Bittensor Subnet 38:Distributed training

Bittensor123 » Subnets » Bittensor Subnet 38:Distributed training

Subnet 38

Distributed Training Subnet

Distributed Training Subnet

This subnet uses a distributed approach to train Large Language Models

SN38 : Distributed training

SubnetDescriptionCategoryCompany
SN38 : Distributed trainingDistributed trainingDecentralized Training
DSTRBTD

This subnet uses a distributed approach to train Large Language Models on web-based datasets. Their proposed solution is a subnet that incentivizes compute, bandwidth, and latency. Compute resources drive the training of each miner’s local model, while bandwidth and latency facilitate the averaging of local model weights using a process called butterfly all-reduce. Once this process is completed, every miner receives a unified global averaged gradient to update their model weights.

Training Process:
Miners train the collective model on specific dataset segments. The training is iterative, with both local and global tracking of epochs and steps. Miners perform local training on their assigned data and participate in gradient averaging using the butterfly all-reduce method.

Dataset:

The subnet utilizes the “HuggingFaceFW/fineweb” dataset with the “sample-350BT” configuration.
Data is streamed in real-time from Hugging Face servers for efficient large-scale data handling.
Text is tokenized with the GPT-2 tokenizer (“distilgpt2”).

Model Submission:

After each gradient averaging step, miners push the updated model to the Hugging Face Hub.
The model is tagged with the current epoch number.
In case of upload failure, the system retries within a set limit.

Validation:

Validators perform two main queries: “Train” and “AllReduce.”
For “Train” queries, validators check miners’ loss, gradients, and dataset indices.
For “AllReduce” queries, they initiate gradient averaging and verify miner participation.

Incentive Mechanism:

Bandwidth Score: Measures miners’ efficiency in sharing model states.
Gradient Score: Compares miner-reported gradients to validator-calculated gradients.
Steps Score: Rewards miners based on the volume of data trained in each step.

Karim Foda

Mikkel Loose