Bittensor Subnet 63: Alpha Trade Exchange

SN63 PROFILE

Subnet 63

Finest Data

Finest Data builds a high-quality, large-scale pretraining dataset for LLMs using Bittensor.

SN63 : Alpha Trade Exchange

Subnet	Description	Category	Company
SN63 : Alpha Trade Exchange	Inference verification & optimization	Generative Al	Manifold

Links

Latest News

404-GEN Releases World’s Largest Open Source 3D Model Dataset for Decentralized AI ResearchApril 15, 2025
Coin Metrics Precog Subnet 55 Demonstrates Predictive Ability to Outperform Traditional Bitcoin Holding StrategiesApril 9, 2025
Nuance Launches Incentives on Bittensor Subnet 23 to Optimize Global Quality of DiscourseApril 8, 2025

Subnet navigation

SN1 SN2 SN3 SN4 SN5 SN6 SN7 SN8 SN9 SN10 SN11 SN12 SN13 SN14 SN15 SN16 SN17 SN18 SN19 SN20 SN21 SN22 SN23 SN24 SN25 SN26 SN27 SN28 SN29 SN30 SN31 SN32 SN33 SN34 SN35 SN36 SN37 SN38 SN39 SN40 SN41 SN42 SN43 SN44 SN45 SN46 SN47 SN48 SN49 SN50 SN51 SN52 SN53 SN54 SN55 SN56 SN57 SN58 SN59 SN60 SN61 SN62 SN63 SN64

Finest Data (Subnet 63) recognizes that the performance of large language models (LLMs) is deeply tied to the quality and scale of their pretraining datasets. While the datasets behind leading open LLMs like LLaMA 3 and Mixtral remain undisclosed, and little is known about their construction, a new large-scale dataset, FineWeb, has emerged. FineWeb, built from 96 snapshots of CommonCrawl, contains 15 trillion tokens (44TB of disk space) and has outperformed other open pretraining datasets.

For this subnet they are using the same algorithm behind FineWeb to develop an even larger, higher-performing dataset. Their dataset will be further refined and enhanced through the decentralized Bittensor network, ensuring superior quality and scalability.

The Finest Data subnet employs an optimized mechanism for dataset creation, consisting of two primary roles:

Miners: They are responsible for generating refined datasets from raw crawled data.

Validators: They are tasked with evaluating the performance of miners and ensuring the quality of the datasets produced.

Both miners and validators are rewarded with TAO based on their scores and trust within the network.

Main Mechanism of the Subnet

Miners receive tasks from the task server via the task retrieval API. This server manages and organizes tasks, primarily splitting the CommonCrawl data and tracking the miners’ status. Once miners process the task, they upload the refined dataset to their Hugging Face repository and submit the commit, including the Hugging Face URL, to the blockchain.

Validators periodically check miners’ commits every few blocks, retrieve the new submissions, and evaluate the elapsed time and the quality of the resulting dataset. Based on the miners’ performance, validators assign weights according to their scores.

Dataset Evaluation Method

Validators train a small model using the miner’s dataset and assess its quality based on the model’s accuracy. If the trained model performs well, this indicates the dataset is of high quality. On the other hand, if the model performs poorly, it suggests the dataset quality is suboptimal. This method effectively evaluates the dataset’s quality.

Awaiting Data