Subnet 13
Data Universe
Macrocosmos
Data Universe decentralizes and scales data storage for Bittensor, supporting extensive data collection and distribution

SN13 : Data Universe
Subnet | Description | Category | Company |
---|---|---|---|
SN13 : Data Universe | Data scraping & storage | Data Pipeline Storage | Macrocosmos |
Data is a crucial pillar of AI, and Data Universe serves as that pillar for Bittensor.
Data Universe is a subnet designed for collecting and storing vast amounts of data from a wide range of sources, intended for use by other subnets. It was built with a strong emphasis on decentralization and scalability. There is no centralized entity controlling the data; it is distributed across all miners on the network and can be queried via the validators. At launch, Data Universe supports up to 50 petabytes of data across 200 miners, while only requiring approximately 10GB of storage on each validator.
Macrocosmos aims to elevate the creation of subnets, emphasizing a focus on crafting incentives and mechanisms for the Bittensor network. In the Data Universe, miners scrape data from defined sources, known as DataSources. Each piece of data (e.g., a webpage, BTC prices), termed a DataEntity, is stored in the miner’s database. Every DataEntity belongs to a specific DataEntityBucket, uniquely identified by its DataEntityBucketId—a tuple consisting of the data’s source (DataSource), creation time (TimeBucket), and a classification (DataLabel, e.g., a stock ticker symbol). The complete set of DataEntityBuckets on a miner is called its MinerIndex.
Validators periodically query each miner to retrieve their latest MinerIndexes and store them in a local database. This process provides validators with a comprehensive overview of all data stored on the network and identifies which miners to query for specific types of data. Validators also regularly verify the accuracy of the data stored by miners and reward them based on the value of the data they have accumulated.
Incentive Mechanism
Each miner reports its MinerIndex to the validator, detailing the quantity and type of data it holds. Miners are scored based on two main dimensions:
- Data Quantity and Value: The volume and the value of the data a miner has.
- Miner Credibility: The reliability of the miner.
Data Value
Not all data holds the same value. The factors determining data value include:
- Data Freshness: Fresh data is more valuable than old data. Data older than a certain threshold is not scored. As of December 11th, 2023, data older than 30 days is not scored, though this threshold may change in the future.
- Data Desirability: The Data Universe defines a DataDesirabilityLookup to determine which types of data are more desirable. Desirable data is scored more highly. Unspecified labels get a default_scale_factor of 0.5, meaning they score half the value compared to specified labels. The DataDesirabilityLookup will evolve over time, with each change announced in advance to allow miners time to adjust.
- Duplication Factor: Data stored by many miners is less valuable than data stored by only a few. The value of data decreases in proportion to the number of miners storing it.
Miner Credibility
Validators periodically check a sample of data from each miner’s MinerIndex to verify its accuracy. This process helps track a miner’s credibility, which in turn scales the miner’s score. Misrepresenting data types and quantities always results in a worse score for the miner.
Data Universe Dashboard
Data Universe rewards diversity of data; storing multiple copies of the same data is not beneficial. To help miners understand the current data landscape, the Data Universe team hosts a dashboard showing the amount of each type of data (by DataEntityBucketId) on the subnet. Miners are encouraged to use this dashboard to optimise their Miner Configuration and maximise rewards.
Will Squires – CEO and Co-Founder
Will has dedicated his career to navigating complexity, spanning from designing and constructing significant infrastructure to spearheading the establishment of an AI accelerator. With a background in engineering, he made notable contributions to transport projects such as Crossrail and HS2. Will’s expertise led to an invitation to serve on the Mayor of London’s infrastructure advisory panel and to lecture at UCL’s Centre for Advanced Spatial Analysis (CASA). He was appointed by AtkinsRéalis to develop an AI accelerator, which expanded to encompass over 60 staff members globally. At XYZ Reality, a company specializing in augmented reality headsets, Will played a pivotal role in product and software development, focusing on holographic technology. Since 2023, Will has provided advisory services for the Opentensor Foundation, contributing to the launch of Revolution.
Steffen Cruz – CTO and Co-Founder
Steffen earned his PhD in subatomic physics from the University of British Columbia, Canada, focusing on developing software to enhance the detection of extremely rare events (10^-7). His groundbreaking research contributed to the identification of novel exotic states of nuclear matter and has been published in prestigious scientific journals. As the founding engineer of SolidState AI, he pioneered innovative techniques for physics-informed machine learning (PIML). Steffen was subsequently appointed as the Chief Technology Officer of the Opentensor Foundation, where he played a pivotal role as a core developer of Subnet 1, the foundation’s flagship subnet. In this capacity, he enhanced the adoption and accessibility of Bittensor by authoring technical documentation, tutorials, and collaborating on the development of the subnet template.
Pedro Ferreira – Machine Learning Engineer
Kalei Brady – Data Scientist
Sergio Champoux – Data Scientist
Brian McCrindle – Machine Learning Researcher
Elena Nesterova – Lead Technical Program Manager
Richard Hudson – Communications Lead
Alex Williams – Recruitment Lead