Bittensor Subnet 25: Protein Folding

SN25 PROFILE

Subnet 25

Folding

Macrocosmos

A protein folding subnet that enables free, on-demand academic research for solving complex protein structures

SN25 : Protein Folding

Subnet	Description	Category	Company
SN25 : Protein Folding	Protein folding	DeSci	Macrocosmos

Links

Latest News

404-GEN Releases World’s Largest Open Source 3D Model Dataset for Decentralized AI ResearchApril 15, 2025
Coin Metrics Precog Subnet 55 Demonstrates Predictive Ability to Outperform Traditional Bitcoin Holding StrategiesApril 9, 2025
Nuance Launches Incentives on Bittensor Subnet 23 to Optimize Global Quality of DiscourseApril 8, 2025

Subnet navigation

SN1 SN2 SN3 SN4 SN5 SN6 SN7 SN8 SN9 SN10 SN11 SN12 SN13 SN14 SN15 SN16 SN17 SN18 SN19 SN20 SN21 SN22 SN23 SN24 SN25 SN26 SN27 SN28 SN29 SN30 SN31 SN32 SN33 SN34 SN35 SN36 SN37 SN38 SN39 SN40 SN41 SN42 SN43 SN44 SN45 SN46 SN47 SN48 SN49 SN50 SN51 SN52 SN53 SN54 SN55 SN56 SN57 SN58 SN59 SN60 SN61 SN62 SN63 SN64

The protein folding subnet marks Bittensor’s inaugural foray into academic applications, developed and managed by Macrocosmos AI. Amidst a landscape primarily focused on AI and web-scraping protocols, they emphasize Bittensor’s remarkable adaptability to tackle diverse challenges.

This subnet is tailored to facilitate valuable academic research within Bittensor, enabling researchers and universities to solve complex protein structures on demand and without cost. They envision this subnet empowering researchers to conduct pioneering studies, publish impactful findings in prestigious journals, and demonstrate that decentralized systems offer cost-effective and efficient alternatives to traditional methodologies. Through this initiative, they aim to highlight not only Bittensor’s scalability and resilience but also its potential to revolutionize academic research practices.

Macrocosmos aims to elevate the creation of subnets, emphasizing a focus on crafting incentives and mechanisms for the Bit Tensor network. Proteins are fundamental biological molecules that execute various functions in biochemistry. They act as enzymes breaking down food, transport oxygen in blood like hemoglobin, and enable muscle contraction through actin filaments. These proteins are composed of long chains of amino acids, with their sequence encoded in DNA. However, transitioning from a 2D chain of amino acids to a functional 3D structure is a critical step in their ability to perform tasks effectively.

The process by which these 2D structures fold into stable 3D shapes is known as protein folding. This natural process results in a structure with significantly lower free energy compared to its linear form. Similar to assembling Lego bricks, it’s not just about knowing the building blocks but how they fit together – where “form defines function,” a crucial concept in biochemistry. Understanding and simulating protein folding is essential for uncovering their functional roles.

Why is Protein Folding an Ideal Subnet Concept?

Protein folding is a notoriously difficult research problem, which is why they chose it: to demonstrate that Bittensor can tackle the world’s hardest research problems and motivate academics and universities to build research subnets. They aim to show that Bittensor can handle computationally complex problems with significant market value. Protein engineering is a $2.6 billion/year market, set to grow significantly by 2030.

Subnet 25, dedicated to solving protein folding, is Bittensor’s first venture into academic use cases, demonstrating the network’s efficacy and flexibility. It uses the industry-standard GROMACS software to simulate the molecular dynamics of proteins. This involves taking an initial 3D protein structure, placing it in a cell-like environment, and applying the laws of physics to evolve the system over time. The simulation accurately predicts the final 3D structure, aiding researchers in understanding the biological function of the protein.

Protein folding is computationally intensive and can take days or weeks to fully simulate complex structures, limiting research to large, well-funded institutions. Subnet 25 aims to democratize protein folding, enabling efficient and accessible study of proteins. Researchers and universities are invited to use the subnet to solve any protein, on demand, for free. This will empower researchers to conduct world-class research and publish in top journals, demonstrating that decentralized systems are an economic and efficient alternative to traditional approaches.

Incentive Mechanism

Physical systems like proteins tend to minimize their energy, providing a succinct, exploit-resistant, and highly sensitive quality measure. Miners compete to provide protein configurations with the lowest energy, aligning the network’s optimization metric with desired biological outcomes.

Validators select proteins at random from a large public database (RCSB protein data bank) and download input files for preprocessing and simulation. Validators calculate energy using files attached to the FoldingSynapse and perform additional verification steps to ensure file integrity. Each validator maintains a queue of 10 concurrent protein folding jobs, with 30 validators active, resulting in 3000 simultaneous simulations at all times. They plan to increase this by an order of magnitude, requiring miners to handle hundreds of concurrent simulations.

Miners use separate random seeds for their simulations, ensuring diverse exploration of the folding space. To mitigate biases from randomization, they will introduce a rebase mechanism, allowing miners to copy winning coordinates, intensifying competition and results. This presents a tradeoff between exploration and exploitation.

They use an exponential incentive curve to encourage motivated miners to innovate and provide more value. Currently, the winning miner gains 80% of possible rewards in each step, but this will shift to a winner-takes-all model. Weights are calculated based on the average EMA across assigned jobs, requiring miners to perform well consistently. Most miners have upgraded to GPU-enabled simulations, increasing productivity by 20-50x. Early stopping and epsilon-bounded scoring prevent wasted efforts and ensure meaningful progress.

Running the Subnet

Protein folding operations utilize GROMACS, a standard package requiring:

A Linux-based machine
Multiple high-performance CPU cores
No GPU-compatible GROMACS package is initially required for mining operations. For detailed hardware specifications, refer to min_compute.yml.

IMPORTANT: GROMACS is a large package, requiring 1 to 1.5 hours for download.

How Does the Subnet Operate?

In this subnet, validators assign protein folding challenges to miners, who then conduct simulations using GROMACS to achieve stable protein configurations. GROMACS relies on many input files to configure simulations, specifying protein coordinates, biological environments, particle interactions, and the core simulator engine. Their FoldingSynapse, based on the Bittensor synapse, facilitates the exchange of serialized files between validators and miners. These files, a mixture of raw text and binary formats, are small but crucial. Only essential output files for verification and rewards are sent back, usually a few MB but potentially up to a few GB. They aim to further reduce file sizes by suppressing unnecessary logs.

Validators act as job schedulers, assigning jobs to miners and periodically querying them for progress updates every five minutes, utilizing approximately 50% of validator bandwidth. Currently, files are not encrypted, though future implementation of encryption would be straightforward. At a broad level, each role can be segmented into specific responsibilities:

Validation

Validators generate a specified number of protein folding tasks (neuron.queue_size).
These tasks are distributed among a designated number of miners (neuron.sample_size batches per PDB).
Validators maintain records of all distributed tasks (sample_size * queue_size) and manage result querying based on a predefined timer (neuron.update_interval).
For more detailed insights, refer to validation.md.

Mining

Miners execute multiple concurrent processes, each performing an energy minimization routine for a given pdb_id. The number of protein jobs handled by a miner is determined by the config.neuron.max_workers parameter.

Data Sources, Security and Compute Requirements

The subnet uses real data from the RCSB protein data bank, updated weekly. With around 10 million existing proteins and an infinite space of variations, they currently select from around 65,000 natural proteins, filtering out those requiring preprocessing, leaving 35,000 addressable specimens. They perform a hyperparameter search to ensure stable simulations, increasing the dataset size by 15x. They plan to expand the dataset size further by adding preprocessing, perturbing input configurations, supporting non-default environmental factors, and increasing the hyperparameter space.

Miners could theoretically look up solutions to folded proteins, but practical use of such databases is difficult due to the sensitivity of outputs to environmental factors and initial configurations. A productive approach involves ‘pre-mining’ protein solutions, which requires vast disk space. This is beneficial for the network, akin to data scraping.

The base miner has lightweight requirements, but miners can increase rewards by using more powerful hardware. Most miners have upgraded to GPU-enabled simulations to remain competitive. A GPU-enabled base miner is expected to be released soon. Multiprocessing is essential, and top miners are running machines with many cores, scaling up to support hundreds of processes.

A simple dashboard displaying key indicators will be released next week, with a more comprehensive version in development at Macrocosmos. Validators log events to an open weights and biases project (wandb), accessible via web UI or python API. A demo notebook is available in the repo.

Ultimately, they want researchers to use their subnet for free as an alternative to traditional HPC clusters, providing key analysis tools for understanding results. Inspired by the AlphaFold3 server, their app will allow researchers to specify folding jobs with full control over input parameters to achieve state-of-the-art results. App development will begin after the first ‘stable’ subnet version, expected in about a month.

They are also outlining commercialization boundaries for decentralized protein research. This could develop into an on-demand, community-driven protein-folding service, selling the subnet’s bandwidth, or directing capacity towards altruistic community goals.

Will Squires – CEO and Co-Founder

Will has dedicated his career to navigating complexity, spanning from designing and constructing significant infrastructure to spearheading the establishment of an AI accelerator. With a background in engineering, he made notable contributions to transport projects such as Crossrail and HS2. Will’s expertise led to an invitation to serve on the Mayor of London’s infrastructure advisory panel and to lecture at UCL’s Centre for Advanced Spatial Analysis (CASA). He was appointed by AtkinsRéalis to develop an AI accelerator, which expanded to encompass over 60 staff members globally. At XYZ Reality, a company specializing in augmented reality headsets, Will played a pivotal role in product and software development, focusing on holographic technology. Since 2023, Will has provided advisory services for the Opentensor Foundation, contributing to the launch of Revolution.

Steffen Cruz – CTO and Co-Founder

Steffen earned his PhD in subatomic physics from the University of British Columbia, Canada, focusing on developing software to enhance the detection of extremely rare events (10^-7). His groundbreaking research contributed to the identification of novel exotic states of nuclear matter and has been published in prestigious scientific journals. As the founding engineer of SolidState AI, he pioneered innovative techniques for physics-informed machine learning (PIML). Steffen was subsequently appointed as the Chief Technology Officer of the Opentensor Foundation, where he played a pivotal role as a core developer of Subnet 1, the foundation’s flagship subnet. In this capacity, he enhanced the adoption and accessibility of Bittensor by authoring technical documentation, tutorials, and collaborating on the development of the subnet template.

Pedro Ferreira – Machine Learning Engineer

Kalei Brady – Data Scientist

Sergio Champoux – Data Scientist

Brian McCrindle – Machine Learning Researcher

Elena Nesterova – Lead Technical Program Manager

Richard Hudson – Communications Lead

Alex Williams – Recruitment Lead