Subnet 25
Folding
Macrocosmos
A protein folding subnet that enables free, on-demand academic research for solving complex protein structures

SN25 : Protein Folding
Subnet | Description | Category | Company |
---|---|---|---|
SN25 : Protein Folding | Protein folding | DeSci | Macrocosmos |
The protein folding subnet marks Bittensor’s inaugural foray into academic applications, developed and managed by Macrocosmos AI. Amidst a landscape primarily focused on AI and web-scraping protocols, they emphasize Bittensor’s remarkable adaptability to tackle diverse challenges.
This subnet is tailored to facilitate valuable academic research within Bittensor, enabling researchers and universities to solve complex protein structures on demand and without cost. They envision this subnet empowering researchers to conduct pioneering studies, publish impactful findings in prestigious journals, and demonstrate that decentralized systems offer cost-effective and efficient alternatives to traditional methodologies. Through this initiative, they aim to highlight not only Bittensor’s scalability and resilience but also its potential to revolutionize academic research practices.
Macrocosmos aims to elevate the creation of subnets, emphasizing a focus on crafting incentives and mechanisms for the Bit Tensor network. Proteins are fundamental biological molecules that execute various functions in biochemistry. They act as enzymes breaking down food, transport oxygen in blood like hemoglobin, and enable muscle contraction through actin filaments. These proteins are composed of long chains of amino acids, with their sequence encoded in DNA. However, transitioning from a 2D chain of amino acids to a functional 3D structure is a critical step in their ability to perform tasks effectively.
The process by which these 2D structures fold into stable 3D shapes is known as protein folding. This natural process results in a structure with significantly lower free energy compared to its linear form. Similar to assembling Lego bricks, it’s not just about knowing the building blocks but how they fit together – where “form defines function,” a crucial concept in biochemistry. Understanding and simulating protein folding is essential for uncovering their functional roles.
Why is Protein Folding an Ideal Subnet Concept?
Protein folding is a notoriously difficult research problem, which is why they chose it: to demonstrate that Bittensor can tackle the world’s hardest research problems and motivate academics and universities to build research subnets. They aim to show that Bittensor can handle computationally complex problems with significant market value. Protein engineering is a $2.6 billion/year market, set to grow significantly by 2030.
Subnet 25, dedicated to solving protein folding, is Bittensor’s first venture into academic use cases, demonstrating the network’s efficacy and flexibility. It uses the industry-standard GROMACS software to simulate the molecular dynamics of proteins. This involves taking an initial 3D protein structure, placing it in a cell-like environment, and applying the laws of physics to evolve the system over time. The simulation accurately predicts the final 3D structure, aiding researchers in understanding the biological function of the protein.
Protein folding is computationally intensive and can take days or weeks to fully simulate complex structures, limiting research to large, well-funded institutions. Subnet 25 aims to democratize protein folding, enabling efficient and accessible study of proteins. Researchers and universities are invited to use the subnet to solve any protein, on demand, for free. This will empower researchers to conduct world-class research and publish in top journals, demonstrating that decentralized systems are an economic and efficient alternative to traditional approaches.
Incentive Mechanism
Physical systems like proteins tend to minimize their energy, providing a succinct, exploit-resistant, and highly sensitive quality measure. Miners compete to provide protein configurations with the lowest energy, aligning the network’s optimization metric with desired biological outcomes.
Validators select proteins at random from a large public database (RCSB protein data bank) and download input files for preprocessing and simulation. Validators calculate energy using files attached to the FoldingSynapse and perform additional verification steps to ensure file integrity. Each validator maintains a queue of 10 concurrent protein folding jobs, with 30 validators active, resulting in 3000 simultaneous simulations at all times. They plan to increase this by an order of magnitude, requiring miners to handle hundreds of concurrent simulations.
Miners use separate random seeds for their simulations, ensuring diverse exploration of the folding space. To mitigate biases from randomization, they will introduce a rebase mechanism, allowing miners to copy winning coordinates, intensifying competition and results. This presents a tradeoff between exploration and exploitation.
They use an exponential incentive curve to encourage motivated miners to innovate and provide more value. Currently, the winning miner gains 80% of possible rewards in each step, but this will shift to a winner-takes-all model. Weights are calculated based on the average EMA across assigned jobs, requiring miners to perform well consistently. Most miners have upgraded to GPU-enabled simulations, increasing productivity by 20-50x. Early stopping and epsilon-bounded scoring prevent wasted efforts and ensure meaningful progress.
Running the Subnet
Protein folding operations utilize GROMACS, a standard package requiring:
A Linux-based machine
Multiple high-performance CPU cores
No GPU-compatible GROMACS package is initially required for mining operations. For detailed hardware specifications, refer to min_compute.yml.
IMPORTANT: GROMACS is a large package, requiring 1 to 1.5 hours for download.
How Does the Subnet Operate?
In this subnet, validators assign protein folding challenges to miners, who then conduct simulations using GROMACS to achieve stable protein configurations. GROMACS relies on many input files to configure simulations, specifying protein coordinates, biological environments, particle interactions, and the core simulator engine. Their FoldingSynapse, based on the Bittensor synapse, facilitates the exchange of serialized files between validators and miners. These files, a mixture of raw text and binary formats, are small but crucial. Only essential output files for verification and rewards are sent back, usually a few MB but potentially up to a few GB. They aim to further reduce file sizes by suppressing unnecessary logs.
Validators act as job schedulers, assigning jobs to miners and periodically querying them for progress updates every five minutes, utilizing approximately 50% of validator bandwidth. Currently, files are not encrypted, though future implementation of encryption would be straightforward. At a broad level, each role can be segmented into specific responsibilities:
Validation
Validators generate a specified number of protein folding tasks (neuron.queue_size).
These tasks are distributed among a designated number of miners (neuron.sample_size batches per PDB).
Validators maintain records of all distributed tasks (sample_size * queue_size) and manage result querying based on a predefined timer (neuron.update_interval).
For more detailed insights, refer to validation.md.
Mining
Miners execute multiple concurrent processes, each performing an energy minimization routine for a given pdb_id. The number of protein jobs handled by a miner is determined by the config.neuron.max_workers parameter.
Data Sources, Security and Compute Requirements
The subnet uses real data from the RCSB protein data bank, updated weekly. With around 10 million existing proteins and an infinite space of variations, they currently select from around 65,000 natural proteins, filtering out those requiring preprocessing, leaving 35,000 addressable specimens. They perform a hyperparameter search to ensure stable simulations, increasing the dataset size by 15x. They plan to expand the dataset size further by adding preprocessing, perturbing input configurations, supporting non-default environmental factors, and increasing the hyperparameter space.
Miners could theoretically look up solutions to folded proteins, but practical use of such databases is difficult due to the sensitivity of outputs to environmental factors and initial configurations. A productive approach involves ‘pre-mining’ protein solutions, which requires vast disk space. This is beneficial for the network, akin to data scraping.
The base miner has lightweight requirements, but miners can increase rewards by using more powerful hardware. Most miners have upgraded to GPU-enabled simulations to remain competitive. A GPU-enabled base miner is expected to be released soon. Multiprocessing is essential, and top miners are running machines with many cores, scaling up to support hundreds of processes.
A simple dashboard displaying key indicators will be released next week, with a more comprehensive version in development at Macrocosmos. Validators log events to an open weights and biases project (wandb), accessible via web UI or python API. A demo notebook is available in the repo.
Ultimately, they want researchers to use their subnet for free as an alternative to traditional HPC clusters, providing key analysis tools for understanding results. Inspired by the AlphaFold3 server, their app will allow researchers to specify folding jobs with full control over input parameters to achieve state-of-the-art results. App development will begin after the first ‘stable’ subnet version, expected in about a month.
They are also outlining commercialization boundaries for decentralized protein research. This could develop into an on-demand, community-driven protein-folding service, selling the subnet’s bandwidth, or directing capacity towards altruistic community goals.
Will Squires – CEO and Co-Founder
Will has dedicated his career to navigating complexity, spanning from designing and constructing significant infrastructure to spearheading the establishment of an AI accelerator. With a background in engineering, he made notable contributions to transport projects such as Crossrail and HS2. Will’s expertise led to an invitation to serve on the Mayor of London’s infrastructure advisory panel and to lecture at UCL’s Centre for Advanced Spatial Analysis (CASA). He was appointed by AtkinsRéalis to develop an AI accelerator, which expanded to encompass over 60 staff members globally. At XYZ Reality, a company specializing in augmented reality headsets, Will played a pivotal role in product and software development, focusing on holographic technology. Since 2023, Will has provided advisory services for the Opentensor Foundation, contributing to the launch of Revolution.
Steffen Cruz – CTO and Co-Founder
Steffen earned his PhD in subatomic physics from the University of British Columbia, Canada, focusing on developing software to enhance the detection of extremely rare events (10^-7). His groundbreaking research contributed to the identification of novel exotic states of nuclear matter and has been published in prestigious scientific journals. As the founding engineer of SolidState AI, he pioneered innovative techniques for physics-informed machine learning (PIML). Steffen was subsequently appointed as the Chief Technology Officer of the Opentensor Foundation, where he played a pivotal role as a core developer of Subnet 1, the foundation’s flagship subnet. In this capacity, he enhanced the adoption and accessibility of Bittensor by authoring technical documentation, tutorials, and collaborating on the development of the subnet template.
Pedro Ferreira – Machine Learning Engineer
Kalei Brady – Data Scientist
Sergio Champoux – Data Scientist
Brian McCrindle – Machine Learning Researcher
Elena Nesterova – Lead Technical Program Manager
Richard Hudson – Communications Lead
Alex Williams – Recruitment Lead