This subnet is dedicated to advancing Retrieval-Augmented Generation (RAG) by encouraging the development and provision of advanced chunking solutions. Its goal is to create, host, and deploy a smart chunking system that enhances similarity within chunks and maximizes dissimilarity between chunks.

VectorChat is building out a vertically integrated solution, being both a consumer and leading provider of intelligent Retrieval-Augmented Generation, and consequently, creating the full demand loop for the Chunking subnet.

Chunking involves breaking data into smaller, manageable “chunks” to facilitate processing and analysis. This method is crucial in natural language processing (NLP) and is especially beneficial for large language models (LLMs). Chunking can involve segmenting content such as dividing an article into sections, a screenplay into scenes, or a musical recording into movements.

Why Use Chunking?

For LLMs to deliver accurate responses, they must access relevant information. When the needed information exceeds the model’s training data, it must be included in the query to avoid inaccuracies. However, including the entire dataset with each query is impractical due to inference costs.

Chunking addresses this by dividing data into smaller, meaningful chunks, which are then converted into vectors and stored in a vector database. When a query is made, it is also converted into a vector, and the system retrieves the most relevant chunks from the database. This approach allows the model to process only the pertinent sections of text, significantly reducing the number of tokens processed per query.

Benefits of Chunking

Chunking makes querying more efficient and cost-effective by focusing on relevant text portions while managing resources effectively. It is a vital step for various machine learning tasks that involve large datasets, such as:

  • Retrieval-Augmented Generation (RAG): By maintaining a database of relevant documents, RAG provides LLMs with the necessary context for accurate query processing. Effective chunking ensures that only the most relevant texts are included, enhancing response quality.
  • Classification: Chunking helps in organizing texts into similar sections for better classification and labeling, improving task accuracy and efficiency.
  • Semantic Search: Improved chunking enhances semantic search algorithms, which focus on meaning rather than simple keyword matching, resulting in more accurate and reliable search results.

What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a framework that enhances the performance of a Generative AI application’s large language model (LLM) by supplying it with the most relevant and contextually important proprietary, private, or dynamic data. This architecture improves the model’s accuracy and effectiveness in performing various tasks.

VectorChat is dedicated to delivering the most immersive conversational AI experience. Their upcoming platform, Toffee, utilizes Retrieval-Augmented Generation (RAG) to provide users with expansive memory, extended conversation lengths, and domain-specific knowledge.

In developing Toffee, they identified that while RAG has seen many improvements, chunking solutions were significantly lacking. Existing methods were either too basic or overly resource-intensive, making the RAG pipeline costly and less accurate. Traditional chunking approaches (e.g., chunking every X tokens with Y overlap) were cheaper but led to higher runtime costs due to unnecessary context in LLM queries. Meanwhile, advanced semantic chunking solutions like Unstructured.io were prohibitively expensive and slow, limiting file uploads for users.

To address these issues and fulfill Toffee’s vision, VectorChat’s team created an algorithm that outperforms current industry solutions. Instead of developing proprietary models, they capitalized on the underdeveloped state of the field. The documentation provided includes all necessary information to develop solutions that match or exceed their current model.

Their aim is to reduce costs, enhance accuracy, and unlock new possibilities. As LLMs expand to include diverse datasets (e.g., audio, images, video), intelligent chunking becomes increasingly crucial.

The subnet is designed with a clear, transparent, and fair incentive system to push beyond current achievements. They are eager to see how miners will advance this technology.