Subnet 05

Open-Kaito

Kaito AI

Open-Kaito decentralizes web search by using community-driven indexing and validation, enhancing transparency and trust

SN5 : OpenKaito

SubnetDescriptionCategoryCompany
SN5 : OpenKaitoText embedding modelModel development
Kaito

Open-Kaito aims to decentralize web searches by allowing a community of users and developers to manage search results instead of a single corporation, promoting transparency and democratization of information. It ensures that access to information is not controlled by a single entity, empowering the community and enhancing trust. The system involves a cooperative effort between miners who collect and propose information and validators who verify its accuracy, akin to librarians and meticulous editors, respectively.

Open-Kaito aims to democratize access to Web3 information through its established platform. However, its current in-house methods for data collection, indexing, AI training, and ranking create operational challenges and limit public innovation.

Search engines are intricate systems that go beyond being just a database or a ranking algorithm. They require low latency, which complicates efforts to decentralize them. Subnet Open-Kaito represents Kaito AI’s venture into addressing these technical challenges. By leveraging BitTensor’s built-in Yuma consensus, Kaito AI redefines search indexing as a miner-validator problem, where index relevance is assessed by an AI-based nDCG evaluator learning from real user engagement feedback.

Additionally, Kaito AI plans to introduce a seamless search and analytics product based on this decentralized search layer, featuring intelligent coordination and caching mechanisms on validator nodes.

The goal is to build a decentralized indexing layer that powers smart search and analytics.

Currently, Open-Kaito mainly concentrates on analyzing Twitter due to its social network analysis capabilities and influence tracking. The intention is for Open-Kaito to serve as an infrastructure layer empowering users to build diverse applications rather than dictating specific content. Open-Kaito employs proprietary AI models to extract sentiment, narratives, and topics in real-time from platforms like Twitter and Discord. An AI pipeline triggers immediate data understanding upon new content, generating insights for storage in the database promptly.

Distinguishing Open Kido’s Approach

  • Open Kido differentiates itself by emphasizing indexing and offering a unique incentive mechanism.
  • It positions itself as an indexing layer enabling quick access to diverse data categories and empowering user-driven content interpretation.

Inverted Index

An inverted index is essential for a search engine, functioning as a reverse lookup table that links keywords to documents containing those keywords. Advanced search engines enhance keyword extraction using NLP techniques (e.g., tokenization, stemming) and content understanding models (e.g., classification, tagging, categorization). Indexing plays a crucial role in providing fast access to vast data repositories, akin to creating a library catalog, enabling quick retrieval of specific information by categorizing and tagging content.

Kaito AI’s indexing focuses on influential figures and governance forums within web 3, utilizing AI and content understanding tools to navigate and analyze data efficiently. Search queries typically impose logical constraints on keywords and are fulfilled by operations on the inverted index, which can be partitioned by keywords or documents for distribution.

Search Ranking

Search ranking involves two main processes:

  1. Retrieval Ranking: This ranks documents based on criteria like term frequency (TF) and inverse document frequency (IDF). It prioritizes simple, indexable signals to optimize speed and relevance.
  2. Re-ranking: This refines a smaller set of candidates using advanced techniques, such as deep learning algorithms that analyze user interaction data. In a decentralized system, this can be optimized through collective intelligence, with network participants contributing to the re-ranking process.

Knowledge Graph

A Knowledge Graph structures real-world entities and their relationships, enhancing search queries and document understanding. In Web3, it contextualizes relationships between projects, influencers, and other entities, making it well-suited for collective intelligence solutions.

Towards a Decentralized Web3 Search

Rather than decentralizing every component of a search engine, the focus is on framing search relevance as a validation-miner problem. This approach incentivizes miners to innovate in data acquisition, indexing, ranking, and knowledge graph development. Miners are encouraged to optimize components with the highest ROI, similar to search engineering teams using A/B testing and failure analysis.

Validator Role

Validators can access the indexing layer by being validators of multiple subnets, granting them access to raw data and indexing capabilities. Validators may need to connect with multiple subnets to ensure the efficient functioning of the system, combining raw data access and indexing capabilities for applications. The infrastructure’s modular nature allows for different approaches, whether centralized or decentralized, providing flexibility in data processing and application development.

Validators issue search queries and expect ranked results from miners. They use a simple format supporting basic functionalities like keywords, AND/OR semantics, sorting, and filtering.

  • AI-based nDCG: Validators use an ML-based nDCG rater, which leverages large language models (LLMs) to evaluate result relevance and ranking. The evaluator model is fine-tuned with real user engagement data and regularly updated on HuggingFace. This model is open-sourced with potential for full decentralization.
  • Result Correctness: Validators verify the URLs of search results to ensure they match original sources, preventing fabricated results.

Miner Role

There are various ways to mine on the subnet, with about 50 different methods available depending on the effectiveness desired for contributing as a minor. Joining the Bittensor ecosystem as a miner presents the opportunity to both educate oneself and contribute to different objectives, earning rewards along the journey. As this subnet evolves, it may become more challenging to contribute, but branching off into new directions allows for the introduction of fresh talent and methods for contribution.

Miners respond to search requests from validators with ranked results. They are urged to utilize efficient filters, such as implementing exact match requirements and capitalization standards, to enhance the quality of indexed content. They are incentivized to improve result quality through various means:

  • Search Index: Miners can use local ElasticSearch instances with a basic schema for supported sources (e.g., Twitter, governance forums). Search requests are translated into ElasticSearch queries by default.
  • Crawler: A basic Apify-based crawler is provided, but node owners are encouraged to develop their own more cost-effective crawler stacks.
  • Ranking Algorithm: The default algorithm is BM25, supported by ElasticSearch, which relies on TF and IDF within the search index.

Reward Model

Miners receive rewards based on several criteria:

  • Truthfulness: Rewards are given for providing authentic results from specified sources, with penalties for fabricated data.
  • Relevance: Rewards reflect the content and contextual relevance of results, as measured by nDCG.
  • Recency: Rewards increase with the timeliness of results.
  • Diversity: Rewards account for the diversity of sources and content, assessed using content clustering methods.

Building a Model for Content Understanding

To truly understand content, a model needs to be built based on the content individuals engage with on platforms like Twitter. This model simplifies complexities and allows for broader application beyond niche cases. By focusing on author groups rather than scraping keywords, comprehensive content analysis can be achieved. This approach streamlines the process by targeting specific sources instead of exhaustively gathering all possible queries. Understanding the source of content is crucial for effectively extracting information, similar to Google’s emphasis on ranking websites over specific keywords.

Kaito is developed by a team with unparalleled expertise in hedge funds, machine learning, and blockchain technology. As web3 researchers, builders, and investors, they possess a deep understanding of the current pain points in Web3 search. Their technology-driven solution for sourcing, sorting, and curating information leverages advanced data science, cutting-edge machine learning, and their extensive experience with large, complex distributed data systems.

Yu Hu – Founder

Hao L – Head of Engineering

YuZhi Wei – Blockchain Data Scientist

Alex W – Marketing Manager

Boyang LI – Founding Engineer

Desti Susilawati – Production Operator

Hongjie Wang – Web Crawler Specialist

Simone L – Product Designer and Technologist

Sandra Leow – Research Partner

Zhenghao Zhang – Data Scientist

Rong Zilin – Senior Frontend Developer