The New Landscape of AI+Web3 Integration: Long Tail Distribution Incentives and Open Source Model Market

AI+Web3: Towers and Square

Key Points Summary

  1. Web3 projects with AI concepts have become targets for capital attraction in both primary and secondary markets.

  2. The opportunities of Web3 in the AI industry lie in: using distributed incentives to coordinate potential supply in the long tail across data, storage, and computing (; while establishing open-source models and a decentralized market for AI Agents.

  3. AI is mainly used in the Web3 industry for on-chain finance ), crypto payments, trading, data analysis (, and assisting development.

  4. The utility of AI + Web3 is reflected in the complementarity of the two: Web3 is expected to counteract AI centralization, while AI is expected to help Web3 break boundaries.

Introduction

In the past two years, the development of AI has been accelerated as if pressed by a fast-forward button. This butterfly effect sparked by ChatGPT has not only opened a new world of generative artificial intelligence but has also stirred a wave in the Web3 field.

With the support of AI concepts, the financing in the crypto market has significantly improved compared to its slowdown. Media statistics show that in just the first half of 2024, a total of 64 Web3+AI projects completed financing, with the AI-based operating system Zyber365 achieving the highest financing amount of 100 million USD in its Series A.

The secondary market is more prosperous, and data from cryptocurrency aggregation websites shows that in just over a year, the total market value of the AI sector has reached $48.5 billion, with a 24-hour trading volume close to $8.6 billion; the benefits brought by mainstream AI technology advancements are evident, as the average price of the AI sector increased by 151% after the release of OpenAI's Sora text-to-video model; the AI effect also radiates to one of the cryptocurrency fundraising sectors, Meme: the first AI Agent concept MemeCoin - GOAT quickly gained popularity and achieved a valuation of $1.4 billion, successfully sparking the AI Meme trend.

The research and discussions surrounding AI+Web3 are equally heated, ranging from AI+DePIN to AI Memecoin, and now to the current AI Agent and AI DAO. The FOMO sentiment has already failed to keep up with the speed of the new narrative rotation.

AI+Web3, this combination of terminology filled with hot money, opportunities, and future fantasies, is inevitably seen by some as a marriage arranged by capital. It seems difficult for us to discern whether beneath this splendid robe lies the playground of speculators or the eve of a dawn explosion?

To answer this question, a key consideration for both parties is whether it will get better with the other. Is it possible to benefit from the other's model? In this article, we also attempt to examine this pattern from the shoulders of our predecessors: how Web3 can play a role in various aspects of the AI technology stack, and what new vitality AI can bring to Web3?

![AI+Web3: Towers and Squares])https://img-cdn.gateio.im/webp-social/moments-25bce79fdc74e866d6663cf31b15ee55.webp(

Part.1 What opportunities does Web3 have under the AI stack?

Before we dive into this topic, we need to understand the technology stack of AI large models:

In simpler terms, the entire process can be described as follows: a "large model" is like the human brain. In the early stages, this brain belongs to a newborn baby who has just come into the world and needs to observe and absorb vast amounts of information from the surroundings to understand this world. This is the "collection" phase of data. Since computers do not possess multiple senses like human vision and hearing, prior to training, the large amounts of untagged information from the outside world need to be transformed into a format that computers can understand and use through "preprocessing."

After inputting data, the AI builds a model with understanding and predictive capabilities through "training", which can be viewed as the process of a baby gradually understanding and learning about the outside world. The model's parameters are like the language abilities that the baby continuously adjusts during the learning process. When the content of learning begins to specialize, or when communication with others provides feedback and corrections, it enters the "fine-tuning" phase of the large model.

As children gradually grow up and learn to speak, they can understand meanings and express their feelings and thoughts in new conversations. This stage is similar to the "reasoning" of AI large models, where the model can predict and analyze new language and text inputs. Infants express feelings, describe objects, and solve various problems through language ability, which is also similar to how AI large models apply reasoning in various specific tasks after being trained and put into use, such as image classification and speech recognition.

The AI Agent is moving closer to the next form of large models - capable of independently executing tasks and pursuing complex goals, possessing not only the ability to think but also the ability to remember, plan, and interact with the world using tools.

Currently, in response to the pain points of AI across various stacks, Web3 has initially formed a multi-layered, interconnected ecosystem that covers all stages of the AI model process.

![AI+Web3: Towers and Squares])https://img-cdn.gateio.im/webp-social/moments-cc3bf45e321f9b1d1280bf3bb827d9f4.webp(

) 1. Basic Layer: Computing Power and Data's Airbnb

Hash Rate

Currently, one of the highest costs of AI is the computing power and energy required for training and inference models.

One example is that a large language model from a tech giant needs 16,000 high-end GPUs produced by a certain GPU manufacturer and 30 days to complete training. The unit price of the latter's 80GB version ranges from $30,000 to $40,000, requiring an investment of $400 million to $700 million in computing hardware ### GPU + network chips (. At the same time, the monthly training consumes 1.6 billion kilowatt-hours, with energy expenditures nearing $20 million per month.

The release of AI computing power is precisely the area where Web3 first intersects with AI - DePIN) decentralized physical infrastructure network(. Currently, relevant data websites have listed over 1,400 projects, among which representative projects for GPU computing power sharing include several well-known platforms.

The main logic is that the platform allows individuals or entities with idle GPU resources to contribute their computing power in a permissionless decentralized manner, through an online marketplace for buyers and sellers similar to Uber or Airbnb, thereby increasing the utilization rate of underutilized GPU resources, and end users can also benefit from more cost-effective and efficient computing resources; at the same time, the staking mechanism also ensures that if there are violations of the quality control mechanism or network interruptions, resource providers will face corresponding penalties.

Its characteristics are:

  • Aggregate idle GPU resources: The suppliers mainly consist of independent small and medium-sized data centers and excess computing power resources from operators like cryptocurrency mining farms, with mining hardware based on PoS consensus mechanisms, such as FileCoin and ETH miners. Currently, there are also projects aiming to lower the entry threshold for devices, such as establishing computing power networks for running large model inference using local devices like MacBook, iPhone, and iPad.

  • Facing the long-tail market of AI computing power:

a. "From a technical perspective," decentralized computing power markets are more suitable for inference processes. Training relies more on the data processing capabilities provided by large-scale GPU clusters, while inference has relatively lower requirements for GPU computational performance, as some platforms focus on low-latency rendering tasks and AI inference applications.

b. "From the demand side perspective", small to medium computing power demanders will not independently train their own large models, but will only choose to optimize and fine-tune around a few leading large models, and these scenarios are naturally suitable for distributed idle computing power resources.

  • Decentralized Ownership: The technological significance of blockchain lies in the fact that resource owners always retain control over their resources, allowing for flexible adjustments based on demand while also generating profits.

)# Data

Data is the foundation of AI. Without data, computation is as useless as a floating weed, and the relationship between data and models is like the saying "Garbage in, Garbage out"; the quantity of data and the quality of input determine the final output quality of the model. For the training of current AI models, data determines the model's language ability, comprehension ability, and even values and human-like performance. Currently, the data demand dilemma for AI mainly focuses on the following four aspects:

  • Data hunger: AI model training relies on a large amount of data input. Public information shows that the parameter count of a well-known AI company's trained large language model has reached the trillion level.

  • Data Quality: With the integration of AI and various industries, the timeliness of data, diversity of data, specialization of vertical data, and the incorporation of emerging data sources such as social media sentiment have raised new demands for its quality.

  • Privacy and compliance issues: Currently, various countries and enterprises are gradually recognizing the importance of high-quality datasets and are imposing restrictions on dataset scraping.

  • High data processing costs: Large data volume and complex processing. Public information shows that over 30% of AI companies' R&D costs are used for basic data collection and processing.

Currently, web3 solutions are reflected in the following four aspects:

  1. Data Collection: The availability of freely provided real-world data that can be scraped is rapidly diminishing, and the expenses that AI companies incur for data are increasing year by year. However, this expenditure has not been redirected back to the true contributors of the data; platforms are entirely enjoying the value creation brought by the data, such as a certain social platform generating a total revenue of $203 million through data authorization agreements with AI companies.

The vision of Web3 is to allow users who genuinely contribute to participate in the value creation brought by data, and to obtain more private and valuable data from users in a cost-effective manner through distributed networks and incentive mechanisms.

  • Some platforms serve as a decentralized data layer and network, where users can run nodes, contribute idle bandwidth, and relay traffic to capture real-time data from across the internet, and receive token rewards;

  • Some platforms have introduced the unique concept of data liquidity pools ###DLP(, where users can upload their private data ) such as shopping records, browsing habits, social media activities, etc. ( to specific DLPs and flexibly choose whether to authorize these data for use by specific third parties;

  • On certain platforms, users can use specific tags and @ the platform on social media to enable data collection.

  1. Data Preprocessing: In the AI data processing process, since the collected data is often noisy and contains errors, it must be cleaned and converted into a usable format before training the model, involving repetitive tasks such as normalization, filtering, and handling missing values. This stage is one of the few manual steps in the AI industry, giving rise to the profession of data annotators. As the model's demand for data quality increases, the threshold for data annotators has also risen, and this task is naturally suited for the decentralized incentive mechanism of Web3.
  • Currently, multiple platforms are considering joining this key step of data annotation.

  • Some projects have proposed the concept of "Train2earn", emphasizing data quality, where users can earn rewards by providing annotated data, comments, or other forms of input.

  • The data labeling project will gamify the labeling tasks and allow users to stake points to earn more points.

  1. Data Privacy and Security: It is important to clarify that data privacy and security are two different concepts. Data privacy involves the handling of sensitive data, while data security protects data information from unauthorized access, destruction, and theft. Thus, the advantages of Web3 privacy technology and potential application scenarios are reflected in two aspects: )1( training of sensitive data; )2( data collaboration: multiple data owners can participate in AI training together without sharing their original data.

The current commonly used privacy technologies in Web3 include:

  • Trusted Execution Environment ) TEE (

  • Fully Homomorphic Encryption ) FHE (

  • Zero-knowledge technology ) zk (, such as certain protocols using zkTLS technology, generates zero-knowledge proofs of HTTPS traffic, allowing users to securely import activity, reputation, and identity data from external websites without exposing sensitive information.

However, the field is still in its early stages, and most projects are still in exploration. One current dilemma is that the computational costs are too high, some examples are:

  • A certain zkML framework takes about 80 minutes to generate a proof for a 1M-nanoGPT model.

  • According to data from a certain laboratory, the overhead of zkML is more than 1000 times higher than pure computation.

  1. Data Storage: Once data is available, a place is also needed to store it on the chain, as well as the LLM generated using that data. With data availability )DA( as the core issue, before the Ethereum Danksharding upgrade, its throughput was 0.08MB. Meanwhile, AI model training and real-time inference typically require a data throughput of 50 to 100GB per second. This order of magnitude gap makes existing on-chain solutions struggle when facing "resource-intensive AI applications."
  • Some platforms are representative projects in this category. It is a centralized storage solution designed for high performance requirements of AI, with key features including: high
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 8
  • Share
Comment
0/400
MoonBoi42vip
· 1h ago
Innovation continues to move forward
View OriginalReply0
GateUser-bd883c58vip
· 22h ago
The future is promising.
View OriginalReply0
BearMarketBardvip
· 22h ago
Let's talk about it during the bull run.
View OriginalReply0
GasFeeBarbecuevip
· 22h ago
The play people for suckers is back.
View OriginalReply0
PumpingCroissantvip
· 22h ago
Rationally optimistic about this trend
View OriginalReply0
AirdropHunterXMvip
· 22h ago
bullish on this field
View OriginalReply0
PebbleHandervip
· 22h ago
Charge forward towards a bright future
View OriginalReply0
BearMarketBuyervip
· 23h ago
Bear Market fearless ones move forward
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)