🌕 Gate Square · Mid-Autumn Creator Incentive Program is Live!
Share trending topic posts, and split $5,000 in prizes! 🎁
👉 Check details & join: https://www.gate.com/campaigns/1953
💝 New users: Post for the first time and complete the interaction tasks to share $600 newcomer pool!
🔥 Today's Hot Topic: #MyTopAICoin#
Altcoins are heating up, AI tokens rising! #WLD# and #KAITO# lead the surge, with WLD up nearly 48% in a single day. AI, IO, VIRTUAL follow suit. Which potential AI coins are you eyeing? Share your investment insights!
💡 Post Ideas:
1️⃣ How do you see AI tokens evolving?
2️⃣ Wh
ByteDance and USTC jointly proposed DocPedia, a large multimodal document model
DocPedia, a multi-modal document model jointly developed by ByteDance and the University of Science and Technology of China, has successfully broken through the limit of resolution and reached a high resolution of 2560×2560, while the industry's advanced multi-modal large models such as LLaVA and MiniGPT-4 process images with a resolution of 336×336, which cannot parse high-resolution document images. The result is that the research team has adopted a new approach to address the shortcomings of existing models in parsing high-resolution document images.
It is said that DocPedia can not only accurately identify image information, but also call the knowledge base to answer questions based on user needs, demonstrating the ability to understand high-resolution multimodal documents.