Image credit: ByteDance
Researchers from ByteDance, Tsinghua University, and the University of Hong Kong have released an open-source system for AI reinforcement learning that they say outperforms a reasoning system from DeepSeek.
The DAPO (Dynamic Sampling Policy Optimisation) system is designed to provide reinforcement-learning techniques for large language models (LLMs) that can be reused by other researchers.
AI companies often release only partial details of their RL methods, the researchers said, making the techniques difficult to reproduce.
In a new research paper, they said they tried to reproduce DeepSeek’s GRPO (group relative policy optimisation) method, but their results trailed DeepSeek’s by 17 points in an AIME benchmark score, “suggesting that critical training details may have been omitted in the R1 paper”.
R1 is DeepSeek’s latest “reasoning” AI model.
Reasoning models deliberately “think” longer before delivering an answer, double-checking their responses and reducing the potential for errors.
In the interests of transparency and reproducibility, the DAPO team released the algorithmic details, training procedures and datasets used in their research.
The project includes training codes and a prepared dataset called DAPO-Math-17K for mathematical reasoning tasks.
The team said DAPO delivered significant performance improvements over DeepSeek’s GRPO on the American Invitational Mathematics Examination (AIME) 2024 benchmark, with a score of 50 points when using the open-source Qwen2.5-32B base model from Alibaba, compared to 47 points for GRPO.
DAPO achieved the score with half the training steps of GRPO, underscoring its efficiency, the team said.
The project is led by ByteDance intern Yu Qiying, a doctoral student at Tsinghua, with other participants being a Tsinghua undergraduate and a University of Hong Kong doctoral student, as the company seeks to work with top-level AI researchers before they have graduated.
The TikTok parent has invested heavily in AI, and its Doubao chatbot has become China’s most popular chatbot since its launch last May, ranking as the world’s second most popular after OpenAI’s ChatGPT.
Messaging app Signal in the headlines after a journalist was invited to a top secret…
OpenAI chief operating officer Brad Lightcap to oversee international expansion as company consolidates lead in…
Chinese researchers publish details on device that could wreak havoc on undersea communications cables in…
Former Intel chief Gelsinger expands role at Gloo, becoming executive chairman and head of technology…
MEPs add to Commission pressure for second EU Chips Act amidst industry calls for renewed…
Smartphone maker Xiaomi reportedly raises about $5.5bn in Hong Kong share sale as it invests…