Categories: Artificial IntelligenceInnovationResearch

ByteDance Researchers Publish High-Performance AI Training Method

The headquarters of TikTok parent ByteDance. Image credit: ByteDance

Researchers from ByteDance, Tsinghua University, and the University of Hong Kong have released an open-source system for AI reinforcement learning that they say outperforms a reasoning system from DeepSeek.

The DAPO (Dynamic Sampling Policy Optimisation) system is designed to provide reinforcement-learning techniques for large language models (LLMs) that can be reused by other researchers.

AI companies often release only partial details of their RL methods, the researchers said, making the techniques difficult to reproduce.

Open approach

In a new research paper, they said they tried to reproduce DeepSeek’s GRPO (group relative policy optimisation) method, but their results trailed DeepSeek’s by 17 points in an AIME benchmark score, “suggesting that critical training details may have been omitted in the R1 paper”.

R1 is DeepSeek’s latest “reasoning” AI model.

Reasoning models deliberately “think” longer before delivering an answer, double-checking their responses and reducing the potential for errors.

In the interests of transparency and reproducibility, the DAPO team released the algorithmic details, training procedures and datasets used in their research.

The project includes training codes and a prepared dataset called DAPO-Math-17K for mathematical reasoning tasks.

The team said DAPO delivered significant performance improvements over DeepSeek’s GRPO on the American Invitational Mathematics Examination (AIME) 2024 benchmark, with a score of 50 points when using the open-source Qwen2.5-32B base model from Alibaba, compared to 47 points for GRPO.

Efficiency

DAPO achieved the score with half the training steps of GRPO, underscoring its efficiency, the team said.

The project is led by ByteDance intern Yu Qiying, a doctoral student at Tsinghua, with other participants being a Tsinghua undergraduate and a University of Hong Kong doctoral student, as the company seeks to work with top-level AI researchers before they have graduated.

The TikTok parent has invested heavily in AI, and its Doubao chatbot has become China’s most popular chatbot since its launch last May, ranking as the world’s second most popular after OpenAI’s ChatGPT.

Matthew Broersma

Matt Broersma is a long standing tech freelance, who has worked for Ziff-Davis, ZDnet and other leading publications

NextApple Reshuffles Executives As AI Plans Struggle »

Previous « Norwegian Man Files Complaint After ChatGPT Claims He Murdered Children

ByteDance Researchers Publish High-Performance AI Training Method

Open approach

Efficiency

Recent Posts

Signal App In Spotlight Amid Secret Chat Controversy Of US Officials

OpenAI’s Lightcap To Take On Expanded Role

China Unveils Deep-Sea Cable-Cutting Device

Pat Gelsinger Joins Faith-Based Tech Company Gloo

MEPs Ramp Pressure For Second EU Chips Act

Xiaomi Raises $5.5bn In Expanded Share Sale

ByteDance Researchers Publish High-Performance AI Training Method

Open approach

Efficiency

Related Post

Recent Posts

Signal App In Spotlight Amid Secret Chat Controversy Of US Officials

OpenAI’s Lightcap To Take On Expanded Role

China Unveils Deep-Sea Cable-Cutting Device

Pat Gelsinger Joins Faith-Based Tech Company Gloo

MEPs Ramp Pressure For Second EU Chips Act

Xiaomi Raises $5.5bn In Expanded Share Sale