Image credit: Microsoft
US researchers have achieved a fresh breakthrough in training a high-performing AI model at low cost, after inexpensively trained models from China’s DeepSeek gained worldwide attention last month.
The S1 reasoning model, developed using open-source technology from Alibaba Group, was trained for under $50 (£40) and outperformed a recent OpenAI model on certain tasks, researchers from Stanford University and the University of Washington said.
As a base model the researchers used Alibaba’s Qwen2.5-32b-Instruct, which was the most-downloaded model last year from AI community Hugging Face, replacing Meta Platforms’ Llama as the top choice for researchers and developers.
Both Qwen and Llama models are open source, unlike closed-source competitors from the likes of Microsoft-backed OpenAI and Amazon-backed Anthropic.
S1 was developed by researchers from Stanford and the University of Washington, Seattle’s Allen Institute for AI (Ai2), including Li Fei-Fei, known as the “Godmother of AI”, according to research published last week.
After being trained with answers to 1,000 curated questions and the “thinking process” distilled from Google’s Gemini Thinking Experimental model, S1 outperformed OpenAI’s o1-preview on maths and programming skills, the research paper said.
The paper says the model was trained for 26 minutes on 16 Nvidia H100 GPUs, which can be rented for as little as $2 per hour.
Alibaba’s cloud unit introduced the Qwen2.5 series last September, with sizes ranging from 500 million to 72 billion parameters, in an indication of a model’s sophistication.
The researchers’ approach, called “test-time scaling”, involves making an AI model take more time to “consider” before delivering a response.
By forcing the model to continue processing if it starts to respond to a query too early, the model can refine and improve its answer, often avoiding mistakes in the output.
The researchers have open-sourced S1 and made it available on GitHub.
DeepSeek’s AI models attracted worldwide attention late last month after achieving similar performance to rivals, while costing a fraction of the price to train.
The massive computing power required to train and deploy generative AI systems has come into the spotlight recently, with Oracle, OpenAI and SoftBank announcing a spending plan of up to $500bn for US AI infrastructure and a similar 100bn euro (£83bn) project announced for France at the AI Action Summit this week.
Mark Zuckerberg firm Meta Platforms makes adverts on Threads app available to all eligible advertisers…
Ten former staffers ask attorney generals in California and Delaware to block OpenAI's for-profit conversion
European regulators have issued both Apple and Meta Platforms with fines totalling hundreds of millions…
Zuckerberg rebuked. Facebook’s Supreme Court seeks review of Meta's Community Notes tool that replaced fact-checkers
Ransomware has become big business. This article reveals how cybercriminals operate, why attacks are surging,…
Struggling chip giant Intel posed to announce plans to cut more than 20 percent of…