Image credit: Microsoft
US researchers have achieved a fresh breakthrough in training a high-performing AI model at low cost, after inexpensively trained models from China’s DeepSeek gained worldwide attention last month.
The S1 reasoning model, developed using open-source technology from Alibaba Group, was trained for under $50 (£40) and outperformed a recent OpenAI model on certain tasks, researchers from Stanford University and the University of Washington said.
As a base model the researchers used Alibaba’s Qwen2.5-32b-Instruct, which was the most-downloaded model last year from AI community Hugging Face, replacing Meta Platforms’ Llama as the top choice for researchers and developers.
Both Qwen and Llama models are open source, unlike closed-source competitors from the likes of Microsoft-backed OpenAI and Amazon-backed Anthropic.
S1 was developed by researchers from Stanford and the University of Washington, Seattle’s Allen Institute for AI (Ai2), including Li Fei-Fei, known as the “Godmother of AI”, according to research published last week.
After being trained with answers to 1,000 curated questions and the “thinking process” distilled from Google’s Gemini Thinking Experimental model, S1 outperformed OpenAI’s o1-preview on maths and programming skills, the research paper said.
The paper says the model was trained for 26 minutes on 16 Nvidia H100 GPUs, which can be rented for as little as $2 per hour.
Alibaba’s cloud unit introduced the Qwen2.5 series last September, with sizes ranging from 500 million to 72 billion parameters, in an indication of a model’s sophistication.
The researchers’ approach, called “test-time scaling”, involves making an AI model take more time to “consider” before delivering a response.
By forcing the model to continue processing if it starts to respond to a query too early, the model can refine and improve its answer, often avoiding mistakes in the output.
The researchers have open-sourced S1 and made it available on GitHub.
DeepSeek’s AI models attracted worldwide attention late last month after achieving similar performance to rivals, while costing a fraction of the price to train.
The massive computing power required to train and deploy generative AI systems has come into the spotlight recently, with Oracle, OpenAI and SoftBank announcing a spending plan of up to $500bn for US AI infrastructure and a similar 100bn euro (£83bn) project announced for France at the AI Action Summit this week.
Deliveries of Telsa's 'bulletproof' Cybertruck are reportedly on hold, amid user complaints side trims are…
New feature reportedly being developed by Apple for iOS 19, that will allow AirPods to…
Binance BNB token rises after WSJ report the Trump family is in talks to secure…
After failed Amazon deal, iRobot warns there is “substantial doubt about the Company's ability to…
Community Notes testing across Facebook, Instagram and Threads to begin next week in US, using…