Researchers Deliver High-Performance AI Model For Under $50

US researchers have achieved a fresh breakthrough in training a high-performing AI model at low cost, after inexpensively trained models from China’s DeepSeek gained worldwide attention last month.

The S1 reasoning model, developed using open-source technology from Alibaba Group, was trained for under $50 (£40) and outperformed a recent OpenAI model on certain tasks, researchers from Stanford University and the University of Washington said.

As a base model the researchers used Alibaba’s Qwen2.5-32b-Instruct, which was the most-downloaded model last year from AI community Hugging Face, replacing Meta Platforms’ Llama as the top choice for researchers and developers.

Stanford University AI researcher Li Fei-Fei pictured in May 2024. Image credit: Li Fei-Fei
Stanford University AI researcher Li Fei-Fei pictured in May 2024. Image credit: Li Fei-Fei

Open source AI

Both Qwen and Llama models are open source, unlike closed-source competitors from the likes of Microsoft-backed OpenAI and Amazon-backed Anthropic.

S1 was developed by researchers from Stanford and the University of Washington, Seattle’s Allen Institute for AI (Ai2), including Li Fei-Fei, known as the “Godmother of AI”, according to research published last week.

After being trained with answers to 1,000 curated questions and the “thinking process” distilled from Google’s Gemini Thinking Experimental model, S1 outperformed OpenAI’s o1-preview on maths and programming skills, the research paper said.

The paper says the model was trained for 26 minutes on 16 Nvidia H100 GPUs, which can be rented for as little as $2 per hour.

Alibaba’s cloud unit introduced the Qwen2.5 series last September, with sizes ranging from 500 million to 72 billion parameters, in an indication of a model’s sophistication.

The researchers’ approach, called “test-time scaling”, involves making an AI model take more time to “consider” before delivering a response.

Computing power

By forcing the model to continue processing if it starts to respond to a query too early, the model can refine and improve its answer, often avoiding mistakes in the output.

The researchers have open-sourced S1 and made it available on GitHub.

DeepSeek’s AI models attracted worldwide attention late last month after achieving similar performance to rivals, while costing a fraction of the price to train.

The massive computing power required to train and deploy generative AI systems has come into the spotlight recently, with Oracle, OpenAI and SoftBank announcing a spending plan of up to $500bn for US AI infrastructure and a similar 100bn euro (£83bn) project announced for France at the AI Action Summit this week.

Matthew Broersma

Matt Broersma is a long standing tech freelance, who has worked for Ziff-Davis, ZDnet and other leading publications

Recent Posts

Meta Begins Advertising On Threads Globally

Mark Zuckerberg firm Meta Platforms makes adverts on Threads app available to all eligible advertisers…

14 hours ago

Former OpenAI Staff Seek To Block For-Profit Conversion

Ten former staffers ask attorney generals in California and Delaware to block OpenAI's for-profit conversion

14 hours ago

European Commission Fines Both Apple, Meta For DMA Breaches

European regulators have issued both Apple and Meta Platforms with fines totalling hundreds of millions…

15 hours ago

Meta Oversight Board Rebukes Zuck’s Firm For Axing Fact-Checkers

Zuckerberg rebuked. Facebook’s Supreme Court seeks review of Meta's Community Notes tool that replaced fact-checkers

16 hours ago

The Ransomware Business Model: The State of Cybercrime

Ransomware has become big business. This article reveals how cybercriminals operate, why attacks are surging,…

19 hours ago

Intel To Cut Over 20 Percent Of Workforce – Report

Struggling chip giant Intel posed to announce plans to cut more than 20 percent of…

20 hours ago