DeepSeek Open-Sources AI Model Training Details

Chinese AI start-up DeepSeek has released a series of open-source projects on GitHub, as it revealed details about how it trained its low-cost, high-performance models that shocked international markets earlier this year.

The company released two models in January, the V3 large language model and the R1 reasoning model, which it said achieved performance on par with major Western competitors but were trained for a fraction of the cost.

That, and the fact that the models were developed under an open-source model, forced investors to re-evaluate the hundreds of billions that companies have poured into AI development over the last two years.

Open source code. Image credit: Unsplash
Image credit: Unsplash

AI optimisation

The popularity of Hangzhou-based DeepSeek initially wiped $1 trillion (£800bn) from international markets, and over the past few weeks has spurred new investor interest in mainland Chinese tech stocks.

The eight new open-source projects the start-up released last week are the first time it has disclosed details of the techniques it used to gain optimal performance from compute, communications and storage, three key aspects of model training.

The company’s developers, who are mostly young university graduates, said they were disclosing the company’s “battle-tested building blocks” to share “our small-but-sincere progress with full transparency”.

vLLM, an open-source AI project originating from the University of California, Berkeley, said it had already used one of DeepSeek’s techniques to improve efficiency by up to 16 percent.

DeepSeek’s open source approach contrasts to the closed-source model of Microsoft-backed OpenAI, whose ChatGPT ignited the current AI craze when it was released in November 2022.

Low-cost approach

ChatGPT chief executive Sam Altman said in February that the company “needs to figure out a different open-source strategy”.

The company on Thursday launched GPT-4.5, which costs users $150 per 1 million output tokens, compared to 55 cents for DeepSeek’s V3 and R1 at off-peak times.

DeepSeek last week cut prices by up to 75 percent for access to its models via its application programming interface (API) during off-peak hours.

The company recently resumed account top-ups for developers after suspending them for more than two weeks due to overwhelming demand.

Matthew Broersma

Matt Broersma is a long standing tech freelance, who has worked for Ziff-Davis, ZDnet and other leading publications

Recent Posts

TSMC Denies Talks With Intel Over Chipmaking Joint Venture

Denial from TSMC, after multiple reports it was in talks with Intel over a joint…

5 hours ago

Apple iPhone Shipments In China Slide, As Cook Talks With Trump Official

CEO Tim Cook talks to Trump official, as IDC notes China's smartphone market growth, and…

7 hours ago

AMD Warns Of $800m Charge From US Chip Restrictions On China

Another big name chip maker expects a hefty financial charge, after the US tightened rules…

8 hours ago

Google Digital Ad Network Ruled Illegal Monopoly By Judge

More bad news for Google. Second time in less than a year that some part…

1 day ago

US State Dept Closes Office Flagging Russia, China Disinformation

Federal office that tackled misinformation and disinformation from hostile nations is closed down, after criticism…

1 day ago

Nvidia CEO Jensen Huang Makes Surprise Visit To China

After Nvidia admits it will take $5.5 billion charge as Trump export limits of slower…

1 day ago