The hardware DeepSeek uses to train its open source AI model is still generating some mistrust. According to information released by the Chinese company, the infrastructure for training DeepSeek R1 includes 2,048 Nvidia H800 chips. Additionally, DeepSeek says the process, which involved 671 billion parameters, cost $5.6 million. However, some analysts are contesting these figures.
A revealing report by SemiAnalysis suggests that DeepSeek’s AI model training infrastructure comprises around 50,000 GPUs based on Nvidia’s Hopper microarchitecture. Analysts Dylan Patel, AJ Kourabi, Doug O’Laughlin, and Reyk Knuhtsen claim that at least 10,000 of these GPUs are Nvidia H100 models, while another 10,000 are H800 GPUs. According to their analysis, the remaining chips are H20 GPUs.
If this information is accurate, the true cost of training DeepSeek R1 is likely much higher. Analysts estimate that DeepSeek’s total investment in servers is around $1.6 billion. The company uses this infrastructure to train AI models, perform financial modeling, and conduct research across multiple locations.
Huawei Is Strengthening Its Position in the Inference Process
DeepSeek has two significant advantages that shouldn’t be overlooked. First, the Chinese company operates its own processing infrastructure. This sets it apart from other startups in the same field, which often rely on large cloud service providers. By having its own hardware, DeepSeek can be highly efficient in developing and optimizing its AI models.
Second, DeepSeek has a distinctive talent acquisition strategy. Unlike many similar Chinese companies, it exclusively recruits engineers from mainland China and doesn’t seek talent from the U.S. or Taiwan. Moreover, DeepSeek offers competitive salaries, paying its top researchers more than $1.3 million per year. This focus on acquiring talent has enabled the company to achieve significant innovations in AI, prioritizing efficiency over mere brute force improvements.
Additionally, it’s important to note that DeepSeek isn’t entirely dependent on Nvidia hardware. AI GPUs developed by Huawei, such as the Ascend 910C chips, have proven to be very efficient in inference processing. Broadly speaking, inference is the computational process performed by language models to generate responses based on the requests they receive. Along with SiliconFlow (another Chinese company focused on infrastructure deployment), Huawei plays a crucial role in making DeepSeek’s V3 and R1 models widely available to users worldwide.
Image | Cristiano Firmani
View 0 comments