China’s DeepSeek Sends Shockwaves Through the Tech Community – 2 Articles

❈ ❈ ❈

DeepSeek’s Geopolitical Impacts

Hongda Jiang

Ever since ChatGPT was popularized among consumers, prominent tech giants across the world have been working on their own version of LLMs—be it Meta’s Llama, X’s Grok, Anthropic’s Claude, or Beijing Zero One’s 01.AI. What makes DeepSeek’s models tower above aforementioned competitors is that it is able to achieve comparable or superior performance in all benchmarked categories while spending a fraction of the time & money required by the next best competitor. For reference, DeepSeek reportedly spent roughly $6M USD to train its model, using about 2.8M GPU hours on 2000+ Nvidia H800 GPUs (which operates at approximately 1/6th the speed of Nvidia’s most advanced H100 series). It achieved this feat in less than 2 months. This is less than 10% of the cost of the next cheapest model—Llama 3 (at least $70M spent), & less than 6% of the GPU-hours spent by the next fastest non-Chinese competitor—OpenAI’s GPT-4 (approximately 50-60M GPU-hours spent), despite the latter’s access to leading edge Nvidia GPUs that far outperform DeepSeek’s inferior H800s. Better yet, much of DeepSeek’s features & functionalities are open-source under an “MIT license”—meaning that anyone can copy, modify, & distribute the associated software & documentation free of charge & without restriction.

This is a revolutionary milestone in the still nascent LLM industry, & there are a few obvious strategic implications of this event:

  1. US semiconductor sanctions against China have decisively failed. Ever since the Trump regime’s first export controls against ZTE in late 2017 (& later vs Huawei in mid-2018), the U.S. has imposed ever more strict export bans on semiconductor exports to China. These sanctions not only prohibit sales of advanced semiconductor end-products to China, but also sales of semiconductor manufacturing equipment, so as to prevent China from being able to access & build the latest semiconductors, & therefore keep China behind the U.S. in terms of accessing the latest advances in AI. These 8 years of ever stricter sanctions not only compelled Chinese enterprises to increase self-reliance across the entire semiconductor value-chain (which would be a first for any country in the semiconductor value chain), but use its limited computing power far more efficiently relative to its U.S. counterparts, so as to get outsized results—as demonstrated by DeepSeek’s latest achievement. While the original DeepSeek model was trained using U.S.-made Nvidia H800s, it is plausible that subsequent models can use domestically produced counterparts such as Huawei’s “Ascend 910C”. While the Ascend series do not have access to the latest cutting edge manufacturing processes (TSMC 2nm), it is a good-enough platform to run the DeepSeek R1 model at scale. In fact, DIY enthusiasts have already demonstrated that the basic open-source DeepSeek software can run on low-end computers such as the Raspberry Pi (albeit without the full 671 billion-parameter model), with power consumption as little as that of an ordinary smartphone.
  2. Valuations of U.S. tech giants must be revised exponentially downwards. As recently as last year, it was assumed that any company that wants to build an LLM needs hundreds of millions of dollars in sophisticated hardware (that only a few companies such as Nvidia can provide), & tens of millions of GPU-hours. This meant that only the richest tech companies in the world—Google, Meta, Microsoft, etc.—can afford to build, maintain, & offer the services of an LLM. Consequently, the profits associated with LLM services would be concentrated in the hands of a few companies that would command multi-trillion dollar valuations (e.g. Nvidia). The release of DeepSeek R1 shattered this assumption. It has demonstrated that a startup with less than 10 million USD can build & train a model, using older hardware that is well behind the leading edge. Therefore, small companies can profitably offer services at pennies on the dollar, given the low financial barrier to entry. Consequently, all the profits (& therefore the overall company valuations) forecasted by the U.S. tech oligopoly must now be revised downwards significantly, with potentially perilous consequences on U.S. financial markets.
  3. The global south can now enjoy the fruits of generative AI. The most transformative impact of DeepSeek is not directly related to China or the U.S., but rather the rest of the world (particularly the global south). Now that everyone in the world has access to a top-performing, open-source LLM that has relatively minimal hardware requirements, the financial & hardware barrier to entry that kept the global south out of the AI game has all but been eliminated. Moreover, no country in the world can keep advanced AI technology out of the hands of any other country, big or small, due to geopolitical differences. The new bottlenecks to the application of AI are now education & imagination. That said, even education is becoming less & less of a barrier to AI, since DeepSeek users have already demonstrated the ability to develop software code (including AI code) without manually writing a single line of code. DeepSeek’s free, open-source LLM will unleash the imaginative & innovative abilities of over 6 billion people in the global south.

DeepSeek’s accomplishment is undoubtedly a great boost to China in the Sino-US technology race. Its benefits go well beyond simply mitigating the impact of U.S. semiconductor export prohibitions, its bigger potential value add comes from 2 other sources:

  1. Expanded semiconductor export opportunities. DeepSeek made it possible to run a scalable, high-performing LLM on relatively affordable but performance-constrained hardware platforms. Consequently, the available market for small scale enterprise & government AI infrastructure with targeted use cases is greatly expanded in global south markets. As the world’s leading manufacturer of legacy semiconductors, China is in the ideal position to sell relatively low-end AI chips & backend infrastructure—or the cloud based services thereof—to developing countries that previously could not afford to deploy or use high-performance computing infrastructure for AI use cases.
  2. Expanded mind share in the AI developer ecosystem. As DeepSeek becomes the LLM of choice for app developers, researchers, & enthusiasts from developed & developing countries alike, its rapid adoption will lead to faster improvements, more available services, accelerated innovation, & broader community support to make DeepSeek an even more attractive alternative for a larger number of people in the future. The fact that it is mostly open-source makes it nearly impossible for any government to restrict or prohibit the use & proliferation of these aforementioned improvements, thus making it far more resistant to geopolitical upheaval.

Despite the numerous upsides for China, there are also significant uncontrollable risks that could be triggered as a result of this accomplishment. First & foremost on this author’s mind is the possibility that DeepSeek may prompt the U.S. to loosen semiconductor export controls, upon witnessing the relative ineffectiveness of such measures. Such a measure may have the detrimental effect of luring Chinese enterprises back to a state of dependency on higher-performing U.S. technology, thus shifting revenue & R&D dollars away from local Chinese upstarts in the ICT value chain. Contrary to popular belief, the sustainability of China’s technological progress is far more vulnerable to a more “friendly” U.S. rather than a more “hostile” one. Another possible, perhaps inevitable, side effect is that DeepSeek’s accomplishment adds to a litany of other recent “Sputnik moments”—be it the “Great American RedNote Migration”, the test flight of 2 6th generation fighter platforms, or the recent breakthrough of EAST’s sustained nuclear fusion reaction to over 1000 seconds—that might galvanize the American public & elites alike to make a more coordinated, whole-of-society effort to maintain a technology lead over the PRC. Unfortunately for China, there are no practical means available to mitigate either of these risks.

In sum, the release of DeepSeek R1 marks a pivotal moment in the evolution of AI & its geopolitical ramifications. By achieving state-of-the-art performance at a fraction of the cost and time required by its competitors, DeepSeek has not only demonstrated China’s growing technological prowess but also reshaped the global AI landscape. The failure of U.S. semiconductor sanctions to stifle Chinese innovation, the potential devaluation of U.S. tech giants, and the democratization of AI for the global south are just the beginning of the transformative changes ushered in by this breakthrough. As DeepSeek’s open-source model proliferates, it will empower billions of people worldwide, accelerate global innovation, and challenge the existing technological and economic order. In this new era, the winners will be those who can harness the power of AI to mitigate humanity’s greatest challenges—regardless of their geographic or economic starting point.

(Hongda Jiang is an editor at The China Academy. Courtesy: The China Academy. The China Academy provides up-to-date, in-depth and contextual information about contemporary China.)

❈ ❈ ❈

DeepSeek: How a Small Chinese AI Company Is Shaking up US Tech Heavyweights

Tongliang Liu

Chinese artificial intelligence (AI) company DeepSeek has sent shockwaves through the tech community, with the release of extremely efficient AI models that can compete with cutting-edge products from US companies such as OpenAI and Anthropic.

Founded in 2023, DeepSeek has achieved its results with a fraction of the cash and computing power of its competitors.

DeepSeek’s “reasoning” R1 model, released last week, provoked excitement among researchers, shock among investors, and responses from AI heavyweights. The company followed up on January 28 with a model that can work with images as well as text.

So what has DeepSeek done, and how did it do it?

What DeepSeek Did

In December, DeepSeek released its V3 model. This is a very powerful “standard” large language model that performs at a similar level to OpenAI’s GPT-4o and Anthropic’s Claude 3.5.

While these models are prone to errors and sometimes make up their own facts, they can carry out tasks such as answering questions, writing essays and generating computer code. On some tests of problem-solving and mathematical reasoning, they score better than the average human.

V3 was trained at a reported cost of about US$5.58 million. This is dramatically cheaper than GPT-4, for example, which cost more than US$100 million to develop.

DeepSeek also claims to have trained V3 using around 2,000 specialised computer chips, specifically H800 GPUs made by NVIDIA. This is again much fewer than other companies, which may have used up to 16,000 of the more powerful H100 chips.

On January 20, DeepSeek released another model, called R1. This is a so-called “reasoning” model, which tries to work through complex problems step by step. These models seem to be better at many tasks that require context and have multiple interrelated parts, such as reading comprehension and strategic planning.

The R1 model is a tweaked version of V3, modified with a technique called reinforcement learning. R1 appears to work at a similar level to OpenAI’s o1, released last year.

DeepSeek also used the same technique to make “reasoning” versions of small open-source models that can run on home computers.

This release has sparked a huge surge of interest in DeepSeek, driving up the popularity of its V3-powered chatbot app and triggering a massive price crash in tech stocks as investors re-evaluate the AI industry. At the time of writing, chipmaker NVIDIA has lost around US$600 billion in value.

How DeepSeek Did It

DeepSeek’s breakthroughs have been in achieving greater efficiency: getting good results with fewer resources. In particular, DeepSeek’s developers have pioneered two techniques that may be adopted by AI researchers more broadly.

The first has to do with a mathematical idea called “sparsity”. AI models have a lot of parameters that determine their responses to inputs (V3 has around 671 billion), but only a small fraction of these parameters is used for any given input.

However, predicting which parameters will be needed isn’t easy. DeepSeek used a new technique to do this, and then trained only those parameters. As a result, its models needed far less training than a conventional approach.

The other trick has to do with how V3 stores information in computer memory. DeepSeek has found a clever way to compress the relevant data, so it is easier to store and access quickly.

What It Means

DeepSeek’s models and techniques have been released under the free MIT License, which means anyone can download and modify them.

While this may be bad news for some AI companies – whose profits might be eroded by the existence of freely available, powerful models – it is great news for the broader AI research community.

At present, a lot of AI research requires access to enormous amounts of computing resources. Researchers like myself who are based at universities (or anywhere except large tech companies) have had limited ability to carry out tests and experiments.

More efficient models and techniques change the situation. Experimentation and development may now be significantly easier for us.

For consumers, access to AI may also become cheaper. More AI models may be run on users’ own devices, such as laptops or phones, rather than running “in the cloud” for a subscription fee.

For researchers who already have a lot of resources, more efficiency may have less of an effect. It is unclear whether DeepSeek’s approach will help to make models with better performance overall, or simply models that are more efficient.The Conversation

(Tongliang Liu is Associate Professor of Machine Learning and Director of the Sydney AI Centre at University of Sydney. Courtesy: The Conversation, an Australia-based nonprofit, independent global news organization dedicated to unlocking the knowledge of experts for the public good.)

Janata Weekly does not necessarily adhere to all of the views conveyed in articles republished by it. Our goal is to share a variety of democratic socialist perspectives that we think our readers will find interesting or useful. —Eds.

Facebook
Twitter
LinkedIn
WhatsApp
Email
Telegram

Contribute for Janata Weekly

Also Read In This Issue:

Saluting Zakia Jafri; Remembering the Gujarat Carnage 2002

On 1 February 2025, Zakiaben was called to her eternal reward. In her death, the people of India have lost a great soul. She suffered much since that fateful day, when her dear husband Ehsan Jafri was brutally murdered. Since then, she fought relentlessly for justice not merely for herself but all women and other victims of an unjust and violent system.

Read More »

If you are enjoying reading Janata Weekly, DO FORWARD THE WEEKLY MAIL to your mailing list(s) and invite people for free subscription of magazine.