For a long time there was a (perhaps comforting) trope that China was adept at copying technology, but not at innovating. It’s not been possible to take that idea seriously for some time; new evidence to the contrary surfaces all the time, particularly in AI. A recent and important example is PanGu
, a Chinese variation on OpenAI’s GPT-3 language model (which made a big splash last year and which we’ve discussed several times before - see TiB 119
). Jeff Ding
and Jack Clark
have excellent commentary.
As Jeff notes, the team behind PanGu claims that it surpasses GPT-3 on a number of dimensions, particularly “few shot” learning tasks, where the model has a small number of examples of a specific task to train on. There are a few important and familar threads here. First, it’s another example of the continued efficacy of scale in machine learning (see TiB 152
): PanGu is a ~200bn parameter model trained on over a terrbyte of text (compared to GPT’s 175bn parameters trained on ~0.5TB of text)
Second, models of this scale require vast, nation state- or Big Tech-level resources to train, which makes them hard to replicate or audit for startup and academic actors (see TiB 159
). Jeff points out that PanGu was a collaboration between Huawei and PCL, a government owned and operated research lab. Third, semiconductor politics are never far from the surface. As Jack notes, the model was trained on Ascend processors
, which - though currently manufactured by TSMC - Huawei is trying
to disentangle from supply chains the US can control. PanGu will be worth keeping an eye on.