Retro Computing Meets Modern AI: Running Llama 2 on DOS Machines

In a surprising twist that blends retro computing with cutting-edge artificial intelligence, Yeo Kheng Meng has successfully managed to run a stripped-down version of the Llama 2 large language model (LLM) on various DOS computers. While many would likely scoff at the idea of a 486 processor being capable of running such advanced applications, Yeo's achievements challenge that skepticism.
Yeo Kheng Meng is utilizing a specific implementation of the Llama 2 model that comes from Meta, known as Llama2.c. This library is notable not only for its function but also for its elegance, consisting of merely 700 lines of modern C code. However, adapting this code to work on DOS 6.22 and the aging Intel i386 architecture posed significant challenges, requiring a meticulous approach and a deep understanding of both the limitations of retro hardware and contemporary programming techniques.
Yeo has documented his efforts extensively, showcasing benchmarks from several retrocomputers. It may be hard for some to accept, but he states that a 486 or even a Pentium 1 can now indeed be classified as retro. This realization marks a significant point in the evolution of computing, where machines once deemed obsolete can still perform remarkable tasks.
While the models being run on these older systems are not particularly large, the TinyStories-trained model takes up only 260 kB and produces an impressive rate of 2.08 tokens per second on a standard 486 machine. In a rather ironic twist, a Pentium M Thinkpad T24, which dates back 21 years, can actually run a larger 110 MB model more efficiently than Yeos modern Ryzen 5 desktop. This anomaly stems from a memory allocation error that prevented the model from executing on the modern CPU, illustrating the idiosyncratic nature of programming and hardware compatibility.
As it stands, this port of Llama 2 is compatible with any 32-bit i386 hardware. However, Yeo is setting his sights on an even loftier goal: tackling the 16-bit environment. The community is now left to ponder whether someone might take on the challenge of running an Llama 2 model locally on an Intel 286 or perhaps a 68000-based machine. If that happens, the traditional question of Does it run DOOM? might need to be replaced with the more intriguing inquiry: Will it run an LLM?