Today’s large language models (LLMs) are amazing, but use one for even a short time and the weaknesses of current LLM approaches start to show. While their ability to write recipes in the style of your favourite band’s lyrics is astonishing, logic and planning remain as area of poorer performance.

There’s been a lot of speculation about potential new AI methods recently. Just search for “Q*” or “Q star” if you want to disappear Alice-like into a Wonderland of speculation[1], with suggestions that Q* is a newly discovered cross pollination of Q-learning[2] and A* graph search[3] that gives a huge boost to AI’s ability to engage in logic and planning. It is easy to find any number of commentators wondering aloud whether some combination of Q learning’s optimised exploration of possible actions, together with A*’s pathfinding techniques has resulted in a new, as yet undisclosed, breakthrough. In the more breathless accounts, it might even lead to machines capable of metacognition (or thinking about how they think) and the capacity to improve their own algorithms.

The potential for AI improvements shouldn’t surprise us – after all, the sole current example of ‘human equivalent intelligence’ we have is our own brains.

In the burgeoning era of artificial intelligence, the human brain stands as a paragon of efficiency on many fronts. When we consider the brain as a neural network with an astounding 100 trillion trainable parameters (in round numbers), its efficiency becomes even more pronounced.

Brains win out

Firstly, the brain’s volumetric efficiency is unparalleled. Occupying a mere 1,400 cubic centimetres or so, it encompasses a parametric learning capacity that would require a data centre-sized GPU-based supercomputer to replicate[4]. This compactness is not just about size; it represents an incredible density of processing power in a 3D structure, which our (largely) 2D CPU and GPU designs cannot match.

Moreover, the energy efficiency of the brain is nothing short of extraordinary. To run a full GPU, TPU or IPU based AI with anything approaching parametric equivalency with a brain would demand megawatts of power, accompanied by substantial heat generation. In stark contrast, the human brain operates on just a few tens of watts at a balmy 37 degrees Celsius. In environmental terms, the brain has far lower energy requirements per computation (and given the scarcity of carbon-free generation, likely a lower carbon footprint) than our current GPU-based AI hardware.

The apparent efficiency of the brain’s learning algorithms further illustrate its superiority. While current LLMs require vast troves of text to train, often using terabytes from the Common Crawl or some similar archive, the human brain achieves linguistic proficiency with a remarkably small dataset. While claims that LLMs train using ‘all of human knowledge’ are hyperbolic, the data used to train LLMs dwarfs the material that any human could absorb in an entire lifetime. We learn to speak, comprehend, and interact using a tiny fraction of the data that AI systems require, demonstrating an innate ability to learn from comparatively limited information.

Lastly, the integrated learning model of the brain sets it apart. Unlike AI systems – which require distinct phases for training and inference – the brain appears to combine learning and inference in a seamless, ongoing process. This simultaneous operation not only saves time and resources but also allows for a more dynamic and adaptive form of learning.

All this begs the question of what progress we can expect in AI research to close these gaps?

From grey cells to silicon wafers and optic fibres

Significant hardware advances would be required to move brain-scale AI from the datacentre to anything even vaguely portable.

One promising avenue of research of these is Compute in Memory (CIM). The Von Neumann architecture, suggested by polymath John Von Neuman at the dawn of the electronic computing age, separates processing (logic) from data storage (memory)[5]. The need to shuttle data from memory into the tiny caches used within logic units, and then back to memory creates the Von Neumann bottleneck[6], something which still constrains performance in many of today’s cutting edge hardware designs. By integrating processing and memory, CIM addresses a critical inefficiency in current AI systems: the energy and time-intensive process of shuttling data between memory and processors. Several CIM based AI accelerators exist, although they tend to be aimed at running pre-trained models in environments where cloud connectivity is problematic. If CIM hardware can be improved and used for training as well as inference, it could herald a significant leap towards mimicking the compact, efficient nature of our brains, where processing and memory are not just in close proximity but are fundamentally interwoven.

Another area ripe for development is analogue computing. In stark contrast to digital systems, analogue circuits require far fewer components for operations like addition and multiplication, potentially leading to more space- and power-efficient designs. This efficiency is bought at the cost of easy interrogation, with analogue designs rarely allowing for the complete read out of intermediate steps in an analogue process, where digital processes allow the recall of precise values for each variable at every step in an algorithm. While superficially a downside of analogue compute hardware when it comes to explainability, in practice this might not be the issue it first appears to be. Since the operation of neural networks (whether digital or analogue) is relatively opaque, this loss of capacity for a precise read out of neurons within hidden layers might not necessarily add to the real-world explainability challenge with AI.

New players in the silicon space like Rain AI[7] and Mythic[8], are developing products that combine CIM and analogue elements, pursuing AI accelerators that could offer significant power, complexity and size efficiencies for similar performance levels when compared to digital AI accelerators. This isn’t purely a field being investigated by startups, however. Research is also being pursued by giants of high-performance computing like IBM[9] and Intel[10].

While our brains use electrical (or electro-chemical) processing, we may be able to leapfrog them in performance terms using a completely different approach. A dazzling development in AI hardware is to harness the speed of light through photonic computing. By using light for computational tasks, photonic computing offers high-speed and energy-efficient solutions, especially in performing operations pivotal to AI algorithms, like matrix multiplication. Innovations in this field, driven by companies like NTT[11] and Lightmatter[12], showcase the potential of photonic computing to revolutionize AI hardware. These systems could use different frequencies or colours of light simultaneously within the processor, allowing multiple different computations to be carried out at the same time within the same hardware to further increase computational density.

Photonic computing could have another significant advantage – heat, or rather the lack of heat. Computing using tiny Mach-Zehnder interferometers generates no appreciable heat from the light passing through the system, where any silicon semiconductor substrate will generate heat as electricity passes through it. This is important because heat extraction and dissipation is one important factor preventing silicon wafers bring stacked within a small volume –without a huge additional volume of cooling hardware, a 1400cm3 volume of GPU wafers would fry themselves. In contrast, it is feasible that many photonic processors could occupy a brain sized volume without generating an insurmountable level of heat.

So we’re some way away from matching the brain in AI hardware, but there are lots of areas of exploration. What about the software side and our AI models?

Model Village

The intriguing parallels between the evolution of the brain and the development of AI models offer a fresh perspective on advancing AI technologies. The human brain, a marvel of natural evolution, exemplifies complexity that has emerged from what we can hypothesize to be deep-rooted simplicity. Think of how often simple fractals underpin amazing complexity in nature. This has suggested to some in the field that the algorithms responsible for intelligence and consciousness may themselves be relatively straightforward[13], with their sophistication being an emergent property from a simple yet extensively replicated structure.

When we consider the brain’s 100 trillion parameters, it’s important to note that only a fraction of these are dedicated to language and reasoning[14]. This realization opens up the possibility that large language models (LLMs) could achieve comparable levels of performance without needing to match the brain’s full parameter count. The implications are profound, suggesting that we may not be as far from replicating certain aspects of human intelligence in AI as previously thought.

Developments in multi-modal ‘generalist’ AI models, such as Deepmind’s GATO[15], illustrate the potential of multimodal models to excel across diverse domains. Other models implement ‘mixture of expert’ structures, enable larger systems to have specific components trained for distinct tasks at little additional inference cost[16]. This approach could lead to sophisticated AI models that integrate various capabilities, such as the spatial awareness and route / action planning seen in self-driving car AI, the emergent properties of conceptual reasoning of present in the most massive LLMs, and the creative virtuosity of diffusion models.

Current LLM architectures primarily function as input/output systems. However, I would argue that the future of AI software lies in transcending this limitation. Ideally, future developments should pave the way for AIs with a form of continuous consciousness, a concept explored in greater depth in my recent article “In Two Minds”[17]. Such an advancement would represent a monumental leap in AI’s ability to mimic the dynamic, ongoing thought processes of the human mind.

Lastly, the training / inference divide remains a weakness of current AI. While training by backpropagation and stochastic gradient descent has proved extremely effective, it does mean that AI models don’t learn from data in live use. Instead, it must be stored for a separate training run, which will update the underlying weights and biases of the model. The brain seems to be capable of both learning and doing, with no clear equivalent to the training / inference separation. Alternatives to backpropagation have been proposed, including Geoffrey Hinton’s ‘forward-forward’ approach, but as yet, none of these approaches seem as effective as training by backpropagation[18]. Anyone who has watched a child at play will have seen human learning-by-doing in action, and inculcating a similar capability in our machines remains on our collective AI ‘to do’ list.

Thinking simple and complex

All this is to say there are many exciting avenues being pursued within the fields of AI hardware and software, all at an incredible pace.

It is clear that the journey towards replicating or even surpassing the human brain’s capabilities is both challenging and exhilarating. On the hardware front, innovative technologies like Compute in Memory, analogue computing, and photonic computing are pushing the boundaries, potentially bringing us closer to achieving the compact efficiency of the brain. In software, the evolution of LLMs and multimodal models hints at a future where AI can integrate diverse functionalities, mirroring the human brain’s multifaceted abilities.

Beyond that, the true essence of AI advancement may lie in simplicity – the notion that complex intelligence and consciousness could emerge from basic, repetitive structures. This perspective not only redefines our approach to AI development but also underscores the extraordinary nature of the human brain as a model for technological inspiration.

With continued exploration and innovation, the convergence of these advancements may soon yield AI systems that not only emulate but comprehensively surpass human cognitive capabilities, ushering in a new era of artificial general intelligence.