OpenAI Has Just Undertaken a Challenging and Ambitious Journey: Understanding AI’s ‘Black Box’

  • The company aims to shed light on the inner workings of neural networks.

  • So-called “sparse autoencoders” hold promise in helping us tackle this challenge.

No comments Twitter Flipboard E-mail

Artificial intelligence has proven to be incredibly useful in various applications. It powers driver assistance systems like Tesla’s Autopilot and enables conversational chatbots like ChatGPT. However, despite its widespread use, we still don’t fully understand how AI works. This lack of understanding poses a challenge when it comes to ensuring the safety of the models we use every day.

On Thursday, OpenAI announced new methods for understanding how GPT-4 works. The company led by Sam Altman is using “sparse autoencoders” to identify features and patterns that can help us comprehend the model. So far, they’ve found 16 million features, but this number is expected to grow as they continue their research.

Understanding AI’s “Black Box”

In the field of AI, experts work with well-defined concepts and utilize extensive datasets to train neural networks in large language models (LLM). When these models become too large and complex to run on existing computing infrastructure, they employ techniques like Mixture of Experts (MoE) to divide the model capacity into different specialties.

Developers also have the capability to create multimodal models such as Gemini 1.5 or GPT-4o, which can process text, audio, and image inputs. Despite these advancements, the inner workings of these models remain a mystery. While a user can task an AI with summarizing a chapter of a book, they are unable to observe or understand the processes that occur inside the so-called “black box” of the model.

OpenAI's sparse autoencoders Sparse autoencoders.

It can describe it as not only a black box, but also a "closed box" that makes it difficult to understand what’s happening inside. This is because developers don’t create the artificial neural networks in the model. They just train them using algorithms. It’s a complex technology that we don’t fully understand and often even surprises experts.

OpenAI logo

OpenAI explains that the patterns of the neural activations in these models are unpredictable, which makes them hard to study. One way to gain insight into them is through the use of sparse autoencoders, which can extract millions of features from the models. While many of these features may be abstract or unimportant, some may help improve security and the overall quality of the models.

It’s important to note that there’s still a lot of work to be done, and the implementation of sparse autoencoders is still in its early stages. For now, OpenAI expects that the initial results from this approach to be used to monitor and adjust the behavior of its advanced models. It’s worth mentioning that ChatGPT’s developers aren’t the only ones working on this. Anthropic, another AI company, is also focused on enhancing sparse encoders.

Image | Xataka using Bing Image Creator

Related | OpenAI Is Close to Making ‘Her’ a Reality. Its New Voice Model Keeps You Company (and Could Make You Fall in Love)

Home o Index