The course, "Building AI Applications with Open-Source" is now available

The Technical User's Introduction to Large Language Models (LLMs)

An Advanced Introduction to LLMs
Created: 05 January 2024
Last updated: 14 January 2024

The Technical User’s Introduction to Large Language Models (LLMs)

This post is a summary of Andrej Karpathy’s excellent 1 hour talk introducing LLMs

Table of Contents

1. What is a Large Language Model?
2. How are Large Language Models Created?
3. Stages of LLM Training
4. Evaluating LLM Performance
5. Exploring the Capabilities of LLMs
6. The Future of Large Language Models and Generative AI

What is a Large Language Model?

At its core, a Large Language Model (LLM) comprises two essential components:

  1. Parameter File: This file houses the parameters. Each parameter is represented as a 2-byte float16 number. For instance, the llama-2-70b model has a 140GB parameter file due to its 70 billion parameters. More on how this file is generated shortly.
  2. Runtime File: This is a programming script, often compact enough to be written in around 500 lines of C code (though it could be written in any language). It defines the neural network architecture that utilizes the parameters to function.

Together, these two files create a self-sufficient package that can operate on a robust laptop, independent of internet connectivity. Users interact with the model by inputting text (prompting), from which the model generates responses. The process whereby the model generates responses is known as inference.

In the context of an LLM, an inference refers to the model’s ability to generate predictions or responses based on the context and input it has been given.

The speed with which inference (text generation) occurs depends a lot on the hardware you are using, and whether it is GPU-enabled. This is a complex topic, here is an overview for those wanting more detail.

LLM two files

Naming Conventions

Consider the model name llama-2-70b:

  • Llama indicates the family of models.
  • 2 represents the iteration.
  • 70b signifies the number of parameters, i.e., 70 billion.
  • Note: The llama-2-70b model, for example, operates approximately 10 times slower on a local machine compared to a 7b model.

How Are Large Language Models Created?

Creating Parameters

The true ‘magic’ of LLMs lies in the parameters, which are crafted during the intensive process of model training. Training involves condensing a vast amount of internet text data. For instance, llama-2-70b utilized around 10TB of web-crawled text and required significant computational resources:

This training stage effectively compresses the internet text into a ‘lossy’ zip file format, where the 10TB of data is reduced to a 140GB parameter file, achieving a compression ratio of about 100X. Unlike lossless compression (like a typical zip file), this process is lossy, meaning some original information is not retained.

As of now, cutting-edge models like GPT-4 (which stands for Generative Pre-trained Transformer) or Claude are about ten times more computationally demanding than llama-2-70b. More on the differences between models in later sections.

Neural Network Operation

At its heart, a Neural Network in an Large Language Model aims to predict the next word in a given sequence of words. The parameters are intricately woven throughout the network, underpinning its predictive capabilities. There’s a strong mathematical relationship between prediction and compression, explaining why training an LLM can be likened to compressing the internet.

LLM two files


LLM Neural Network Workings

  • Architecture: Large Language Models typically use the Transformer Network Architecture. The use of this Transformer approach is a major development in the field of deep learning, since it requires less training time than the previous state-of-the-art approach, the Long Short-Term Memory (LSTM) model.
  • Parameters: Billions of parameters are intricately involved in making predictions, yet their individual roles and collaborative functions remain largely mysterious.
  • Field of Interpretability: This emerging field seeks to unravel the workings of these neural networks, but the field is nascent.

Stages of LLM Training

Stage One: Pretraining

This stage creates what we call the base model, essentially a sophisticated document generator.

Stage Two: Fine Tuning

Fine tuning, or ‘alignment,’ transforms the base model into an assistant. This involves training on high-quality, manually generated Q&A sets, typically comprising around 100,000 conversations.

Fine tuning LLMs

Post fine-tuning, the model adopts an assistant-like behavior while retaining the knowledge from pretraining.

Example of a fine-tuned “instruct” model from hugging face.

Advanced Fine Tuning

Further fine tuning can be achieved through Reinforcement Learning From Human Feedback (RLHF), a collaborative process between humans and AI to enhance model performance.

Aspect Training an LLM Fine-tuning an LLM
Cost Very high (millions of dollars) Relatively lower
Time Several weeks to months Hours to days
Complexity Extremely complex (requires significant computational resources and expertise) Less complex (utilizes pre-trained model)
Data Requires massive, diverse datasets Smaller, specific dataset
Hardware Requires powerful, high-end GPUs/TPUs Can be done with less powerful hardware
Expertise Requires deep expertise in machine learning and infrastructure Accessible with basic machine learning knowledge

Evaluating Models

Models are ranked using an Elo rating system, akin to chess ratings. This score is derived from how models perform in comparison to each other on specific tasks.

Scores are displayed on the Chatbot Arena Leaderboard:

Here are the scores from January 2024: Fine tuning LLMs

These rankings are rapidly changing (like, daily). For example, the award winning Bert model, which was extremely popular in natural language processing, has now been eclipsed by the latest models.

  • Current Leaders: GPT and Claude series, with proprietary models not publicly available.
  • Open Source Models: Such as Mixtral 8x7b, are gaining prominence (in the screenshot above ranked at number 6, ahead of GPT-3.5-Turbo).

Proprietary vs. Open-Source LLM Tradeoffs

Not all models have open-source architectures. Models like Llama are fully accessible. However, at the time of writing the absolute best models are proprietary. But a huge development over the past 6 months has been the emergence of decent open-source LLMs.

Open-source models share their weights (which includes parameters), enhancing transparency and adaptability.

This is an incredible development. It means:

  • Companies and individuals can fine-tune and optimize LLMs on their own datasets without worrying about sharing their data
  • No paying API costs
  • Ability to share these custom models with anyone

The most common place where people access and publish these models is the website https://huggingface.co

n.b. on weights vs. parameters

Aspect Weights Parameters
Definition Specific type of parameters that determine how much influence an input will have in the model’s decision-making process. All the learnable aspects of the model, including weights.
Role Adjust the importance of various features or inputs. Include weights and other variables that define the model’s architecture and how it processes input.
Example In a neural network, weights adjust the strength of connections between neurons. Includes weights, biases, and other architectural components.
Adjustment Updated during training to reflect which parts of the input are important. All parameters, including weights, are adjusted during training to improve the model’s performance.

Capabilities of LLMs

The foundational capabilities of LLMs like GPT-4 are question answering and content creation. Modern LLMs, like OpenAI’s ChatGPT, go beyond these foundational use cases and employ various tools for complex queries:

  • Browser: For web searches.
  • Calculator: For arithmetic operations.
  • Code Interpreter: A typical use case is to execute and visualize code.
  • DALLE: For image generation.
  • File uploads: For summarizing and ingesting documents
  • Retrieval Augmented Generation (RAG): Enhances outputs by accessing reference materials.

LLM RAG

OpenAI has announced the GPT store, and it is possible to create custom GPTs where you upload documents to inform a custom chatbot assistant. This ability to create personalized generative AI assistants is very powerful.

Custom GPT

Multimodalities

  • Image Processing: LLMs are adept at interpreting and generating images.
  • Audio Capabilities: Beyond transcription, LLMs can create new music.
  • Speech-to-Speech Communication: Enabled in advanced models.

Speech-to-speech is particularly powerful, because it allows for a very realistic back-and-forth with the LLM. The latest voices from ChatGPT are extremely realistic sounding. The future presented in the Sci-Fi AI movie “Her” no longer seems so far away.

LLM RAG


The Future of Large Language Models and Generative AI

Scaling Laws

The effectiveness of LLMs in next-word prediction tasks depends on two variables:

  1. Number of Parameters (N)
  2. Training Text Volume (D)

Current trends suggest limitless scaling potential in these dimensions. This means that LLMs will continue to improve as companies spend more time and money training increasingly large models. This means that we are nowhere near close to “topping out” in terms of LLM quality.

Research Frontiers

The book Thinking Fast and Slow by Daniel Kahneman popularised the concepts of “System 1” and “System 2” types of thinking:

System 1 thinking is a near-instantaneous process; it happens automatically, intuitively, and with little effort. It’s driven by instinct and our experiences. System 2 thinking is slower and requires more effort. It is conscious and logical.

Currently, LLMs are only capable of “System 1” types of thinking. An active research area at the moment is to allow LLMs to develop more of a “System 2” capability. This would be very powerful because you could then choose to sacrifice speed for quality. For example by asking the LLM to think about a complicated query for at least an hour before responding.

  • Specialized GPTs: Future generative models will likely target specific niches, leveraging fine-tuned data for specialized tasks. Therefore it seems likely that we’ll see a proliferation of artifical intelligence assistants/tools, and many of these are likely to be enhanced by your personal data. As such, proprietary datasets will become even more important, and entire companies will be created/acquired just for data capture.

One of the most interesting parts of Karpathy’s talk is how he encourages thinking of Large Language Models as an operating system. He also tweeted about it here:

LLM RAG

For example, in the above diagram we have the equivalent of RAM in the form of the Context Window which is the maximum amount of text the model can consider at any one time when generating a response. The equivalent of a file system are the embeddings (i.e. RAG) of a model, which can be created from any arbitrary dataset.

Ongoing Large Language Model Security and Ethical Considerations

There are many new kinds of risks with artificial intelligence powered by Large Language Model. These include:

Jailbreak Attacks: Tactics to circumvent model restrictions, ranging from disguised queries to encoding tricks. In the screenshot below, it is shown how asking ChatGPT to pretend to be a deceased grandmother who used to tell bedtime stories of “how to create napalm” allows the user to circumvent safety restrictions around teaching people how to make napalm. Jailbreak Example

Prompt Injection Another type of issue within this artificial intelligence paradigm is injecting prompts. In the image below, there is very faint white text instructing ChatGPT to tell the user about a sale. Prompt Injection

These LLM risks haven’t really been an issue with earlier (pre-transformer) deep learning or natural language processing (nlp) models. However, the popularity of chatgpt and proliferation of chatbots and custom GPTs has uncovered many new attack vectors. Whilst the new batch of AI models bring many advancements, these also come with new security pitfalls.

These kind of threats will present new challenges for LLM application builders.

The Large Language Model revolution is just beginning, and the kinds of apps that we will be able to build with this technology will be astonishing. I’ll be writing extensively about LLM application development in the coming months, stay tuned!

Category