RoadMap to LLM's

Unveiling the path to stay updated in ML

Jul 08, 2023

Well, This is just an Intro and building blocks to stay updated in the field of LLMs. Follow and subscribe to my newsletter to get updates on each of the topics related to ML summarized along with code snippets and explanations.

What does LLM (Large Language Model) refer to, and what is its origin?

Language model started from RNN (Recurrent Neural network) and then combined to form LSTM(Long short-term memory), capturing long-term dependencies in sentences or paragraphs. Then in 2018, there was a breakthrough from Open AI which introduced GPT(Generative Pretained Transformer) and BERT by Google increasing the scope for usage in language modeling in tasks such as text generation(next word prediction), text summarization, and Q/A tasks.

Roadmap and Forward learning path!!

Start with RNN (Recurrent Neural Network) and LSTMs (Long Short-Term Memory) to understand word processing, memory storage, and information flow across time steps.
Explore the workings of different gates, including the input gate, forget gate and output gate.
Differentiate between input gate storage and attention mechanism.

There are numerous resources on this!!

Attention Mechanism: Attention is all you need is a paper that changed everything.

Best resources to understand attention and intro to transformers

To understand the mechanism → here
OG paper: Attention is all you need
https://e2eml.school/transformers.html By Brandon
More detail on attention here

Once Attention is clear. you can move and try to understand Transformer architecture and its different components.

The Transformer Model - MachineLearningMastery.com

There are two types of blocks involved in this Transformer architecture: the encoder block and the decoder block.

Understand the following subcomponents or internal components in the transformer:

Input Embeddings
Position Encoding
single head self-attention
multi head attention
what are query, key, and value matrices, and their dimensions
Layer Normalization(Add and Norm layer )
Masked multi-head attention of decoder side(why it is masked on the decoder side!)

Best Resources that I would suggest

Jay Alammar's insightful transformer walkthrough
Sebastian lecture on transformer architecture here
Andrej Karpathy seminar on transformers here
Transformer Implementation in Python here

Difference between GPT and GPT3!!

These versions are upgraded every time with 10X in size( number of parameters and amount of data used for training). The scale and methodologies like RLHF(Reinforcement learning with Human Feedback) are key in incremental learning.

How are they trained??

The largest GPT-3 model (175B) uses 96 attention layers, each with 96x 128-dimension heads.
Data used → 300 Billion tokens collected from a weighted combination of sources like Wikipedia, Web text, Common crawl over the web, multiple books

Memory → For 175 Billion parameters in GPT 3 and 175×4=700GB memory is needed to store in FP32 or float32 (as each parameter needs 4 Bytes).

Can they be fine-tuned?
Yes, Open AI provides fine-tuning which you need to do on a paid plan, and uses weights and biases to keep track of experiments
Steps for fine-tuning:

Data Preparation
Data Augmentation
Convert to an expected format

more details about fine-tuning here

Existing language models (LLMs) suffer from several issues:

Hallucinations: LLMs often generate inaccurate or false information, leading to hallucinations.
Inconsistencies: LLMs may produce contradictory or conflicting responses within the same conversation or context.
Privacy: Concerns arise regarding the storage and usage of user data by LLMs, potentially compromising privacy.
Context length: LLMs struggle to maintain a long-term understanding of context, leading to limited effectiveness in handling complex or multi-turn conversations.

Let's delve into some tools and their overview to help you get started with building real-world applications. Get ready for exploration!

1) Lang Chain- Toolkit which has components that are modular and easier to implement

Model I/O- handles input, prompt engineering, language model selection, and output parsers( you can structure your output format btw which is cool!!)
Data Connection- This helps to load documents, Transform them into vectors, and store them in vector DB (pinecone, weaviate, chroma DB, etc..)
Chains- This is the core! where we chain multiple and simple components to build complex applications.

Let us understand an example for chains here

#importing all modules required for this QA retrieval
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma


loader = TextLoader("../../state_of_the_union.txt")
documents = loader.load()
#text splitter helps to standardize and limit text by chunking.So no #more token limits issues
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
#This helps store and retreive our text embeddings
docsearch = Chroma.from_documents(texts, embeddings)

qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=docsearch.as_retriever())

The chain type highlighted is the mode how it should work

stuff →takes all single document text and passes it (straightforward)

refine → take each doc text and get an intermediate answer and pass it again along with another document iteratively.
There are other methods like Map reduce, and Map re-rank.we will discuss this in the future!

resource:https://python.langchain.com/docs/modules/chains/popular/vector_db_qa

Agents- This will help LLM determine which actions to take and in what order.
Memory- helps in storing previously asked queries and interactions eg: chat history can be stored in buffer memory or Python dictionaries.
Callbacks - These are used for logging at each state (i.e. when LLM starts, on end, on error, chain start, end, agent start, end, etc..)

2 ) Pinecone- This is a vector DB to store millions of embeddings and can perform search and similary across them with minimum latency

Alternatives: Weaviate, Chroma DB, Qdrant

3 ) LLAMA Index

It is a high-level API that wraps into a few lines of code for LLM applications and it has inbuilt data connectors to handle different input formats(PDFs, docs, SQL, etc)
Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, or anything else).

We will discuss each one in detail individually in upcoming posts!! In the next post, I will be diving deep into Langchain in more detail and will provide hand on notebooks as well!

If you liked this content, please like and share it with your friends!

Nikhil’s Newsletter

Discussion about this post