What are Large Language Models (LLMs)?
What are Large Language Models (LLMs)?
Learn about Large Language Models (LLMs) and why they are fascinating. Discover the key concepts and terms associated with LLM frameworks. Explore how LLMs can handle images and videos. A captivating read for software enthusiasts.
Ah, that's a very good question!
At IllumiDesk, like most companies around the globe, we have been involved with Large Language Models (LLMs) deep dives. The space is moving fast, ephemeral, and, as Tobias Macey stated, a yet-to-be-cemented framework that deals with huge amounts of semi-structured text. These new capabilities are fascinating and remind us why we love the software landscape. (And no, I don't think LLMs will threaten us Skynet style).
The Things We Love about LLMs
Why do we ("we" in this context is the IllumiDesk Team) enjoy learning about and using LLMs? There are multiple reasons, but for the most part, we like them because:
- We spend more time with our families and friends instead of writing boilerplate text. There's a lot of text that works as glue to tie one idea in with another. I doubt anyone (in this context, "anyone" is the world) gets excited about writing or's, and's, the's ... or my favorite, looking for synonyms for "nevertheless" and "furthermore" - over and over and over again.
- Get our creative juices flowing. As a very amateur artist (charcoal drawing, in my case), I sometimes get blocked from thinking of new things to sketch, developing new shading ideas, and the like. So what do we do, regardless of our creative hobbies? We get inspired by what others have done in the past and present. In the past, we would visit our local museum, listen to classic songs, and read books from authors we admire ... this list is endless. Generative A.I. helps synthesize ideas and create new bodies of work in a flash. Humans, in turn, can use these outputs to inspire our creative efforts. Blockage be squashed!
- Learn by teaching. In my humble opinion, teaching is one of the best ways to learn. We can't all be experts in all subjects. But we can know enough to teach our kids or others that our curious about a particular subject. For example, I have a graduate degree in computer science. But do I consider myself an "expert" in Computer Science? Heck no! Case in point: one of the best programmers I ever met was a History major. She ran circles around me during pair programming sessions. Her secret was a passion for explaining how things work, and she wasn't afraid of saying she didn't know a particular topic. LLMs have helped her take these skills to the next level. She can now synthesize new topics faster than ever and teach others.
LLM Terms: A Primer
First, we need to know the terms around Large Language Model frameworks. This will help us frame the similarities and differences between other well-known A.I. pipelines. Let's split the LLM concepts into two buckets: general/abstract terms and terms which are associated with the nuts and bolts of the LLM framework itself. Let's get started!
High-Level LLM Concepts
Here are some of the general concepts that you should have a basic handle on when dealing with LLM frameworks:
- Transformer: A neural network architecture used in most large language models. Transformers are known for their ability to handle long-range dependencies in text, which allows them to understand the context of words and generate more coherent text.
- Self-attention: A mechanism transformers use to weigh the importance of different parts of an input sequence. Self-attention allows transformers to focus on the most important words in a sentence, regardless of their order.
- Vocabulary: The set of unique tokens (words or sub-words) recognized by a large language model. The vocabulary size determines how many words a language model can understand and generate.
- Context window: The maximum number of tokens a language model can consider from the input text when generating a response or prediction. The context window determines how much context a language model has to work with when understanding and generating text.
- Zero-shot learning: The ability of a large language model to perform a task without any additional fine-tuning or task-specific training. Zero-shot learning allows language models to be applied to a wider range of tasks without collecting and labeling large amounts of training data.
I'm sure you have encountered some or all of the terms above, implicitly or explicitly.
For example, if you were to add too much text in your request to one of the more popular LLMs in the market, such as GPT-4, Bard, or Claude, then it is likely the model would respond with an error stating that there's too much text, in other words, that the context window isn't big enough to support that many tokens in the request. (Tokens can be a word or a part of a word, packages such as tiktoken can help calculate the number of tokens for a given text).
LLM Framework Concepts
LLM frameworks help users build Large Language Model pipelines. Many of these frameworks are open source which allows you to run them on your local laptop. A whole new class of vendors and solutions ranging from scrappy open-source projects to new products from the cloud giants has arisen. Regardless, here are some of the terms used with LLM Frameworks that you will probably encounter when exploring these solutions:
- Schema: Merriam-Webster defines a schema as "a structured framework or plan". In computer terms, a schema is how the input/output to and from a system is defined. Large Language Models essentially differ from other data pipelines because they handle inputs/outputs with text, vs more traditional schemas like pre-defined JSON schemas with a REST API.
- Models: there are basically two types of models used with an LLM Framework: embedding models and generative models.
- Embedding models take text as input and return a floating point number as output. (The indexes term defined below uses the relationships between these numbers to fetch other chunks of text that are mathematically related).
- Generative models, on the other hand, take text as input and return text as output. The key is that the text output is coherent and human readable.
- Prompts: prompts are used to program the generative models. Other terms used in the market are "guiding" the model. Vendors have put a lot of time and effort into blocking text inputs and outputs that may be harmful, such as when encountering text that is criminal in nature.
- Indexes: this term is very similar to the classic index definition when describing traditional databases, such as SQL. However, with LLM Frameworks, the index refers to the "text document." The text should probably be semi-structured before storing it in memory so that retrievers of the text are more efficient. If you were using a Q&A bot, structuring your email in a Q&A format would improve the accuracy of the retrieval process. In the email example, the text may be represented as: question 1: who is the author of this email? answer 1: Jane Doe, and so on.
- Memory: this is the process of storing and retrieving a conversation in memory. Memory can be short-term or long-term. Short-term memory handles sending all of the text in a conversation as input to improve the generative model's context. Long-term memory deals with storing and retrieving text between conversations, usually in a Vector Database (which, by the way, is a type of index). For example, when chatting with a generative model, the LLM Framework can store the conversation history and then handle context even after the conversation has stopped after a long period of time.
- Chains: chains are sequences of operations to achieve a certain outcome. Chains usually combine a prompt template and a model to take user input (text), and fix/validate the model's output. For example, a model's output may include the source of the generative model's output, so the response would be updated to include said output along with the source of the original text.
- Agents: agents are chains on steroids. Agents create unknown chains using tools, such as retrievals from a database, RESTful API, or an index of URLs to enhance the desired output. Agents are fascinating because desired outcomes might be built using chains that are unknown to the user when sending the original text input as a prompt. Projects such as AutoGPT and GPT Engineer rely on agents to achieve outcomes to build web pages, automate twitter feeds, conduct research and build detailed reports, among many others.
Do LLMs Handle Images and Videos?
You bet! As mentioned above, LLM Frameworks are text-in and text-out. But that doesn't mean the generative model can't use the text to create outputs in other formats.
Generative models such as DALLE from OpenAI or Stablediffusion can create incredible images based on a given text input.
This is an incredible and exciting time to be involved with software and LLM Frameworks. We believe LLM Frameworks will improve our lives and can't wait to see what's next.