Quick Start Guide to Large Language Models - Second Edition

Get your copy today and please leave a rating/review to tell me what you thought! ⭐⭐⭐⭐⭐

Welcome to the GitHub repository for the "Quick Start Guide to Large Language Models - Second Edition". This repository contains the code snippets and notebooks used in the book, demonstrating various applications and advanced techniques in working with Transformer models and large language models (LLMs). View the code for the First Edition here

Repository Structure

Directories

notebooks: Contains Jupyter notebooks for each chapter in the book.
data: Contains the datasets used in the notebooks.
images: Contains images and graphs used in the notebooks.

Notebooks

Below is a list of the notebooks included in the notebooks directory, organized by the chapters in the book.

Part I - Introduction to Large Language Models

Chapter 2: Semantic Search with LLMs
- 02_semantic_search.ipynb: An introduction to semantic search using OpenAI and open-source models.
Chapter 3: First Steps with Prompt Engineering
- 03_prompt_engineering.ipynb: A guide to effective prompt engineering for instruction-aligned LLMs.
Chapter 4: The AI Ecosystem: Putting the Pieces Together
- 04_rag_retrieval.ipynb: Building a Retrieval-Augmented Generation (RAG) pipeline.
- 04_agent.ipynb: Constructing an AI agent using LLMs and other tools.

Part II - Getting the Most Out of LLMs

Chapter 5: Optimizing LLMs with Customized Fine-Tuning
- 05_bert_app_review.ipynb: Fine-tuning a BERT model for app review classification.
- 05_openai_app_review_fine_tuning.ipynb: Fine-tuning OpenAI models for app review classification.
Chapter 6: Advanced Prompt Engineering
- 06_adv_prompt_engineering.ipynb: Advanced techniques in prompt engineering, including output validation and semantic few-shot learning.
Chapter 7: Customizing Embeddings and Model Architectures
- 07_recommendation_engine.ipynb: Building a recommendation engine using custom fine-tuned LLMs and embeddings.

Part III - Advanced LLM Usage

Chapter 9: Moving Beyond Foundation Models
- 09_constructing_a_vqa_system.ipynb: Step-by-step guide to constructing a Visual Question Answering (VQA) system using GPT-2 and Vision Transformer.
- 09_using_our_vqa.ipynb: Using the VQA system built in the previous notebook.
- 09_flan_t5_rl.ipynb: Using Reinforcement Learning (RL) to improve FLAN-T5 model outputs.
Chapter 10: Advanced Open-Source LLM Fine-Tuning
- 10_SAWYER_LLAMA_SFT.ipynb: Fine-tuning the Llama-3 model to create the SAWYER bot.
- 10_SAWYER_Reward_Model.ipynb: Training a reward model from human preferences for the SAWYER bot.
- 10_SAWYER_RLF.ipynb: Applying Reinforcement Learning from Human Feedback (RLHF) to align the SAWYER bot.
- 10_SAWYER_USE_SAWYER.ipynb: Using the SAWYER bot.
- 10_anime_category_classification_model_freezing.ipynb: Fine-tuning a BERT model for anime category classification, comparing layer freezing techniques.
- 10_latex_gpt2.ipynb: Fine-tuning GPT-2 to generate LaTeX formulas.
- 10_optimizing_fine_tuning.ipynb: Best practices for optimizing fine-tuning of transformer models.
Chapter 11: Moving LLMs into Production
- 11_distillation_example_1.ipynb: Exploring knowledge distillation techniques for transformer models.
- 11_distillation_example_2.ipynb: Advanced distillation methods and applications.
- 11_llama_quantization.ipynb: Quantizing Llama models for efficient deployment.
Chapter 12: Evaluating LLMs
- 12_llm_calibration.ipynb: Techniques for calibrating LLM outputs.
- 12_llm_gen_eval.ipynb: Methods for evaluating the generative capabilities of LLMs.
- 12_cluster.ipynb: Clustering techniques for analyzing LLM outputs.
- Probing - There are over a dozen notebooks for Probing so I will only share a few key ones here:

How to Use

To use this repository:

Clone the repository to your local machine:

git clone https://github.com/yourusername/quick-start-llms.git

Navigate to the notebooks directory and open the Jupyter notebook of your choice:

cd quick-start-llms

Install the necessary libraries:

pip install -r requirements.txt

Note: Some notebooks may require specific datasets, which can be found in the data directory.

Contributing

Contributions are welcome! If you have any additions, corrections, or enhancements, feel free to submit a pull request.

Disclaimer

This repository is for educational purposes and is meant to accompany the "Quick Start Guide to Large Language Models - Second Edition" book. Please refer to the book for in-depth explanations and discussions of the topics covered in the notebooks.

More From Sinan

Check out Sinan's Newsletter AI Office Hours for more AI/LLM content!
Sinan has a podcast called Practically Intelligent where he chats about the latest and greatest in AI!
Follow the Getting Started with Data, LLMs and ChatGPT Playlist on O'Reilly for a curated list of Sinan's work!

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
data		data
images		images
notebooks		notebooks
semantic-search-fastapi		semantic-search-fastapi
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start Guide to Large Language Models - Second Edition

Repository Structure

Directories

Notebooks

Part I - Introduction to Large Language Models

Part II - Getting the Most Out of LLMs

Part III - Advanced LLM Usage

How to Use

Contributing

Disclaimer

More From Sinan

About

Languages

sinanuozdemir/quick-start-guide-to-llms

Folders and files

Latest commit

History

Repository files navigation

Quick Start Guide to Large Language Models - Second Edition

Repository Structure

Directories

Notebooks

Part I - Introduction to Large Language Models

Part II - Getting the Most Out of LLMs

Part III - Advanced LLM Usage

How to Use

Contributing

Disclaimer

More From Sinan

About

Topics

Resources

Stars

Watchers

Forks

Languages