Run llama.cpp distributed over MPI
-
CFP open for 2023 Linux Plumbers Conference (November 13-15, Richmond VA, USA)
-
StableLM released today.
-
Rust foundation meltdown - Primeagen stream - Oxide Computer discussion
-
What is a large language model(LLM)?
-
What can I do with a LLM?
-
A LLM is a data structure. Given k tokens, it outputs the probability of the next token.
-
ChatGPT4 is approching Library of Congress training size. We are running out of data.
-
New data will probably come from reflection - prompting an LLM with it's own output for deeper insight.
-
BabyGPT - a three bit LLM
-
Tokenize - Train - Infer trained model - Quantize model to shrink size
-
OpenAI TickToken - high performance tokenizer.
-
LLM Training can cost millions - OpenAI burned spare GPU at Azure WDM after Bitcoin crash as tax writeoff.
-
LLM training and inference involve mostly tensor (matrix) operations.
-
Once you have a LLM - fine-tuning can cost as low as $3
-
ChatGPT4 Demo - GitHub Copilot demo if you want
-
whisper.cpp - use OpenAI whisper to transcribe audio to text - WANTED live transcription for meetings.
-
AutoGPT - BabyAGI - use GPT and scripts to drive other GPTs and scripts.
-
Reddit GPT has good weekly briefings.
-
llama.cpp - a fork of whisper.cpp - most widely used C++ code to host your own LLM.
-
Huggingface - stores open models as Git LFS.
-
web-llm - uses WebGPU to run in the LLM in your browser.
-
Linux 7 will have a LLM of various sizes and an SMT solver to prove responses correct.
-
CGROUPS3 - closer to AWS Zelkova and AWS IAM
-
Kernel LLM will be used as a dictionary for data compression.
-
Oxide Computer size racks will have distrubuted linux schedulers. Kubernetes goes extinct.
-
More systems code like compilers will run on GPU.