Skip to content

A list of free LLM inference resources accessible via API.

Notifications You must be signed in to change notification settings

cheahjs/free-llm-api-resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Free LLM API resources

This lists various services that provide free access or credits towards API-based LLM usage.

Note

Please don't abuse these services, else we might lose them.

Warning

This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)

Free Providers

Provider Provider Limits/Notes Model Name Model Limits
GroqDistil Whisper Large v37,200 audio-seconds/minute
2,000 requests/day
Gemma 2 9B Instruct14,400 requests/day
15,000 tokens/minute
Llama 3 70B14,400 requests/day
6,000 tokens/minute
Llama 3 70B - Groq Tool Use Preview14,400 requests/day
15,000 tokens/minute
Llama 3 8B14,400 requests/day
30,000 tokens/minute
Llama 3 8B - Groq Tool Use Preview14,400 requests/day
15,000 tokens/minute
Llama 3.1 70B14,400 requests/day
6,000 tokens/minute
Llama 3.1 8B14,400 requests/day
20,000 tokens/minute
Llama 3.2 11B Vision7,000 requests/day
7,000 tokens/minute
Llama 3.2 1B7,000 requests/day
7,000 tokens/minute
Llama 3.2 3B7,000 requests/day
7,000 tokens/minute
Llama 3.2 90B Vision3,500 requests/day
7,000 tokens/minute
Llama 3.3 70B1,000 requests/day
6,000 tokens/minute
Llama 3.3 70B (Speculative Decoding)1,000 requests/day
6,000 tokens/minute
Llama Guard 3 8B14,400 requests/day
15,000 tokens/minute
Mixtral 8x7B14,400 requests/day
5,000 tokens/minute
Whisper Large v37,200 audio-seconds/minute
2,000 requests/day
Whisper Large v3 Turbo7,200 audio-seconds/minute
2,000 requests/day
OpenRouter20 requests/minute
200 requests/day
Gemini 2.0 Flash Experimental
Gemini Experimental 1114
Gemini Experimental 1206
Gemma 2 9B Instruct
Llama 3 8B Instruct
Llama 3.1 405B Instruct
Llama 3.1 70B Instruct
Llama 3.1 8B Instruct
Llama 3.2 11B Vision Instruct
Llama 3.2 1B Instruct
Llama 3.2 3B Instruct
Llama 3.2 90B Vision Instruct
Mistral 7B Instruct
Mythomax L2 13B
OpenChat 7B
Phi-3 Medium 128k Instruct
Phi-3 Mini 128k Instruct
Qwen 2 7B Instruct
Toppy M 7B
Zephyr 7B Beta
Google AI Studio Data is used for training (when used outside of the UK/CH/EEA/EU). Gemini 2.0 Flash 4,000,000 tokens/minute
10 requests/minute
Gemini 1.5 Flash 1,000,000 tokens/minute
1,500 requests/day
15 requests/minute
Gemini 1.5 Flash (Experimental) 1,000,000 tokens/minute
1,500 requests/day
5 requests/minute
Gemini 1.5 Flash-8B 1,000,000 tokens/minute
1,500 requests/day
15 requests/minute
Gemini 1.5 Flash-8B (Experimental) 1,000,000 tokens/minute
1,500 requests/day
15 requests/minute
Gemini 1.5 Pro 32,000 tokens/minute
50 requests/day
2 requests/minute
Gemini 1.5 Pro (Experimental) 1,000,000 tokens/minute
100 requests/day
5 requests/minute
LearnLM 1.5 Pro (Experimental) 1,500 requests/day
15 requests/minute
Gemini 1.0 Pro 32,000 tokens/minute
1,500 requests/day
15 requests/minute
text-embedding-004 150 batch requests/minute
1,500 requests/minute
100 content/batch
embedding-001
Mistral (La Plateforme) Free tier (Experiment plan) requires opting into data training, requires phone number verification. Open and Proprietary Mistral models 1 request/second
500,000 tokens/minute
1,000,000,000 tokens/month
Mistral (Codestral) Currently free to use, monthly subscription based, requires phone number verification. Codestral 30 requests/minute
2,000 requests/day
HuggingFace Serverless Inference Limited to models smaller than 10GB.
Some popular models are supported even if they exceed 10GB.
Various open models 1,000 requests/day (with an account)
SambaNova CloudLlama 3.1 8B30 requests/minute
Llama 3.1 70B20 requests/minute
Llama 3.1 405B10 requests/minute
Llama 3.2 1B30 requests/minute
Llama 3.2 3B30 requests/minute
Llama 3.2 11B10 requests/minute
Llama 3.2 90B1 requests/minute
Llama 3.3 70B20 requests/minute
Llama Guard 3 8B30 requests/minute
Qwen 2.5 72B20 requests/minute
Qwen 2.5 Coder 32B20 requests/minute
QwQ 32B Preview10 requests/minute
Cerebras Waitlist
Free tier restricted to 8K context
Llama 3.1 8B 30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Llama 3.1 70B 30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Llama 3.3 70B 30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
GitHub ModelsWaitlist
Rate limits dependent on Copilot subscription tier
AI21-Jamba-Instruct
Cohere Command R
Cohere Command R+
Cohere Embed v3 English
Cohere Embed v3 Multilingual
Meta-Llama-3-70B-Instruct
Meta-Llama-3-8B-Instruct
Meta-Llama-3.1-405B-Instruct
Meta-Llama-3.1-70B-Instruct
Meta-Llama-3.1-8B-Instruct
Mistral Large
Mistral Large (2407)
Mistral Nemo
Mistral Small
OpenAI GPT-4o
OpenAI GPT-4o mini
OpenAI Text Embedding 3 (large)
OpenAI Text Embedding 3 (small)
Phi-3-medium instruct (128k)
Phi-3-medium instruct (4k)
Phi-3-mini instruct (128k)
Phi-3-mini instruct (4k)
Phi-3-small instruct (128k)
Phi-3-small instruct (8k)
Phi-3.5-mini instruct (128k)
OVH AI Endpoints (Free Beta)CodeLlama 13B Instruct12 requests/minute
Codestral Mamba 7B v0.112 requests/minute
Llama 2 13B Chat12 requests/minute
Llama 3 70B Instruct12 requests/minute
Llama 3 8B Instruct12 requests/minute
Llama 3.1 70B Instruct12 requests/minute
Mathstral 7B v0.112 requests/minute
Mistral 7B Instruct12 requests/minute
Mistral Nemo 240712 requests/minute
Mixtral 8x22B Instruct12 requests/minute
Mixtral 8x7B Instruct12 requests/minute
Scaleway Generative APIs (Free Beta)BGE-Multilingual-Gemma2600 requests/minute
1,000,000 tokens/minute
Llama 3.1 70B Instruct300 requests/minute
100,000 tokens/minute
Llama 3.1 8B Instruct300 requests/minute
100,000 tokens/minute
Mistral Nemo 2407300 requests/minute
100,000 tokens/minute
Pixtral 12B (2409)300 requests/minute
100,000 tokens/minute
Qwen2.5 Coder 32B Instruct
sentence-t5-xxl600 requests/minute
1,000,000 tokens/minute
Cloudflare Workers AI10,000 tokens/dayDeepseek Coder 6.7B Base (AWQ)
Deepseek Coder 6.7B Instruct (AWQ)
Deepseek Math 7B Instruct
Discolm German 7B v1 (AWQ)
Falcom 7B Instruct
Gemma 2B Instruct (LoRA)
Gemma 7B Instruct
Gemma 7B Instruct (LoRA)
Hermes 2 Pro Mistral 7B
Llama 2 13B Chat (AWQ)
Llama 2 7B Chat (FP16)
Llama 2 7B Chat (INT8)
Llama 2 7B Chat (LoRA)
Llama 3 8B Instruct
Llama 3 8B Instruct
Llama 3 8B Instruct (AWQ)
Llama 3.1 8B Instruct
Llama 3.1 8B Instruct (AWQ)
Llama 3.1 8B Instruct (FP8)
Llama 3.2 11B Vision Instruct
Llama 3.2 1B Instruct
Llama 3.2 3B Instruct
Llama 3.3 70B Instruct (FP8)
LlamaGuard 7B (AWQ)
Mistral 7B Instruct v0.1
Mistral 7B Instruct v0.1 (AWQ)
Mistral 7B Instruct v0.2
Mistral 7B Instruct v0.2 (LoRA)
Neural Chat 7B v3.1 (AWQ)
OpenChat 3.5 0106
OpenHermes 2.5 Mistral 7B (AWQ)
Phi-2
Qwen 1.5 0.5B Chat
Qwen 1.5 1.8B Chat
Qwen 1.5 14B Chat (AWQ)
Qwen 1.5 7B Chat (AWQ)
SQLCoder 7B 2
Starling LM 7B Beta
TinyLlama 1.1B Chat v1.0
Una Cybertron 7B v2 (BF16)
Zephyr 7B Beta (AWQ)
Together Llama 3.2 11B Vision Instruct Free for 2024
Cohere 20 requests/min
1,000 requests/month
Command-R Shared Limit
Command-R+
Google Cloud Vertex AI Very stringent payment verification for Google Cloud. Llama 3.1 70B Instruct Llama 3.1 API Service free during preview.
60 requests/minute
Llama 3.1 8B Instruct Llama 3.1 API Service free during preview.
60 requests/minute
Llama 3.2 90B Vision Instruct Llama 3.2 API Service free during preview.
30 requests/minute
Gemini 2.0 Flash Experimental Experimental Gemini model.
10 requests/minute
Gemini Flash Experimental
Gemini Pro Experimental
glhf.chat (Free Beta) Any model on Hugging Face runnable on vLLM and fits on a A100 node (~640GB VRAM), including Llama 3.1 405B at FP8 480 requests/8 hours

Providers with trial credits

Provider Credits Requirements Models
Together $1 Various open models
Fireworks $1 Various open models
Unify $5 when you add a payment method Routes to other providers, various open models and proprietary models (OpenAI, Gemini, Anthropic, Mistral, Perplexity, etc)
NVIDIA NIM 1,000 API calls for 1 month Various open models
Baseten $30 Any supported model - pay by compute time
xAI $25/month until end of 2024 Grok
Nebius $1 Various open models
Hyperbolic$10DeepSeek V2.5
Hermes 3 Llama 3.1 70B
Llama 3 70B Instruct
Llama 3.1 405B Base
Llama 3.1 405B Base (FP8)
Llama 3.1 405B Instruct
Llama 3.1 405B Instruct Virtuals
Llama 3.1 70B Instruct
Llama 3.1 8B Instruct
Llama 3.2 3B Instruct
Llama 3.3 70B Instruct
Pixtral 12B (2409)
Qwen QwQ 32B Preview
Qwen2-VL 72B Instruct
Qwen2-VL 7B Instruct
Qwen2.5 72B Instruct
Qwen2.5 Coder 32B Instruct
AI21 $10 for 3 months Jamba/Jurrasic-2
Upstage $10 for 3 months Solar Pro/Mini
NLP Cloud $15 Phone number verification Various open models