This lists various services that provide free access or credits towards API-based LLM usage.
Note
Please don't abuse these services, else we might lose them.
Warning
This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)
Provider | Provider Limits/Notes | Model Name | Model Limits |
---|---|---|---|
Groq | Distil Whisper Large v3 | 7,200 audio-seconds/minute 2,000 requests/day | |
Gemma 2 9B Instruct | 14,400 requests/day 15,000 tokens/minute | ||
Llama 3 70B | 14,400 requests/day 6,000 tokens/minute | ||
Llama 3 70B - Groq Tool Use Preview | 14,400 requests/day 15,000 tokens/minute | ||
Llama 3 8B | 14,400 requests/day 30,000 tokens/minute | ||
Llama 3 8B - Groq Tool Use Preview | 14,400 requests/day 15,000 tokens/minute | ||
Llama 3.1 70B | 14,400 requests/day 6,000 tokens/minute | ||
Llama 3.1 8B | 14,400 requests/day 20,000 tokens/minute | ||
Llama 3.2 11B Vision | 7,000 requests/day 7,000 tokens/minute | ||
Llama 3.2 1B | 7,000 requests/day 7,000 tokens/minute | ||
Llama 3.2 3B | 7,000 requests/day 7,000 tokens/minute | ||
Llama 3.2 90B Vision | 3,500 requests/day 7,000 tokens/minute | ||
Llama 3.3 70B | 1,000 requests/day 6,000 tokens/minute | ||
Llama 3.3 70B (Speculative Decoding) | 1,000 requests/day 6,000 tokens/minute | ||
Llama Guard 3 8B | 14,400 requests/day 15,000 tokens/minute | ||
Mixtral 8x7B | 14,400 requests/day 5,000 tokens/minute | ||
Whisper Large v3 | 7,200 audio-seconds/minute 2,000 requests/day | ||
Whisper Large v3 Turbo | 7,200 audio-seconds/minute 2,000 requests/day | ||
OpenRouter | 20 requests/minute 200 requests/day | Gemini 2.0 Flash Experimental | |
Gemini Experimental 1114 | |||
Gemini Experimental 1206 | |||
Gemma 2 9B Instruct | |||
Llama 3 8B Instruct | |||
Llama 3.1 405B Instruct | |||
Llama 3.1 70B Instruct | |||
Llama 3.1 8B Instruct | |||
Llama 3.2 11B Vision Instruct | |||
Llama 3.2 1B Instruct | |||
Llama 3.2 3B Instruct | |||
Llama 3.2 90B Vision Instruct | |||
Mistral 7B Instruct | |||
Mythomax L2 13B | |||
OpenChat 7B | |||
Phi-3 Medium 128k Instruct | |||
Phi-3 Mini 128k Instruct | |||
Qwen 2 7B Instruct | |||
Toppy M 7B | |||
Zephyr 7B Beta | |||
Google AI Studio | Data is used for training (when used outside of the UK/CH/EEA/EU). | Gemini 2.0 Flash | 4,000,000 tokens/minute 10 requests/minute |
Gemini 1.5 Flash | 1,000,000 tokens/minute 1,500 requests/day 15 requests/minute |
||
Gemini 1.5 Flash (Experimental) | 1,000,000 tokens/minute 1,500 requests/day 5 requests/minute |
||
Gemini 1.5 Flash-8B | 1,000,000 tokens/minute 1,500 requests/day 15 requests/minute |
||
Gemini 1.5 Flash-8B (Experimental) | 1,000,000 tokens/minute 1,500 requests/day 15 requests/minute |
||
Gemini 1.5 Pro | 32,000 tokens/minute 50 requests/day 2 requests/minute |
||
Gemini 1.5 Pro (Experimental) | 1,000,000 tokens/minute 100 requests/day 5 requests/minute |
||
LearnLM 1.5 Pro (Experimental) | 1,500 requests/day 15 requests/minute |
||
Gemini 1.0 Pro | 32,000 tokens/minute 1,500 requests/day 15 requests/minute |
||
text-embedding-004 | 150 batch requests/minute 1,500 requests/minute 100 content/batch |
||
embedding-001 | |||
Mistral (La Plateforme) | Free tier (Experiment plan) requires opting into data training, requires phone number verification. | Open and Proprietary Mistral models | 1 request/second 500,000 tokens/minute 1,000,000,000 tokens/month |
Mistral (Codestral) | Currently free to use, monthly subscription based, requires phone number verification. | Codestral | 30 requests/minute 2,000 requests/day |
HuggingFace Serverless Inference | Limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB. |
Various open models | 1,000 requests/day (with an account) |
SambaNova Cloud | Llama 3.1 8B | 30 requests/minute | |
Llama 3.1 70B | 20 requests/minute | ||
Llama 3.1 405B | 10 requests/minute | ||
Llama 3.2 1B | 30 requests/minute | ||
Llama 3.2 3B | 30 requests/minute | ||
Llama 3.2 11B | 10 requests/minute | ||
Llama 3.2 90B | 1 requests/minute | ||
Llama 3.3 70B | 20 requests/minute | ||
Llama Guard 3 8B | 30 requests/minute | ||
Qwen 2.5 72B | 20 requests/minute | ||
Qwen 2.5 Coder 32B | 20 requests/minute | ||
QwQ 32B Preview | 10 requests/minute | ||
Cerebras | Waitlist Free tier restricted to 8K context |
Llama 3.1 8B | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
Llama 3.1 70B | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
||
Llama 3.3 70B | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
||
GitHub Models | Waitlist Rate limits dependent on Copilot subscription tier | AI21-Jamba-Instruct | |
Cohere Command R | |||
Cohere Command R+ | |||
Cohere Embed v3 English | |||
Cohere Embed v3 Multilingual | |||
Meta-Llama-3-70B-Instruct | |||
Meta-Llama-3-8B-Instruct | |||
Meta-Llama-3.1-405B-Instruct | |||
Meta-Llama-3.1-70B-Instruct | |||
Meta-Llama-3.1-8B-Instruct | |||
Mistral Large | |||
Mistral Large (2407) | |||
Mistral Nemo | |||
Mistral Small | |||
OpenAI GPT-4o | |||
OpenAI GPT-4o mini | |||
OpenAI Text Embedding 3 (large) | |||
OpenAI Text Embedding 3 (small) | |||
Phi-3-medium instruct (128k) | |||
Phi-3-medium instruct (4k) | |||
Phi-3-mini instruct (128k) | |||
Phi-3-mini instruct (4k) | |||
Phi-3-small instruct (128k) | |||
Phi-3-small instruct (8k) | |||
Phi-3.5-mini instruct (128k) | |||
OVH AI Endpoints (Free Beta) | CodeLlama 13B Instruct | 12 requests/minute | |
Codestral Mamba 7B v0.1 | 12 requests/minute | ||
Llama 2 13B Chat | 12 requests/minute | ||
Llama 3 70B Instruct | 12 requests/minute | ||
Llama 3 8B Instruct | 12 requests/minute | ||
Llama 3.1 70B Instruct | 12 requests/minute | ||
Mathstral 7B v0.1 | 12 requests/minute | ||
Mistral 7B Instruct | 12 requests/minute | ||
Mistral Nemo 2407 | 12 requests/minute | ||
Mixtral 8x22B Instruct | 12 requests/minute | ||
Mixtral 8x7B Instruct | 12 requests/minute | ||
Scaleway Generative APIs (Free Beta) | BGE-Multilingual-Gemma2 | 600 requests/minute 1,000,000 tokens/minute | |
Llama 3.1 70B Instruct | 300 requests/minute 100,000 tokens/minute | ||
Llama 3.1 8B Instruct | 300 requests/minute 100,000 tokens/minute | ||
Mistral Nemo 2407 | 300 requests/minute 100,000 tokens/minute | ||
Pixtral 12B (2409) | 300 requests/minute 100,000 tokens/minute | ||
Qwen2.5 Coder 32B Instruct | |||
sentence-t5-xxl | 600 requests/minute 1,000,000 tokens/minute | ||
Cloudflare Workers AI | 10,000 tokens/day | Deepseek Coder 6.7B Base (AWQ) | |
Deepseek Coder 6.7B Instruct (AWQ) | |||
Deepseek Math 7B Instruct | |||
Discolm German 7B v1 (AWQ) | |||
Falcom 7B Instruct | |||
Gemma 2B Instruct (LoRA) | |||
Gemma 7B Instruct | |||
Gemma 7B Instruct (LoRA) | |||
Hermes 2 Pro Mistral 7B | |||
Llama 2 13B Chat (AWQ) | |||
Llama 2 7B Chat (FP16) | |||
Llama 2 7B Chat (INT8) | |||
Llama 2 7B Chat (LoRA) | |||
Llama 3 8B Instruct | |||
Llama 3 8B Instruct | |||
Llama 3 8B Instruct (AWQ) | |||
Llama 3.1 8B Instruct | |||
Llama 3.1 8B Instruct (AWQ) | |||
Llama 3.1 8B Instruct (FP8) | |||
Llama 3.2 11B Vision Instruct | |||
Llama 3.2 1B Instruct | |||
Llama 3.2 3B Instruct | |||
Llama 3.3 70B Instruct (FP8) | |||
LlamaGuard 7B (AWQ) | |||
Mistral 7B Instruct v0.1 | |||
Mistral 7B Instruct v0.1 (AWQ) | |||
Mistral 7B Instruct v0.2 | |||
Mistral 7B Instruct v0.2 (LoRA) | |||
Neural Chat 7B v3.1 (AWQ) | |||
OpenChat 3.5 0106 | |||
OpenHermes 2.5 Mistral 7B (AWQ) | |||
Phi-2 | |||
Qwen 1.5 0.5B Chat | |||
Qwen 1.5 1.8B Chat | |||
Qwen 1.5 14B Chat (AWQ) | |||
Qwen 1.5 7B Chat (AWQ) | |||
SQLCoder 7B 2 | |||
Starling LM 7B Beta | |||
TinyLlama 1.1B Chat v1.0 | |||
Una Cybertron 7B v2 (BF16) | |||
Zephyr 7B Beta (AWQ) | |||
Together | Llama 3.2 11B Vision Instruct | Free for 2024 | |
Cohere | 20 requests/min 1,000 requests/month |
Command-R | Shared Limit |
Command-R+ | |||
Google Cloud Vertex AI | Very stringent payment verification for Google Cloud. | Llama 3.1 70B Instruct | Llama 3.1 API Service free during preview. 60 requests/minute |
Llama 3.1 8B Instruct | Llama 3.1 API Service free during preview. 60 requests/minute |
||
Llama 3.2 90B Vision Instruct | Llama 3.2 API Service free during preview. 30 requests/minute |
||
Gemini 2.0 Flash Experimental | Experimental Gemini model. 10 requests/minute |
||
Gemini Flash Experimental | |||
Gemini Pro Experimental | |||
glhf.chat (Free Beta) | Any model on Hugging Face runnable on vLLM and fits on a A100 node (~640GB VRAM), including Llama 3.1 405B at FP8 | 480 requests/8 hours |
Provider | Credits | Requirements | Models |
---|---|---|---|
Together | $1 | Various open models | |
Fireworks | $1 | Various open models | |
Unify | $5 when you add a payment method | Routes to other providers, various open models and proprietary models (OpenAI, Gemini, Anthropic, Mistral, Perplexity, etc) | |
NVIDIA NIM | 1,000 API calls for 1 month | Various open models | |
Baseten | $30 | Any supported model - pay by compute time | |
xAI | $25/month until end of 2024 | Grok | |
Nebius | $1 | Various open models | |
Hyperbolic | $10 | DeepSeek V2.5 | |
Hermes 3 Llama 3.1 70B | |||
Llama 3 70B Instruct | |||
Llama 3.1 405B Base | |||
Llama 3.1 405B Base (FP8) | |||
Llama 3.1 405B Instruct | |||
Llama 3.1 405B Instruct Virtuals | |||
Llama 3.1 70B Instruct | |||
Llama 3.1 8B Instruct | |||
Llama 3.2 3B Instruct | |||
Llama 3.3 70B Instruct | |||
Pixtral 12B (2409) | |||
Qwen QwQ 32B Preview | |||
Qwen2-VL 72B Instruct | |||
Qwen2-VL 7B Instruct | |||
Qwen2.5 72B Instruct | |||
Qwen2.5 Coder 32B Instruct | |||
AI21 | $10 for 3 months | Jamba/Jurrasic-2 | |
Upstage | $10 for 3 months | Solar Pro/Mini | |
NLP Cloud | $15 | Phone number verification | Various open models |