Skip to content

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

License

Notifications You must be signed in to change notification settings

AI-Hypercomputer/JetStream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

b8b9cb2 · Mar 26, 2025
Jan 29, 2025
Mar 25, 2025
Feb 5, 2025
Mar 11, 2025
Mar 26, 2025
Mar 5, 2025
Mar 5, 2025
Jan 29, 2025
Jan 29, 2025
Jan 29, 2025
Jan 29, 2025
Mar 25, 2025
Jan 29, 2025
Jan 29, 2025
Jan 29, 2025
Mar 14, 2025
Mar 5, 2025

Repository files navigation

Unit Tests PyPI version PyPi downloads Contributions welcome

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.

About

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

JetStream Engine Implementation

Currently, there are two reference engine implementations available -- one for Jax models and another for Pytorch models.

Jax

Pytorch

Documentation

JetStream Standalone Local Setup

Getting Started

Setup

make install-deps

Run local server & Testing

Use the following commands to run a server locally:

# Start a server
python -m jetstream.core.implementations.mock.server

# Test local mock server
python -m jetstream.tools.requester

# Load test local mock server
python -m jetstream.tools.load_tester

Test core modules

# Test JetStream core orchestrator
python -m unittest -v jetstream.tests.core.test_orchestrator

# Test JetStream core server library
python -m unittest -v jetstream.tests.core.test_server

# Test mock JetStream engine implementation
python -m unittest -v jetstream.tests.engine.test_mock_engine

# Test mock JetStream token utils
python -m unittest -v jetstream.tests.engine.test_token_utils
python -m unittest -v jetstream.tests.engine.test_utils