-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
95 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,103 @@ | ||
# Superpipe Studio | ||
|
||
Superpipe Studio is the companion to the Superpipe SDK that helps you: | ||
Superpipe Studio is a free and open-source observability and experimentation app for the Superpipe SDK. It can help you: | ||
|
||
- **Visualize the results of your Superpipe pipelines** | ||
- **Manually update candidate labels to create high-quality ground truth data** | ||
- **Track and compare experiments and grid searches** | ||
- **Deploy and monitor Superpipe pipelines in production** | ||
- **Improve your pipelines with models fine-tuned on your data** | ||
- **Log and monitor results of your Superpipe pipelines in dev or production** | ||
- **Manage datasets and build golden sets for ground truth labeling and evaluation** | ||
- **Track experiments/grid searches and compare them on accuracy, latency and cost** | ||
|
||
Superpipe Studio will be open-source and available for self-hosting or using our hosted instance. | ||
|
||
If you're interested in early access, please email [studio@villagecomputing.co](mailto:studio@villagecomputing.co). | ||
Superpipe Studio is a Next JS app that can be run locally or self-hosted with Vercel. | ||
|
||
**Demo** | ||
|
||
<div style="position: relative; padding-bottom: 67.5%; height: 0;"><iframe src="https://www.loom.com/embed/fba6211c77204f35a70a50090d1e7001?sid=8f228e4c-27be-4dfa-a6ec-8befa6a55f54" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div> | ||
|
||
## Running Superpipe Studio | ||
|
||
To get Superpipe Studio running locally or to self-host with Vercel follow the instructions in the [Studio Github](https://github.com/villagecomputing/studio) readme. | ||
|
||
## Usage with Superpipe | ||
|
||
1. Install the superpipe-studio python library with `pip install superpipe-studio`. Also make sure you're on the latest version of superpipe (`pip install superpipe-py -U`). | ||
2. Set the following environment variables: | ||
1. `SUPERPIPE_STUDIO_URL` = the url where your studio instance is hosted | ||
2. `SUPERPIPE_API_KEY` = your Superpipe API key if running with authentication (see the [Authentication](https://github.com/villagecomputing/studio) section) | ||
|
||
### Logging | ||
|
||
To log a pipeline to Studio, simply pass in `enable_logging=True` when calling `pipeline.run`. | ||
|
||
```python | ||
input = { | ||
... | ||
} | ||
pipeline.run(data=input, enable_logging=True) | ||
``` | ||
|
||
It’s helpful to set the pipeline’s `name` field when initializing it to identify the pipeline logs in Studio. | ||
|
||
### Datasets | ||
|
||
Creating a Studio dataset uploads the data to Studio where you can visualize it in a convenient interface. It also allows you to use the same dataset across experiments. | ||
|
||
Datasets are created by calling the constructor of the `Dataset` class and passing in a pandas dataframe. | ||
|
||
```python | ||
from studio import Dataset | ||
import pandas as pd | ||
|
||
df = pd.DataFrame(...) | ||
dataset = Dataset(data=df, name="furniture", ground_truths=["brand_name"]) | ||
``` | ||
|
||
**Ground truth columns** | ||
|
||
TODO | ||
|
||
You can also download a dataset that already exists in Studio by passing in its `id`. | ||
|
||
```python | ||
id = dataset.id | ||
dataset_copy = Dataset(id=id) | ||
``` | ||
|
||
To add data to an existing dataset, call the `add_data` function on a dataset and pass in a pandas dataframe. | ||
|
||
```python | ||
df = pd.DataFrame(...) | ||
dataset.add_data(data=df) | ||
``` | ||
|
||
### Experiments | ||
|
||
Experiments in Studio help you log the results of running a pipeline or a grid search on a dataset, so you can evaluate their accuracy, cost and speed and compare pipelines objectively. | ||
|
||
To run a pipeline experiment, define your pipeline as usual, call `pipeline.run_experiment` and pass in a pandas dataframe, a Studio dataset object or a dataset id string. If you pass in a dataframe, a Studio dataset object will be created and can be reused for future experiments. If you pass in a dataset id, a Studio dataset will be downloaded. | ||
|
||
```python | ||
import pandas as pd | ||
|
||
df = pd.DataFrame(...) | ||
pipeline.run_experiment(data=df) | ||
``` | ||
|
||
To run a grid search experiment, define your grid search as usual, call `grid_search.run_experiment`. Everything else is the same as a pipeline experiment, but you will see one experiment created for each set of parameters in the grid search. | ||
|
||
```python | ||
grid_search.run_experiment(data=df) | ||
``` | ||
|
||
**Experiment groups** | ||
|
||
Studio intelligently groups pipeline experiments so you can compare them more easily. Two experiments will be added to the same group if: | ||
|
||
1. they have the same name, and | ||
2. they have the same steps (but the steps can have different params) | ||
|
||
Grid searches are automatically grouped into the same experiment group. | ||
|
||
## Usage without Superpipe | ||
|
||
Studio can be used even if you're not using Superpipe to build your LLM pipelines. | ||
|
||
You can interact with Studio directly via REST API - see [API docs here](https://superpipe.vercel.app/api-doc). |