In this lesson, we'll be taking a look at how we can use a remotely hosted LLM model.
- What is Replicate?
- Installing Replicate
- Getting your own Replicate API token
- Setting the Replicate API token
- Run an LLM model
- Summary
Replicate is an online platform that lets user run machine learning models in a few lines of code without the need to understand how machine learning works. This is particularly helpful for integrating machine learning functionality in any websites with no overhead on the model training and maintenance aspects.
Such models are accessible via a simple API call and Replicate's model page provide code snippets (available in Node.js, Python and HTTP) to get users started in provisioning their own projects.
In addition, to code snippets, another notable feature on the model page is the Demo that allows the user to play with the LLM model. Go ahead, try adjusting the prompt and model parameters and see how it works.
Of the above mentioned methods of using Replicate, we're going to use it via a Python library.
Let's install the replicate
library via pip
as follows:
pip install replicate
Now, we're good to go!
Watch the following screencast to get your own Replicate API token.
In order to allow our Python script access to the forthcoming LLM model, we'll need to assign the Replicate API token as an environment variable so that it is in memory for subsequent authorization by the Replicate platform.
import os
os.environ["REPLICATE_API_TOKEN"] = "r8_xxxxxxxxxxxxxxxxxxx"
To perform an LLM response generation, we'll need to:
- Import the
replicate
library - Define our system prompt (i.e. herein defined as the
pre_prompt
variable) and prompt input - Generate the LLM response by calling the
replicate.run()
method along with specifying the LLM model to use, the prompt input as well as model parameters.
import replicate
# Prompts
pre_prompt = "You are a helpful assistant. You do not respond as 'User' or pretend to be 'User'. You only respond once as 'Assistant'."
prompt_input = "What is Streamlit?"
# Generate LLM response
output = replicate.run('a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5', # LLM model
input={"prompt": f"{pre_prompt} {prompt_input} Assistant: ", # Prompts
"temperature":0.1, "top_p":0.9, "max_length":128, "repetition_penalty":1}) # Model parameters
In this lesson, you've learned how to use a remotely hosted LLM model.