Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Adds guidance extension #2554

Closed
wants to merge 5 commits into from
Closed

Adds guidance extension #2554

wants to merge 5 commits into from

Conversation

paolorechia
Copy link

What is this
Adds a small API wrapper to the guidance library (https://github.com/microsoft/guidance), using the model loaded by oobabooga's UI.

Use cases in mind
Implementation of chain of thought flows (and more complex flows) with guidance, using oobabooga as the model loader, so people can have an easy time loading GPTQ and other types of models.

Why

I've tried adding support to oobabooga's existing API to the guidance library (https://github.com/paolorechia/local-guidance), however not all features were possible to support, since guidance depends on logprobs and other data fields from the Hugging Face API for certain features.

Limitations
Only works for HuggingFace and GPTQ models so far. Someone already opened a PR in guidance to add supported llama cpp Python bindings: guidance-ai/guidance#70 - with this, we'll be able to support LLaMA GGML models.

How to use
The exposed API endpoint can easily be used with the thin wrapper (andromeda-chain):

pip install andromeda-chain

Repository: https://github.com/ChuloAI/andromeda-chain
Example code:

from andromeda_chain import AndromedaChain, AndromedaPrompt, AndromedaResponse

chain = AndromedaChain("http://0.0.0.0:9000/guidance_api/v1/generate")

prompt = AndromedaPrompt(
    name="hello",
    prompt_template="""Howdy: {{gen 'expert_names' temperature=0 max_tokens=300}}""",
    input_vars=[],
    output_vars=["expert_names"]
)

response: AndromedaResponse = chain.run_guidance_prompt(prompt)

Alternatively, the extension can be used by just implementing a simple HTTP client.

@bilwis
Copy link

bilwis commented Jun 10, 2023

EDIT: Okay, nvm, it's late here. Turns out I just wasn't loading the guidance extension 🤦. I'm going to leave this up for posterity.

Small suggestion though, the description parser.add_argument('--guidance-device', type=str, default='cuda', help='The listening port for the blocking guidance API.') should probably read something else. I'm definitely blaming that for thinking it'd be the network device...

Thanks again for your work! Can't wait to try it out when I've had some sleep 😄


When running your fork with python server.py --listen-port 7500 --xformers --api --guidance --guidance-port 8000 --guidance-device 0.0.0.0 --listen, trying to connect to the API with

from andromeda_chain import AndromedaChain, AndromedaPrompt, AndromedaResponse

chain = AndromedaChain("http://localhost:8000/guidance_api/v1/generate")

prompt = AndromedaPrompt(
    name="hello",
    prompt_template="""Howdy: {{gen 'expert_names' temperature=0 max_tokens=300}}""",
    input_vars=[],
    output_vars=["expert_names"]
)

response: AndromedaResponse = chain.run_guidance_prompt(prompt)

results in the following error:

ConnectionError: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /guidance_api/v1/generate (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f53cc622800>: Failed to establish a new connection: [Errno 111] Connection refused'))

Reinstalling dependencies, loading the model before the extensions, accessing from different ports, etc., all didn't solve the issue.

Not sure what the problem is, here, and sadly my coding knowledge doesn't reach far enough to tackle this. Hopefully, you can shine a light on it. Thanks for the work with the repo, btw! I'd love to get guidance running without having to (re)load the model in my notebooks all the time.

I'm running on WSL2 (Ubuntu 22.04.2 LTS), installed packages are as follows:

accelerate 0.20.3
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
altair 4.2.2
andromeda-chain 0.2
annoy 1.17.2
anyio 3.6.2
asciitree 0.3.3
asttokens 2.2.1
async-timeout 4.0.2
attrs 22.2.0
auto-gptq 0.2.2+cu117
backcall 0.2.0
backoff 2.2.1
beautifulsoup4 4.12.2
bitsandbytes 0.39.0
blinker 1.6.2
blis 0.7.9
boto3 1.26.137
botocore 1.29.137
brotlipy 0.7.0
cachetools 5.3.1
catalogue 2.0.8
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 2.0.4
chromadb 0.3.18
click 8.1.3
clickhouse-connect 0.5.24
cmake 3.26.3
colorama 0.4.6
confection 0.0.4
contourpy 1.0.7
cryptography 39.0.1
cycler 0.11.0
cymem 2.0.7
datasets 2.10.1
decorator 5.1.1
deepspeed 0.8.2
dill 0.3.6
diskcache 5.6.1
docopt 0.6.2
duckdb 0.8.0
einops 0.6.1
en-core-web-sm 3.5.0
encodec 0.1.1
entrypoints 0.4
exceptiongroup 1.1.1
executing 1.2.0
fastapi 0.95.0
fasteners 0.18
ffmpy 0.3.0
filelock 3.9.0
Flask 2.3.2
flask-cloudflared 0.0.12
flexgen 0.1.7
flit_core 3.6.0
fonttools 4.39.2
frozenlist 1.3.3
fsspec 2023.3.0
funcy 2.0
gmpy2 2.1.2
gptcache 0.1.30
gptq-llama 0.2.2
gradio 3.33.1
gradio_client 0.2.5
guidance 0.0.61
h11 0.14.0
hjson 3.1.0
hnswlib 0.7.0
httpcore 0.16.3
httptools 0.5.0
httpx 0.23.3
huggingface-hub 0.14.1
idna 3.4
iniconfig 2.0.0
ipython 8.13.2
itsdangerous 2.1.2
jedi 0.18.2
Jinja2 3.1.2
jmespath 1.0.1
joblib 1.2.0
jsonschema 4.17.3
kiwisolver 1.4.4
langcodes 3.3.0
linkify-it-py 2.0.0
lit 16.0.5
llama-cpp-python 0.1.57
lz4 4.3.2
Markdown 3.4.3
markdown-it-py 2.2.0
MarkupSafe 2.1.1
matplotlib 3.7.1
matplotlib-inline 0.1.6
mdit-py-plugins 0.3.3
mdurl 0.1.2
mkl-fft 1.3.1
mkl-random 1.2.2
mkl-service 2.4.0
monotonic 1.6
mpmath 1.2.1
msal 1.22.0
multidict 6.0.4
multiprocess 0.70.14
murmurhash 1.0.9
mypy-extensions 1.0.0
nest-asyncio 1.5.6
networkx 2.8.4
ninja 1.11.1
nltk 3.8.1
num2words 0.5.12
numcodecs 0.11.0
numpy 1.24.2
openai 0.27.8
orjson 3.8.7
packaging 23.0
pandas 1.5.3
parsimonious 0.10.0
parso 0.8.3
pathy 0.10.1
peft 0.4.0.dev0
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.5.0
pip 23.1.2
platformdirs 3.5.3
pluggy 1.0.0
posthog 2.4.2
preshed 3.0.8
prompt-toolkit 3.0.38
protobuf 3.20.2
psutil 5.9.4
ptyprocess 0.7.0
PuLP 2.7.0
pure-eval 0.2.2
py-cpuinfo 9.0.0
pyarrow 11.0.0
pycparser 2.21
pycryptodome 3.17
pydantic 1.10.6
pydub 0.25.1
Pygments 2.15.1
pygtrie 2.5.0
PyJWT 2.7.0
pyOpenSSL 23.0.0
pyparsing 3.0.9
pyre-extensions 0.0.29
pyrsistent 0.19.3
PySocks 1.7.1
pytest 7.2.2
python-dateutil 2.8.2
python-dotenv 1.0.0
python-multipart 0.0.6
pytz 2022.7.1
PyYAML 6.0
regex 2022.10.31
requests 2.28.2
responses 0.18.0
rfc3986 1.5.0
rouge 1.0.1
rwkv 0.7.3
s3transfer 0.6.1
safetensors 0.3.1
scikit-learn 1.2.2
scipy 1.10.1
semantic-version 2.10.0
sentence-transformers 2.2.2
sentencepiece 0.1.97
setuptools 67.8.0
six 1.16.0
smart-open 6.3.0
sniffio 1.3.0
soupsieve 2.4.1
spacy 3.5.3
spacy-legacy 3.0.12
spacy-loggers 1.0.4
srsly 2.4.6
stack-data 0.6.2
starlette 0.26.1
suno-bark 0.0.1a0
sympy 1.11.1
texttable 1.6.7
thinc 8.1.10
threadpoolctl 3.1.0
tiktoken 0.4.0
tokenizers 0.13.3
toml 0.10.2
tomli 2.0.1
toolz 0.12.0
torch 2.0.0
torchaudio 2.0.0
torchvision 0.15.0
tqdm 4.65.0
traitlets 5.9.0
transformers 4.30.0
triton 2.0.0
typer 0.7.0
typing_extensions 4.5.0
typing-inspect 0.8.0
uc-micro-py 1.0.1
urllib3 1.26.14
uvicorn 0.21.1
uvloop 0.17.0
wasabi 1.1.1
watchfiles 0.19.0
wcwidth 0.2.6
websockets 11.0.2
Werkzeug 2.3.6
wheel 0.40.0
xformers 0.0.19
xxhash 3.2.0
yarl 1.8.2
zarr 2.14.2
zstandard 0.21.0

@paolorechia
Copy link
Author

Hi, @bilwis, thanks for trying it out and finding this issue. I was going to point the argument looked weird, but you found it yourself first :)

It used the api extension code as a base, so I forgot to update the description, sorry about the confusion. I’ll fix this description hopefully tomorrow or during the week.

Let me know how it goes for you.

@bilwis
Copy link

bilwis commented Jun 11, 2023

Here we are again, well rested but still stupid. I've got a question about passing input variables. You've got the input_vars field in AndromedaPrompt, but does this actually do anything? From what I've figured out, you have to pass the variables to the run_guidance_prompt command.

input_vars = {'word': 'Howdy'}

prompt = AndromedaPrompt(
    name = 'test',
    prompt_template = """{{word}}: {{gen 'response'}}""",
    input_vars = [], #What do I put here?
    output_vars = ['response']
)

response: AndromedaResponse = chain.run_guidance_prompt(prompt, input_vars)

Again, thanks for your work, I hope it'll be merged soon, and that more people get to play around with guidance.

@paolorechia
Copy link
Author

Here we are again, well rested but still stupid. I've got a question about passing input variables. You've got the input_vars field in AndromedaPrompt, but does this actually do anything? From what I've figured out, you have to pass the variables to the run_guidance_prompt command.

input_vars = {'word': 'Howdy'}

prompt = AndromedaPrompt(
    name = 'test',
    prompt_template = """{{word}}: {{gen 'response'}}""",
    input_vars = [], #What do I put here?
    output_vars = ['response']
)

response: AndromedaResponse = chain.run_guidance_prompt(prompt, input_vars)

Again, thanks for your work, I hope it'll be merged soon, and that more people get to play around with guidance.

My fault that the documentation is not clear. You should pass a dictionary of values for the variables that are not generated. You have already it defined as {“word”: “Howdy”} in the example.

@oobabooga
Copy link
Owner

Could you please submit the extension to https://github.com/oobabooga/text-generation-webui-extensions? I am not familiar enough with guidance to properly maintain the extension in the future, and would prefer to have something integrated with the UI rather than an additional API.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants