Skip to content

json: refine whitespace rules to avoid runaways #7866

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Jun 11, 2024

Conversation

ochafik
Copy link
Collaborator

@ochafik ochafik commented Jun 11, 2024

Quick follow up to #7841 (@HanClinto I took the bait of your "good start" 🤪)

Defining whitespace as ws ::= | " " | "\n" [ \t]{0,20} allows compact inline {"a": 1}, spacy inline { "a" : 1 } and indented JSON, but disallows multiple empty lines, multiple spaces not in an indenting context, etc.

For instance, this removes the trailing spaces generated by the following call:

./main --log-disable --seed 1133 \
  --grammar-file grammars/json.gbnf \
  -mu https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q8_0.gguf \
  -p "Tell me a love story with a JSON structure:
"
Show output from master
{
  "characters": [
    {"name": "Alex", "age": 25, "occupation": "Software Engineer"},
    {"name": "Maya", "age": 28, "occupation": "Graphic Designer"}
  ],
  "story": {
    "beginning": "Alex and Maya met at a coffee shop in the heart of the city.",
    "middle": "They bonded over their shared love of art and technology, and soon became inseparable.",
    "end": "After a year of dating, Alex proposed to Maya with a custom-made ring and a romantic sunset view."
  }
}





  





  



Seems to perform similarly to master with the couple of attempts I've done.

Show benchmark commands
hyperfine \
  --warmup 1 --runs 5 \
  -L branch master,json-ws \
  --prepare 'git checkout {branch} && make clean && make -j LLAMA_CURL=1 main' \
  'branch={branch} \
    ./main --log-disable --seed 1133 \
      --grammar-file grammars/json.gbnf \
      -mu https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q8_0.gguf \
      -p "Tell me a love story with a JSON structure:
      "'

Note: also tested ws ::= | " " | "\n" (" "{0,10} | "\t"{0,10}) but it's slower, probably because of the extra stack / alternatives overhead.

@github-actions github-actions bot added testing Everything test related examples python python script changes server labels Jun 11, 2024
@ochafik ochafik changed the title json: refine whitespace rules to avoid runaways json: refine whitespace rules to avoid runaways Jun 11, 2024
@ochafik ochafik marked this pull request as ready for review June 11, 2024 01:09
Copy link
Collaborator

@HanClinto HanClinto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! 👍

@ochafik ochafik merged commit b61eb96 into ggml-org:master Jun 11, 2024
43 of 55 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
examples python python script changes server testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants