Skip to content

json: document schema conversion in GBNF readme, align manual grammar examples & converters #7841

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 4 commits into from
Jun 11, 2024

Conversation

ochafik
Copy link
Collaborator

@ochafik ochafik commented Jun 10, 2024

cc/ @HanClinto @ExtReMLapin

@github-actions github-actions bot added testing Everything test related examples python python script changes server labels Jun 10, 2024
@@ -57,7 +57,7 @@ std::unordered_map<std::string, BuiltinRule> PRIMITIVE_RULES = {
{"object", {"\"{\" space ( string \":\" space value (\",\" space string \":\" space value)* )? \"}\" space", {"string", "value"}}},
{"array", {"\"[\" space ( value (\",\" space value)* )? \"]\" space", {"value"}}},
{"uuid", {"\"\\\"\" [0-9a-fA-F]{8} \"-\" [0-9a-fA-F]{4} \"-\" [0-9a-fA-F]{4} \"-\" [0-9a-fA-F]{4} \"-\" [0-9a-fA-F]{12} \"\\\"\" space", {}}},
{"char", {"[^\"\\\\] | \"\\\\\" ([\"\\\\/bfnrt] | \"u\" [0-9a-fA-F]{4})", {}}},
{"char", {"[^\"\\\\\\x7F\\x00-\\x1F] | [\\\\] ([\"\\\\bfnrt] | \"u\" [0-9a-fA-F]{4})", {}}},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these possibly be updated to use the new . operator, or is now not the time for that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, we can't do that, because we need to exclude backslashes from this list. Nevermind, carry on! :)


# Optional space: by convention, applied in this grammar after literal chars when allowed
ws ::= ([ \t\n] ws)?
ws ::= [ \t\n]{,20}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a good change -- I like constraining the output in this way. Could even consider limiting it to something more restrictive like {,4} or {,8}, but this is a good start.

Copy link
Collaborator Author

@ochafik ochafik Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've drafted an updated space rule in #7866. No matter what the bound is with this current syntax, models like Llama-3-8B & Phi-3-mini seem keen to misuse it. But given near-unlimited indent space only (and only 1 newline at a time), they're very sensible.

Copy link
Collaborator

@HanClinto HanClinto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! 👍

@ochafik ochafik marked this pull request as ready for review June 10, 2024 23:59
@ochafik ochafik changed the title [WIP] json: document schema conversion in GBNF readme, align manual grammar examples & converters json: document schema conversion in GBNF readme, align manual grammar examples & converters Jun 10, 2024
@ochafik ochafik merged commit 396b18d into ggml-org:master Jun 11, 2024
38 of 55 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
examples python python script changes server testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants