-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
https://github.com/invisibleXML/ixml/blob/master/samples/URI/rfc-3987.ixml #139
Comments
Also unused rules: |
Regarding "a"-"f", the ABNF doesn't include the lowercase versions, but the relevant part of RFC 2234 is apparently:
So all quoted strings in the ABNF form have to be changed to support mixed case. |
Removing unused rules, more rules become unused: |
One source of ambiguity is: This is because "192.168.0.org" is a valid ireg-name, and they don't bother to discern. That is "192.168.0.0" matches ireg-name anyway. And that is because they are lazy and don't discern subdomains, just allowing a host to be any mixture of ALPHA | DIGIT | "-" | "." | "_" | "~" | ucschar. (which I believe isn't syntactically valid) |
I believe all other parts of the grammar already supports mixed case. |
It might be useful to write test cases against the sample grammars. My processor in |
Commenting out the use of ipv4 in ihost makes all my test examples (not a huge number) unambiguous. |
Some of the suggestions in this issue seem to me to make sense; others do not. Our judgement may depend on what we think the purpose of the exercise is. My goal was an ixml translation of the grammar in the RFC, with marks to make the XML nicer (for some subjective judgement of 'niceness'). I did not think the goal was to suggest improvements to the normative grammar in the RFC. I don't object in principle to a sample grammar that deviates in well defined ways from the normative grammar for the language in question, but I think it needs to be strongly motivated and the deviations clearly explained. If we think, for example, that the ixml grammar would be more useful if we made host and ihost unambiguous, or if ireg-name were defined as
or as
then we can do so, but we need to explain (first to each other and then to the public) why we think that's more helpful and what class of domain names will be grammatical in the normative grammar but ungrammatical in ours, or vice versa, and why we think deviating from the normative spec for those domain names will probably not matter in practice. So far, I haven't seen any reason to change my understanding of the goal of these grammars.
My apologies if anything in this comment seems terse or ungenerous; my ego seems to be reacting with less equanimity than one could wish to some of the wording in the comments on this issue. |
My apologies if anything in this comment seems terse or ungenerous; my ego
seems to be reacting with less equanimity than one could wish to some of
the wording in the comments on this issue.
Oh, I'm sorry if I offended you. Recognising the grammar as a direct
transliteration of the RFC 3987 grammar, I didn't think you would feel any
personal ownership, otherwise I would have tempered my language.
Any criticism that there was was entirely directed at messrs Duerst and
Suignard (both of whom I know personally) for the inconsistencies in their
grammar, even though I am entirely grateful that they produced such a
grammar. Try to find one for internationalised email addresses and you end
up in a twisty maze of passages all alike.
(I should point out that I was forced to turn RFC 3987 into a regular
expression for the XForms spec.)
But we should recognise that the purpose of RFC 3987 is to define the
syntax of a correct IRI; our purpose on the other hand is to reveal the
structure.
For that reason I personally would prefer, to take an example, the
(sub-)grammar for IPv6 to have the form:
IPv6: h4**":";
h4**":", zeros, h4**":".
h4: h;
h, h;
h, h, h;
h, h, h, h.
zeros: "::". h: ["0"-"9"; "A"-"F"; "a"-"f"].
rather than the hoops that they have to jump through to ensure that there
are no more than 8 colons in an IPv6 address.
|
Some of the suggestions in this issue seem to me to make sense; others do
not.
Our judgement may depend on what we think the purpose of the exercise is.
My goal was an ixml translation of the grammar in the RFC, with marks to
make the XML nicer (for some subjective judgement of 'niceness'). I did not
think the goal was to suggest improvements to the normative grammar in the
RFC.
Absolutely understood. But as I also said elsewhere, published syntaxes are
typically to define what is correct. while our aim is to expose structure.
The imperfect syntax of ihost in rfc3987 being a point in case.
I don't object in principle to a sample grammar that deviates in well
defined ways from the normative grammar for the language in question, but I
think it needs to be strongly motivated and the deviations clearly
explained. If we think, for example, that the ixml grammar would be more
useful if we made host and ihost unambiguous, or if ireg-name were defined
as
ireg-name = label ++ ".".
label = ...
or as
ireg-name = (sub-domain ** ".", ".")?, TLD.
sub-domain = label.
TLD = label.
-label = ...
then we can do so, but we need to explain (first to each other and then to
the public) why we think that's more helpful and what class of domain names
will be grammatical in the normative grammar but ungrammatical in ours, or
vice versa, and why we think deviating from the normative spec for those
domain names will probably not matter in practice. So far, I haven't seen
any reason to change my understanding of the goal of these grammars.
I think one principle of ixml supplied grammars should be: you don't need
to reparse any subtrees.
It would probably be better to make IRI-reference the start symbol for the
IRI grammar
Sounds good, then several other nonterminals become reachable (but still
not absolute-IRI, ipath, reserved, gen-delims, CR, DQUOTE, LF and SP.)
Steven
|
(Sorry, ctrl-return sends the message, so if I take my finger off the ctrl
too late, it sends. Here is the message as intended.)
My apologies if anything in this comment seems terse or ungenerous; my ego
seems to be reacting with less equanimity than one could wish to some of
the wording in the comments on this issue.
Oh, I'm sorry if I offended you. Recognising the grammar as a direct
transliteration of the RFC 3987 grammar, I didn't think you would feel any
personal ownership, otherwise I would have tempered my language.
Any criticism that there was was entirely directed at messrs Duerst and
Suignard (both of whom I know personally)
for the inconsistencies in their grammar, even though I am entirely
grateful that they produced such a grammar. Try to find one for
internationalised email addresses and you end up in a twisty maze of
passages all alike.
(I should point out that I was forced to turn RFC 3987 into a regular
expression for the XForms spec.)
But we should recognise that the purpose of RFC 3987 is to define the
syntax of
a correct IRI; our purpose on the other hand is to reveal the
structure.
For that reason I personally would prefer, to take an example, the
(sub-)grammar for IPv6 to have a form like:
IPv6: h4**":";
h4**":", zeros, h4**":".
h4: h;
h, h;
h, h, h;
h, h, h, h.
zeros: "::".
-h: ["0"-"9"; "A"-"F"; "a"-"f"].
rather than the hoops that they have to jump through to ensure
that there are no more than 8 colons in an IPv6 address.
This makes our grammar easier to manage, and easier to read, at the expense
of allowing more than 8 colons in an IPv6 address. Is that good or bad? It
depends.
Steven
On Friday 12 August 2022 22:26:00 (+02:00), Steven Pemberton wrote:
My apologies if anything in this comment seems terse or ungenerous; my ego
seems to be reacting with less equanimity than one could wish to some of
the wording in the comments on this issue.
Oh, I'm sorry if I offended you. Recognising the grammar as a direct
transliteration of the RFC 3987 grammar, I didn't think you would feel any
personal ownership, otherwise I would have tempered my language.
Any criticism that there was was entirely directed at messrs Duerst and
Suignard (both of whom I know personally)
for the inconsistencies in their grammar, even though I am entirely
grateful that they produced such a grammar. Try to find one for
internationalised email addresses and you end up in a twisty maze of
passages all alike.
(I
should point out that I was forced to turn RFC 3987 into a regular
expression for the XForms spec.)
But
we should recognise that the purpose of RFC 3987 is to define the syntax of
a correct IRI; our purpose on the other hand is to reveal the
structure.
For
that reason I personally would prefer, to take an example, the
(sub-)grammar for IPv6 to have the form:
IPv6: h4**":";
h4**":", zeros, h4**":".
h4: h;
h, h;
h, h, h;
h, h, h, h.
zeros: "::". h:
["0"-"9"; "A"-"F"; "a"-"f"].
rather than the hoops that they have to jump through to ensure
that there are no more than 8 colons in an IPv6 address.
|
Although this is straight out of the RFC, it is not good enough for proper use.
HEXDIG should include "a"-"f"
ipchar, iunreserved and ucschar should have a "-" before the rule.
The grammar is ambiguous, but that needs work to investigate (on it).
The text was updated successfully, but these errors were encountered: