Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Support simple data types? #239

Open
ndw opened this issue Mar 17, 2024 · 0 comments
Open

Support simple data types? #239

ndw opened this issue Mar 17, 2024 · 0 comments

Comments

@ndw
Copy link
Contributor

ndw commented Mar 17, 2024

I was toying about with a grammar that included a nonterminal to match an ISO 8601 date.

I started with this:

  ixml version "1.1".

  date = year, -'-', month, -'-', day .
  year = d+ .
  month = d, d.
  day = d, d .
  -d = ["0" - "9"].

That’s probably fine most of the time. But I was noodling about for fun so, naturally, I wanted to do better than that.

I quickly moved on to:

  ixml version "1.1".

  date = year, -'-', month, -'-', day .
  year = d+ .
  month = ["0"-"1"], d.
  day = ["0"-"2"], d | "30" | "31" .
  -d = ["0" - "9"].

That’s better and only marginally more complicated. It still lets every month have 31 days though.

The next level of improvement is obvious, if not especially tidy:

  ixml version "1.1".

  date = year, -'-', monthDay .
  year = d+ .
 -monthDay = month30, -'-', day30 | month31, -'-', day31 | month28, -'-', day28 .
  month31>month = "01" | "03" | "05" | "07" | "08" | "10" | "12" .
  day31>day = ["0"-"2"], d | "30" | "31" .
  month30>month = "04" | "06" | "09" | "11" .
  day30>day = ["0"-"2"], d | "30" .
  month28>month = "02" .
  day28>day = ["0"-"2"], d .
  -d = ["0" - "9"] .

You will perhaps be relieved † to hear that I wasn’t sufficiently interested in the intellectual challenge of working out if I could constrain year such that it would be possible to handle February in non-leap years correctly.

All of this is a bit tedious especially because there's a dead easy way to specify a date type: xs:date. And there's lots of software that will validate that your input conforms to an xs:date. That got me thinking, what if I could write this:

  date!date = year, -'-', month, -'-', day .

Where the !date part means that the nonterminal must be a valid xs:date in addition to satisfying the grammatical constraints expressed on the right hand side.

If we allow implementations to provide their own data type libraries (as, for example, RELAX NG does for simple data types), it would open up a wide range of possibilities for solving otherwise very difficult problems.

I had a proof of concept implemented (in my Earley parser, anyway) in the space of about half an hour. FWIW.


† I lied

  ixml version "1.1".

  date = leapYear, -'-', leapMonthDay | nonLeapYear, '-', nonLeapMonthDay .

  -leapYear>year = d+, ("00" | "04" | "08" | "12" | "16" | "20" | "24" | "28" | "32" | "36" | "40" |
                        "44" | "48" | "52" | "56" | "60" | "64" | "68" | "72" | "76" | "80" | "84" |
                        "88" | "92" | "96") .

  -nonLeapYear>year = d+, ("01" | "02" | "03" | "05" | "06" | "07" | "09" | "10" | "11" | "13" | "14" |
                           "15" | "17" | "18" | "19" | "21" | "22" | "23" | "25" | "26" | "27" | "29" |
                           "30" | "31" | "33" | "34" | "35" | "37" | "38" | "39" | "41" | "42" | "43" |
                           "45" | "46" | "47" | "49" | "50" | "51" | "53" | "54" | "55" | "57" | "58" |
                           "59" | "61" | "62" | "63" | "65" | "66" | "67" | "69" | "70" | "71" | "73" |
                           "74" | "75" | "77" | "78" | "79" | "81" | "82" | "83" | "85" | "86" | "87" |
                           "89" | "90" | "91" | "93" | "94" | "95" | "97" | "98" | "99").

 -leapMonthDay = month30, -'-', day30 | month31, -'-', day31 | month28, -'-', day29 .
 -nonLeapMonthDay = month30, -'-', day30 | month31, -'-', day31 | month28, -'-', day28 .
  month31>month = "01" | "03" | "05" | "07" | "08" | "10" | "12" .
  day31>day = ["0"-"2"], d | "30" | "31" .
  month30>month = "04" | "06" | "09" | "11" .
  day30>day = ["0"-"2"], d | "30" .
  month28>month = "02" .
  day28>day = ["0"-"1"], d | "2" | ["0"-"8"].
  day29>day = ["0"-"2"], d .
  -d = ["0" - "9"] .

But I am genuinely going to leave the 400 year cycle problem to someone else.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant