-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
WIP: checkpoint on preserve CR issue #1244
base: main
Are you sure you want to change the base?
Conversation
checkpoint - possibly unnecessary files
Does Daffodil ever really need the CRL -> LF canonicalization? If CR is part of an actual field and is not used as a delimiter or something, it seems we should never do this lossy mapping. Maybe we should just always disable CR -> LF canonicalization and convert CR to PUA. It makes infosets messier, but enures we don't lose data. |
I agree. We should not, in my view, ever convert CR to LF. We should preserve it. I think we could do better than to convert it to the PUA though. See last comment on DAFFODIL-1559. I think we could map CRLF to U+202B + LF, and isolated CR to NEL(U+2028), and back for unparsing. This would be more palatable for such data as the data would not look polluted by wierd glyphs that get assigned to the PUA characters. No matter what we need to supply several variations here, so we need to settle on the scheme for how users parameterize the InfosetOutputters and Inputters needs to be agreed upon. The default scheme needs to be exactly what we do today. Maybe Daffodil v4.0.0 can change the default behavior for XML mappings. |
I thought mapping CR to LF was a regression that should be fixed? Or have we always done this mapping?
Agreed. Interesting idea to have more complex but more human friendly mappings. It does potentially cause issues with validation since changing the mapping requires different validation pattern facets and thus a different schema. |
Actually, I am not sure about this. We need to look at the Daffodil release before Jan 2023 and test it there. |
I tested as far back as 3.1.0 and all of them did the CR -> LF mapping. So this is not a regression. Maybe not the preferred behavior, but probably a change we wait for 4.0.0 to change. |
Thanks for doing that. Fixing this functionally is very easy. We will need a switch to control it, and putting in a switch/param to enable/disable/choose the behavior is the stumbling block. I believe you suggested we do this with a new properties-file method, as opposed to extending the current tunables system to include this behavior. If you can write that up for the dev list then it's probably pretty easy to agree on and we can implement it with this as the driving use case. |
This PR is the reawakening of this issue, which a bunch of work was done on, but the PR never got completed:
See: #912
DAFFODIL-1559