Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

The editor or API should prevent bad character encodings #867

Closed
mmo opened this issue Jul 5, 2022 · 1 comment · Fixed by #908
Closed

The editor or API should prevent bad character encodings #867

mmo opened this issue Jul 5, 2022 · 1 comment · Fixed by #908
Labels
bug Breaks something but is not blocking f: data About data model, importation, transformation, exportation of data, specific for bibliographic data p-High To set a high priority!

Comments

@mmo
Copy link
Collaborator

mmo commented Jul 5, 2022

How it works

When a document record contains character encoding problems, caused for instance when the cataloguer enters abstracts or other metadata by copying-pasting from PDF files, this affects OAI-PMH behaviour. Every PMH request that includes that record will fail with a server error:

All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

Improvement suggestion

The editor prevent to submit non authorised characters or, ideally, automatically correct it.

Alternative:

  • OAI-PMH requests should not fail due to character encoding problems in a single record. Records should be checked for character encoding problems. Possible approaches are (1=worst ... 4=best):
    1. During the OAI-PMH response: check each record for encoding problems and exclude it from the response, if needed
    2. During the OAI-PMH response: check each record for encoding problems and automatically sanitize it, if needed, before including it in the response
    3. During record creation: automatically sanitize the record before saving
    4. During record creation: issue an error and prevent the record to be created (ckeck server-side/client-side implications)
@mmo mmo added the enhancement Enhancement of an existing feature label Jul 5, 2022
@pronguen pronguen added bug Breaks something but is not blocking f: data About data model, importation, transformation, exportation of data, specific for bibliographic data p-High To set a high priority! and removed enhancement Enhancement of an existing feature labels Jul 5, 2022
@pronguen pronguen changed the title Make OAI-PMH responses more robust against bad character encodings The editor should prevent bad character encodings Aug 8, 2022
@pronguen
Copy link
Contributor

pronguen commented Aug 8, 2022

Similar to #861

@PascalRepond PascalRepond changed the title The editor should prevent bad character encodings The editor or API should prevent bad character encodings Aug 10, 2022
jma added a commit to jma/sonar that referenced this issue Nov 16, 2022
* Adds new `safety` exceptions.
* Removes controls chars when the dublin core xml file is produced.
* Closes rero#867.

Co-Authored-by: Johnny Mariéthoz <Johnny.Mariethoz@rero.ch>
jma added a commit to jma/sonar that referenced this issue Nov 16, 2022
* Adds new `safety` exceptions.
* Removes controls chars when the dublin core xml file is produced.
* Closes rero#867.

Co-Authored-by: Johnny Mariéthoz <Johnny.Mariethoz@rero.ch>
@jma jma closed this as completed in ac86d20 Nov 22, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Breaks something but is not blocking f: data About data model, importation, transformation, exportation of data, specific for bibliographic data p-High To set a high priority!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants