Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Bug in longturtle serialization #2767

Closed
mschiedon opened this issue Apr 15, 2024 · 6 comments · Fixed by #2997
Closed

Bug in longturtle serialization #2767

mschiedon opened this issue Apr 15, 2024 · 6 comments · Fixed by #2997

Comments

@mschiedon
Copy link

mschiedon commented Apr 15, 2024

The longturtle serializer fails to emit a whitespace separator between a predicate and a list of objects if one of these objects is a blank node (and the blank node cannot be 'inlined', i.e. is used more than once). The problem can be reproduced using this Python code:

from rdflib import Graph

input = '''\
@prefix ex: <https://example.org/> .

ex:1 a ex:Thing ;
    ex:relatedTo ex:3, _:bnode0 .

ex:2 a ex:Thing ;
    ex:relatedTo _:bnode0 .

_:bnode0 a ex:Thing .
'''

graph = Graph().parse(data=input, format='turtle')
output = graph.serialize(format='longturtle')
print(output)
assert output.find('relatedTo_:') == -1, \
    'Missing whitespace separation between predicate' \
    ' and the first blank node of a list of objects.'

The resulting Turtle with the bug looks like below. Note the missing space between the predicate ex:relatedTo and blank node _:n40fef3a41a034be9a7116df126afd613b1 for the ex:1 case. The ex:2 case does correctly use a space separator when serializing because it's a single object and not a list.

PREFIX ex: <https://example.org/>

ex:1
    a ex:Thing ;
    ex:relatedTo_:n40fef3a41a034be9a7116df126afd613b1 ,
        ex:3 ;
.

ex:2
    a ex:Thing ;
    ex:relatedTo _:n40fef3a41a034be9a7116df126afd613b1 ;
.

_:n40fef3a41a034be9a7116df126afd613b1
    a ex:Thing ;
.

I believe the issue might be solved by adding an additional indent in the longturtle.py source code on this line, as shown in the code below.

    def objectList(self, objects):
        count = len(objects)
        if count == 0:
            return
        depthmod = (count == 1) and 0 or 1
        self.depth += depthmod
        first_nl = False
        if count > 1:
            if not isinstance(objects[0], BNode):
                self.write("\n" + self.indent(1))
                # BUG: Gave below line an extra indent.
                first_nl = True
        self.path(objects[0], OBJECT, newline=first_nl)
        for obj in objects[1:]:
            self.write(" ,")
            if not isinstance(obj, BNode):
                self.write("\n" + self.indent(1))
            self.path(obj, OBJECT, newline=True)
        self.depth -= depthmod
@nicholascar
Copy link
Member

I think this issue has been addressed by PR #2700 but that fix is currently only in the HEAD of this repo, not an RDFlib release yet. It should appear in 7.0.1 or 7.1.0 in the next few weeks when we make that release which will fix a bunch of small things.

@mschiedon
Copy link
Author

mschiedon commented May 23, 2024

I think this issue has been addressed by PR #2700 but that fix is currently only in the HEAD of this repo, not an RDFlib release yet. It should appear in 7.0.1 or 7.1.0 in the next few weeks when we make that release which will fix a bunch of small things.

Excellent, thank you! I can confirm this addresses the issue. Looking forward to the next rdflib release then 👍

@danielpcampagna
Copy link

Are there any updates on this issue?

@mschiedon mschiedon reopened this Oct 21, 2024
@mschiedon
Copy link
Author

@nicholascar The release notes of rdflib 7.1.0 mention this:

2024-03-12 - Fix LongTurtle multi-BN object serialization bug #2700

The code changes from that fix ought to include something like this:

image

But I don't think I see that fix reflected in the code of the 7.1.0 release.

Could you please comment/investigate?

@edmondchuc
Copy link
Contributor

I think the changes made in that PR were inadvertently reverted by beda4fa in PR #2731.

From memory, mypy was failing and complaining that the else block was unreachable code. We may have to re-submit the PR again and add some annotations to instruct mypy to ignore the error.

@mschiedon
Copy link
Author

I think the changes made in that PR were inadvertently reverted by beda4fa in PR #2731.

From memory, mypy was failing and complaining that the else block was unreachable code. We may have to re-submit the PR again and add some annotations to instruct mypy to ignore the error.

Yes, that may be the commit where the fix was regressed. If the mypy complaint was about the elif in below code, that was probably a legitimate complaint, since something that is not a BNode would include an IRI and Literal, so the elif would then never trigger for the Literal.

            if not isinstance(objects[0], BNode):
                self.write("\n" + self.indent(1))
            elif isinstance(objects[0], Literal):
                self.write(" ")

Note that the above code would not have fixed the missing space separator between the predicate and the blank node identifier. The code from #2700 would, which I would hope (expect) mypy would not have an issue with:

            if not isinstance(objects[0], BNode):
                self.write("\n" + self.indent(1))
            else:
                self.write(" ")

edmondchuc added a commit that referenced this issue Nov 28, 2024
edmondchuc added a commit that referenced this issue Jan 15, 2025
* feat: sort longturtle blank nodes in the object position by their cbd string

* fix: #2767
edmondchuc added a commit that referenced this issue Jan 15, 2025
* feat: sort longturtle blank nodes in the object position by their cbd string

* fix: #2767
edmondchuc added a commit that referenced this issue Jan 15, 2025
* feat: sort longturtle blank nodes in the object position by their cbd string

* fix: #2767
edmondchuc added a commit that referenced this issue Jan 16, 2025
* feat: sort longturtle blank nodes in the object position by their cbd string

* fix: #2767
edmondchuc added a commit that referenced this issue Jan 16, 2025
* feat: sort longturtle blank nodes in the object position by their cbd string

* fix: #2767
nicholascar added a commit that referenced this issue Jan 16, 2025
* 7.1.1 post release (#2953)

* Fix Black formatting in ./admin/get_merged_prs.py (#2954)

* build(deps-dev): bump ruff from 0.7.0 to 0.7.1 (#2955)

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.0 to 0.7.1.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ruff@0.7.0...0.7.1)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ashley Sommer <ashleysommer@gmail.com>

* Fix defined namespace warnings (#2964)

* Fix defined namespace warnings

Current docs-generation tests are polluted by lots of warnings that occur when Sphinx tries to read various parts of DefinedNamespace.

* Fix tests that no longer need incorrect exceptions handled.

* fix black formatting in test file

* Undo typing changes, so this works on current pre-3.9 branch

* better handling for any/all double-underscore properties

* Don't include __slots__ in dir().

* test: earl test passing

* Annotate Serializer.serialize and descendants (#2970)

This patch aligns the type signatures on `Serializer` subclasses,
including renaming the arbitrary-keywords dictionary to always be
`**kwargs`.  This is in part to prepare for the possibility of adding
`*args` as a positional-argument delimiter.

References:
* #1890 (comment)

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>

* build(deps): bump orjson from 3.10.10 to 3.10.11 (#2966)

Bumps [orjson](https://github.com/ijl/orjson) from 3.10.10 to 3.10.11.
- [Release notes](https://github.com/ijl/orjson/releases)
- [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md)
- [Commits](ijl/orjson@3.10.10...3.10.11)

---
updated-dependencies:
- dependency-name: orjson
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps-dev): bump ruff from 0.7.1 to 0.7.2 (#2969)

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.1 to 0.7.2.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ruff@0.7.1...0.7.2)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps-dev): bump ruff from 0.7.2 to 0.7.3 (#2979)

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.2 to 0.7.3.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ruff@0.7.2...0.7.3)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps-dev): bump ruff from 0.7.3 to 0.8.0 (#2994)

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.3 to 0.8.0.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ruff@0.7.3...0.8.0)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump orjson from 3.10.11 to 3.10.12 (#2991)

Bumps [orjson](https://github.com/ijl/orjson) from 3.10.11 to 3.10.12.
- [Release notes](https://github.com/ijl/orjson/releases)
- [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md)
- [Commits](ijl/orjson@3.10.11...3.10.12)

---
updated-dependencies:
- dependency-name: orjson
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* added Node as an exported name from the root package location. Updated linting commands section in the developer section to use ruff check. (#2981)

* build(deps-dev): bump wheel from 0.45.0 to 0.45.1 (#2992)

Bumps [wheel](https://github.com/pypa/wheel) from 0.45.0 to 0.45.1.
- [Release notes](https://github.com/pypa/wheel/releases)
- [Changelog](https://github.com/pypa/wheel/blob/main/docs/news.rst)
- [Commits](pypa/wheel@0.45.0...0.45.1)

---
updated-dependencies:
- dependency-name: wheel
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Car <nick@kurrawong.net>

* feat: sort longturtle blank nodes (#2997)

* feat: sort longturtle blank nodes in the object position by their cbd string

* fix: #2767

* build(deps-dev): bump pytest from 8.3.3 to 8.3.4 (#2999)

Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.3.3 to 8.3.4.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](pytest-dev/pytest@8.3.3...8.3.4)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps-dev): bump poetry from 1.8.4 to 1.8.5 (#3001)

Bumps [poetry](https://github.com/python-poetry/poetry) from 1.8.4 to 1.8.5.
- [Release notes](https://github.com/python-poetry/poetry/releases)
- [Changelog](https://github.com/python-poetry/poetry/blob/1.8.5/CHANGELOG.md)
- [Commits](python-poetry/poetry@1.8.4...1.8.5)

---
updated-dependencies:
- dependency-name: poetry
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps-dev): bump ruff from 0.8.0 to 0.8.2 (#3003)

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.0 to 0.8.2.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ruff@0.8.0...0.8.2)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps-dev): bump ruff from 0.8.2 to 0.8.3 (#3010)

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.2 to 0.8.3.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ruff@0.8.2...0.8.3)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump berkeleydb from 18.1.11 to 18.1.12 (#3009)

Bumps [berkeleydb](https://www.jcea.es/programacion/pybsddb.htm) from 18.1.11 to 18.1.12.

---
updated-dependencies:
- dependency-name: berkeleydb
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# Conflicts:
#	poetry.lock

* build(deps): bump orjson from 3.10.12 to 3.10.13 (#3018)

Bumps [orjson](https://github.com/ijl/orjson) from 3.10.12 to 3.10.13.
- [Release notes](https://github.com/ijl/orjson/releases)
- [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md)
- [Commits](ijl/orjson@3.10.12...3.10.13)

---
updated-dependencies:
- dependency-name: orjson
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps-dev): bump ruff from 0.8.4 to 0.8.6 (#3025)

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.4 to 0.8.6.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ruff@0.8.4...0.8.6)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat: deterministic longturtle serialisation using RDF canonicalization + n-triples sort (#3008)

* feat: use the RGDA1 canonicalization algorithm + lexical n-triples sort to produce deterministic longturtle serialisation

* chore: normalise usage of format

* chore: apply black

* fix: double up of semicolons when subject is a blank node

* fix: lint

* jsonld: Do not merge nodes with different invalid URIs (#3011)

When parsing JSON-LD with invalid URIs in the `@id`, the
`generalized_rdf: True` option allows parsing these nodes as blank nodes
instead of outright rejecting the document.

However, all nodes with invalid URIs were mapped to the same blank node,
resulting in incorrect data. For example, without this patch, the new test
fails with:

```
AssertionError: Expected:
@Prefix schema: <https://schema.org/> .

<https://example.org/root-object> schema:author [ schema:familyName "Doe" ;
            schema:givenName "Jane" ;
            schema:name "Jane Doe" ],
        [ schema:familyName "Doe" ;
            schema:givenName "John" ;
            schema:name "John Doe" ] .

Got:
@Prefix schema: <https://schema.org/> .

<https://example.org/root-object> schema:author <> .

<> schema:familyName "Doe" ;
    schema:givenName "Jane",
        "John" ;
    schema:name "Jane Doe",
        "John Doe" .
```

* Fixed incorrect ASK behaviour for dataset with one element (#2989)

* Pass base uri to serializer when writing to file. (#2977)

Co-authored-by: Nicholas Car <nick@kurrawong.net>

* Dataset documentation improvements (#3012)

* example printout improvements

* added BN graph creation

* updated tests var names & added one subtest

* typos & improved formatting

* updated Graph & Dataset docco

* typo fix

* fix code-in-comment syntax

* fix code-in-comment syntax 2

* fix code-in-comment syntax - ellipses

* fix code-in-comment syntax - sort print loop output

* blacked

* ruff fixes

* Poetry 2.0.0 pyproject.toml file

* move to PEP621 (Poetry 2.0.0) pyproject.toml

* require poetry 2.0.0

* require poetry 2.0.0

* add in requirement for poetry-plugin-export

* change from --sync to sync command

* further pyproject.toml format updates

* add poetry plugin to requirements-poetry.in

* fix pre-commit poetry version to 2.0.0

* remove testing artifact

* update license to 2025

* add me to contributors

* remove outdated --check arg

* typo

* test add back in precommit args

* test remove precommit args

* match ruff version to pre-commit autoupdate PR #3026; add back in --check

* re-remove --check

* add David to CONTRIBUTORS

* ruff in pyproject.toml to match pre-commit

* updates for David's comments

* fix Dataset docc ReST formatting

* remove ConjunctiveGraph example; add Dataset example; add JSON-LS serialization example

* Add RDFLib Path to SHACL path utility and corresponding tests (#2990)

* shacl path parser: Add additional test case

* shacl utilities: Add new SHACL path building utility with corresponding tests

---------

Co-authored-by: Nicholas Car <nick@kurrawong.net>
# Conflicts:
#	rdflib/extras/shacl.py

* fix: typing and import issues

* fix: line length as int

* fix: ruff version conflict

* fix: berkeleydb pin to 18.1.10 for python 3.8 compatibility

* 3a not 2a

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Co-authored-by: Nicholas Car <nick@kurrawong.net>
Co-authored-by: Ashley Sommer <ashleysommer@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alex Nelson <alexander.nelson@nist.gov>
Co-authored-by: joecrowleygaia <142864129+joecrowleygaia@users.noreply.github.com>
Co-authored-by: Val Lorentz <vlorentz@softwareheritage.org>
Co-authored-by: jcbiddle <114963309+jcbiddle@users.noreply.github.com>
Co-authored-by: Sander Van Dooren <sandervd@users.noreply.github.com>
Co-authored-by: Nicholas Car <nick@kurrawong.ai>
Co-authored-by: Matt Goldberg <59745812+mgberg@users.noreply.github.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants