Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Valency: do not show duplicate instances #955

Open
myrix opened this issue Mar 27, 2023 · 0 comments
Open

Valency: do not show duplicate instances #955

myrix opened this issue Mar 27, 2023 · 0 comments
Assignees
Labels
backend bug is related to backend enhancement this label means that resolving the issue would improve some part of the system

Comments

@myrix
Copy link
Contributor

myrix commented Mar 27, 2023

Based on discussion with J. Normanskaya, we should hide duplicate instances at the valency instance approval page /valency.

Consider a use case of a user slightly changing a source document from the corpus, adding the changed version to the corpus, parsing it and then updating valency data. Depending on the extent of the changes, a considerable number of new instances may be duplicates of already existing instances. Even if the user deletes the previous version of the document, we are still to show any approved instances sourced from it, and so there will be duplicate instances, see #775 (comment).

Instances are considered duplicates if they are in the same position in the same sentence of the same source, with sentence identified by its sequence of tokens with their parsed attributes, including grammar.

A possible solution is to store instance hashes and filter out duplicates when querying DB, preferring earlier instances of earlier sources.

@myrix myrix added enhancement this label means that resolving the issue would improve some part of the system backend bug is related to backend labels Mar 27, 2023
@myrix myrix self-assigned this Mar 27, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
backend bug is related to backend enhancement this label means that resolving the issue would improve some part of the system
Projects
None yet
Development

No branches or pull requests

1 participant