Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

some assumptions about additive indexes are not inline with real world usage #14

Open
nvdk opened this issue Feb 13, 2020 · 4 comments
Open

Comments

@nvdk
Copy link
Member

nvdk commented Feb 13, 2020

When searching mu-search currently assumes that if it can find or create an index for each group specified as allowed group and that this should suffice. (based on my understanding of get_request_indexes )

This does unfortunately does not always hold, a very common case is information from the public allowed group and a private group being combined in the rdf properties of one document. In this case both the public index and private index lack information, whereas a non additive index for both groups would hold this information.

An example config.json demonstrating this case

 {
      "type": "case",
      "on_path": "case",
      "rdf_type": "http://dbpedia.org/ontology/Case",
      "properties": {
        "mandatees": [
          "http://data.vlaanderen.be/ns/besluitvorming#heeftBevoegde",
          "http://data.vlaanderen.be/ns/mandaat#isBestuurlijkeAliasVan",
          "http://xmlns.com/foaf/0.1/name"
        ],
    }
}

and following quads:

<http://mu.semte.ch/graphs/public> <http://example.org/mandatee1> <http://data.vlaanderen.be/ns/mandaat#isBestuurlijkeAliasVan> <http://example.org/person1>.
<http://mu.semte.ch/graphs/public> <http://example.org/person1> <http://xmlns.com/foaf/0.1/name> "John Doe".
<http://mu.semte.ch/graphs/private> <http://example.org/case1> a  <http://dbpedia.org/ontology/Case>.
<http://mu.semte.ch/graphs/private> <http://example.org/case1> <http://data.vlaanderen.be/ns/besluitvorming#heeftBevoegde>  <http://example.org/mandatee1>

would result in documents without mandatees if additive indexes are enabled.

We should at the very least address this issue in the readme with a note where we describing additive indexes and recommend not use them if this case occurs. A combination of eager indexing, with a proper configuration of the eager indexing groups and additive indexes might work and should also be documented if so.

We can also change the assumption and change our understanding of additive indexes. We should then define what an additive index is and what indexes are expected to be created when indexes are created non eagerly. if we create all possible permutations this could cause the creation of a lot of indexes.

@madnificent
Copy link
Member

This seems to be so as per the intended meaning of additive_indexes. It should indicate that these indexes can be considered disconnected from each other. Data in one will not affect the other. Hence you can reuse them when calculating information for a certain set of allowed_groups.

@x-m-el
Copy link

x-m-el commented May 31, 2024

When reading the documentation, I came across a similar misinterpretation. Maybe this gives some guidance where the docs might become confusing.

The question that spawned after reading the docs was:

For mu-search, given that there are additive access rights, why would it be better to create an eager index group with multiple objects?
For example[{ "variables": ["company-y"], "name": "organization-read" }, { "variables": [], "name": "public" } ], instead of putting organization-read and public in a separate eager group, so they can be automatically added together.

Where the answer was

If data that is needed to build documents of the search index is stored across public and organization-read, they need to be in the same eager group. Otherwise you will have only 'partial' documents

This answer is quite clear, but this case is not mentioned (as far as I can see) anywhere in the docs

In the documentation, the following line was the main reason for confusion:

If a user is granted access to multiple groups, indexes will be combined to calculate the response. Therefore, it's strongly adviced the indexes contain non-overlapping data.

A mention when breaking the advice to avoid overlapping data, would make it clear that it is okay to do so in these cases.

@erikap
Copy link
Member

erikap commented Jun 24, 2024

@x-m-el Updated the README with the suggestions you gave. Can you have another look?

@x-m-el
Copy link

x-m-el commented Jul 15, 2024

Yes, the extra text makes it clear for me 👍

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants