-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Expected behavior of facets
on array of string
#617
Comments
Conceptually they are the same. In the example, because it is always the same field of
This, because it's most useful/common statistics and easy to parse for the client program.
This is interesting too but more specialized and not easy for a client to parse. This is better handled by adding a filter to the facet query, i.e. (A and B and D). |
Where is the collapsing performed? What happens if there are multiple
According to the GDC documentation, this is not possible (see limitation 2): https://docs.gdc.cancer.gov/API/Users_Guide/Search_and_Retrieval/#facets Otherwise I am fine with the behavior described in option 1. |
Let me explain with an SQL example, if that helps. It depends how you store
I'm ignoring the specifics about the id for joining the tables and restricting to a specific repertoire/sample. If this query returns no records for a specific repertoire/sample, then no TRB locus for the sample. If this query returns one or more records, then there is a TRB locus for the sample. Combining a query like that with
I was never sure why the GDC had that limitation, but the ADC does not have it, you should be able to have any filter on a facets. All the filter does is restrict to a subset of repertoire records, so it's essentially independent of the facets operation. In the SQL world, that may mean you need to chain SELECT statements, i.e. one SELECT to do the filtering and another SELECT which operates on the first's results to do the facets. |
@bussec can this be closed? |
No, this information needs to be included in the docs (especially the difference to GDC). |
I have updated the Docs to reflect the GDC difference. Closing this issue. |
* Update facet docs As per #617 * Removal/deprecation of is and not operators * New release notes file for ADC API Added deprecation of is and not. * Error codes, repository loading changes As per #431 and #487 * Add 408 and 413 errors * Added 408 and 413 errors * Add docs for AA/nt case discussion As per #528 * Update data loading recommendation * Remove docs about deprecated not operator * Update to array query docs. * Typo fix
What is the expected behavior when an ADC API query requests aggregation (via the
facets
request parameter) on a field that holds an array of strings, e.g.,study
.keywords_study
. For example if such a field holds the array['A', 'B', 'D']
should the aggregation{'A':1, 'B':1, 'D':1}
, or{'A,B,D':1}
?Note that the example provided in the docs does not match this case 1:1 as
pcr_target
is an array of objects, andpcr_target_locus
contains only a single string.The text was updated successfully, but these errors were encountered: