-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
ADC API junction_aa (and other AA) queries - force upper case? #528
Comments
Maybe the better way to think of this is make it case independent... not force one or the other? |
This makes it complicated for the repositories in a way, as string searchers become complicate, but perhaps not really. The repositories can force this one way or the other internally (store CAP AAs, convert any queries into CAPS) and can then search efficiently. Although we don't force this at the moment in the iReceptor Turnkey, we could, and this issue would I believe go away. @schristley any objections to not forcing anything, realizing that we might want to optimize this on the repository side (if we aren't already)? |
I'm tempted to force uppercase for the repository but allow flexibility for interface. By that I mean the repositories store as uppercase and return the data in uppercase, but queries can be mixed case but are turned into uppercase while performing the query. What I don't want is for data to be stored and/or returned in mixed case because that might cause problems down the road. The convention is uppercase but there's bound to be some tool sooner or later that doesn't do that. Furthermore, you do see mixed case a lot as you get closer to genomic data. |
100% agree. That is what I think we would do as well. An impact on data loading into a repository, as the AA's would need to be converted, but that is a one time cost and makes query optimization much easier. Repository then returns what it has. Repository also converts any AA queries that it gets that are not in the correct case for the repository into the correct case. I am not 100% sure but I think the Gateway does this translation already, so any queries from the Gateway will be uppercase, even if the user types lower or mixed case. |
So I think there is no change required to the spec. Although I don't think this needs to be documented for the end user, this decision should probably be documented for repository and API builders. I will add a comment in the docs... |
Interesting enough, LinkML allows patterns to be defined, and this is a situation where that would be useful because besides wanting to enforce the uppercase convention, only certain letters should be in JSON schema also supports patterns as well, so we could technically add a pattern to the spec to define this constraint. We've not been using them in the AIRR spec though, so we might want to be cautious about adding before we do some testing with our tool chains. I can think of two scenarios that we need to think more
Putting my AKC hat on, this falls under both the data modeling work and the validation work. We will want these stricter validation rules in the AKC and as part of the iterative process, and push stuff back into AIRR spec that make sense. Also the use case of querying with |
@bcorrie this issue is specifically about uppercase and documenting the API, and what you've done is probably sufficient for the V2 and closing it. I'd like not to lose some of my thoughts though, what would you suggest as a process? I'm thinking that we create AK issues to track and later create new AIRR issues when we are ready to push back proposed changes. |
Docs added, closing this issue. |
* Update facet docs As per #617 * Removal/deprecation of is and not operators * New release notes file for ADC API Added deprecation of is and not. * Error codes, repository loading changes As per #431 and #487 * Add 408 and 413 errors * Added 408 and 413 errors * Add docs for AA/nt case discussion As per #528 * Update data loading recommendation * Remove docs about deprecated not operator * Update to array query docs. * Typo fix
Currently the ADC API does not (nor does the AIRR spec) say anything about AA and case. Maybe it shouldn't, but we have seen some users typing lower case AA strings. They don't find anything because they are searching for an exact substring match and our AA fields are upper case.
Is it reasonable to force/suggest a standard here? We load AA strings as the annotation tools provide them, which is I think always upper case, so was going to make the Gateway convert all to upper case, but not sure that is the right thing to do...
We don't comment on this here: https://docs.airr-community.org/en/stable/datarep/rearrangements.html
Thoughts?
The text was updated successfully, but these errors were encountered: