Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Case insensitive mapping not working. #22

Closed
bevankoopman opened this issue Aug 2, 2018 · 2 comments
Closed

Case insensitive mapping not working. #22

bevankoopman opened this issue Aug 2, 2018 · 2 comments
Assignees
Labels

Comments

@bevankoopman
Copy link

I'm getting case sensitive mapping results even though I'm using an index built with --lowercase.

To reproduce.

  • index with UMLS 2018AA and -L (lowercase)
  • test code:
from quickumls import *

matcher = QuickUMLS('/quickumls-data-2018AA')
text ='Name.'
for p in matcher.match(text, best_match=True, ignore_syntax=False):
    for q in p:
        print(q)

Result is:

{'start': 0, 'end': 4, 'ngram': 'Name', 'term': 'name', 'cui': 'C0027365', 'similarity': 1.0, 'semtypes': {'T170'}, 'preferred': 1}
{'start': 0, 'end': 4, 'ngram': 'Name', 'term': 'name', 'cui': 'C1547383', 'similarity': 1.0, 'semtypes': {'T170'}, 'preferred': 1}
{'start': 0, 'end': 4, 'ngram': 'Name', 'term': 'name', 'cui': 'C1554107', 'similarity': 1.0, 'semtypes': {'T170'}, 'preferred': 1}
{'start': 0, 'end': 4, 'ngram': 'Name', 'term': 'name', 'cui': 'C2599456', 'similarity': 1.0, 'semtypes': {'T201'}, 'preferred': 1}

But if text = 'Name.' is changed to text = 'name.' result is empty.

Similarly, if text = 'Patient Name.' then result is:

{'start': 0, 'end': 12, 'ngram': 'Patient Name', 'term': 'patient name', 'cui': 'C1299487', 'similarity': 1.0, 'semtypes': {'T033'}, 'preferred': 1}

And if `text = 'Patient name.', the result is:

{'start': 0, 'end': 7, 'ngram': 'Patient', 'term': 'patient', 'cui': 'C1550655', 'similarity': 1.0, 'semtypes': {'T031'}, 'preferred': 1}
{'start': 0, 'end': 7, 'ngram': 'Patient', 'term': 'patient', 'cui': 'C1578485', 'similarity': 1.0, 'semtypes': {'T170'}, 'preferred': 1}
{'start': 0, 'end': 7, 'ngram': 'Patient', 'term': 'patient', 'cui': 'C1578486', 'similarity': 1.0, 'semtypes': {'T170'}, 'preferred': 1}

(Note the name part is ignored.)

@bevankoopman bevankoopman changed the title Case insensitive not working. Case insensitive mapping not working. Aug 2, 2018
@soldni soldni self-assigned this Aug 3, 2018
@soldni soldni added bug and removed bug labels Aug 3, 2018
@soldni
Copy link
Member

soldni commented Aug 3, 2018

Hi Bevan,

That's strange. Will take a look and see if I can reproduce. Thanks for the bug report!

-Luca

@soldni
Copy link
Member

soldni commented Sep 3, 2018

Turns out SpaCy folks got a bit to aggressive with the definition of stopwords, as "name" is now included in their list. Switched to NLTK in QuickUMLS 1.2.4 to fix this issue. Sorry it took me a while to get to this!

@soldni soldni closed this as completed Sep 3, 2018
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants