Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

threshold parameter #9

Closed
Gracelek opened this issue Oct 20, 2017 · 3 comments
Closed

threshold parameter #9

Gracelek opened this issue Oct 20, 2017 · 3 comments

Comments

@Gracelek
Copy link

After the installation, I am playing with QuickUMLS.

from quickumls import * 
matcher_thres_default = QuickUMLS('/users/gracelee/documents/quickumls_data')
text = 'the authors studied the diagnostic and prognostic factors by using stepwise logistic regression analysis.'
results = matcher_thres_default.match(text, best_match=True, ignore_syntax=False)
for res in results : print(res)

The code above returns three extracted terms. Their similarities are 1.0, 1.0, and 0.6 for each.
Isn't the threshold parameter (default 0.7) handling the similarity values ?

[{'start': 85, 'end': 104, 'ngram': 'regression analysis', 'term': 'regression analysis', 'cui': 'C0034980', 'similarity': 1.0, 'semtypes': {'T170'}, 'preferred': 1}]
[{'start': 39, 'end': 49, 'ngram': 'prognostic', 'term': 'prognostic', 'cui': 'C0220901', 'similarity': 1.0, 'semtypes': {'T170'}, 'preferred': 1}]
[{'start': 24, 'end': 34, 'ngram': 'diagnostic', 'term': 'Diagnostic', 'cui': 'C0011900', 'similarity': 0.6, 'semtypes': {'T033'}, 'preferred': 1}, {'start': 24, 'end': 34, 'ngram': 'diagnostic', 'term': 'Diagnostic', 'cui': 'C0358514', 'similarity': 0.6, 'semtypes': {'T130'}, 'preferred': 1}, {'start': 24, 'end': 34, 'ngram': 'diagnostic', 'term': 'Diagnostic', 'cui': 'C0430022', 'similarity': 0.6, 'semtypes': {'T060'}, 'preferred': 1}, {'start': 24, 'end': 34, 'ngram': 'diagnostic', 'term': 'Diagnostic', 'cui': 'C1547424', 'similarity': 0.6, 'semtypes': {'T170'}, 'preferred': 1}]

If a parameter threshold=0.7 is specified when instantiation, it returns the same result with the same code. But when threshold=0.8, it gives the first two terms.

@soldni
Copy link
Member

soldni commented Oct 21, 2017

Hi,
Could you provide me with a bit more information about your setup? (e.g. which platform/python version/UMLS subset/etc.) Thanks!

-Luca

@Gracelek
Copy link
Author

Hi Luca,
Here is the setup info.
Python3.6, UMLS 2016AB, and QuickUMLS 1.2 latest version and I am running them on Mac Sierra.
Thanks
-G

@soldni
Copy link
Member

soldni commented Oct 23, 2017

Hey,

Thanks for the info. I found a discrepancy between how similarity was being calculated by QuickUMLS and how it was computed in the underlying string matching library. The issue you were experiencing is now fixed in the latest release (v. 1.2.1).

-Luca

@soldni soldni closed this as completed Oct 23, 2017
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants