threshold parameter #9

Gracelek · 2017-10-20T01:18:51Z

After the installation, I am playing with QuickUMLS.

from quickumls import * 
matcher_thres_default = QuickUMLS('/users/gracelee/documents/quickumls_data')
text = 'the authors studied the diagnostic and prognostic factors by using stepwise logistic regression analysis.'
results = matcher_thres_default.match(text, best_match=True, ignore_syntax=False)
for res in results : print(res)

The code above returns three extracted terms. Their similarities are 1.0, 1.0, and 0.6 for each.
Isn't the threshold parameter (default 0.7) handling the similarity values ?

[{'start': 85, 'end': 104, 'ngram': 'regression analysis', 'term': 'regression analysis', 'cui': 'C0034980', 'similarity': 1.0, 'semtypes': {'T170'}, 'preferred': 1}]
[{'start': 39, 'end': 49, 'ngram': 'prognostic', 'term': 'prognostic', 'cui': 'C0220901', 'similarity': 1.0, 'semtypes': {'T170'}, 'preferred': 1}]
[{'start': 24, 'end': 34, 'ngram': 'diagnostic', 'term': 'Diagnostic', 'cui': 'C0011900', 'similarity': 0.6, 'semtypes': {'T033'}, 'preferred': 1}, {'start': 24, 'end': 34, 'ngram': 'diagnostic', 'term': 'Diagnostic', 'cui': 'C0358514', 'similarity': 0.6, 'semtypes': {'T130'}, 'preferred': 1}, {'start': 24, 'end': 34, 'ngram': 'diagnostic', 'term': 'Diagnostic', 'cui': 'C0430022', 'similarity': 0.6, 'semtypes': {'T060'}, 'preferred': 1}, {'start': 24, 'end': 34, 'ngram': 'diagnostic', 'term': 'Diagnostic', 'cui': 'C1547424', 'similarity': 0.6, 'semtypes': {'T170'}, 'preferred': 1}]

If a parameter threshold=0.7 is specified when instantiation, it returns the same result with the same code. But when threshold=0.8, it gives the first two terms.

The text was updated successfully, but these errors were encountered:

soldni · 2017-10-21T14:54:55Z

Hi,
Could you provide me with a bit more information about your setup? (e.g. which platform/python version/UMLS subset/etc.) Thanks!

-Luca

Gracelek · 2017-10-23T11:25:42Z

Hi Luca,
Here is the setup info.
Python3.6, UMLS 2016AB, and QuickUMLS 1.2 latest version and I am running them on Mac Sierra.
Thanks
-G

soldni · 2017-10-23T20:49:35Z

Hey,

Thanks for the info. I found a discrepancy between how similarity was being calculated by QuickUMLS and how it was computed in the underlying string matching library. The issue you were experiencing is now fixed in the latest release (v. 1.2.1).

-Luca

soldni closed this as completed Oct 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

threshold parameter #9

threshold parameter #9

Gracelek commented Oct 20, 2017

soldni commented Oct 21, 2017

Gracelek commented Oct 23, 2017

soldni commented Oct 23, 2017 •

edited

Loading

threshold parameter #9

threshold parameter #9

Comments

Gracelek commented Oct 20, 2017

soldni commented Oct 21, 2017

Gracelek commented Oct 23, 2017

soldni commented Oct 23, 2017 • edited Loading

soldni commented Oct 23, 2017 •

edited

Loading