You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
it makes no sense to instantiate a class for each cleaned name; it's overcomplex, extra work and unnecessary, especially when most of setup code is now outside the class
switch to working on whitespace-separated name parts rather than full strings
In effect we would check for example in case of suffix for business_name.split()[-1] == term rather than business_name.endswith(' ' + term). Of course the splitting would be done just once in the beginning.
at the moment, the class is splitting and rejoining the name already, to get rid of extra whitespaces
at the moment, the code already looks for a prefix/suffix that's padded by a single whitespace, so in effect it's the same
If we can just handle the fact that some legal terms are "multi-part" (whitespace-separated), this would simplify the code and make it run faster since for example we'd only have to work on the last whitespace-separated name part for suffix, and just the first for prefix. There are other cases, too.
We would not have to presort the data, either.
don't use both legal and countrywise suffixes in clean_name
there are a lot of duplicates, it should be enough to use just either (preferably countrywise data since that would allow dropping off countries easily)
The text was updated successfully, but these errors were encountered:
term search works on splitting the names & terms rather than directly on strings; see optimization2 branch for code to compare the effect of this (x3 speedup)
the term preparation code generates unique terms
These are pretty much what this request was asking for, so closing.
switch to function-based API
switch to working on whitespace-separated name parts rather than full strings
In effect we would check for example in case of suffix for
business_name.split()[-1] == term
rather thanbusiness_name.endswith(' ' + term)
. Of course the splitting would be done just once in the beginning.If we can just handle the fact that some legal terms are "multi-part" (whitespace-separated), this would simplify the code and make it run faster since for example we'd only have to work on the last whitespace-separated name part for suffix, and just the first for prefix. There are other cases, too.
We would not have to presort the data, either.
don't use both legal and countrywise suffixes in clean_name
The text was updated successfully, but these errors were encountered: