-
-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feat: UPS tracking numbers #228
Conversation
c287929
to
2516179
Compare
pywhat/Data/regex.json
Outdated
"Regex": "^(1Z[0-9A-Z]{6}[0-9]{2}[0-9]{8})$", | ||
"plural_name": false, | ||
"Description": null, | ||
"Rarity": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that rarity should be lowered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something around 0.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 0.3? I'd say higher like 0.5 or 0.6 because:
- The string has to start with
1Z
- It needs 7 chars
0-9A-Z
- It has exactly 2 numbers
- It has 8 numbers
Also, can we make it:
- ^(1Z[0-9A-Z]{6}[0-9]{2}[0-9]{8})$
- + ^(1Z[0-9A-Z]{6}[0-9]{10})$
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 0.4 or 0.5. And yes, regex should be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of the 2+8 split is because the first 2 digits in this group represent a service indicator code and perhaps it could be captured and handled in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: I wonder if the "rarity" could be estimated more reliably through some entropy-based measure 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
service indicator code
We have precedence for this called sub-categories. See the Mastercard / Phone Numbers regex. I am not sure it'll work on data in the middle of the regex, we may need to change the code for that :)
Aside: I wonder if the "rarity" could be estimated more reliably through some entropy-based measure 🤔
Probably! Currently I am estimating it based on what I see when people post this:
And also whether we have any false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@P403n1x87 You can use subcategories with regex method for that.
14b9fcc
to
45d37fe
Compare
45d37fe
to
e3880c0
Compare
Codecov Report
@@ Coverage Diff @@
## main #228 +/- ##
=======================================
Coverage 92.60% 92.60%
=======================================
Files 15 15
Lines 1217 1217
=======================================
Hits 1127 1127
Misses 90 90 Continue to review full report at Codecov.
|
Co-authored-by: piatrashkakanstantinass <74979584+piatrashkakanstantinass@users.noreply.github.com>
⚠ Pull Requests not made with this template will be automatically closed 🔥
Prerequisites
Why do we need this pull request?
What GitHub issues does this fix?
N. A.
Copy / paste of output