Skip to content

Commit

Permalink
🐛 Fix unreachable code in the sorting algorithm of CharsetMatch (#352)
Browse files Browse the repository at this point in the history
  • Loading branch information
Ousret authored Sep 30, 2023
1 parent 061a71b commit 5aed9a4
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 4 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
- (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.7

### Fixed
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350)

## [3.2.0](https://github.com/Ousret/charset_normalizer/compare/3.1.0...3.2.0) (2023-06-07)

### Changed
Expand Down
8 changes: 4 additions & 4 deletions charset_normalizer/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,16 +54,16 @@ def __lt__(self, other: object) -> bool:

# Below 1% difference --> Use Coherence
if chaos_difference < 0.01 and coherence_difference > 0.02:
# When having a tough decision, use the result that decoded as many multi-byte as possible.
if chaos_difference == 0.0 and self.coherence == other.coherence:
return self.multi_byte_usage > other.multi_byte_usage
return self.coherence > other.coherence
elif chaos_difference < 0.01 and coherence_difference <= 0.02:
# When having a difficult decision, use the result that decoded as many multi-byte as possible.
return self.multi_byte_usage > other.multi_byte_usage

return self.chaos < other.chaos

@property
def multi_byte_usage(self) -> float:
return 1.0 - len(str(self)) / len(self.raw)
return 1.0 - (len(str(self)) / len(self.raw))

def __str__(self) -> str:
# Lazy Str Loading
Expand Down

0 comments on commit 5aed9a4

Please # to comment.