Releases: jawah/charset_normalizer
Releases · jawah/charset_normalizer
Version 1.3.7
Changes :
- Bugfix: 🐛 The legacy
detect
function should return UTF-8-SIG if sig is present in the payload. #38
Version 1.3.6
Version 1.3.5
Changes :
- Miscellaneous: 🔧 Dependencies refactor, add python 3.9 and 3.10 to the supported interpreters
- Bugfix: 🐛 Fix error while using the package with a python pre-release interpreter #33
Small refresh to keep the project up and running until further dev. (Upcoming version 1.4.0)
Thanks to the many adopters.
Charset Normalizer
Changes :
- Improvement/Bugfix : False positive when searching for successive upper, lower char. (ProbeChaos) (#31)
Charset Normalizer
Changes :
- Improvement : Noticeable better detection for jp #30
Charset Normalizer
Changes :
- Bugfix : Passing zero-length bytes to from_bytes (#29)
Charset Normalizer
Charset Normalizer
Changes :
- Feature : Now support
unicodedata2
backport. To benefit from it install usingpip install charset-normalizer[UnicodeDataBackport]
. Python 3.7 have UnicodeData v11. You could upgrade it to v12. - Feature : Added preemptive behaviour. Looking for a declared encoding. Using positional parameter
preemptive_behaviour
. Default to True. Does not take declared encoding for it, testing it first. - Improvement : Added aliases to CharsetNormalizerMatches class.
CharsetDetector
;EncodingDetector
andCharsetDoctor
.
Charset Normalizer
Changes :
- Feature : Added
has_submatch
,percent_chaos
andpercent_coherence
properties on single match object. - Improvement :
best()
method of CharsetNormalizerMatches has been rewritten for better readability. - Feature : Added
explain
boolean positional parameter to print out what actually happen when searching for a match. - Improvement : Detection has been globally improved.
- Feature : You can exclude some encoding when searching for a match with parameter
cp_exclusion
. List of str. forfrom_bytes
from_path
andfrom_fp
. - Feature : You can limit the search to some encoding when looking for a match with parameter
cp_isolation
. List of str. forfrom_bytes
from_path
andfrom_fp
. - Feature :
import charset_normalizer
is enough to provide additional help when you encounter UnicodeDecodeError exception.
Charset Normalizer
Changes :
- Bugfix :
from_bytes
parameters steps and chunk_size were not adapted to sequence len if provided values were not fitted to content. Therefore could lead to misdetection on small content.