* Project metadata are now stored using `pyproject.toml` instead of
`setup.cfg` using setuptools as the build backend.
* Enforce annotation delayed loading for a simpler and consistent
types in the project.
* Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8
* Added pre-commit configuration.
* Added noxfile.
* Removed `build-requirements.txt` as per using `pyproject.toml`
native build configuration.
* Removed `bin/integration.py` and `bin/serve.py` in favor of downstream
integration test (see noxfile).
* Removed `setup.cfg` in favor of `pyproject.toml` metadata configuration.
* Removed unused `utils.range_scan` function.
* Converting content to Unicode bytes may insert `utf_8` instead of
preferred `utf-8`. (#572)
* Deprecation warning "'count' is passed as positional argument" when
converting to Unicode bytes on Python 3.13+
- Drop sed command to remove code coverage flags from pytest
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=49
* Argument `--no-preemptive` in the CLI to prevent the detector
to search for hints.
* Support for Python 3.13
* Relax the TypeError exception thrown when trying to compare a
CharsetMatch with anything else than a CharsetMatch.
* Improved the general reliability of the detector based on
user feedbacks. (#520) (#509) (#498) (#407)
* Declared charset in content (preemptive detection) not
changed when converting to utf-8 bytes.
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=45
- update to 3.3.2:
* Unintentional memory usage regression when using large
payload that match several encoding (#376)
* Regression on some detection case showcased in the
documentation (#371)
* Noise (md) probe that identify malformed arabic
representation due to the presence of letters in isolated
form
* Optional mypyc compilation upgraded to version 1.6.1 for
Python >= 3.8
* Improved the general detection reliability based on reports
from the community
OBS-URL: https://build.opensuse.org/request/show/1128743
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-charset-normalizer?expand=0&rev=22
* Unintentional memory usage regression when using large
payload that match several encoding (#376)
* Regression on some detection case showcased in the
documentation (#371)
* Noise (md) probe that identify malformed arabic
representation due to the presence of letters in isolated
form
* Optional mypyc compilation upgraded to version 1.6.1 for
Python >= 3.8
* Improved the general detection reliability based on reports
from the community
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=43
- update to 3.3.0:
* Allow to execute the CLI (e.g. normalizer) through `python -m
charset_normalizer.cli` or `python -m charset_normalizer`
* Support for 9 forgotten encoding that are supported by Python
but unlisted in `encoding.aliases` as they have no alias
* Optional mypyc compilation upgraded to version 1.5.1 for
Python >= 3.7
* Unable to properly sort CharsetMatch when both chaos/noise
and coherence were close due to an unreachable condition in
\_\_lt\_\_ (#350)
- Update to 3.0.1
- Update to 3.0.0
* ASCII miss-detection on rare cases (PR #170)
* Wrong logging level applied when setting kwarg `explain` to True
- require lower-case name instead of breaking build
OBS-URL: https://build.opensuse.org/request/show/1114778
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-charset-normalizer?expand=0&rev=21
* Allow to execute the CLI (e.g. normalizer) through `python -m
charset_normalizer.cli` or `python -m charset_normalizer`
* Support for 9 forgotten encoding that are supported by Python
but unlisted in `encoding.aliases` as they have no alias
* Optional mypyc compilation upgraded to version 1.5.1 for
Python >= 3.7
* Unable to properly sort CharsetMatch when both chaos/noise
and coherence were close due to an unreachable condition in
\_\_lt\_\_ (#350)
- Update to 3.0.1
- Update to 3.0.0
* ASCII miss-detection on rare cases (PR #170)
* Wrong logging level applied when setting kwarg `explain` to True
- require lower-case name instead of breaking build
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=41
- update to 3.2.0:
* Typehint for function `from_path` no longer enforce
`PathLike` as its first argument
* Minor improvement over the global detection reliability
* Introduce function `is_binary` that relies on main
capabilities, and optimized to detect binaries
* Propagate `enable_fallback` argument throughout `from_bytes`,
`from_path`, and `from_fp` that allow a deeper control over
the detection (default True)
* Edge case detection failure where a file would contain 'very-
long' camel cased word (Issue #289)
OBS-URL: https://build.opensuse.org/request/show/1098807
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-charset-normalizer?expand=0&rev=20
* Typehint for function `from_path` no longer enforce
`PathLike` as its first argument
* Minor improvement over the global detection reliability
* Introduce function `is_binary` that relies on main
capabilities, and optimized to detect binaries
* Propagate `enable_fallback` argument throughout `from_bytes`,
`from_path`, and `from_fp` that allow a deeper control over
the detection (default True)
* Edge case detection failure where a file would contain 'very-
long' camel cased word (Issue #289)
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=39
Forwarded request #1031656 from yarunachalam
- Update to 3.0.0
Added
* Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio
normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl)
* Changed
Build with static metadata using 'build' frontend
Make the language detection stricter
Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
* Fixed
CLI with opt --normalize fail when using full path for files
TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
Sphinx warnings when generating the documentation
* Removed
Coherence detector no longer return 'Simple English' instead return 'English'
Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
Breaking: Method first() and best() from CharsetMatch
UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
Breaking: Top-level function normalize
Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
Support for the backport unicodedata2
OBS-URL: https://build.opensuse.org/request/show/1032182
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-charset-normalizer?expand=0&rev=16
- Update to 3.0.0
Added
* Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio
normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl)
* Changed
Build with static metadata using 'build' frontend
Make the language detection stricter
Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
* Fixed
CLI with opt --normalize fail when using full path for files
TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
Sphinx warnings when generating the documentation
* Removed
Coherence detector no longer return 'Simple English' instead return 'English'
Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
Breaking: Method first() and best() from CharsetMatch
UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
Breaking: Top-level function normalize
Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
Support for the backport unicodedata2
OBS-URL: https://build.opensuse.org/request/show/1031656
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=31
- update to 2.1.0:
* Output the Unicode table version when running the CLI with `--version`
* Re-use decoded buffer for single byte character sets
* Fixing some performance bottlenecks
* Workaround potential bug in cpython with Zero Width No-Break Space located
* in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space
* CLI default threshold aligned with the API threshold from
* Support for Python 3.5 (PR #192)
* Use of backport unicodedata from `unicodedata2` as Python is quickly
catching up, scheduled for removal in 3.0
OBS-URL: https://build.opensuse.org/request/show/991152
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-charset-normalizer?expand=0&rev=13
* Output the Unicode table version when running the CLI with `--version`
* Re-use decoded buffer for single byte character sets
* Fixing some performance bottlenecks
* Workaround potential bug in cpython with Zero Width No-Break Space located
* in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space
* CLI default threshold aligned with the API threshold from
* Support for Python 3.5 (PR #192)
* Use of backport unicodedata from `unicodedata2` as Python is quickly
catching up, scheduled for removal in 3.0
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=25
- update to 2.0.8:
* Improvement over Vietnamese detection
* MD improvement on trailing data and long foreign (non-pure latin)
* Efficiency improvements in cd/alphabet_languages
* call sum() without an intermediary list following PEP 289 recommendations
* Code style as refactored by Sourcery-AI
* Minor adjustment on the MD around european words
* Remove and replace SRTs from assets / tests
* Initialize the library logger with a `NullHandler` by default
* Setting kwarg `explain` to True will add provisionally
* Fix large (misleading) sequence giving UnicodeDecodeError
* Avoid using too insignificant chunk
* Add and expose function `set_logging_handler` to configure a specific
StreamHandler
- require lower-case name instead of breaking build
- Use lower-case name of prettytable package
OBS-URL: https://build.opensuse.org/request/show/934519
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-charset-normalizer?expand=0&rev=9
* Improvement over Vietnamese detection
* MD improvement on trailing data and long foreign (non-pure latin)
* Efficiency improvements in cd/alphabet_languages
* call sum() without an intermediary list following PEP 289 recommendations
* Code style as refactored by Sourcery-AI
* Minor adjustment on the MD around european words
* Remove and replace SRTs from assets / tests
* Initialize the library logger with a `NullHandler` by default
* Setting kwarg `explain` to True will add provisionally
* Fix large (misleading) sequence giving UnicodeDecodeError
* Avoid using too insignificant chunk
* Add and expose function `set_logging_handler` to configure a specific
StreamHandler
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=21
- Update to version 2.0.7
* Addition: bento Add support for Kazakh (Cyrillic) language
detection
* Improvement: sparkle Further improve inferring the language
from a given code page (single-byte).
* Removed: fire Remove redundant logging entry about detected
language(s).
* Improvement: zap Refactoring for potential performance
improvements in loops.
* Improvement: sparkles Various detection improvement (MD+CD).
* Bugfix: bug Fix a minor inconsistency between Python 3.5 and
other versions regarding language detection.
- Update to version 2.0.6
* Bugfix: bug Unforeseen regression with the loss of the
backward-compatibility with some older minor of Python 3.5.x.
* Bugfix: bug Fix CLI crash when using --minimal output in
certain cases.
* Improvement: sparkles Minor improvement to the detection
efficiency (less than 1%).
- Update to version 2.0.5
* Improvement: sparkles The BC-support with v1.x was improved,
the old staticmethods are restored.
* Remove: fire The project no longer raise warning on tiny
content given for detection, will be simply logged as warning
instead.
* Improvement: sparkles The Unicode detection is slightly
improved, see #93
* Bugfix: bug In some rare case, the chunks extractor could cut
in the middle of a multi-byte character and could mislead the
mess detection.
OBS-URL: https://build.opensuse.org/request/show/927599
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-charset-normalizer?expand=0&rev=8
- Update to version 2.0.7
* Addition: bento Add support for Kazakh (Cyrillic) language
detection
* Improvement: sparkle Further improve inferring the language
from a given code page (single-byte).
* Removed: fire Remove redundant logging entry about detected
language(s).
* Improvement: zap Refactoring for potential performance
improvements in loops.
* Improvement: sparkles Various detection improvement (MD+CD).
* Bugfix: bug Fix a minor inconsistency between Python 3.5 and
other versions regarding language detection.
- Update to version 2.0.6
* Bugfix: bug Unforeseen regression with the loss of the
backward-compatibility with some older minor of Python 3.5.x.
* Bugfix: bug Fix CLI crash when using --minimal output in
certain cases.
* Improvement: sparkles Minor improvement to the detection
efficiency (less than 1%).
- Update to version 2.0.5
* Improvement: sparkles The BC-support with v1.x was improved,
the old staticmethods are restored.
* Remove: fire The project no longer raise warning on tiny
content given for detection, will be simply logged as warning
instead.
* Improvement: sparkles The Unicode detection is slightly
improved, see #93
* Bugfix: bug In some rare case, the chunks extractor could cut
in the middle of a multi-byte character and could mislead the
mess detection.
OBS-URL: https://build.opensuse.org/request/show/925848
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=18
- Update to 1.3.0
* Backport unicodedata for v12 impl into python if available
* Add aliases to CharsetNormalizerMatches class
* Add feature preemptive behaviour, looking for encoding declaration
* Add method to determine if specific encoding is multi byte
* Add has_submatch property on a match
* Add percent_chaos and percent_coherence
* Coherence ratio based on mean instead of sum of best results
* Using loguru for trace/debug <3
* from_byte method improved
OBS-URL: https://build.opensuse.org/request/show/734946
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=8