0bb2757b8d
- Update to 3.4.4 * Bound setuptools to a specific constraint setuptools>=68,<=81. * Raised upper bound of mypyc for the optional pre-built extension to v1.18.2 * setuptools-scm as a build dependency. * Enforced hashes in dev-requirements.txt and created ci-requirements.txt for security purposes. * Additional pre-built wheels for riscv64, s390x, and armv7l architectures. * Restore multiple.intoto.jsonl in GitHub releases in addition to individual attestation file per wheel.
Daniel Garcia2025-11-03 09:19:03 +00:00
edc4e5e7d4
Accepting request 1304688 from devel:languages:python
Ana Guerrero2025-09-15 17:50:31 +00:00
32a1c29aa6
- update to 3.4.3: * mypy(c) is no longer a required dependency at build time if CHARSET_NORMALIZER_USE_MYPYC isn't set to 1. (#595) * automatically lower confidence on small bytes samples that are not Unicode in detect output legacy function. * Custom build backend to overcome inability to mark mypy as an optional dependency in the build phase. * Support for Python 3.14 * sdist archive contained useless directories. * automatically fallback on valid UTF-16 or UTF-32 even if the md says it's noisy. * SBOM are automatically published to the relevant GitHub release to comply with regulatory changes. * Each published wheel comes with its SBOM. We choose CycloneDX as the format. * Prebuilt optimized wheel are no longer distributed by default for CPython 3.7 due to a change in cibuildwheel.
Dirk Mueller2025-09-14 21:00:37 +00:00
35295358a7
- Update to 3.4.2 * Addressed the DeprecationWarning in our CLI regarding argparse.FileType by backporting the target class into the package. (#591) * Improved the overall reliability of the detector with CJK Ideographs. (#605) (#587) * Optional mypyc compilation upgraded to version 1.15 for Python >= 3.8
Markéta Machová2025-05-05 09:15:39 +00:00
93a74fa152
Accepting request 1238022 from devel:languages:python
Ana Guerrero2025-01-16 17:31:17 +00:00
b17aad8053
- Use libalternatives instead of update-alternatives, bsc#1235781
Daniel Garcia2025-01-15 10:38:37 +00:00
06686c6c46
Accepting request 1236172 from devel:languages:python
Ana Guerrero2025-01-12 10:10:17 +00:00
7e521f133b
- Update to 3.4.1 * Project metadata are now stored using pyproject.toml instead of setup.cfg using setuptools as the build backend. * Enforce annotation delayed loading for a simpler and consistent types in the project. * Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8 * Added pre-commit configuration. * Added noxfile. * Removed build-requirements.txt as per using pyproject.toml native build configuration. * Removed bin/integration.py and bin/serve.py in favor of downstream integration test (see noxfile). * Removed setup.cfg in favor of pyproject.toml metadata configuration. * Removed unused utils.range_scan function. * Converting content to Unicode bytes may insert utf_8 instead of preferred utf-8. (#572) * Deprecation warning "'count' is passed as positional argument" when converting to Unicode bytes on Python 3.13+ - Drop sed command to remove code coverage flags from pytest
Markéta Machová2025-01-09 12:21:13 +00:00
f142f583d0
Accepting request 1221058 from devel:languages:python
Ana Guerrero2024-11-05 14:39:43 +00:00
e1a6b9e55c
Accepting request 1217078 from devel:languages:python
Ana Guerrero2024-10-23 19:08:21 +00:00
3bf31d75b8
- update to 3.4.0: * Argument --no-preemptive in the CLI to prevent the detector to search for hints. * Support for Python 3.13 * Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. * Improved the general reliability of the detector based on user feedbacks. (#520) (#509) (#498) (#407) * Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes.
Dirk Mueller2024-10-22 16:00:24 +00:00
223430a4ca
Accepting request 1128743 from devel:languages:python
Ana Guerrero2023-11-27 21:42:20 +00:00
9cd0e22679
- update to 3.3.2: * Unintentional memory usage regression when using large payload that match several encoding (#376) * Regression on some detection case showcased in the documentation (#371) * Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form * Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8 * Improved the general detection reliability based on reports from the community
Dirk Mueller2023-11-25 14:12:46 +00:00
5a4b0b3e0c
Accepting request 1114778 from devel:languages:python
Ana Guerrero2023-11-23 20:38:43 +00:00
3e7e8a34ba
- update to 3.3.0: * Allow to execute the CLI (e.g. normalizer) through python -m charset_normalizer.cli or python -m charset_normalizer * Support for 9 forgotten encoding that are supported by Python but unlisted in encoding.aliases as they have no alias * Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.7 * Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350) - Update to 3.0.1 - Update to 3.0.0 * ASCII miss-detection on rare cases (PR #170) * Wrong logging level applied when setting kwarg explain to True - require lower-case name instead of breaking build
Dirk Mueller2023-10-02 09:08:45 +00:00
efcb074653
Accepting request 1098807 from devel:languages:python
Ana Guerrero2023-07-17 17:22:47 +00:00
c103080fc4
- update to 3.2.0: * Typehint for function from_path no longer enforce PathLike as its first argument * Minor improvement over the global detection reliability * Introduce function is_binary that relies on main capabilities, and optimized to detect binaries * Propagate enable_fallback argument throughout from_bytes, from_path, and from_fp that allow a deeper control over the detection (default True) * Edge case detection failure where a file would contain 'very- long' camel cased word (Issue #289)
Dirk Mueller2023-07-11 13:24:00 +00:00
5a2b102103
- update to 3.1.0: * Argument should_rename_legacy for legacy function detect and disregard any new arguments without errors (PR #262) * Removed Support for Python 3.6 (PR #260) * Optional speedup provided by mypy/c 1.0.1
Dirk Mueller2023-03-26 20:04:47 +00:00
12f704616b
- update to 2.1.1: * Function normalize scheduled for removal in 3.0 * Removed useless call to decode in fn is_unprintable (#206)
Dirk Mueller2022-09-17 15:50:18 +00:00
eac72ae8c5
Accepting request 998013 from home:bnavigator:branches:devel:languages:python
Dirk Mueller2022-08-19 06:47:38 +00:00
18b088feb8
Accepting request 991152 from devel:languages:python
Richard Brown2022-07-26 17:42:09 +00:00
8dca1a6616
- update to 2.1.0: * Output the Unicode table version when running the CLI with --version * Re-use decoded buffer for single byte character sets * Fixing some performance bottlenecks * Workaround potential bug in cpython with Zero Width No-Break Space located * in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space * CLI default threshold aligned with the API threshold from * Support for Python 3.5 (PR #192) * Use of backport unicodedata from unicodedata2 as Python is quickly catching up, scheduled for removal in 3.0
Dirk Mueller2022-07-19 11:40:33 +00:00
259f5f1afe
- update to 2.0.12: * ASCII miss-detection on rare cases (PR #170) * Explicit support for Python 3.11 (PR #164) * The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels
Dirk Mueller2022-02-15 08:43:43 +00:00
c739862e1a
- update to 2.0.10: * Fallback match entries might lead to UnicodeDecodeError for large bytes sequence * Skipping the language-detection (CD) on ASCII
Dirk Mueller2022-01-10 23:04:22 +00:00
53a1bfb655
- update to 2.0.9: * Moderating the logging impact (since 2.0.8) for specific environments * Wrong logging level applied when setting kwarg explain to True
Dirk Mueller2021-12-06 20:09:48 +00:00
4e6d945d9a
- update to 2.0.8: * Improvement over Vietnamese detection * MD improvement on trailing data and long foreign (non-pure latin) * Efficiency improvements in cd/alphabet_languages * call sum() without an intermediary list following PEP 289 recommendations * Code style as refactored by Sourcery-AI * Minor adjustment on the MD around european words * Remove and replace SRTs from assets / tests * Initialize the library logger with a NullHandler by default * Setting kwarg explain to True will add provisionally * Fix large (misleading) sequence giving UnicodeDecodeError * Avoid using too insignificant chunk * Add and expose function set_logging_handler to configure a specific StreamHandler
Dirk Mueller2021-11-29 11:18:31 +00:00
380896adbc
- require lower-case name instead of breaking build
Dirk Mueller2021-11-26 11:35:38 +00:00
515e72fd80
- Use lower-case name of prettytable package
Matej Cepl2021-11-25 22:27:00 +00:00
c385f2c788
- Update to 1.1.1: * from_bytes parameters steps and chunk_size were not adapted to sequence len if provided values were not fitted to content * Sequence having lenght bellow 10 chars was not checked * Legacy detect method inspired by chardet was not returning * Various more test updates
Tomáš Chvátal
2019-09-26 10:38:40 +00:00
5cb7342274
- Update to 0.3: * Improvement on detection * Performance loss to expect * Added --threshold option to CLI * Bugfix on UTF 7 support * Legacy detect(byte_str) method * BOM support (Unicode mostly) * Chaos prober improved on small text * Language detection has been reviewed to give better result * Bugfix on jp detection, every jp text was considered chaotic
Tomáš Chvátal
2019-09-13 11:07:21 +00:00