python-charset-normalizer

SHA256

Author	SHA256	Message	Date
Dirk Mueller	3e7e8a34ba	- update to 3.3.0: * Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer` * Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias * Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.7 * Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350) - Update to 3.0.1 - Update to 3.0.0 * ASCII miss-detection on rare cases (PR #170) * Wrong logging level applied when setting kwarg `explain` to True - require lower-case name instead of breaking build OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=41	2023-10-02 09:08:45 +00:00
Dirk Mueller	c103080fc4	- update to 3.2.0: * Typehint for function `from_path` no longer enforce `PathLike` as its first argument * Minor improvement over the global detection reliability * Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries * Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True) * Edge case detection failure where a file would contain 'very- long' camel cased word (Issue #289) OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=39	2023-07-11 13:24:00 +00:00
Dirk Mueller	110acf0118	- add sle15_python_module_pythons (jsc#PED-68) OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=37	2023-05-05 06:41:30 +00:00
Dirk Mueller	5a2b102103	- update to 3.1.0: * Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR #262) * Removed Support for Python 3.6 (PR #260) * Optional speedup provided by mypy/c 1.0.1 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=35	2023-03-26 20:04:47 +00:00
Dirk Mueller	20637d8d7f	Accepting request 1039709 from home:yarunachalam:branches:devel:languages:python - Update to 3.0.1 Fixed Multi-bytes cutter/chunk generator did not always cut correctly (PR #233) Changed Speedup provided by mypy/c 0.990 on Python >= 3.7 OBS-URL: https://build.opensuse.org/request/show/1039709 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=33	2022-12-03 07:29:01 +00:00
Matej Cepl	fa191726b9	Accepting request 1031656 from home:yarunachalam:branches:devel:languages:python - Update to 3.0.0 Added * Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl) * Changed Build with static metadata using 'build' frontend Make the language detection stricter Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1 * Fixed CLI with opt --normalize fail when using full path for files TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it Sphinx warnings when generating the documentation * Removed Coherence detector no longer return 'Simple English' instead return 'English' Coherence detector no longer return 'Classical Chinese' instead return 'Chinese' Breaking: Method first() and best() from CharsetMatch UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII) Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches Breaking: Top-level function normalize Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch Support for the backport unicodedata2 OBS-URL: https://build.opensuse.org/request/show/1031656 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=31	2022-10-29 11:47:59 +00:00
Dirk Mueller	12f704616b	- update to 2.1.1: * Function `normalize` scheduled for removal in 3.0 * Removed useless call to decode in fn is_unprintable (#206) OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=29	2022-09-17 15:50:18 +00:00
Dirk Mueller	eac72ae8c5	Accepting request 998013 from home:bnavigator:branches:devel:languages:python - Clean requirements: We don't need anything OBS-URL: https://build.opensuse.org/request/show/998013 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=27	2022-08-19 06:47:38 +00:00
Dirk Mueller	8dca1a6616	- update to 2.1.0: * Output the Unicode table version when running the CLI with `--version` * Re-use decoded buffer for single byte character sets * Fixing some performance bottlenecks * Workaround potential bug in cpython with Zero Width No-Break Space located * in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space * CLI default threshold aligned with the API threshold from * Support for Python 3.5 (PR #192) * Use of backport unicodedata from `unicodedata2` as Python is quickly catching up, scheduled for removal in 3.0 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=25	2022-07-19 11:40:33 +00:00
Dirk Mueller	259f5f1afe	- update to 2.0.12: * ASCII miss-detection on rare cases (PR #170) * Explicit support for Python 3.11 (PR #164) * The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=24	2022-02-15 08:43:43 +00:00
Dirk Mueller	c739862e1a	- update to 2.0.10: * Fallback match entries might lead to UnicodeDecodeError for large bytes sequence * Skipping the language-detection (CD) on ASCII OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=23	2022-01-10 23:04:22 +00:00
Dirk Mueller	53a1bfb655	- update to 2.0.9: * Moderating the logging impact (since 2.0.8) for specific environments * Wrong logging level applied when setting kwarg `explain` to True OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=22	2021-12-06 20:09:48 +00:00
Dirk Mueller	4e6d945d9a	- update to 2.0.8: * Improvement over Vietnamese detection * MD improvement on trailing data and long foreign (non-pure latin) * Efficiency improvements in cd/alphabet_languages * call sum() without an intermediary list following PEP 289 recommendations * Code style as refactored by Sourcery-AI * Minor adjustment on the MD around european words * Remove and replace SRTs from assets / tests * Initialize the library logger with a `NullHandler` by default * Setting kwarg `explain` to True will add provisionally * Fix large (misleading) sequence giving UnicodeDecodeError * Avoid using too insignificant chunk * Add and expose function `set_logging_handler` to configure a specific StreamHandler OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=21	2021-11-29 11:18:31 +00:00
Dirk Mueller	380896adbc	- require lower-case name instead of breaking build OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=20	2021-11-26 11:35:38 +00:00
Matej Cepl	515e72fd80	- Use lower-case name of prettytable package OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=19	2021-11-25 22:27:00 +00:00
Dirk Mueller	fd5f5dc1f2	Accepting request 925848 from home:mnhauke - Update to version 2.0.7 * Addition: bento Add support for Kazakh (Cyrillic) language detection * Improvement: sparkle Further improve inferring the language from a given code page (single-byte). * Removed: fire Remove redundant logging entry about detected language(s). * Improvement: zap Refactoring for potential performance improvements in loops. * Improvement: sparkles Various detection improvement (MD+CD). * Bugfix: bug Fix a minor inconsistency between Python 3.5 and other versions regarding language detection. - Update to version 2.0.6 * Bugfix: bug Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x. * Bugfix: bug Fix CLI crash when using --minimal output in certain cases. * Improvement: sparkles Minor improvement to the detection efficiency (less than 1%). - Update to version 2.0.5 * Improvement: sparkles The BC-support with v1.x was improved, the old staticmethods are restored. * Remove: fire The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead. * Improvement: sparkles The Unicode detection is slightly improved, see #93 * Bugfix: bug In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection. OBS-URL: https://build.opensuse.org/request/show/925848 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=18	2021-10-26 20:41:42 +00:00
Markéta Machová	d5d2d1f9e5	Accepting request 894588 from home:pgajdos:python - version update to 1.3.9 * Bugfix: bug In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload #40 * Bugfix: bug Empty given payload for detection may cause an exception if trying to access the alphabets property. #39 * Bugfix: bug The legacy detect function should return UTF-8-SIG if sig is present in the payload. #38 OBS-URL: https://build.opensuse.org/request/show/894588 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=16	2021-05-20 09:54:40 +00:00
Dirk Mueller	4ec9d5c90b	Accepting request 870710 from home:jayvdb:branches:devel:languages:python - Switch to PyPI source - Add Suggests: python-unicodedata2 - Remove executable bit from charset_normalizer/assets/frequencies.json - Update to v1.3.6 OBS-URL: https://build.opensuse.org/request/show/870710 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=14	2021-02-10 08:09:39 +00:00
Tomáš Chvátal	204ac0c668	Accepting request 808744 from home:pgajdos:python submit OBS-URL: https://build.opensuse.org/request/show/808744 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=12	2020-05-25 13:36:05 +00:00
Tomáš Chvátal	611d9d38c7	Accepting request 767602 from home:mcalabkova:branches:devel:languages:python - Update to 1.3.4 * Improvement/Bugfix : False positive when searching for successive upper, lower char. (ProbeChaos) * Improvement : Noticeable better detection for jp * Bugfix : Passing zero-length bytes to from_bytes * Improvement : Expose version in package * Bugfix : Division by zero * Improvement : Prefers unicode (utf-8) when detected * Apparently dropped Python2 silently OBS-URL: https://build.opensuse.org/request/show/767602 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=10	2020-01-28 08:11:06 +00:00
Tomáš Chvátal	631de8d368	Accepting request 734946 from home:mcalabkova:branches:devel:languages:python - Update to 1.3.0 * Backport unicodedata for v12 impl into python if available * Add aliases to CharsetNormalizerMatches class * Add feature preemptive behaviour, looking for encoding declaration * Add method to determine if specific encoding is multi byte * Add has_submatch property on a match * Add percent_chaos and percent_coherence * Coherence ratio based on mean instead of sum of best results * Using loguru for trace/debug <3 * from_byte method improved OBS-URL: https://build.opensuse.org/request/show/734946 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=8	2019-10-04 09:50:53 +00:00
Tomáš Chvátal	c385f2c788	- Update to 1.1.1: * from_bytes parameters steps and chunk_size were not adapted to sequence len if provided values were not fitted to content * Sequence having lenght bellow 10 chars was not checked * Legacy detect method inspired by chardet was not returning * Various more test updates OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=6	2019-09-26 10:38:40 +00:00
Tomáš Chvátal	5cb7342274	- Update to 0.3: * Improvement on detection * Performance loss to expect * Added --threshold option to CLI * Bugfix on UTF 7 support * Legacy detect(byte_str) method * BOM support (Unicode mostly) * Chaos prober improved on small text * Language detection has been reviewed to give better result * Bugfix on jp detection, every jp text was considered chaotic OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=5	2019-09-13 11:07:21 +00:00
Dominique Leuenberger	4357e3b4f3	Accepting request 727095 from devel:languages:python OBS-URL: https://build.opensuse.org/request/show/727095 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-charset-normalizer?expand=0&rev=1	2019-09-04 07:09:48 +00:00
Tomáš Chvátal	1b68d0fd4b	- Fix the tarball to really be the one published by upstream OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=3	2019-08-30 00:46:43 +00:00
Tomáš Chvátal	4b06d6e2e5	OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=2	2019-08-30 00:46:24 +00:00
Tomáš Chvátal	2365d5732b	Accepting request 726939 from home:jayvdb:py-new A very new & impressive (and the only) alternative to chardet which has stagnated lately OBS-URL: https://build.opensuse.org/request/show/726939 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=1	2019-08-29 10:43:06 +00:00

27 Commits