python-charset-normalizer

SHA256

Author	SHA256	Message	Date
Dirk Mueller	259f5f1afe	- update to 2.0.12: * ASCII miss-detection on rare cases (PR #170) * Explicit support for Python 3.11 (PR #164) * The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=24	2022-02-15 08:43:43 +00:00
Dirk Mueller	c739862e1a	- update to 2.0.10: * Fallback match entries might lead to UnicodeDecodeError for large bytes sequence * Skipping the language-detection (CD) on ASCII OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=23	2022-01-10 23:04:22 +00:00
Dirk Mueller	53a1bfb655	- update to 2.0.9: * Moderating the logging impact (since 2.0.8) for specific environments * Wrong logging level applied when setting kwarg `explain` to True OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=22	2021-12-06 20:09:48 +00:00
Dirk Mueller	4e6d945d9a	- update to 2.0.8: * Improvement over Vietnamese detection * MD improvement on trailing data and long foreign (non-pure latin) * Efficiency improvements in cd/alphabet_languages * call sum() without an intermediary list following PEP 289 recommendations * Code style as refactored by Sourcery-AI * Minor adjustment on the MD around european words * Remove and replace SRTs from assets / tests * Initialize the library logger with a `NullHandler` by default * Setting kwarg `explain` to True will add provisionally * Fix large (misleading) sequence giving UnicodeDecodeError * Avoid using too insignificant chunk * Add and expose function `set_logging_handler` to configure a specific StreamHandler OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=21	2021-11-29 11:18:31 +00:00
Dirk Mueller	380896adbc	- require lower-case name instead of breaking build OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=20	2021-11-26 11:35:38 +00:00
Matej Cepl	515e72fd80	- Use lower-case name of prettytable package OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=19	2021-11-25 22:27:00 +00:00
Dirk Mueller	fd5f5dc1f2	Accepting request 925848 from home:mnhauke - Update to version 2.0.7 * Addition: bento Add support for Kazakh (Cyrillic) language detection * Improvement: sparkle Further improve inferring the language from a given code page (single-byte). * Removed: fire Remove redundant logging entry about detected language(s). * Improvement: zap Refactoring for potential performance improvements in loops. * Improvement: sparkles Various detection improvement (MD+CD). * Bugfix: bug Fix a minor inconsistency between Python 3.5 and other versions regarding language detection. - Update to version 2.0.6 * Bugfix: bug Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x. * Bugfix: bug Fix CLI crash when using --minimal output in certain cases. * Improvement: sparkles Minor improvement to the detection efficiency (less than 1%). - Update to version 2.0.5 * Improvement: sparkles The BC-support with v1.x was improved, the old staticmethods are restored. * Remove: fire The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead. * Improvement: sparkles The Unicode detection is slightly improved, see #93 * Bugfix: bug In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection. OBS-URL: https://build.opensuse.org/request/show/925848 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=18	2021-10-26 20:41:42 +00:00
Markéta Machová	d5d2d1f9e5	Accepting request 894588 from home:pgajdos:python - version update to 1.3.9 * Bugfix: bug In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload #40 * Bugfix: bug Empty given payload for detection may cause an exception if trying to access the alphabets property. #39 * Bugfix: bug The legacy detect function should return UTF-8-SIG if sig is present in the payload. #38 OBS-URL: https://build.opensuse.org/request/show/894588 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=16	2021-05-20 09:54:40 +00:00
Dirk Mueller	4ec9d5c90b	Accepting request 870710 from home:jayvdb:branches:devel:languages:python - Switch to PyPI source - Add Suggests: python-unicodedata2 - Remove executable bit from charset_normalizer/assets/frequencies.json - Update to v1.3.6 OBS-URL: https://build.opensuse.org/request/show/870710 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=14	2021-02-10 08:09:39 +00:00
Tomáš Chvátal	204ac0c668	Accepting request 808744 from home:pgajdos:python submit OBS-URL: https://build.opensuse.org/request/show/808744 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=12	2020-05-25 13:36:05 +00:00
Tomáš Chvátal	611d9d38c7	Accepting request 767602 from home:mcalabkova:branches:devel:languages:python - Update to 1.3.4 * Improvement/Bugfix : False positive when searching for successive upper, lower char. (ProbeChaos) * Improvement : Noticeable better detection for jp * Bugfix : Passing zero-length bytes to from_bytes * Improvement : Expose version in package * Bugfix : Division by zero * Improvement : Prefers unicode (utf-8) when detected * Apparently dropped Python2 silently OBS-URL: https://build.opensuse.org/request/show/767602 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=10	2020-01-28 08:11:06 +00:00
Tomáš Chvátal	631de8d368	Accepting request 734946 from home:mcalabkova:branches:devel:languages:python - Update to 1.3.0 * Backport unicodedata for v12 impl into python if available * Add aliases to CharsetNormalizerMatches class * Add feature preemptive behaviour, looking for encoding declaration * Add method to determine if specific encoding is multi byte * Add has_submatch property on a match * Add percent_chaos and percent_coherence * Coherence ratio based on mean instead of sum of best results * Using loguru for trace/debug <3 * from_byte method improved OBS-URL: https://build.opensuse.org/request/show/734946 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=8	2019-10-04 09:50:53 +00:00
Tomáš Chvátal	c385f2c788	- Update to 1.1.1: * from_bytes parameters steps and chunk_size were not adapted to sequence len if provided values were not fitted to content * Sequence having lenght bellow 10 chars was not checked * Legacy detect method inspired by chardet was not returning * Various more test updates OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=6	2019-09-26 10:38:40 +00:00
Tomáš Chvátal	5cb7342274	- Update to 0.3: * Improvement on detection * Performance loss to expect * Added --threshold option to CLI * Bugfix on UTF 7 support * Legacy detect(byte_str) method * BOM support (Unicode mostly) * Chaos prober improved on small text * Language detection has been reviewed to give better result * Bugfix on jp detection, every jp text was considered chaotic OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=5	2019-09-13 11:07:21 +00:00
Tomáš Chvátal	1b68d0fd4b	- Fix the tarball to really be the one published by upstream OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=3	2019-08-30 00:46:43 +00:00
Tomáš Chvátal	2365d5732b	Accepting request 726939 from home:jayvdb:py-new A very new & impressive (and the only) alternative to chardet which has stagnated lately OBS-URL: https://build.opensuse.org/request/show/726939 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=1	2019-08-29 10:43:06 +00:00

16 Commits