2022-01-11 00:04:22 +01:00
|
|
|
-------------------------------------------------------------------
|
2022-02-15 09:43:43 +01:00
|
|
|
Tue Feb 15 08:42:30 UTC 2022 - Dirk Müller <dmueller@suse.com>
|
|
|
|
|
|
|
|
- update to 2.0.12:
|
|
|
|
* ASCII miss-detection on rare cases (PR #170)
|
|
|
|
* Explicit support for Python 3.11 (PR #164)
|
|
|
|
* The logging behavior have been completely reviewed, now using only TRACE
|
|
|
|
and DEBUG levels
|
|
|
|
|
|
|
|
-------------------------------------------------------------------
|
2022-01-11 00:04:22 +01:00
|
|
|
Mon Jan 10 23:01:54 UTC 2022 - Dirk Müller <dmueller@suse.com>
|
|
|
|
|
|
|
|
- update to 2.0.10:
|
|
|
|
* Fallback match entries might lead to UnicodeDecodeError for large bytes
|
|
|
|
sequence
|
|
|
|
* Skipping the language-detection (CD) on ASCII
|
|
|
|
|
2021-12-06 21:09:48 +01:00
|
|
|
-------------------------------------------------------------------
|
|
|
|
Mon Dec 6 20:08:41 UTC 2021 - Dirk Müller <dmueller@suse.com>
|
|
|
|
|
|
|
|
- update to 2.0.9:
|
|
|
|
* Moderating the logging impact (since 2.0.8) for specific
|
|
|
|
environments
|
|
|
|
* Wrong logging level applied when setting kwarg `explain` to True
|
|
|
|
|
2021-11-29 12:18:31 +01:00
|
|
|
-------------------------------------------------------------------
|
|
|
|
Mon Nov 29 11:14:37 UTC 2021 - Dirk Müller <dmueller@suse.com>
|
|
|
|
|
|
|
|
- update to 2.0.8:
|
|
|
|
* Improvement over Vietnamese detection
|
|
|
|
* MD improvement on trailing data and long foreign (non-pure latin)
|
|
|
|
* Efficiency improvements in cd/alphabet_languages
|
|
|
|
* call sum() without an intermediary list following PEP 289 recommendations
|
|
|
|
* Code style as refactored by Sourcery-AI
|
|
|
|
* Minor adjustment on the MD around european words
|
|
|
|
* Remove and replace SRTs from assets / tests
|
|
|
|
* Initialize the library logger with a `NullHandler` by default
|
|
|
|
* Setting kwarg `explain` to True will add provisionally
|
|
|
|
* Fix large (misleading) sequence giving UnicodeDecodeError
|
|
|
|
* Avoid using too insignificant chunk
|
|
|
|
* Add and expose function `set_logging_handler` to configure a specific
|
|
|
|
StreamHandler
|
|
|
|
|
2021-11-26 12:35:38 +01:00
|
|
|
-------------------------------------------------------------------
|
|
|
|
Fri Nov 26 11:35:25 UTC 2021 - Dirk Müller <dmueller@suse.com>
|
|
|
|
|
|
|
|
- require lower-case name instead of breaking build
|
|
|
|
|
2021-11-25 23:27:00 +01:00
|
|
|
-------------------------------------------------------------------
|
|
|
|
Thu Nov 25 22:26:52 UTC 2021 - Matej Cepl <mcepl@suse.com>
|
|
|
|
|
|
|
|
- Use lower-case name of prettytable package
|
|
|
|
|
2021-10-26 22:41:42 +02:00
|
|
|
-------------------------------------------------------------------
|
|
|
|
Sun Oct 17 14:01:59 UTC 2021 - Martin Hauke <mardnh@gmx.de>
|
|
|
|
|
|
|
|
- Update to version 2.0.7
|
|
|
|
* Addition: bento Add support for Kazakh (Cyrillic) language
|
|
|
|
detection
|
|
|
|
* Improvement: sparkle Further improve inferring the language
|
|
|
|
from a given code page (single-byte).
|
|
|
|
* Removed: fire Remove redundant logging entry about detected
|
|
|
|
language(s).
|
|
|
|
* Improvement: zap Refactoring for potential performance
|
|
|
|
improvements in loops.
|
|
|
|
* Improvement: sparkles Various detection improvement (MD+CD).
|
|
|
|
* Bugfix: bug Fix a minor inconsistency between Python 3.5 and
|
|
|
|
other versions regarding language detection.
|
|
|
|
- Update to version 2.0.6
|
|
|
|
* Bugfix: bug Unforeseen regression with the loss of the
|
|
|
|
backward-compatibility with some older minor of Python 3.5.x.
|
|
|
|
* Bugfix: bug Fix CLI crash when using --minimal output in
|
|
|
|
certain cases.
|
|
|
|
* Improvement: sparkles Minor improvement to the detection
|
|
|
|
efficiency (less than 1%).
|
|
|
|
- Update to version 2.0.5
|
|
|
|
* Improvement: sparkles The BC-support with v1.x was improved,
|
|
|
|
the old staticmethods are restored.
|
|
|
|
* Remove: fire The project no longer raise warning on tiny
|
|
|
|
content given for detection, will be simply logged as warning
|
|
|
|
instead.
|
|
|
|
* Improvement: sparkles The Unicode detection is slightly
|
|
|
|
improved, see #93
|
|
|
|
* Bugfix: bug In some rare case, the chunks extractor could cut
|
|
|
|
in the middle of a multi-byte character and could mislead the
|
|
|
|
mess detection.
|
|
|
|
* Bugfix: bug Some rare 'space' characters could trip up the
|
|
|
|
UnprintablePlugin/Mess detection.
|
|
|
|
* Improvement: art Add syntax sugar __bool__ for results
|
|
|
|
CharsetMatches list-container.
|
|
|
|
- Update to version 2.0.4
|
|
|
|
* Improvement: sparkle Adjust the MD to lower the sensitivity,
|
|
|
|
thus improving the global detection reliability.
|
|
|
|
* Improvement: sparkle Allow fallback on specified encoding
|
|
|
|
if any.
|
|
|
|
* Bugfix: bug The CLI no longer raise an unexpected exception
|
|
|
|
when no encoding has been found.
|
|
|
|
* Bugfix: bug Fix accessing the 'alphabets' property when the
|
|
|
|
payload contains surrogate characters.
|
|
|
|
* Bugfix: bug pencil2 The logger could mislead (explain=True) on
|
|
|
|
detected languages and the impact of one MBCS match (in #72)
|
|
|
|
* Bugfix: bug Submatch factoring could be wrong in rare edge
|
|
|
|
cases (in #72)
|
|
|
|
* Bugfix: bug Multiple files given to the CLI were ignored when
|
|
|
|
publishing results to STDOUT. (After the first path) (in #72)
|
|
|
|
* Internal: art Fix line endings from CRLF to LF for certain
|
|
|
|
files.
|
|
|
|
- Update to version 2.0.3
|
|
|
|
* Improvement: sparkles Part of the detection mechanism has been
|
|
|
|
improved to be less sensitive, resulting in more accurate
|
|
|
|
detection results. Especially ASCII. #63 Fix #62
|
|
|
|
* Improvement: sparklesAccording to the community wishes, the
|
|
|
|
detection will fall back on ASCII or UTF-8 in a last-resort
|
|
|
|
case.
|
|
|
|
- Update to version 2.0.2
|
|
|
|
* Bugfix: bug Empty/Too small JSON payload miss-detection fixed.
|
|
|
|
* Improvement: sparkler Don't inject unicodedata2 into sys.modules
|
|
|
|
- Update to version 2.0.1
|
|
|
|
* Bugfix: bug Make it work where there isn't a filesystem
|
|
|
|
available, dropping assets frequencies.json.
|
|
|
|
* Improvement: sparkles You may now use aliases in cp_isolation
|
|
|
|
and cp_exclusion arguments.
|
|
|
|
* Bugfix: bug Using explain=False permanently disable the verbose
|
|
|
|
output in the current runtime #47
|
|
|
|
* Bugfix: bug One log entry (language target preemptive) was not
|
|
|
|
show in logs when using explain=True #47
|
|
|
|
* Bugfix: bug Fix undesired exception (ValueError) on getitem of
|
|
|
|
instance CharsetMatches #52
|
|
|
|
* Improvement: wrench Public function normalize default args
|
|
|
|
values were not aligned with from_bytes #53
|
|
|
|
- Update to version 2.0.0
|
|
|
|
* Performance: zap 4x to 5 times faster than the previous 1.4.0
|
|
|
|
release.
|
|
|
|
* Performance: zap At least 2x faster than Chardet.
|
|
|
|
* Performance: zap Accent has been made on UTF-8 detection,
|
|
|
|
should perform rather instantaneous.
|
|
|
|
* Improvement: back The backward compatibility with Chardet has
|
|
|
|
been greatly improved. The legacy detect function returns an
|
|
|
|
identical charset name whenever possible.
|
|
|
|
* Improvement: sparkle The detection mechanism has been slightly
|
|
|
|
improved, now Turkish content is detected correctly (most of
|
|
|
|
the time)
|
|
|
|
* Code: art The program has been rewritten to ease the
|
|
|
|
readability and maintainability. (+Using static typing)
|
|
|
|
* Tests: heavy_check_mark New workflows are now in place to
|
|
|
|
verify the following aspects: Performance, Backward-
|
|
|
|
Compatibility with Chardet, and Detection Coverage in addition#
|
|
|
|
to currents tests. (+CodeQL)
|
|
|
|
* Dependency: heavy_minus_sign This package no longer require
|
|
|
|
anything when used with Python 3.5 (Dropped cached_property)
|
|
|
|
* Docs: pencil2 Performance claims have been updated, the guide
|
|
|
|
to contributing, and the issue template.
|
|
|
|
* Improvement: sparkle Add --version argument to CLI
|
|
|
|
* Bugfix: bug The CLI output used the relative path of the
|
|
|
|
file(s). Should be absolute.
|
|
|
|
* Deprecation: red_circle Methods coherence_non_latin, w_counter,
|
|
|
|
chaos_secondary_pass of the class CharsetMatch are now
|
|
|
|
deprecated and scheduled for removal in v3.0
|
|
|
|
* Improvement: sparkle If no language was detected in content,
|
|
|
|
trying to infer it using the encoding name/alphabets used.
|
|
|
|
* Removal: fire Removed support for these languages: Catalan,
|
|
|
|
Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk,
|
|
|
|
Macedonian, and Serbocroatian.
|
|
|
|
* Improvement: sparkle utf_7 detection has been reinstated.
|
|
|
|
* Removal: fire The exception hook on UnicodeDecodeError has
|
|
|
|
been removed.
|
|
|
|
- Update to version 1.4.1
|
|
|
|
* Improvement: art Logger configuration/usage no longer
|
|
|
|
conflict with others #44
|
|
|
|
- Update to version 1.4.0
|
|
|
|
* Dependency: heavy_minus_sign Using standard logging instead
|
|
|
|
of using the package loguru.
|
|
|
|
* Dependency: heavy_minus_sign Dropping nose test framework in
|
|
|
|
favor of the maintained pytest.
|
|
|
|
* Dependency: heavy_minus_sign Choose to not use dragonmapper
|
|
|
|
package to help with gibberish Chinese/CJK text.
|
|
|
|
* Dependency: wrench heavy_minus_sign Require cached_property
|
|
|
|
only for Python 3.5 due to constraint. Dropping for every
|
|
|
|
other interpreter version.
|
|
|
|
* Bugfix: bug BOM marker in a CharsetNormalizerMatch instance
|
|
|
|
could be False in rare cases even if obviously present. Due
|
|
|
|
to the sub-match factoring process.
|
|
|
|
* Improvement: sparkler Return ASCII if given sequences fit.
|
|
|
|
* Performance: zap Huge improvement over the larges payload.
|
|
|
|
* Change: fire Stop support for UTF-7 that does not contain a
|
|
|
|
SIG. (Contributions are welcome to improve that point)
|
|
|
|
* Feature: sparkler CLI now produces JSON consumable output.
|
|
|
|
* Dependency: Dropping PrettyTable, replaced with pure JSON
|
|
|
|
output.
|
|
|
|
* Bugfix: bug Not searching properly for the BOM when trying
|
|
|
|
utf32/16 parent codec.
|
|
|
|
* Other: zap Improving the package final size by compressing
|
|
|
|
frequencies.json.
|
|
|
|
|
2021-05-20 11:54:40 +02:00
|
|
|
-------------------------------------------------------------------
|
|
|
|
Thu May 20 09:46:56 UTC 2021 - pgajdos@suse.com
|
|
|
|
|
|
|
|
- version update to 1.3.9
|
|
|
|
* Bugfix: bug In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload #40
|
|
|
|
* Bugfix: bug Empty given payload for detection may cause an exception if trying to access the alphabets property. #39
|
|
|
|
* Bugfix: bug The legacy detect function should return UTF-8-SIG if sig is present in the payload. #38
|
|
|
|
|
2021-02-10 09:09:39 +01:00
|
|
|
-------------------------------------------------------------------
|
|
|
|
Tue Feb 9 00:47:34 UTC 2021 - John Vandenberg <jayvdb@gmail.com>
|
|
|
|
|
|
|
|
- Switch to PyPI source
|
|
|
|
- Add Suggests: python-unicodedata2
|
|
|
|
- Remove executable bit from charset_normalizer/assets/frequencies.json
|
|
|
|
- Update to v1.3.6
|
|
|
|
* Allow prettytable 2.0
|
|
|
|
- from v1.3.5
|
|
|
|
* Dependencies refactor and add support for py 3.9 and 3.10
|
|
|
|
* Fix version parsing
|
|
|
|
|
2020-05-25 15:36:05 +02:00
|
|
|
-------------------------------------------------------------------
|
|
|
|
Mon May 25 10:59:12 UTC 2020 - Petr Gajdos <pgajdos@suse.com>
|
|
|
|
|
|
|
|
- %python3_only -> %python_alternative
|
|
|
|
|
2020-01-28 09:11:06 +01:00
|
|
|
-------------------------------------------------------------------
|
|
|
|
Mon Jan 27 09:09:27 UTC 2020 - Marketa Calabkova <mcalabkova@suse.com>
|
|
|
|
|
|
|
|
- Update to 1.3.4
|
|
|
|
* Improvement/Bugfix : False positive when searching for successive upper, lower char. (ProbeChaos)
|
|
|
|
* Improvement : Noticeable better detection for jp
|
|
|
|
* Bugfix : Passing zero-length bytes to from_bytes
|
|
|
|
* Improvement : Expose version in package
|
|
|
|
* Bugfix : Division by zero
|
|
|
|
* Improvement : Prefers unicode (utf-8) when detected
|
|
|
|
* Apparently dropped Python2 silently
|
|
|
|
|
2019-10-04 11:50:53 +02:00
|
|
|
-------------------------------------------------------------------
|
|
|
|
Fri Oct 4 08:52:51 UTC 2019 - Marketa Calabkova <mcalabkova@suse.com>
|
|
|
|
|
|
|
|
- Update to 1.3.0
|
|
|
|
* Backport unicodedata for v12 impl into python if available
|
|
|
|
* Add aliases to CharsetNormalizerMatches class
|
|
|
|
* Add feature preemptive behaviour, looking for encoding declaration
|
|
|
|
* Add method to determine if specific encoding is multi byte
|
|
|
|
* Add has_submatch property on a match
|
|
|
|
* Add percent_chaos and percent_coherence
|
|
|
|
* Coherence ratio based on mean instead of sum of best results
|
|
|
|
* Using loguru for trace/debug <3
|
|
|
|
* from_byte method improved
|
|
|
|
|
2019-09-26 12:38:40 +02:00
|
|
|
-------------------------------------------------------------------
|
|
|
|
Thu Sep 26 10:35:51 UTC 2019 - Tomáš Chvátal <tchvatal@suse.com>
|
|
|
|
|
|
|
|
- Update to 1.1.1:
|
|
|
|
* from_bytes parameters steps and chunk_size were not adapted to sequence len if provided values were not fitted to content
|
|
|
|
* Sequence having lenght bellow 10 chars was not checked
|
|
|
|
* Legacy detect method inspired by chardet was not returning
|
|
|
|
* Various more test updates
|
|
|
|
|
2019-09-13 13:07:21 +02:00
|
|
|
-------------------------------------------------------------------
|
|
|
|
Fri Sep 13 11:05:06 UTC 2019 - Tomáš Chvátal <tchvatal@suse.com>
|
|
|
|
|
|
|
|
- Update to 0.3:
|
|
|
|
* Improvement on detection
|
|
|
|
* Performance loss to expect
|
|
|
|
* Added --threshold option to CLI
|
|
|
|
* Bugfix on UTF 7 support
|
|
|
|
* Legacy detect(byte_str) method
|
|
|
|
* BOM support (Unicode mostly)
|
|
|
|
* Chaos prober improved on small text
|
|
|
|
* Language detection has been reviewed to give better result
|
|
|
|
* Bugfix on jp detection, every jp text was considered chaotic
|
|
|
|
|
2019-08-30 02:46:43 +02:00
|
|
|
-------------------------------------------------------------------
|
|
|
|
Fri Aug 30 00:46:27 UTC 2019 - Tomáš Chvátal <tchvatal@suse.com>
|
|
|
|
|
|
|
|
- Fix the tarball to really be the one published by upstream
|
|
|
|
|
2019-08-29 12:43:06 +02:00
|
|
|
-------------------------------------------------------------------
|
|
|
|
Tue Aug 28 06:29:02 PM UTC 2019 - John Vandenberg <jayvdb@gmail.com>
|
|
|
|
|
|
|
|
- Initial spec for v0.1.8
|