Accepting request 925848 from home:mnhauke

- Update to version 2.0.7
  * Addition: bento Add support for Kazakh (Cyrillic) language
    detection
  * Improvement: sparkle Further improve inferring the language
    from a given code page (single-byte).
  * Removed: fire Remove redundant logging entry about detected
    language(s).
  * Improvement: zap Refactoring for potential performance
    improvements in loops.
  * Improvement: sparkles Various detection improvement (MD+CD).
  * Bugfix: bug Fix a minor inconsistency between Python 3.5 and
    other versions regarding language detection.
- Update to version 2.0.6
  * Bugfix: bug Unforeseen regression with the loss of the
    backward-compatibility with some older minor of Python 3.5.x.
  * Bugfix: bug Fix CLI crash when using --minimal output in
    certain cases.
  * Improvement: sparkles Minor improvement to the detection
    efficiency (less than 1%).
- Update to version 2.0.5
  * Improvement: sparkles The BC-support with v1.x was improved,
    the old staticmethods are restored.
  * Remove: fire The project no longer raise warning on tiny
    content given for detection, will be simply logged as warning
    instead.
  * Improvement: sparkles The Unicode detection is slightly
    improved, see #93
  * Bugfix: bug In some rare case, the chunks extractor could cut
    in the middle of a multi-byte character and could mislead the
    mess detection.

OBS-URL: https://build.opensuse.org/request/show/925848
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=18
This commit is contained in:
Dirk Mueller 2021-10-26 20:41:42 +00:00 committed by Git OBS Bridge
parent d5d2d1f9e5
commit fd5f5dc1f2
4 changed files with 148 additions and 9 deletions

View File

@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:54425d9436c1cff46dfbb6b6598ac0a4c2d7b003d4787ab7daaf64528e458ed8
size 347681

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6473e80f73f5918254953073798a367f120cc5717e70c917359e155901c0e2d0
size 369094

View File

@ -1,3 +1,144 @@
-------------------------------------------------------------------
Sun Oct 17 14:01:59 UTC 2021 - Martin Hauke <mardnh@gmx.de>
- Update to version 2.0.7
* Addition: bento Add support for Kazakh (Cyrillic) language
detection
* Improvement: sparkle Further improve inferring the language
from a given code page (single-byte).
* Removed: fire Remove redundant logging entry about detected
language(s).
* Improvement: zap Refactoring for potential performance
improvements in loops.
* Improvement: sparkles Various detection improvement (MD+CD).
* Bugfix: bug Fix a minor inconsistency between Python 3.5 and
other versions regarding language detection.
- Update to version 2.0.6
* Bugfix: bug Unforeseen regression with the loss of the
backward-compatibility with some older minor of Python 3.5.x.
* Bugfix: bug Fix CLI crash when using --minimal output in
certain cases.
* Improvement: sparkles Minor improvement to the detection
efficiency (less than 1%).
- Update to version 2.0.5
* Improvement: sparkles The BC-support with v1.x was improved,
the old staticmethods are restored.
* Remove: fire The project no longer raise warning on tiny
content given for detection, will be simply logged as warning
instead.
* Improvement: sparkles The Unicode detection is slightly
improved, see #93
* Bugfix: bug In some rare case, the chunks extractor could cut
in the middle of a multi-byte character and could mislead the
mess detection.
* Bugfix: bug Some rare 'space' characters could trip up the
UnprintablePlugin/Mess detection.
* Improvement: art Add syntax sugar __bool__ for results
CharsetMatches list-container.
- Update to version 2.0.4
* Improvement: sparkle Adjust the MD to lower the sensitivity,
thus improving the global detection reliability.
* Improvement: sparkle Allow fallback on specified encoding
if any.
* Bugfix: bug The CLI no longer raise an unexpected exception
when no encoding has been found.
* Bugfix: bug Fix accessing the 'alphabets' property when the
payload contains surrogate characters.
* Bugfix: bug pencil2 The logger could mislead (explain=True) on
detected languages and the impact of one MBCS match (in #72)
* Bugfix: bug Submatch factoring could be wrong in rare edge
cases (in #72)
* Bugfix: bug Multiple files given to the CLI were ignored when
publishing results to STDOUT. (After the first path) (in #72)
* Internal: art Fix line endings from CRLF to LF for certain
files.
- Update to version 2.0.3
* Improvement: sparkles Part of the detection mechanism has been
improved to be less sensitive, resulting in more accurate
detection results. Especially ASCII. #63 Fix #62
* Improvement: sparklesAccording to the community wishes, the
detection will fall back on ASCII or UTF-8 in a last-resort
case.
- Update to version 2.0.2
* Bugfix: bug Empty/Too small JSON payload miss-detection fixed.
* Improvement: sparkler Don't inject unicodedata2 into sys.modules
- Update to version 2.0.1
* Bugfix: bug Make it work where there isn't a filesystem
available, dropping assets frequencies.json.
* Improvement: sparkles You may now use aliases in cp_isolation
and cp_exclusion arguments.
* Bugfix: bug Using explain=False permanently disable the verbose
output in the current runtime #47
* Bugfix: bug One log entry (language target preemptive) was not
show in logs when using explain=True #47
* Bugfix: bug Fix undesired exception (ValueError) on getitem of
instance CharsetMatches #52
* Improvement: wrench Public function normalize default args
values were not aligned with from_bytes #53
- Update to version 2.0.0
* Performance: zap 4x to 5 times faster than the previous 1.4.0
release.
* Performance: zap At least 2x faster than Chardet.
* Performance: zap Accent has been made on UTF-8 detection,
should perform rather instantaneous.
* Improvement: back The backward compatibility with Chardet has
been greatly improved. The legacy detect function returns an
identical charset name whenever possible.
* Improvement: sparkle The detection mechanism has been slightly
improved, now Turkish content is detected correctly (most of
the time)
* Code: art The program has been rewritten to ease the
readability and maintainability. (+Using static typing)
* Tests: heavy_check_mark New workflows are now in place to
verify the following aspects: Performance, Backward-
Compatibility with Chardet, and Detection Coverage in addition#
to currents tests. (+CodeQL)
* Dependency: heavy_minus_sign This package no longer require
anything when used with Python 3.5 (Dropped cached_property)
* Docs: pencil2 Performance claims have been updated, the guide
to contributing, and the issue template.
* Improvement: sparkle Add --version argument to CLI
* Bugfix: bug The CLI output used the relative path of the
file(s). Should be absolute.
* Deprecation: red_circle Methods coherence_non_latin, w_counter,
chaos_secondary_pass of the class CharsetMatch are now
deprecated and scheduled for removal in v3.0
* Improvement: sparkle If no language was detected in content,
trying to infer it using the encoding name/alphabets used.
* Removal: fire Removed support for these languages: Catalan,
Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk,
Macedonian, and Serbocroatian.
* Improvement: sparkle utf_7 detection has been reinstated.
* Removal: fire The exception hook on UnicodeDecodeError has
been removed.
- Update to version 1.4.1
* Improvement: art Logger configuration/usage no longer
conflict with others #44
- Update to version 1.4.0
* Dependency: heavy_minus_sign Using standard logging instead
of using the package loguru.
* Dependency: heavy_minus_sign Dropping nose test framework in
favor of the maintained pytest.
* Dependency: heavy_minus_sign Choose to not use dragonmapper
package to help with gibberish Chinese/CJK text.
* Dependency: wrench heavy_minus_sign Require cached_property
only for Python 3.5 due to constraint. Dropping for every
other interpreter version.
* Bugfix: bug BOM marker in a CharsetNormalizerMatch instance
could be False in rare cases even if obviously present. Due
to the sub-match factoring process.
* Improvement: sparkler Return ASCII if given sequences fit.
* Performance: zap Huge improvement over the larges payload.
* Change: fire Stop support for UTF-7 that does not contain a
SIG. (Contributions are welcome to improve that point)
* Feature: sparkler CLI now produces JSON consumable output.
* Dependency: Dropping PrettyTable, replaced with pure JSON
output.
* Bugfix: bug Not searching properly for the BOM when trying
utf32/16 parent codec.
* Other: zap Improving the package final size by compressing
frequencies.json.
-------------------------------------------------------------------
Thu May 20 09:46:56 UTC 2021 - pgajdos@suse.com

View File

@ -19,14 +19,13 @@
%{?!python_module:%define python_module() python-%{**} python3-%{**}}
%define skip_python2 1
Name: python-charset-normalizer
Version: 1.3.9
Version: 2.0.7
Release: 0
Summary: Python Universal Charset detector
License: MIT
URL: https://github.com/ousret/charset_normalizer
Source: https://files.pythonhosted.org/packages/source/c/charset_normalizer/charset_normalizer-%{version}.tar.gz
Source: https://github.com/Ousret/charset_normalizer/archive/refs/tags/%{version}.tar.gz#/charset_normalizer-%{version}.tar.gz
BuildRequires: %{python_module setuptools}
BuildRequires: dos2unix
BuildRequires: fdupes
BuildRequires: python-rpm-macros
Requires: python-PrettyTable
@ -45,6 +44,7 @@ BuildRequires: %{python_module PrettyTable}
BuildRequires: %{python_module cached-property >= 1.5}
BuildRequires: %{python_module dragonmapper >= 0.2}
BuildRequires: %{python_module loguru >= 0.5}
BuildRequires: %{python_module pytest-cov}
BuildRequires: %{python_module pytest}
BuildRequires: %{python_module zhon}
# /SECTION
@ -55,8 +55,6 @@ Python Universal Charset detector.
%prep
%setup -q -n charset_normalizer-%{version}
dos2unix README.md
chmod a-x charset_normalizer/assets/frequencies.json
%build
%python_build
@ -79,6 +77,6 @@ chmod a-x charset_normalizer/assets/frequencies.json
%doc README.md
%license LICENSE
%python_alternative %{_bindir}/normalizer
%{python_sitelib}/*
%{python_sitelib}/charset_normalizer*
%changelog