Accepting request 925848 from home:mnhauke

- Update to version 2.0.7 * Addition: bento Add support for Kazakh (Cyrillic) language detection * Improvement: sparkle Further improve inferring the language from a given code page (single-byte). * Removed: fire Remove redundant logging entry about detected language(s). * Improvement: zap Refactoring for potential performance improvements in loops. * Improvement: sparkles Various detection improvement (MD+CD). * Bugfix: bug Fix a minor inconsistency between Python 3.5 and other versions regarding language detection. - Update to version 2.0.6 * Bugfix: bug Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x. * Bugfix: bug Fix CLI crash when using --minimal output in certain cases. * Improvement: sparkles Minor improvement to the detection efficiency (less than 1%). - Update to version 2.0.5 * Improvement: sparkles The BC-support with v1.x was improved, the old staticmethods are restored. * Remove: fire The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead. * Improvement: sparkles The Unicode detection is slightly improved, see #93 * Bugfix: bug In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection. OBS-URL: https://build.opensuse.org/request/show/925848 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=18
2021-10-26 20:41:42 +00:00 · 2021-10-26 20:41:42 +00:00 · fd5f5dc1f2
commit fd5f5dc1f2
parent d5d2d1f9e5
4 changed files with 148 additions and 9 deletions
--- a/charset_normalizer-1.3.9.tar.gz
+++ b/charset_normalizer-1.3.9.tar.gz
@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:54425d9436c1cff46dfbb6b6598ac0a4c2d7b003d4787ab7daaf64528e458ed8
-size 347681
--- a/charset_normalizer-2.0.7.tar.gz
+++ b/charset_normalizer-2.0.7.tar.gz
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:6473e80f73f5918254953073798a367f120cc5717e70c917359e155901c0e2d0
+size 369094
--- a/python-charset-normalizer.changes
+++ b/python-charset-normalizer.changes
@ -1,3 +1,144 @@
+-------------------------------------------------------------------
+Sun Oct 17 14:01:59 UTC 2021 - Martin Hauke <mardnh@gmx.de>
+
+- Update to version 2.0.7
+  * Addition: bento Add support for Kazakh (Cyrillic) language
+    detection
+  * Improvement: sparkle Further improve inferring the language
+    from a given code page (single-byte).
+  * Removed: fire Remove redundant logging entry about detected
+    language(s).
+  * Improvement: zap Refactoring for potential performance
+    improvements in loops.
+  * Improvement: sparkles Various detection improvement (MD+CD).
+  * Bugfix: bug Fix a minor inconsistency between Python 3.5 and
+    other versions regarding language detection.
+- Update to version 2.0.6
+  * Bugfix: bug Unforeseen regression with the loss of the
+    backward-compatibility with some older minor of Python 3.5.x.
+  * Bugfix: bug Fix CLI crash when using --minimal output in
+    certain cases.
+  * Improvement: sparkles Minor improvement to the detection
+    efficiency (less than 1%).
+- Update to version 2.0.5
+  * Improvement: sparkles The BC-support with v1.x was improved,
+    the old staticmethods are restored.
+  * Remove: fire The project no longer raise warning on tiny
+    content given for detection, will be simply logged as warning
+    instead.
+  * Improvement: sparkles The Unicode detection is slightly
+    improved, see #93
+  * Bugfix: bug In some rare case, the chunks extractor could cut
+    in the middle of a multi-byte character and could mislead the
+    mess detection.
+  * Bugfix: bug Some rare 'space' characters could trip up the
+    UnprintablePlugin/Mess detection.
+  * Improvement: art Add syntax sugar __bool__ for results
+    CharsetMatches list-container.
+- Update to version 2.0.4
+  * Improvement: sparkle Adjust the MD to lower the sensitivity,
+    thus improving the global detection reliability.
+  * Improvement: sparkle Allow fallback on specified encoding
+    if any.
+  * Bugfix: bug The CLI no longer raise an unexpected exception
+    when no encoding has been found.
+  * Bugfix: bug Fix accessing the 'alphabets' property when the
+    payload contains surrogate characters.
+  * Bugfix: bug pencil2 The logger could mislead (explain=True) on
+    detected languages and the impact of one MBCS match (in #72)
+  * Bugfix: bug Submatch factoring could be wrong in rare edge
+    cases (in #72)
+  * Bugfix: bug Multiple files given to the CLI were ignored when
+    publishing results to STDOUT. (After the first path) (in #72)
+  * Internal: art Fix line endings from CRLF to LF for certain
+    files.
+- Update to version 2.0.3
+  * Improvement: sparkles Part of the detection mechanism has been
+    improved to be less sensitive, resulting in more accurate
+    detection results. Especially ASCII. #63 Fix #62
+  * Improvement: sparklesAccording to the community wishes, the
+    detection will fall back on ASCII or UTF-8 in a last-resort
+    case.
+- Update to version 2.0.2
+  * Bugfix: bug Empty/Too small JSON payload miss-detection fixed.
+  * Improvement: sparkler Don't inject unicodedata2 into sys.modules
+- Update to version 2.0.1
+  * Bugfix: bug Make it work where there isn't a filesystem
+    available, dropping assets frequencies.json.
+  * Improvement: sparkles You may now use aliases in cp_isolation
+    and cp_exclusion arguments.
+  * Bugfix: bug Using explain=False permanently disable the verbose
+    output in the current runtime #47
+  * Bugfix: bug One log entry (language target preemptive) was not
+    show in logs when using explain=True #47
+  * Bugfix: bug Fix undesired exception (ValueError) on getitem of
+    instance CharsetMatches #52
+  * Improvement: wrench Public function normalize default args
+    values were not aligned with from_bytes #53
+- Update to version 2.0.0
+  * Performance: zap 4x to 5 times faster than the previous 1.4.0
+    release.
+  * Performance: zap At least 2x faster than Chardet.
+  * Performance: zap Accent has been made on UTF-8 detection,
+    should perform rather instantaneous.
+  * Improvement: back The backward compatibility with Chardet has
+    been greatly improved. The legacy detect function returns an
+    identical charset name whenever possible.
+  * Improvement: sparkle The detection mechanism has been slightly
+    improved, now Turkish content is detected correctly (most of
+    the time)
+  * Code: art The program has been rewritten to ease the
+    readability and maintainability. (+Using static typing)
+  * Tests: heavy_check_mark New workflows are now in place to
+    verify the following aspects: Performance, Backward-
+    Compatibility with Chardet, and Detection Coverage in addition#
+    to currents tests. (+CodeQL)
+  * Dependency: heavy_minus_sign This package no longer require
+    anything when used with Python 3.5 (Dropped cached_property)
+  * Docs: pencil2 Performance claims have been updated, the guide
+    to contributing, and the issue template.
+  * Improvement: sparkle Add --version argument to CLI
+  * Bugfix: bug The CLI output used the relative path of the
+    file(s). Should be absolute.
+  * Deprecation: red_circle Methods coherence_non_latin, w_counter,
+    chaos_secondary_pass of the class CharsetMatch are now
+    deprecated and scheduled for removal in v3.0
+  * Improvement: sparkle If no language was detected in content,
+    trying to infer it using the encoding name/alphabets used.
+  * Removal: fire Removed support for these languages: Catalan,
+    Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk,
+    Macedonian, and Serbocroatian.
+  * Improvement: sparkle utf_7 detection has been reinstated.
+  * Removal: fire The exception hook on UnicodeDecodeError has
+    been removed.
+- Update to version 1.4.1
+  * Improvement: art Logger configuration/usage no longer
+    conflict with others #44
+- Update to version 1.4.0
+  * Dependency: heavy_minus_sign Using standard logging instead
+    of using the package loguru.
+  * Dependency: heavy_minus_sign Dropping nose test framework in
+    favor of the maintained pytest.
+  * Dependency: heavy_minus_sign Choose to not use dragonmapper
+    package to help with gibberish Chinese/CJK text.
+  * Dependency: wrench heavy_minus_sign Require cached_property
+    only for Python 3.5 due to constraint. Dropping for every
+    other interpreter version.
+  * Bugfix: bug BOM marker in a CharsetNormalizerMatch instance
+    could be False in rare cases even if obviously present. Due
+    to the sub-match factoring process.
+  * Improvement: sparkler Return ASCII if given sequences fit.
+  * Performance: zap Huge improvement over the larges payload.
+  * Change: fire Stop support for UTF-7 that does not contain a
+    SIG. (Contributions are welcome to improve that point)
+  * Feature: sparkler CLI now produces JSON consumable output.
+  * Dependency: Dropping PrettyTable, replaced with pure JSON
+    output.
+  * Bugfix: bug Not searching properly for the BOM when trying
+    utf32/16 parent codec.
+  * Other: zap Improving the package final size by compressing
+    frequencies.json.
+
 -------------------------------------------------------------------
 Thu May 20 09:46:56 UTC 2021 - pgajdos@suse.com

--- a/python-charset-normalizer.spec
+++ b/python-charset-normalizer.spec
@ -19,14 +19,13 @@
 %{?!python_module:%define python_module() python-%{**} python3-%{**}}
 %define skip_python2 1
 Name:           python-charset-normalizer
-Version:        1.3.9
+Version:        2.0.7
 Release:        0
 Summary:        Python Universal Charset detector
 License:        MIT
 URL:            https://github.com/ousret/charset_normalizer
-Source:         https://files.pythonhosted.org/packages/source/c/charset_normalizer/charset_normalizer-%{version}.tar.gz
+Source:         https://github.com/Ousret/charset_normalizer/archive/refs/tags/%{version}.tar.gz#/charset_normalizer-%{version}.tar.gz
 BuildRequires:  %{python_module setuptools}
-BuildRequires:  dos2unix
 BuildRequires:  fdupes
 BuildRequires:  python-rpm-macros
 Requires:       python-PrettyTable
@ -45,6 +44,7 @@ BuildRequires:  %{python_module PrettyTable}
 BuildRequires:  %{python_module cached-property >= 1.5}
 BuildRequires:  %{python_module dragonmapper >= 0.2}
 BuildRequires:  %{python_module loguru >= 0.5}
+BuildRequires:  %{python_module pytest-cov}
 BuildRequires:  %{python_module pytest}
 BuildRequires:  %{python_module zhon}
 # /SECTION
@ -55,8 +55,6 @@ Python Universal Charset detector.

 %prep
 %setup -q -n charset_normalizer-%{version}
-dos2unix README.md
-chmod a-x charset_normalizer/assets/frequencies.json

 %build
 %python_build
@ -79,6 +77,6 @@ chmod a-x charset_normalizer/assets/frequencies.json
 %doc README.md
 %license LICENSE
 %python_alternative %{_bindir}/normalizer
-%{python_sitelib}/*
+%{python_sitelib}/charset_normalizer*

 %changelog