Accepting request 925848 from home:mnhauke
- Update to version 2.0.7 * Addition: bento Add support for Kazakh (Cyrillic) language detection * Improvement: sparkle Further improve inferring the language from a given code page (single-byte). * Removed: fire Remove redundant logging entry about detected language(s). * Improvement: zap Refactoring for potential performance improvements in loops. * Improvement: sparkles Various detection improvement (MD+CD). * Bugfix: bug Fix a minor inconsistency between Python 3.5 and other versions regarding language detection. - Update to version 2.0.6 * Bugfix: bug Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x. * Bugfix: bug Fix CLI crash when using --minimal output in certain cases. * Improvement: sparkles Minor improvement to the detection efficiency (less than 1%). - Update to version 2.0.5 * Improvement: sparkles The BC-support with v1.x was improved, the old staticmethods are restored. * Remove: fire The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead. * Improvement: sparkles The Unicode detection is slightly improved, see #93 * Bugfix: bug In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection. OBS-URL: https://build.opensuse.org/request/show/925848 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=18
This commit is contained in:
parent
d5d2d1f9e5
commit
fd5f5dc1f2
@ -1,3 +0,0 @@
|
|||||||
version https://git-lfs.github.com/spec/v1
|
|
||||||
oid sha256:54425d9436c1cff46dfbb6b6598ac0a4c2d7b003d4787ab7daaf64528e458ed8
|
|
||||||
size 347681
|
|
3
charset_normalizer-2.0.7.tar.gz
Normal file
3
charset_normalizer-2.0.7.tar.gz
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:6473e80f73f5918254953073798a367f120cc5717e70c917359e155901c0e2d0
|
||||||
|
size 369094
|
@ -1,3 +1,144 @@
|
|||||||
|
-------------------------------------------------------------------
|
||||||
|
Sun Oct 17 14:01:59 UTC 2021 - Martin Hauke <mardnh@gmx.de>
|
||||||
|
|
||||||
|
- Update to version 2.0.7
|
||||||
|
* Addition: bento Add support for Kazakh (Cyrillic) language
|
||||||
|
detection
|
||||||
|
* Improvement: sparkle Further improve inferring the language
|
||||||
|
from a given code page (single-byte).
|
||||||
|
* Removed: fire Remove redundant logging entry about detected
|
||||||
|
language(s).
|
||||||
|
* Improvement: zap Refactoring for potential performance
|
||||||
|
improvements in loops.
|
||||||
|
* Improvement: sparkles Various detection improvement (MD+CD).
|
||||||
|
* Bugfix: bug Fix a minor inconsistency between Python 3.5 and
|
||||||
|
other versions regarding language detection.
|
||||||
|
- Update to version 2.0.6
|
||||||
|
* Bugfix: bug Unforeseen regression with the loss of the
|
||||||
|
backward-compatibility with some older minor of Python 3.5.x.
|
||||||
|
* Bugfix: bug Fix CLI crash when using --minimal output in
|
||||||
|
certain cases.
|
||||||
|
* Improvement: sparkles Minor improvement to the detection
|
||||||
|
efficiency (less than 1%).
|
||||||
|
- Update to version 2.0.5
|
||||||
|
* Improvement: sparkles The BC-support with v1.x was improved,
|
||||||
|
the old staticmethods are restored.
|
||||||
|
* Remove: fire The project no longer raise warning on tiny
|
||||||
|
content given for detection, will be simply logged as warning
|
||||||
|
instead.
|
||||||
|
* Improvement: sparkles The Unicode detection is slightly
|
||||||
|
improved, see #93
|
||||||
|
* Bugfix: bug In some rare case, the chunks extractor could cut
|
||||||
|
in the middle of a multi-byte character and could mislead the
|
||||||
|
mess detection.
|
||||||
|
* Bugfix: bug Some rare 'space' characters could trip up the
|
||||||
|
UnprintablePlugin/Mess detection.
|
||||||
|
* Improvement: art Add syntax sugar __bool__ for results
|
||||||
|
CharsetMatches list-container.
|
||||||
|
- Update to version 2.0.4
|
||||||
|
* Improvement: sparkle Adjust the MD to lower the sensitivity,
|
||||||
|
thus improving the global detection reliability.
|
||||||
|
* Improvement: sparkle Allow fallback on specified encoding
|
||||||
|
if any.
|
||||||
|
* Bugfix: bug The CLI no longer raise an unexpected exception
|
||||||
|
when no encoding has been found.
|
||||||
|
* Bugfix: bug Fix accessing the 'alphabets' property when the
|
||||||
|
payload contains surrogate characters.
|
||||||
|
* Bugfix: bug pencil2 The logger could mislead (explain=True) on
|
||||||
|
detected languages and the impact of one MBCS match (in #72)
|
||||||
|
* Bugfix: bug Submatch factoring could be wrong in rare edge
|
||||||
|
cases (in #72)
|
||||||
|
* Bugfix: bug Multiple files given to the CLI were ignored when
|
||||||
|
publishing results to STDOUT. (After the first path) (in #72)
|
||||||
|
* Internal: art Fix line endings from CRLF to LF for certain
|
||||||
|
files.
|
||||||
|
- Update to version 2.0.3
|
||||||
|
* Improvement: sparkles Part of the detection mechanism has been
|
||||||
|
improved to be less sensitive, resulting in more accurate
|
||||||
|
detection results. Especially ASCII. #63 Fix #62
|
||||||
|
* Improvement: sparklesAccording to the community wishes, the
|
||||||
|
detection will fall back on ASCII or UTF-8 in a last-resort
|
||||||
|
case.
|
||||||
|
- Update to version 2.0.2
|
||||||
|
* Bugfix: bug Empty/Too small JSON payload miss-detection fixed.
|
||||||
|
* Improvement: sparkler Don't inject unicodedata2 into sys.modules
|
||||||
|
- Update to version 2.0.1
|
||||||
|
* Bugfix: bug Make it work where there isn't a filesystem
|
||||||
|
available, dropping assets frequencies.json.
|
||||||
|
* Improvement: sparkles You may now use aliases in cp_isolation
|
||||||
|
and cp_exclusion arguments.
|
||||||
|
* Bugfix: bug Using explain=False permanently disable the verbose
|
||||||
|
output in the current runtime #47
|
||||||
|
* Bugfix: bug One log entry (language target preemptive) was not
|
||||||
|
show in logs when using explain=True #47
|
||||||
|
* Bugfix: bug Fix undesired exception (ValueError) on getitem of
|
||||||
|
instance CharsetMatches #52
|
||||||
|
* Improvement: wrench Public function normalize default args
|
||||||
|
values were not aligned with from_bytes #53
|
||||||
|
- Update to version 2.0.0
|
||||||
|
* Performance: zap 4x to 5 times faster than the previous 1.4.0
|
||||||
|
release.
|
||||||
|
* Performance: zap At least 2x faster than Chardet.
|
||||||
|
* Performance: zap Accent has been made on UTF-8 detection,
|
||||||
|
should perform rather instantaneous.
|
||||||
|
* Improvement: back The backward compatibility with Chardet has
|
||||||
|
been greatly improved. The legacy detect function returns an
|
||||||
|
identical charset name whenever possible.
|
||||||
|
* Improvement: sparkle The detection mechanism has been slightly
|
||||||
|
improved, now Turkish content is detected correctly (most of
|
||||||
|
the time)
|
||||||
|
* Code: art The program has been rewritten to ease the
|
||||||
|
readability and maintainability. (+Using static typing)
|
||||||
|
* Tests: heavy_check_mark New workflows are now in place to
|
||||||
|
verify the following aspects: Performance, Backward-
|
||||||
|
Compatibility with Chardet, and Detection Coverage in addition#
|
||||||
|
to currents tests. (+CodeQL)
|
||||||
|
* Dependency: heavy_minus_sign This package no longer require
|
||||||
|
anything when used with Python 3.5 (Dropped cached_property)
|
||||||
|
* Docs: pencil2 Performance claims have been updated, the guide
|
||||||
|
to contributing, and the issue template.
|
||||||
|
* Improvement: sparkle Add --version argument to CLI
|
||||||
|
* Bugfix: bug The CLI output used the relative path of the
|
||||||
|
file(s). Should be absolute.
|
||||||
|
* Deprecation: red_circle Methods coherence_non_latin, w_counter,
|
||||||
|
chaos_secondary_pass of the class CharsetMatch are now
|
||||||
|
deprecated and scheduled for removal in v3.0
|
||||||
|
* Improvement: sparkle If no language was detected in content,
|
||||||
|
trying to infer it using the encoding name/alphabets used.
|
||||||
|
* Removal: fire Removed support for these languages: Catalan,
|
||||||
|
Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk,
|
||||||
|
Macedonian, and Serbocroatian.
|
||||||
|
* Improvement: sparkle utf_7 detection has been reinstated.
|
||||||
|
* Removal: fire The exception hook on UnicodeDecodeError has
|
||||||
|
been removed.
|
||||||
|
- Update to version 1.4.1
|
||||||
|
* Improvement: art Logger configuration/usage no longer
|
||||||
|
conflict with others #44
|
||||||
|
- Update to version 1.4.0
|
||||||
|
* Dependency: heavy_minus_sign Using standard logging instead
|
||||||
|
of using the package loguru.
|
||||||
|
* Dependency: heavy_minus_sign Dropping nose test framework in
|
||||||
|
favor of the maintained pytest.
|
||||||
|
* Dependency: heavy_minus_sign Choose to not use dragonmapper
|
||||||
|
package to help with gibberish Chinese/CJK text.
|
||||||
|
* Dependency: wrench heavy_minus_sign Require cached_property
|
||||||
|
only for Python 3.5 due to constraint. Dropping for every
|
||||||
|
other interpreter version.
|
||||||
|
* Bugfix: bug BOM marker in a CharsetNormalizerMatch instance
|
||||||
|
could be False in rare cases even if obviously present. Due
|
||||||
|
to the sub-match factoring process.
|
||||||
|
* Improvement: sparkler Return ASCII if given sequences fit.
|
||||||
|
* Performance: zap Huge improvement over the larges payload.
|
||||||
|
* Change: fire Stop support for UTF-7 that does not contain a
|
||||||
|
SIG. (Contributions are welcome to improve that point)
|
||||||
|
* Feature: sparkler CLI now produces JSON consumable output.
|
||||||
|
* Dependency: Dropping PrettyTable, replaced with pure JSON
|
||||||
|
output.
|
||||||
|
* Bugfix: bug Not searching properly for the BOM when trying
|
||||||
|
utf32/16 parent codec.
|
||||||
|
* Other: zap Improving the package final size by compressing
|
||||||
|
frequencies.json.
|
||||||
|
|
||||||
-------------------------------------------------------------------
|
-------------------------------------------------------------------
|
||||||
Thu May 20 09:46:56 UTC 2021 - pgajdos@suse.com
|
Thu May 20 09:46:56 UTC 2021 - pgajdos@suse.com
|
||||||
|
|
||||||
|
@ -19,14 +19,13 @@
|
|||||||
%{?!python_module:%define python_module() python-%{**} python3-%{**}}
|
%{?!python_module:%define python_module() python-%{**} python3-%{**}}
|
||||||
%define skip_python2 1
|
%define skip_python2 1
|
||||||
Name: python-charset-normalizer
|
Name: python-charset-normalizer
|
||||||
Version: 1.3.9
|
Version: 2.0.7
|
||||||
Release: 0
|
Release: 0
|
||||||
Summary: Python Universal Charset detector
|
Summary: Python Universal Charset detector
|
||||||
License: MIT
|
License: MIT
|
||||||
URL: https://github.com/ousret/charset_normalizer
|
URL: https://github.com/ousret/charset_normalizer
|
||||||
Source: https://files.pythonhosted.org/packages/source/c/charset_normalizer/charset_normalizer-%{version}.tar.gz
|
Source: https://github.com/Ousret/charset_normalizer/archive/refs/tags/%{version}.tar.gz#/charset_normalizer-%{version}.tar.gz
|
||||||
BuildRequires: %{python_module setuptools}
|
BuildRequires: %{python_module setuptools}
|
||||||
BuildRequires: dos2unix
|
|
||||||
BuildRequires: fdupes
|
BuildRequires: fdupes
|
||||||
BuildRequires: python-rpm-macros
|
BuildRequires: python-rpm-macros
|
||||||
Requires: python-PrettyTable
|
Requires: python-PrettyTable
|
||||||
@ -45,6 +44,7 @@ BuildRequires: %{python_module PrettyTable}
|
|||||||
BuildRequires: %{python_module cached-property >= 1.5}
|
BuildRequires: %{python_module cached-property >= 1.5}
|
||||||
BuildRequires: %{python_module dragonmapper >= 0.2}
|
BuildRequires: %{python_module dragonmapper >= 0.2}
|
||||||
BuildRequires: %{python_module loguru >= 0.5}
|
BuildRequires: %{python_module loguru >= 0.5}
|
||||||
|
BuildRequires: %{python_module pytest-cov}
|
||||||
BuildRequires: %{python_module pytest}
|
BuildRequires: %{python_module pytest}
|
||||||
BuildRequires: %{python_module zhon}
|
BuildRequires: %{python_module zhon}
|
||||||
# /SECTION
|
# /SECTION
|
||||||
@ -55,8 +55,6 @@ Python Universal Charset detector.
|
|||||||
|
|
||||||
%prep
|
%prep
|
||||||
%setup -q -n charset_normalizer-%{version}
|
%setup -q -n charset_normalizer-%{version}
|
||||||
dos2unix README.md
|
|
||||||
chmod a-x charset_normalizer/assets/frequencies.json
|
|
||||||
|
|
||||||
%build
|
%build
|
||||||
%python_build
|
%python_build
|
||||||
@ -79,6 +77,6 @@ chmod a-x charset_normalizer/assets/frequencies.json
|
|||||||
%doc README.md
|
%doc README.md
|
||||||
%license LICENSE
|
%license LICENSE
|
||||||
%python_alternative %{_bindir}/normalizer
|
%python_alternative %{_bindir}/normalizer
|
||||||
%{python_sitelib}/*
|
%{python_sitelib}/charset_normalizer*
|
||||||
|
|
||||||
%changelog
|
%changelog
|
||||||
|
Loading…
x
Reference in New Issue
Block a user