- Update to 3.4.1
* Project metadata are now stored using `pyproject.toml` instead of `setup.cfg` using setuptools as the build backend. * Enforce annotation delayed loading for a simpler and consistent types in the project. * Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8 * Added pre-commit configuration. * Added noxfile. * Removed `build-requirements.txt` as per using `pyproject.toml` native build configuration. * Removed `bin/integration.py` and `bin/serve.py` in favor of downstream integration test (see noxfile). * Removed `setup.cfg` in favor of `pyproject.toml` metadata configuration. * Removed unused `utils.range_scan` function. * Converting content to Unicode bytes may insert `utf_8` instead of preferred `utf-8`. (#572) * Deprecation warning "'count' is passed as positional argument" when converting to Unicode bytes on Python 3.13+ - Drop sed command to remove code coverage flags from pytest OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-charset-normalizer?expand=0&rev=49
This commit is contained in:
commit
7e521f133b
23
.gitattributes
vendored
Normal file
23
.gitattributes
vendored
Normal file
@ -0,0 +1,23 @@
|
||||
## Default LFS
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.bsp filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.gem filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.jar filter=lfs diff=lfs merge=lfs -text
|
||||
*.lz filter=lfs diff=lfs merge=lfs -text
|
||||
*.lzma filter=lfs diff=lfs merge=lfs -text
|
||||
*.obscpio filter=lfs diff=lfs merge=lfs -text
|
||||
*.oxt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pdf filter=lfs diff=lfs merge=lfs -text
|
||||
*.png filter=lfs diff=lfs merge=lfs -text
|
||||
*.rpm filter=lfs diff=lfs merge=lfs -text
|
||||
*.tbz filter=lfs diff=lfs merge=lfs -text
|
||||
*.tbz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.ttf filter=lfs diff=lfs merge=lfs -text
|
||||
*.txz filter=lfs diff=lfs merge=lfs -text
|
||||
*.whl filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
1
.gitignore
vendored
Normal file
1
.gitignore
vendored
Normal file
@ -0,0 +1 @@
|
||||
.osc
|
3
charset_normalizer-3.3.2.tar.gz
Normal file
3
charset_normalizer-3.3.2.tar.gz
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:9948e5c17831916ef192cf3f26c744d539eb6f4e9e3b02eea649552c52b10d91
|
||||
size 95792
|
BIN
charset_normalizer-3.4.0.tar.gz
(Stored with Git LFS)
Normal file
BIN
charset_normalizer-3.4.0.tar.gz
(Stored with Git LFS)
Normal file
Binary file not shown.
3
charset_normalizer-3.4.1.tar.gz
Normal file
3
charset_normalizer-3.4.1.tar.gz
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:5b91677d4c790eead798f4ed3e5f546ed80d6fe8cf1d8120939960d593371855
|
||||
size 97345
|
440
python-charset-normalizer.changes
Normal file
440
python-charset-normalizer.changes
Normal file
@ -0,0 +1,440 @@
|
||||
-------------------------------------------------------------------
|
||||
Thu Jan 9 09:05:30 UTC 2025 - John Paul Adrian Glaubitz <adrian.glaubitz@suse.com>
|
||||
|
||||
- Update to 3.4.1
|
||||
* Project metadata are now stored using `pyproject.toml` instead of
|
||||
`setup.cfg` using setuptools as the build backend.
|
||||
* Enforce annotation delayed loading for a simpler and consistent
|
||||
types in the project.
|
||||
* Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8
|
||||
* Added pre-commit configuration.
|
||||
* Added noxfile.
|
||||
* Removed `build-requirements.txt` as per using `pyproject.toml`
|
||||
native build configuration.
|
||||
* Removed `bin/integration.py` and `bin/serve.py` in favor of downstream
|
||||
integration test (see noxfile).
|
||||
* Removed `setup.cfg` in favor of `pyproject.toml` metadata configuration.
|
||||
* Removed unused `utils.range_scan` function.
|
||||
* Converting content to Unicode bytes may insert `utf_8` instead of
|
||||
preferred `utf-8`. (#572)
|
||||
* Deprecation warning "'count' is passed as positional argument" when
|
||||
converting to Unicode bytes on Python 3.13+
|
||||
- Drop sed command to remove code coverage flags from pytest
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Mon Oct 28 16:37:52 UTC 2024 - Dirk Müller <dmueller@suse.com>
|
||||
|
||||
- switch to PEP517 build
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Tue Oct 22 16:00:12 UTC 2024 - Dirk Müller <dmueller@suse.com>
|
||||
|
||||
- update to 3.4.0:
|
||||
* Argument `--no-preemptive` in the CLI to prevent the detector
|
||||
to search for hints.
|
||||
* Support for Python 3.13
|
||||
* Relax the TypeError exception thrown when trying to compare a
|
||||
CharsetMatch with anything else than a CharsetMatch.
|
||||
* Improved the general reliability of the detector based on
|
||||
user feedbacks. (#520) (#509) (#498) (#407)
|
||||
* Declared charset in content (preemptive detection) not
|
||||
changed when converting to utf-8 bytes.
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Sat Nov 25 14:12:18 UTC 2023 - Dirk Müller <dmueller@suse.com>
|
||||
|
||||
- update to 3.3.2:
|
||||
* Unintentional memory usage regression when using large
|
||||
payload that match several encoding (#376)
|
||||
* Regression on some detection case showcased in the
|
||||
documentation (#371)
|
||||
* Noise (md) probe that identify malformed arabic
|
||||
representation due to the presence of letters in isolated
|
||||
form
|
||||
* Optional mypyc compilation upgraded to version 1.6.1 for
|
||||
Python >= 3.8
|
||||
* Improved the general detection reliability based on reports
|
||||
from the community
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Mon Oct 2 09:07:47 UTC 2023 - Dirk Müller <dmueller@suse.com>
|
||||
|
||||
- update to 3.3.0:
|
||||
* Allow to execute the CLI (e.g. normalizer) through `python -m
|
||||
charset_normalizer.cli` or `python -m charset_normalizer`
|
||||
* Support for 9 forgotten encoding that are supported by Python
|
||||
but unlisted in `encoding.aliases` as they have no alias
|
||||
* Optional mypyc compilation upgraded to version 1.5.1 for
|
||||
Python >= 3.7
|
||||
* Unable to properly sort CharsetMatch when both chaos/noise
|
||||
and coherence were close due to an unreachable condition in
|
||||
\_\_lt\_\_ (#350)
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Tue Jul 11 13:22:52 UTC 2023 - Dirk Müller <dmueller@suse.com>
|
||||
|
||||
- update to 3.2.0:
|
||||
* Typehint for function `from_path` no longer enforce
|
||||
`PathLike` as its first argument
|
||||
* Minor improvement over the global detection reliability
|
||||
* Introduce function `is_binary` that relies on main
|
||||
capabilities, and optimized to detect binaries
|
||||
* Propagate `enable_fallback` argument throughout `from_bytes`,
|
||||
`from_path`, and `from_fp` that allow a deeper control over
|
||||
the detection (default True)
|
||||
* Edge case detection failure where a file would contain 'very-
|
||||
long' camel cased word (Issue #289)
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Fri Apr 21 12:31:23 UTC 2023 - Dirk Müller <dmueller@suse.com>
|
||||
|
||||
- add sle15_python_module_pythons (jsc#PED-68)
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Sun Mar 26 20:04:17 UTC 2023 - Dirk Müller <dmueller@suse.com>
|
||||
|
||||
- update to 3.1.0:
|
||||
* Argument `should_rename_legacy` for legacy function `detect`
|
||||
and disregard any new arguments without errors (PR #262)
|
||||
* Removed Support for Python 3.6 (PR #260)
|
||||
* Optional speedup provided by mypy/c 1.0.1
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Sat Dec 3 04:13:46 UTC 2022 - Yogalakshmi Arunachalam <yarunachalam@suse.com>
|
||||
|
||||
- Update to 3.0.1
|
||||
Fixed
|
||||
Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
|
||||
Changed
|
||||
Speedup provided by mypy/c 0.990 on Python >= 3.7
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Thu Oct 27 22:18:02 UTC 2022 - Yogalakshmi Arunachalam <yarunachalam@suse.com>
|
||||
|
||||
- Update to 3.0.0
|
||||
Added
|
||||
* Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||
Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||
Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio
|
||||
normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||
* Changed
|
||||
Build with static metadata using 'build' frontend
|
||||
Make the language detection stricter
|
||||
Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||
* Fixed
|
||||
CLI with opt --normalize fail when using full path for files
|
||||
TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||
Sphinx warnings when generating the documentation
|
||||
* Removed
|
||||
Coherence detector no longer return 'Simple English' instead return 'English'
|
||||
Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||
Breaking: Method first() and best() from CharsetMatch
|
||||
UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||
Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||
Breaking: Top-level function normalize
|
||||
Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
|
||||
Support for the backport unicodedata2
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Sat Sep 17 15:46:10 UTC 2022 - Dirk Müller <dmueller@suse.com>
|
||||
|
||||
- update to 2.1.1:
|
||||
* Function `normalize` scheduled for removal in 3.0
|
||||
* Removed useless call to decode in fn is_unprintable (#206)
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Thu Aug 18 15:34:14 UTC 2022 - Ben Greiner <code@bnavigator.de>
|
||||
|
||||
- Clean requirements: We don't need anything
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Tue Jul 19 11:38:48 UTC 2022 - Dirk Müller <dmueller@suse.com>
|
||||
|
||||
- update to 2.1.0:
|
||||
* Output the Unicode table version when running the CLI with `--version`
|
||||
* Re-use decoded buffer for single byte character sets
|
||||
* Fixing some performance bottlenecks
|
||||
* Workaround potential bug in cpython with Zero Width No-Break Space located
|
||||
* in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space
|
||||
* CLI default threshold aligned with the API threshold from
|
||||
* Support for Python 3.5 (PR #192)
|
||||
* Use of backport unicodedata from `unicodedata2` as Python is quickly
|
||||
catching up, scheduled for removal in 3.0
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Tue Feb 15 08:42:30 UTC 2022 - Dirk Müller <dmueller@suse.com>
|
||||
|
||||
- update to 2.0.12:
|
||||
* ASCII miss-detection on rare cases (PR #170)
|
||||
* Explicit support for Python 3.11 (PR #164)
|
||||
* The logging behavior have been completely reviewed, now using only TRACE
|
||||
and DEBUG levels
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Mon Jan 10 23:01:54 UTC 2022 - Dirk Müller <dmueller@suse.com>
|
||||
|
||||
- update to 2.0.10:
|
||||
* Fallback match entries might lead to UnicodeDecodeError for large bytes
|
||||
sequence
|
||||
* Skipping the language-detection (CD) on ASCII
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Mon Dec 6 20:08:41 UTC 2021 - Dirk Müller <dmueller@suse.com>
|
||||
|
||||
- update to 2.0.9:
|
||||
* Moderating the logging impact (since 2.0.8) for specific
|
||||
environments
|
||||
* Wrong logging level applied when setting kwarg `explain` to True
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Mon Nov 29 11:14:37 UTC 2021 - Dirk Müller <dmueller@suse.com>
|
||||
|
||||
- update to 2.0.8:
|
||||
* Improvement over Vietnamese detection
|
||||
* MD improvement on trailing data and long foreign (non-pure latin)
|
||||
* Efficiency improvements in cd/alphabet_languages
|
||||
* call sum() without an intermediary list following PEP 289 recommendations
|
||||
* Code style as refactored by Sourcery-AI
|
||||
* Minor adjustment on the MD around european words
|
||||
* Remove and replace SRTs from assets / tests
|
||||
* Initialize the library logger with a `NullHandler` by default
|
||||
* Setting kwarg `explain` to True will add provisionally
|
||||
* Fix large (misleading) sequence giving UnicodeDecodeError
|
||||
* Avoid using too insignificant chunk
|
||||
* Add and expose function `set_logging_handler` to configure a specific
|
||||
StreamHandler
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Fri Nov 26 11:35:25 UTC 2021 - Dirk Müller <dmueller@suse.com>
|
||||
|
||||
- require lower-case name instead of breaking build
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Thu Nov 25 22:26:52 UTC 2021 - Matej Cepl <mcepl@suse.com>
|
||||
|
||||
- Use lower-case name of prettytable package
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Sun Oct 17 14:01:59 UTC 2021 - Martin Hauke <mardnh@gmx.de>
|
||||
|
||||
- Update to version 2.0.7
|
||||
* Addition: bento Add support for Kazakh (Cyrillic) language
|
||||
detection
|
||||
* Improvement: sparkle Further improve inferring the language
|
||||
from a given code page (single-byte).
|
||||
* Removed: fire Remove redundant logging entry about detected
|
||||
language(s).
|
||||
* Improvement: zap Refactoring for potential performance
|
||||
improvements in loops.
|
||||
* Improvement: sparkles Various detection improvement (MD+CD).
|
||||
* Bugfix: bug Fix a minor inconsistency between Python 3.5 and
|
||||
other versions regarding language detection.
|
||||
- Update to version 2.0.6
|
||||
* Bugfix: bug Unforeseen regression with the loss of the
|
||||
backward-compatibility with some older minor of Python 3.5.x.
|
||||
* Bugfix: bug Fix CLI crash when using --minimal output in
|
||||
certain cases.
|
||||
* Improvement: sparkles Minor improvement to the detection
|
||||
efficiency (less than 1%).
|
||||
- Update to version 2.0.5
|
||||
* Improvement: sparkles The BC-support with v1.x was improved,
|
||||
the old staticmethods are restored.
|
||||
* Remove: fire The project no longer raise warning on tiny
|
||||
content given for detection, will be simply logged as warning
|
||||
instead.
|
||||
* Improvement: sparkles The Unicode detection is slightly
|
||||
improved, see #93
|
||||
* Bugfix: bug In some rare case, the chunks extractor could cut
|
||||
in the middle of a multi-byte character and could mislead the
|
||||
mess detection.
|
||||
* Bugfix: bug Some rare 'space' characters could trip up the
|
||||
UnprintablePlugin/Mess detection.
|
||||
* Improvement: art Add syntax sugar __bool__ for results
|
||||
CharsetMatches list-container.
|
||||
- Update to version 2.0.4
|
||||
* Improvement: sparkle Adjust the MD to lower the sensitivity,
|
||||
thus improving the global detection reliability.
|
||||
* Improvement: sparkle Allow fallback on specified encoding
|
||||
if any.
|
||||
* Bugfix: bug The CLI no longer raise an unexpected exception
|
||||
when no encoding has been found.
|
||||
* Bugfix: bug Fix accessing the 'alphabets' property when the
|
||||
payload contains surrogate characters.
|
||||
* Bugfix: bug pencil2 The logger could mislead (explain=True) on
|
||||
detected languages and the impact of one MBCS match (in #72)
|
||||
* Bugfix: bug Submatch factoring could be wrong in rare edge
|
||||
cases (in #72)
|
||||
* Bugfix: bug Multiple files given to the CLI were ignored when
|
||||
publishing results to STDOUT. (After the first path) (in #72)
|
||||
* Internal: art Fix line endings from CRLF to LF for certain
|
||||
files.
|
||||
- Update to version 2.0.3
|
||||
* Improvement: sparkles Part of the detection mechanism has been
|
||||
improved to be less sensitive, resulting in more accurate
|
||||
detection results. Especially ASCII. #63 Fix #62
|
||||
* Improvement: sparklesAccording to the community wishes, the
|
||||
detection will fall back on ASCII or UTF-8 in a last-resort
|
||||
case.
|
||||
- Update to version 2.0.2
|
||||
* Bugfix: bug Empty/Too small JSON payload miss-detection fixed.
|
||||
* Improvement: sparkler Don't inject unicodedata2 into sys.modules
|
||||
- Update to version 2.0.1
|
||||
* Bugfix: bug Make it work where there isn't a filesystem
|
||||
available, dropping assets frequencies.json.
|
||||
* Improvement: sparkles You may now use aliases in cp_isolation
|
||||
and cp_exclusion arguments.
|
||||
* Bugfix: bug Using explain=False permanently disable the verbose
|
||||
output in the current runtime #47
|
||||
* Bugfix: bug One log entry (language target preemptive) was not
|
||||
show in logs when using explain=True #47
|
||||
* Bugfix: bug Fix undesired exception (ValueError) on getitem of
|
||||
instance CharsetMatches #52
|
||||
* Improvement: wrench Public function normalize default args
|
||||
values were not aligned with from_bytes #53
|
||||
- Update to version 2.0.0
|
||||
* Performance: zap 4x to 5 times faster than the previous 1.4.0
|
||||
release.
|
||||
* Performance: zap At least 2x faster than Chardet.
|
||||
* Performance: zap Accent has been made on UTF-8 detection,
|
||||
should perform rather instantaneous.
|
||||
* Improvement: back The backward compatibility with Chardet has
|
||||
been greatly improved. The legacy detect function returns an
|
||||
identical charset name whenever possible.
|
||||
* Improvement: sparkle The detection mechanism has been slightly
|
||||
improved, now Turkish content is detected correctly (most of
|
||||
the time)
|
||||
* Code: art The program has been rewritten to ease the
|
||||
readability and maintainability. (+Using static typing)
|
||||
* Tests: heavy_check_mark New workflows are now in place to
|
||||
verify the following aspects: Performance, Backward-
|
||||
Compatibility with Chardet, and Detection Coverage in addition#
|
||||
to currents tests. (+CodeQL)
|
||||
* Dependency: heavy_minus_sign This package no longer require
|
||||
anything when used with Python 3.5 (Dropped cached_property)
|
||||
* Docs: pencil2 Performance claims have been updated, the guide
|
||||
to contributing, and the issue template.
|
||||
* Improvement: sparkle Add --version argument to CLI
|
||||
* Bugfix: bug The CLI output used the relative path of the
|
||||
file(s). Should be absolute.
|
||||
* Deprecation: red_circle Methods coherence_non_latin, w_counter,
|
||||
chaos_secondary_pass of the class CharsetMatch are now
|
||||
deprecated and scheduled for removal in v3.0
|
||||
* Improvement: sparkle If no language was detected in content,
|
||||
trying to infer it using the encoding name/alphabets used.
|
||||
* Removal: fire Removed support for these languages: Catalan,
|
||||
Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk,
|
||||
Macedonian, and Serbocroatian.
|
||||
* Improvement: sparkle utf_7 detection has been reinstated.
|
||||
* Removal: fire The exception hook on UnicodeDecodeError has
|
||||
been removed.
|
||||
- Update to version 1.4.1
|
||||
* Improvement: art Logger configuration/usage no longer
|
||||
conflict with others #44
|
||||
- Update to version 1.4.0
|
||||
* Dependency: heavy_minus_sign Using standard logging instead
|
||||
of using the package loguru.
|
||||
* Dependency: heavy_minus_sign Dropping nose test framework in
|
||||
favor of the maintained pytest.
|
||||
* Dependency: heavy_minus_sign Choose to not use dragonmapper
|
||||
package to help with gibberish Chinese/CJK text.
|
||||
* Dependency: wrench heavy_minus_sign Require cached_property
|
||||
only for Python 3.5 due to constraint. Dropping for every
|
||||
other interpreter version.
|
||||
* Bugfix: bug BOM marker in a CharsetNormalizerMatch instance
|
||||
could be False in rare cases even if obviously present. Due
|
||||
to the sub-match factoring process.
|
||||
* Improvement: sparkler Return ASCII if given sequences fit.
|
||||
* Performance: zap Huge improvement over the larges payload.
|
||||
* Change: fire Stop support for UTF-7 that does not contain a
|
||||
SIG. (Contributions are welcome to improve that point)
|
||||
* Feature: sparkler CLI now produces JSON consumable output.
|
||||
* Dependency: Dropping PrettyTable, replaced with pure JSON
|
||||
output.
|
||||
* Bugfix: bug Not searching properly for the BOM when trying
|
||||
utf32/16 parent codec.
|
||||
* Other: zap Improving the package final size by compressing
|
||||
frequencies.json.
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Thu May 20 09:46:56 UTC 2021 - pgajdos@suse.com
|
||||
|
||||
- version update to 1.3.9
|
||||
* Bugfix: bug In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload #40
|
||||
* Bugfix: bug Empty given payload for detection may cause an exception if trying to access the alphabets property. #39
|
||||
* Bugfix: bug The legacy detect function should return UTF-8-SIG if sig is present in the payload. #38
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Tue Feb 9 00:47:34 UTC 2021 - John Vandenberg <jayvdb@gmail.com>
|
||||
|
||||
- Switch to PyPI source
|
||||
- Add Suggests: python-unicodedata2
|
||||
- Remove executable bit from charset_normalizer/assets/frequencies.json
|
||||
- Update to v1.3.6
|
||||
* Allow prettytable 2.0
|
||||
- from v1.3.5
|
||||
* Dependencies refactor and add support for py 3.9 and 3.10
|
||||
* Fix version parsing
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Mon May 25 10:59:12 UTC 2020 - Petr Gajdos <pgajdos@suse.com>
|
||||
|
||||
- %python3_only -> %python_alternative
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Mon Jan 27 09:09:27 UTC 2020 - Marketa Calabkova <mcalabkova@suse.com>
|
||||
|
||||
- Update to 1.3.4
|
||||
* Improvement/Bugfix : False positive when searching for successive upper, lower char. (ProbeChaos)
|
||||
* Improvement : Noticeable better detection for jp
|
||||
* Bugfix : Passing zero-length bytes to from_bytes
|
||||
* Improvement : Expose version in package
|
||||
* Bugfix : Division by zero
|
||||
* Improvement : Prefers unicode (utf-8) when detected
|
||||
* Apparently dropped Python2 silently
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Fri Oct 4 08:52:51 UTC 2019 - Marketa Calabkova <mcalabkova@suse.com>
|
||||
|
||||
- Update to 1.3.0
|
||||
* Backport unicodedata for v12 impl into python if available
|
||||
* Add aliases to CharsetNormalizerMatches class
|
||||
* Add feature preemptive behaviour, looking for encoding declaration
|
||||
* Add method to determine if specific encoding is multi byte
|
||||
* Add has_submatch property on a match
|
||||
* Add percent_chaos and percent_coherence
|
||||
* Coherence ratio based on mean instead of sum of best results
|
||||
* Using loguru for trace/debug <3
|
||||
* from_byte method improved
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Thu Sep 26 10:35:51 UTC 2019 - Tomáš Chvátal <tchvatal@suse.com>
|
||||
|
||||
- Update to 1.1.1:
|
||||
* from_bytes parameters steps and chunk_size were not adapted to sequence len if provided values were not fitted to content
|
||||
* Sequence having lenght bellow 10 chars was not checked
|
||||
* Legacy detect method inspired by chardet was not returning
|
||||
* Various more test updates
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Fri Sep 13 11:05:06 UTC 2019 - Tomáš Chvátal <tchvatal@suse.com>
|
||||
|
||||
- Update to 0.3:
|
||||
* Improvement on detection
|
||||
* Performance loss to expect
|
||||
* Added --threshold option to CLI
|
||||
* Bugfix on UTF 7 support
|
||||
* Legacy detect(byte_str) method
|
||||
* BOM support (Unicode mostly)
|
||||
* Chaos prober improved on small text
|
||||
* Language detection has been reviewed to give better result
|
||||
* Bugfix on jp detection, every jp text was considered chaotic
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Fri Aug 30 00:46:27 UTC 2019 - Tomáš Chvátal <tchvatal@suse.com>
|
||||
|
||||
- Fix the tarball to really be the one published by upstream
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Tue Aug 28 06:29:02 PM UTC 2019 - John Vandenberg <jayvdb@gmail.com>
|
||||
|
||||
- Initial spec for v0.1.8
|
72
python-charset-normalizer.spec
Normal file
72
python-charset-normalizer.spec
Normal file
@ -0,0 +1,72 @@
|
||||
#
|
||||
# spec file for package python-charset-normalizer
|
||||
#
|
||||
# Copyright (c) 2025 SUSE LLC
|
||||
#
|
||||
# All modifications and additions to the file contributed by third parties
|
||||
# remain the property of their copyright owners, unless otherwise agreed
|
||||
# upon. The license for this file, and modifications and additions to the
|
||||
# file, is the same license as for the pristine package itself (unless the
|
||||
# license for the pristine package is not an Open Source License, in which
|
||||
# case the license is the MIT License). An "Open Source License" is a
|
||||
# license that conforms to the Open Source Definition (Version 1.9)
|
||||
# published by the Open Source Initiative.
|
||||
|
||||
# Please submit bugfixes or comments via https://bugs.opensuse.org/
|
||||
#
|
||||
|
||||
|
||||
%{?sle15_python_module_pythons}
|
||||
Name: python-charset-normalizer
|
||||
Version: 3.4.1
|
||||
Release: 0
|
||||
Summary: Python Universal Charset detector
|
||||
License: MIT
|
||||
URL: https://github.com/ousret/charset_normalizer
|
||||
Source: https://github.com/Ousret/charset_normalizer/archive/refs/tags/%{version}.tar.gz#/charset_normalizer-%{version}.tar.gz
|
||||
BuildRequires: %{python_module base >= 3.7}
|
||||
BuildRequires: %{python_module pip}
|
||||
BuildRequires: %{python_module setuptools}
|
||||
BuildRequires: %{python_module wheel}
|
||||
BuildRequires: fdupes
|
||||
BuildRequires: python-rpm-macros
|
||||
Requires(post): update-alternatives
|
||||
Requires(postun): update-alternatives
|
||||
Suggests: python-unicodedata2
|
||||
BuildArch: noarch
|
||||
# SECTION test requirements
|
||||
BuildRequires: %{python_module pytest}
|
||||
# /SECTION
|
||||
%python_subpackages
|
||||
|
||||
%description
|
||||
Python Universal Charset detector.
|
||||
|
||||
%prep
|
||||
%setup -q -n charset_normalizer-%{version}
|
||||
|
||||
%build
|
||||
%pyproject_wheel
|
||||
|
||||
%install
|
||||
%pyproject_install
|
||||
%python_clone -a %{buildroot}%{_bindir}/normalizer
|
||||
%python_expand %fdupes %{buildroot}%{$python_sitelib}
|
||||
|
||||
%check
|
||||
%pytest
|
||||
|
||||
%post
|
||||
%python_install_alternative normalizer
|
||||
|
||||
%postun
|
||||
%python_uninstall_alternative normalizer
|
||||
|
||||
%files %{python_files}
|
||||
%doc README.md
|
||||
%license LICENSE
|
||||
%python_alternative %{_bindir}/normalizer
|
||||
%{python_sitelib}/charset_normalizer
|
||||
%{python_sitelib}/charset_normalizer-%{version}*-info
|
||||
|
||||
%changelog
|
Loading…
Reference in New Issue
Block a user