- Update to version 5.6
* The unescape_html function now supports all the HTML5 entities
that appear in html.entities.html5, including those with long
names such as ˝.
* Unescaping of numeric HTML entities now uses the standard library's
html.unescape, making edge cases consistent.
* On top of Python's support for HTML5 entities, ftfy will also
convert HTML escapes of common Latin capital letters that are
(nonstandardly) written in all caps, such as Ñ for Ñ.
OBS-URL: https://build.opensuse.org/request/show/722677
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-ftfy?expand=0&rev=9
232 lines
12 KiB
Plaintext
232 lines
12 KiB
Plaintext
-------------------------------------------------------------------
|
||
Mon Aug 12 12:31:18 UTC 2019 - Marketa Calabkova <mcalabkova@suse.com>
|
||
|
||
- Update to version 5.6
|
||
* The unescape_html function now supports all the HTML5 entities
|
||
that appear in html.entities.html5, including those with long
|
||
names such as ˝.
|
||
* Unescaping of numeric HTML entities now uses the standard library's
|
||
html.unescape, making edge cases consistent.
|
||
* On top of Python's support for HTML5 entities, ftfy will also
|
||
convert HTML escapes of common Latin capital letters that are
|
||
(nonstandardly) written in all caps, such as &NTILDE; for Ñ.
|
||
|
||
-------------------------------------------------------------------
|
||
Thu Oct 18 09:57:30 UTC 2018 - Tomáš Chvátal <tchvatal@suse.com>
|
||
|
||
- Update to version 5.5.1:
|
||
* Fixes build on python3.7
|
||
* Use Unicode 11
|
||
|
||
-------------------------------------------------------------------
|
||
Sun Jul 29 11:07:27 UTC 2018 - jengelh@inai.de
|
||
|
||
- Use noun phrase in summary. Trim filler wording from description.
|
||
|
||
-------------------------------------------------------------------
|
||
Wed May 16 16:10:48 UTC 2018 - toddrme2178@gmail.com
|
||
|
||
- Update to Version 5.3 (January 25, 2018)
|
||
* A heuristic has been too conservative since version 4.2, causing a regression
|
||
compared to previous versions: ftfy would fail to fix mojibake of common
|
||
characters such as `á` when seen in isolation. A new heuristic now makes it
|
||
possible to fix more of these common cases with less evidence.
|
||
- Update to Version 5.2 (November 27, 2017)
|
||
* The command-line tool will not accept the same filename as its input
|
||
and output. (Previously, this would write a zero-length file.)
|
||
* The `uncurl_quotes` fixer, which replaces curly quotes with straight quotes,
|
||
now also replaces MODIFIER LETTER APOSTROPHE.
|
||
* Codepoints that contain two Latin characters crammed together for legacy
|
||
encoding reasons are replaced by those two separate characters, even in NFC
|
||
mode. We formerly did this just with ligatures such as `fi` and `IJ`, but now
|
||
this includes the Afrikaans digraph `ʼn` and Serbian/Croatian digraphs such as
|
||
`dž`.
|
||
- Update to Version 5.1.1 and 4.4.3 (May 15, 2017)
|
||
- These releases fix two unrelated problems with the tests, one in each version.
|
||
* v5.1.1: fixed the CLI tests (which are new in v5) so that they pass
|
||
on Windows, as long as the Python output encoding is UTF-8.
|
||
* v4.4.3: added the `# coding: utf-8` declaration to two files that were
|
||
missing it, so that tests can run on Python 2.
|
||
- Update to Version 5.1 (April 7, 2017)
|
||
* Removed the dependency on `html5lib` by dropping support for Python 3.2.
|
||
We previously used the dictionary `html5lib.constants.entities` to decode
|
||
HTML entities. In Python 3.3 and later, that exact dictionary is now in the
|
||
standard library as `html.entities.html5`.
|
||
* Moved many test cases about how particular text should be fixed into
|
||
`test_cases.json`, which may ease porting to other languages.
|
||
- Update to Version 5.0.2 and 4.4.2 (March 21, 2017)
|
||
* Added a `MANIFEST.in` that puts files such as the license file and this
|
||
changelog inside the source distribution.
|
||
- Update to Version 5.0.1 and 4.4.1 (March 10, 2017)
|
||
- Bug fix:
|
||
* The `unescape_html` fixer will decode entities between `€` and `Ÿ`
|
||
as what they would be in Windows-1252, even without the help of
|
||
`fix_encoding`.
|
||
This better matches what Web browsers do, and fixes a regression that version
|
||
4.4 introduced in an example that uses `…` as an ellipsis.
|
||
- Update to Version 5.0 (February 17, 2017)
|
||
- Breaking changes:
|
||
* Dropped support for Python 2. If you need Python 2 support, you should get
|
||
version 4.4, which has the same features as this version.
|
||
* The top-level functions require their arguments to be given as keyword
|
||
arguments.
|
||
- Update to Version 4.4.0 (February 17, 2017)
|
||
- Heuristic changes:
|
||
* ftfy can now fix mojibake involving the Windows-1250 or ISO-8859-2 encodings.
|
||
* The `fix_entities` fixer is now applied after `fix_encoding`. This makes
|
||
more situations resolvable when both fixes are needed.
|
||
* With a few exceptions for commonly-used characters such as `^`, it is now
|
||
considered "weird" whenever a diacritic appears in non-combining form,
|
||
such as the diaeresis character `¨`.
|
||
* It is also now weird when IPA phonetic letters, besides `ə`, appear next to
|
||
capital letters.
|
||
* These changes to the heuristics, and others we've made in recent versions,
|
||
let us lower the "cost" for fixing mojibake in some encodings, causing them
|
||
to be fixed in more cases.
|
||
- Update to Version 4.3.1 (January 12, 2017)
|
||
- Bug fix:
|
||
* `remove_control_chars` was removing U+0D ('\r') prematurely. That's the
|
||
job of `fix_line_breaks`.
|
||
- Update to Version 4.3.0 (December 29, 2016)
|
||
* This version now depends on the `html5lib` and `wcwidth` libraries.
|
||
- Feature changes:
|
||
* The `remove_control_chars` fixer will now remove some non-ASCII control
|
||
characters as well, such as deprecated Arabic control characters and
|
||
byte-order marks. Bidirectional controls are still left as is.
|
||
This should have no impact on well-formed text, while cleaning up many
|
||
characters that the Unicode Consortium deems "not suitable for markup"
|
||
(see Unicode Technical Report #20).
|
||
* The `unescape_html` fixer uses a more thorough list of HTML entities,
|
||
which it imports from `html5lib`.
|
||
* `ftfy.formatting` now uses `wcwidth` to compute the width that a string
|
||
will occupy in a text console.
|
||
- Heuristic changes:
|
||
* Updated the data file of Unicode character categories to Unicode 9, as used
|
||
in Python 3.6.0. (No matter what version of Python you're on, ftfy uses the
|
||
same data.)
|
||
- Pending deprecations:
|
||
* The `remove_bom` option will become deprecated in 5.0, because it has been
|
||
superseded by `remove_control_chars`.
|
||
* ftfy 5.0 will remove the previously deprecated name `fix_text_encoding`. It
|
||
was renamed to `fix_encoding` in 4.0.
|
||
* ftfy 5.0 will require Python 3.2 or later, as planned. Python 2 users, please
|
||
specify `ftfy < 5` in your dependencies if you haven't already.
|
||
- Update to Version 4.2.0 (September 28, 2016)
|
||
- Heuristic changes:
|
||
* Math symbols next to currency symbols are no longer considered 'weird' by the
|
||
heuristic. This fixes a false positive where text that involved the
|
||
multiplication sign and British pounds or euros (as in '5×£35') could turn
|
||
into Hebrew letters.
|
||
* A heuristic that used to be a bonus for certain punctuation now also gives a
|
||
bonus to successfully decoding other common codepoints, such as the
|
||
non-breaking space, the degree sign, and the byte order mark.
|
||
* In version 4.0, we tried to "future-proof" the categorization of emoji (as a
|
||
kind of symbol) to include codepoints that would likely be assigned to emoji
|
||
later. The future happened, and there are even more emoji than we expected.
|
||
We have expanded the range to include those emoji, too.
|
||
ftfy is still mostly based on information from Unicode 8 (as Python 3.5 is),
|
||
but this expanded range should include the emoji from Unicode 9 and 10.
|
||
* Emoji are increasingly being modified by variation selectors and skin-tone
|
||
modifiers. Those codepoints are now grouped with 'symbols' in ftfy, so they
|
||
fit right in with emoji, instead of being considered 'marks' as their Unicode
|
||
category would suggest.
|
||
This enables fixing mojibake that involves iOS's new diverse emoji.
|
||
* An old heuristic that wasn't necessary anymore considered Latin text with
|
||
high-numbered codepoints to be 'weird', but this is normal in languages such
|
||
as Vietnamese and Azerbaijani. This does not seem to have caused any false
|
||
positives, but it caused ftfy to be too reluctant to fix some cases of broken
|
||
text in those languages.
|
||
The heuristic has been changed, and all languages that use Latin letters
|
||
should be on even footing now.
|
||
- Update to Version 4.1.1 (April 13, 2016)
|
||
* Bug fix: in the command-line interface, the `-e` option had no effect on
|
||
Python 3 when using standard input. Now, it correctly lets you specify
|
||
a different encoding for standard input.
|
||
- Update to Version 4.1.0 (February 25, 2016)
|
||
- Heuristic changes:
|
||
* ftfy can now deal with "lossy" mojibake. If your text has been run through
|
||
a strict Windows-1252 decoder, such as the one in Python, it may contain
|
||
the replacement character <20> (U+FFFD) where there were bytes that are
|
||
unassigned in Windows-1252.
|
||
Although ftfy won't recover the lost information, it can now detect this
|
||
situation, replace the entire lossy character with <20>, and decode the rest of
|
||
the characters. Previous versions would be unable to fix any string that
|
||
contained U+FFFD.
|
||
As an example, text in curly quotes that gets corrupted `“ like this â€<C3A2>`
|
||
now gets fixed to be `“ like this <20>`.
|
||
* Updated the data file of Unicode character categories to Unicode 8.0, as used
|
||
in Python 3.5.0. (No matter what version of Python you're on, ftfy uses the
|
||
same data.)
|
||
* Heuristics now count characters such as `~` and `^` as punctuation instead
|
||
of wacky math symbols, improving the detection of mojibake in some edge cases.
|
||
- New features:
|
||
* A new module, `ftfy.formatting`, can be used to justify Unicode text in a
|
||
monospaced terminal. It takes into account that each character can take up
|
||
anywhere from 0 to 2 character cells.
|
||
* Internally, the `utf-8-variants` codec was simplified and optimized.
|
||
- Update to Version 4.0.0 (April 10, 2015)
|
||
- Breaking changes:
|
||
* The default normalization form is now NFC, not NFKC. NFKC replaces a large
|
||
number of characters with 'equivalent' characters, and some of these
|
||
replacements are useful, but some are not desirable to do by default.
|
||
* The `fix_text` function has some new options that perform more targeted
|
||
operations that are part of NFKC normalization, such as
|
||
`fix_character_width`, without requiring hitting all your text with the huge
|
||
mallet that is NFKC.
|
||
* The `remove_unsafe_private_use` parameter has been removed entirely, after
|
||
two versions of deprecation. The function name `fix_bad_encoding` is also
|
||
gone.
|
||
- New features:
|
||
* Fixers for strange new forms of mojibake, including particularly clear cases
|
||
of mixed UTF-8 and Windows-1252.
|
||
* New heuristics, so that ftfy can fix more stuff, while maintaining
|
||
approximately zero false positives.
|
||
* The command-line tool trusts you to know what encoding your *input* is in,
|
||
and assumes UTF-8 by default. You can still tell it to guess with the `-g`
|
||
option.
|
||
* The command-line tool can be configured with options, and can be used as a
|
||
pipe.
|
||
* Recognizes characters that are new in Unicode 7.0, as well as emoji from
|
||
Unicode 8.0+ that may already be in use on iOS.
|
||
- Deprecations:
|
||
* `fix_text_encoding` is being renamed again, for conciseness and consistency.
|
||
It's now simply called `fix_encoding`. The name `fix_text_encoding` is
|
||
available but emits a warning.
|
||
- Pending deprecations:
|
||
* Python 2.6 support is largely coincidental.
|
||
* Python 2.7 support is on notice. If you use Python 2, be sure to pin a
|
||
version of ftfy less than 5.0 in your requirements.
|
||
|
||
- Implement single-spec version
|
||
|
||
-------------------------------------------------------------------
|
||
Mon Jul 13 13:12:38 UTC 2015 - toddrme2178@gmail.com
|
||
|
||
- Fix building on SLES 11
|
||
|
||
-------------------------------------------------------------------
|
||
Thu May 7 07:07:50 UTC 2015 - jweberhofer@weberhofer.at
|
||
|
||
- Use the tar-ball from pypi.python.org
|
||
|
||
-------------------------------------------------------------------
|
||
Mon May 4 15:04:36 UTC 2015 - jweberhofer@weberhofer.at
|
||
|
||
- Updated to version 3.4.0
|
||
|
||
* ftfy.fixes.fix_surrogates will fix all 16-bit surrogate codepoints, which
|
||
would otherwise break various encoding and output functions.
|
||
|
||
* remove_unsafe_private_use emits a warning, and will disappear in the next
|
||
minor or major version.
|
||
|
||
- Updated to version 3.3.1
|
||
|
||
* restores compatibility with Python 2.6.
|
||
|
||
-------------------------------------------------------------------
|
||
Mon Aug 18 12:59:42 UTC 2014 - jweberhofer@weberhofer.at
|
||
|
||
- Initial RPM package for version 3.3.0
|
||
|