2019-08-12 14:16:06 +00:00
|
|
|
|
-------------------------------------------------------------------
|
|
|
|
|
|
Mon Aug 12 12:31:18 UTC 2019 - Marketa Calabkova <mcalabkova@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
- Update to version 5.6
|
|
|
|
|
|
* The unescape_html function now supports all the HTML5 entities
|
|
|
|
|
|
that appear in html.entities.html5, including those with long
|
|
|
|
|
|
names such as ˝.
|
|
|
|
|
|
* Unescaping of numeric HTML entities now uses the standard library's
|
|
|
|
|
|
html.unescape, making edge cases consistent.
|
|
|
|
|
|
* On top of Python's support for HTML5 entities, ftfy will also
|
|
|
|
|
|
convert HTML escapes of common Latin capital letters that are
|
|
|
|
|
|
(nonstandardly) written in all caps, such as &NTILDE; for Ñ.
|
|
|
|
|
|
|
2018-10-18 09:58:47 +00:00
|
|
|
|
-------------------------------------------------------------------
|
|
|
|
|
|
Thu Oct 18 09:57:30 UTC 2018 - Tomáš Chvátal <tchvatal@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
- Update to version 5.5.1:
|
|
|
|
|
|
* Fixes build on python3.7
|
|
|
|
|
|
* Use Unicode 11
|
|
|
|
|
|
|
2018-07-30 07:33:53 +00:00
|
|
|
|
-------------------------------------------------------------------
|
|
|
|
|
|
Sun Jul 29 11:07:27 UTC 2018 - jengelh@inai.de
|
|
|
|
|
|
|
|
|
|
|
|
- Use noun phrase in summary. Trim filler wording from description.
|
|
|
|
|
|
|
2018-05-31 15:18:39 +00:00
|
|
|
|
-------------------------------------------------------------------
|
|
|
|
|
|
Wed May 16 16:10:48 UTC 2018 - toddrme2178@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
- Update to Version 5.3 (January 25, 2018)
|
|
|
|
|
|
* A heuristic has been too conservative since version 4.2, causing a regression
|
|
|
|
|
|
compared to previous versions: ftfy would fail to fix mojibake of common
|
|
|
|
|
|
characters such as `á` when seen in isolation. A new heuristic now makes it
|
|
|
|
|
|
possible to fix more of these common cases with less evidence.
|
|
|
|
|
|
- Update to Version 5.2 (November 27, 2017)
|
|
|
|
|
|
* The command-line tool will not accept the same filename as its input
|
|
|
|
|
|
and output. (Previously, this would write a zero-length file.)
|
|
|
|
|
|
* The `uncurl_quotes` fixer, which replaces curly quotes with straight quotes,
|
|
|
|
|
|
now also replaces MODIFIER LETTER APOSTROPHE.
|
|
|
|
|
|
* Codepoints that contain two Latin characters crammed together for legacy
|
|
|
|
|
|
encoding reasons are replaced by those two separate characters, even in NFC
|
|
|
|
|
|
mode. We formerly did this just with ligatures such as `fi` and `IJ`, but now
|
|
|
|
|
|
this includes the Afrikaans digraph `ʼn` and Serbian/Croatian digraphs such as
|
|
|
|
|
|
`dž`.
|
|
|
|
|
|
- Update to Version 5.1.1 and 4.4.3 (May 15, 2017)
|
|
|
|
|
|
- These releases fix two unrelated problems with the tests, one in each version.
|
|
|
|
|
|
* v5.1.1: fixed the CLI tests (which are new in v5) so that they pass
|
|
|
|
|
|
on Windows, as long as the Python output encoding is UTF-8.
|
|
|
|
|
|
* v4.4.3: added the `# coding: utf-8` declaration to two files that were
|
|
|
|
|
|
missing it, so that tests can run on Python 2.
|
|
|
|
|
|
- Update to Version 5.1 (April 7, 2017)
|
|
|
|
|
|
* Removed the dependency on `html5lib` by dropping support for Python 3.2.
|
|
|
|
|
|
We previously used the dictionary `html5lib.constants.entities` to decode
|
|
|
|
|
|
HTML entities. In Python 3.3 and later, that exact dictionary is now in the
|
|
|
|
|
|
standard library as `html.entities.html5`.
|
|
|
|
|
|
* Moved many test cases about how particular text should be fixed into
|
|
|
|
|
|
`test_cases.json`, which may ease porting to other languages.
|
|
|
|
|
|
- Update to Version 5.0.2 and 4.4.2 (March 21, 2017)
|
|
|
|
|
|
* Added a `MANIFEST.in` that puts files such as the license file and this
|
|
|
|
|
|
changelog inside the source distribution.
|
|
|
|
|
|
- Update to Version 5.0.1 and 4.4.1 (March 10, 2017)
|
|
|
|
|
|
- Bug fix:
|
|
|
|
|
|
* The `unescape_html` fixer will decode entities between `€` and `Ÿ`
|
|
|
|
|
|
as what they would be in Windows-1252, even without the help of
|
|
|
|
|
|
`fix_encoding`.
|
|
|
|
|
|
This better matches what Web browsers do, and fixes a regression that version
|
|
|
|
|
|
4.4 introduced in an example that uses `…` as an ellipsis.
|
|
|
|
|
|
- Update to Version 5.0 (February 17, 2017)
|
|
|
|
|
|
- Breaking changes:
|
|
|
|
|
|
* Dropped support for Python 2. If you need Python 2 support, you should get
|
|
|
|
|
|
version 4.4, which has the same features as this version.
|
|
|
|
|
|
* The top-level functions require their arguments to be given as keyword
|
|
|
|
|
|
arguments.
|
|
|
|
|
|
- Update to Version 4.4.0 (February 17, 2017)
|
|
|
|
|
|
- Heuristic changes:
|
|
|
|
|
|
* ftfy can now fix mojibake involving the Windows-1250 or ISO-8859-2 encodings.
|
|
|
|
|
|
* The `fix_entities` fixer is now applied after `fix_encoding`. This makes
|
|
|
|
|
|
more situations resolvable when both fixes are needed.
|
|
|
|
|
|
* With a few exceptions for commonly-used characters such as `^`, it is now
|
|
|
|
|
|
considered "weird" whenever a diacritic appears in non-combining form,
|
|
|
|
|
|
such as the diaeresis character `¨`.
|
|
|
|
|
|
* It is also now weird when IPA phonetic letters, besides `ə`, appear next to
|
|
|
|
|
|
capital letters.
|
|
|
|
|
|
* These changes to the heuristics, and others we've made in recent versions,
|
|
|
|
|
|
let us lower the "cost" for fixing mojibake in some encodings, causing them
|
|
|
|
|
|
to be fixed in more cases.
|
|
|
|
|
|
- Update to Version 4.3.1 (January 12, 2017)
|
|
|
|
|
|
- Bug fix:
|
|
|
|
|
|
* `remove_control_chars` was removing U+0D ('\r') prematurely. That's the
|
|
|
|
|
|
job of `fix_line_breaks`.
|
|
|
|
|
|
- Update to Version 4.3.0 (December 29, 2016)
|
|
|
|
|
|
* This version now depends on the `html5lib` and `wcwidth` libraries.
|
|
|
|
|
|
- Feature changes:
|
|
|
|
|
|
* The `remove_control_chars` fixer will now remove some non-ASCII control
|
|
|
|
|
|
characters as well, such as deprecated Arabic control characters and
|
|
|
|
|
|
byte-order marks. Bidirectional controls are still left as is.
|
|
|
|
|
|
This should have no impact on well-formed text, while cleaning up many
|
|
|
|
|
|
characters that the Unicode Consortium deems "not suitable for markup"
|
|
|
|
|
|
(see Unicode Technical Report #20).
|
|
|
|
|
|
* The `unescape_html` fixer uses a more thorough list of HTML entities,
|
|
|
|
|
|
which it imports from `html5lib`.
|
|
|
|
|
|
* `ftfy.formatting` now uses `wcwidth` to compute the width that a string
|
|
|
|
|
|
will occupy in a text console.
|
|
|
|
|
|
- Heuristic changes:
|
|
|
|
|
|
* Updated the data file of Unicode character categories to Unicode 9, as used
|
|
|
|
|
|
in Python 3.6.0. (No matter what version of Python you're on, ftfy uses the
|
|
|
|
|
|
same data.)
|
|
|
|
|
|
- Pending deprecations:
|
|
|
|
|
|
* The `remove_bom` option will become deprecated in 5.0, because it has been
|
|
|
|
|
|
superseded by `remove_control_chars`.
|
|
|
|
|
|
* ftfy 5.0 will remove the previously deprecated name `fix_text_encoding`. It
|
|
|
|
|
|
was renamed to `fix_encoding` in 4.0.
|
|
|
|
|
|
* ftfy 5.0 will require Python 3.2 or later, as planned. Python 2 users, please
|
|
|
|
|
|
specify `ftfy < 5` in your dependencies if you haven't already.
|
|
|
|
|
|
- Update to Version 4.2.0 (September 28, 2016)
|
|
|
|
|
|
- Heuristic changes:
|
|
|
|
|
|
* Math symbols next to currency symbols are no longer considered 'weird' by the
|
|
|
|
|
|
heuristic. This fixes a false positive where text that involved the
|
|
|
|
|
|
multiplication sign and British pounds or euros (as in '5×£35') could turn
|
|
|
|
|
|
into Hebrew letters.
|
|
|
|
|
|
* A heuristic that used to be a bonus for certain punctuation now also gives a
|
|
|
|
|
|
bonus to successfully decoding other common codepoints, such as the
|
|
|
|
|
|
non-breaking space, the degree sign, and the byte order mark.
|
|
|
|
|
|
* In version 4.0, we tried to "future-proof" the categorization of emoji (as a
|
|
|
|
|
|
kind of symbol) to include codepoints that would likely be assigned to emoji
|
|
|
|
|
|
later. The future happened, and there are even more emoji than we expected.
|
|
|
|
|
|
We have expanded the range to include those emoji, too.
|
|
|
|
|
|
ftfy is still mostly based on information from Unicode 8 (as Python 3.5 is),
|
|
|
|
|
|
but this expanded range should include the emoji from Unicode 9 and 10.
|
|
|
|
|
|
* Emoji are increasingly being modified by variation selectors and skin-tone
|
|
|
|
|
|
modifiers. Those codepoints are now grouped with 'symbols' in ftfy, so they
|
|
|
|
|
|
fit right in with emoji, instead of being considered 'marks' as their Unicode
|
|
|
|
|
|
category would suggest.
|
|
|
|
|
|
This enables fixing mojibake that involves iOS's new diverse emoji.
|
|
|
|
|
|
* An old heuristic that wasn't necessary anymore considered Latin text with
|
|
|
|
|
|
high-numbered codepoints to be 'weird', but this is normal in languages such
|
|
|
|
|
|
as Vietnamese and Azerbaijani. This does not seem to have caused any false
|
|
|
|
|
|
positives, but it caused ftfy to be too reluctant to fix some cases of broken
|
|
|
|
|
|
text in those languages.
|
|
|
|
|
|
The heuristic has been changed, and all languages that use Latin letters
|
|
|
|
|
|
should be on even footing now.
|
|
|
|
|
|
- Update to Version 4.1.1 (April 13, 2016)
|
|
|
|
|
|
* Bug fix: in the command-line interface, the `-e` option had no effect on
|
|
|
|
|
|
Python 3 when using standard input. Now, it correctly lets you specify
|
|
|
|
|
|
a different encoding for standard input.
|
|
|
|
|
|
- Update to Version 4.1.0 (February 25, 2016)
|
|
|
|
|
|
- Heuristic changes:
|
|
|
|
|
|
* ftfy can now deal with "lossy" mojibake. If your text has been run through
|
|
|
|
|
|
a strict Windows-1252 decoder, such as the one in Python, it may contain
|
|
|
|
|
|
the replacement character <20> (U+FFFD) where there were bytes that are
|
|
|
|
|
|
unassigned in Windows-1252.
|
|
|
|
|
|
Although ftfy won't recover the lost information, it can now detect this
|
|
|
|
|
|
situation, replace the entire lossy character with <20>, and decode the rest of
|
|
|
|
|
|
the characters. Previous versions would be unable to fix any string that
|
|
|
|
|
|
contained U+FFFD.
|
|
|
|
|
|
As an example, text in curly quotes that gets corrupted `“ like this â€<C3A2>`
|
|
|
|
|
|
now gets fixed to be `“ like this <20>`.
|
|
|
|
|
|
* Updated the data file of Unicode character categories to Unicode 8.0, as used
|
|
|
|
|
|
in Python 3.5.0. (No matter what version of Python you're on, ftfy uses the
|
|
|
|
|
|
same data.)
|
|
|
|
|
|
* Heuristics now count characters such as `~` and `^` as punctuation instead
|
|
|
|
|
|
of wacky math symbols, improving the detection of mojibake in some edge cases.
|
|
|
|
|
|
- New features:
|
|
|
|
|
|
* A new module, `ftfy.formatting`, can be used to justify Unicode text in a
|
|
|
|
|
|
monospaced terminal. It takes into account that each character can take up
|
|
|
|
|
|
anywhere from 0 to 2 character cells.
|
|
|
|
|
|
* Internally, the `utf-8-variants` codec was simplified and optimized.
|
|
|
|
|
|
- Update to Version 4.0.0 (April 10, 2015)
|
|
|
|
|
|
- Breaking changes:
|
|
|
|
|
|
* The default normalization form is now NFC, not NFKC. NFKC replaces a large
|
|
|
|
|
|
number of characters with 'equivalent' characters, and some of these
|
|
|
|
|
|
replacements are useful, but some are not desirable to do by default.
|
|
|
|
|
|
* The `fix_text` function has some new options that perform more targeted
|
|
|
|
|
|
operations that are part of NFKC normalization, such as
|
|
|
|
|
|
`fix_character_width`, without requiring hitting all your text with the huge
|
|
|
|
|
|
mallet that is NFKC.
|
|
|
|
|
|
* The `remove_unsafe_private_use` parameter has been removed entirely, after
|
|
|
|
|
|
two versions of deprecation. The function name `fix_bad_encoding` is also
|
|
|
|
|
|
gone.
|
|
|
|
|
|
- New features:
|
|
|
|
|
|
* Fixers for strange new forms of mojibake, including particularly clear cases
|
|
|
|
|
|
of mixed UTF-8 and Windows-1252.
|
|
|
|
|
|
* New heuristics, so that ftfy can fix more stuff, while maintaining
|
|
|
|
|
|
approximately zero false positives.
|
|
|
|
|
|
* The command-line tool trusts you to know what encoding your *input* is in,
|
|
|
|
|
|
and assumes UTF-8 by default. You can still tell it to guess with the `-g`
|
|
|
|
|
|
option.
|
|
|
|
|
|
* The command-line tool can be configured with options, and can be used as a
|
|
|
|
|
|
pipe.
|
|
|
|
|
|
* Recognizes characters that are new in Unicode 7.0, as well as emoji from
|
|
|
|
|
|
Unicode 8.0+ that may already be in use on iOS.
|
|
|
|
|
|
- Deprecations:
|
|
|
|
|
|
* `fix_text_encoding` is being renamed again, for conciseness and consistency.
|
|
|
|
|
|
It's now simply called `fix_encoding`. The name `fix_text_encoding` is
|
|
|
|
|
|
available but emits a warning.
|
|
|
|
|
|
- Pending deprecations:
|
|
|
|
|
|
* Python 2.6 support is largely coincidental.
|
|
|
|
|
|
* Python 2.7 support is on notice. If you use Python 2, be sure to pin a
|
|
|
|
|
|
version of ftfy less than 5.0 in your requirements.
|
|
|
|
|
|
|
|
|
|
|
|
- Implement single-spec version
|
|
|
|
|
|
|
|
|
|
|
|
-------------------------------------------------------------------
|
|
|
|
|
|
Mon Jul 13 13:12:38 UTC 2015 - toddrme2178@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
- Fix building on SLES 11
|
|
|
|
|
|
|
|
|
|
|
|
-------------------------------------------------------------------
|
|
|
|
|
|
Thu May 7 07:07:50 UTC 2015 - jweberhofer@weberhofer.at
|
|
|
|
|
|
|
|
|
|
|
|
- Use the tar-ball from pypi.python.org
|
|
|
|
|
|
|
|
|
|
|
|
-------------------------------------------------------------------
|
|
|
|
|
|
Mon May 4 15:04:36 UTC 2015 - jweberhofer@weberhofer.at
|
|
|
|
|
|
|
|
|
|
|
|
- Updated to version 3.4.0
|
|
|
|
|
|
|
|
|
|
|
|
* ftfy.fixes.fix_surrogates will fix all 16-bit surrogate codepoints, which
|
|
|
|
|
|
would otherwise break various encoding and output functions.
|
|
|
|
|
|
|
|
|
|
|
|
* remove_unsafe_private_use emits a warning, and will disappear in the next
|
|
|
|
|
|
minor or major version.
|
|
|
|
|
|
|
|
|
|
|
|
- Updated to version 3.3.1
|
|
|
|
|
|
|
|
|
|
|
|
* restores compatibility with Python 2.6.
|
|
|
|
|
|
|
|
|
|
|
|
-------------------------------------------------------------------
|
|
|
|
|
|
Mon Aug 18 12:59:42 UTC 2014 - jweberhofer@weberhofer.at
|
|
|
|
|
|
|
|
|
|
|
|
- Initial RPM package for version 3.3.0
|
|
|
|
|
|
|