15
0
forked from pool/python-gTTS

Accepting request 670774 from home:alarrosa:branches:devel:languages:python

- Update to 2.0.3:
  * Added new tokenizer case for ':' preventing cut in the middle of
    a time notation
- Update to 2.0.2:
  Features
  * Added Python 3.7 support, modernization of packaging, testing and CI
  Bugfixes
  * Fixed language retrieval/validation broken from new Google Translate page
- Update to 2.0.1:
  Bugfixes
  * Fixed an UnicodeDecodeError when installing gTTS if system locale was
    not utf-8
  Improved Documentation
  * Added Pre-processing and tokenizing > Minimizing section about the API's
    100 characters limit and how larger tokens are handled
- Update to 2.0.0:
  Features
  * The gtts module
    + New logger ("gtts") replaces all occurrences of print()
    + Languages list is now obtained automatically (gtts.lang)
    + Added a curated list of language sub-tags that have been observed to
      provide different dialects or accents (e.g. "en-gb", "fr-ca")
    + New gTTS() parameter lang_check to disable language checking.
    + gTTS() now delegates the text tokenizing to the API request methods (i.e.
      write_to_fp(), save()), allowing gTTS instances to be modified/reused
    + Rewrote tokenizing and added pre-processing (see below)
    + New gTTS() parameters pre_processor_funcs and tokenizer_func to configure
      pre-processing and tokenizing (or use a 3rd party tokenizer)
    + Error handling:
      - Added new exception gTTSError raised on API request errors. It attempts

OBS-URL: https://build.opensuse.org/request/show/670774
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-gTTS?expand=0&rev=3
This commit is contained in:
Tomáš Chvátal
2019-02-03 18:33:45 +00:00
committed by Git OBS Bridge
parent ff507f499e
commit 18440ed39b
5 changed files with 139 additions and 9 deletions

View File

@@ -1,7 +1,121 @@
-------------------------------------------------------------------
Sat Feb 2 21:52:59 UTC 2019 - Antonio Larrosa <alarrosa@suse.com>
- Update to 2.0.3:
* Added new tokenizer case for ':' preventing cut in the middle of
a time notation
- Update to 2.0.2:
Features
* Added Python 3.7 support, modernization of packaging, testing and CI
Bugfixes
* Fixed language retrieval/validation broken from new Google Translate page
- Update to 2.0.1:
Bugfixes
* Fixed an UnicodeDecodeError when installing gTTS if system locale was
not utf-8
Improved Documentation
* Added Pre-processing and tokenizing > Minimizing section about the API's
100 characters limit and how larger tokens are handled
- Update to 2.0.0:
Features
* The gtts module
+ New logger ("gtts") replaces all occurrences of print()
+ Languages list is now obtained automatically (gtts.lang)
+ Added a curated list of language sub-tags that have been observed to
provide different dialects or accents (e.g. "en-gb", "fr-ca")
+ New gTTS() parameter lang_check to disable language checking.
+ gTTS() now delegates the text tokenizing to the API request methods (i.e.
write_to_fp(), save()), allowing gTTS instances to be modified/reused
+ Rewrote tokenizing and added pre-processing (see below)
+ New gTTS() parameters pre_processor_funcs and tokenizer_func to configure
pre-processing and tokenizing (or use a 3rd party tokenizer)
+ Error handling:
- Added new exception gTTSError raised on API request errors. It attempts
to guess what went wrong based on known information and observed
behaviour
- gTTS.write_to_fp() and gTTS.save() also raise gTTSError on gtts_token
error
- gTTS.write_to_fp() raises TypeError when fp is not a file-like object
or one that doesn't take bytes
- gTTS() raises ValueError on unsupported languages (and lang_check is
True)
- More fine-grained error handling throughout (e.g. request failed vs.
request successful with a bad response)
* Tokenizer (and new pre-processors):
+ Rewrote and greatly expanded tokenizer (gtts.tokenizer)
+ Smarter token 'cleaning' that will remove tokens that only contain
characters that can't be spoken (i.e. punctuation and whitespace)
+ Decoupled token minimizing from tokenizing, making the latter usable
in other contexts
+ New flexible speech-centric text pre-processing
+ New flexible full-featured regex-based tokenizer
(gtts.tokenizer.core.Tokenizer)
+ New RegexBuilder, PreProcessorRegex and PreProcessorSub classes to make
writing regex-powered text pre-processors and tokenizer cases easier
+ Pre-processors:
- Re-form words cut by end-of-line hyphens
- Remove periods after a (customizable) list of known abbreviations (e.g.
"jr", "sr", "dr") that can be spoken the same without a period
- Perform speech corrections by doing word-for-word replacements from a
(customizable) list of tuples
+ Tokenizing:
- Keep punctuation that modify the inflection of speech (e.g. "?", "!")
- Don't split in the middle of numbers (e.g. "10.5", "20,000,000")
- Don't split on "dotted" abbreviations and accronyms (e.g. "U.S.A")
- Added Chinese comma (""), ellipsis ("…") to punctuation list to
tokenize on
* The gtts-cli command-line tool
- Rewrote cli as first-class citizen module (gtts.cli), powered by Click
- Windows support using setuptool's entry_points
- Better support for Unicode I/O in Python 2
- All arguments are now pre-validated
- New --nocheck flag to skip language pre-checking
- New --all flag to list all available languages
- Either the --file option or the <text> argument can be set to "-" to
read from stdin
- The --debug flag uses logging and doesn't pollute stdout anymore
Bugfixes
* _minimize(): Fixed an infinite recursion loop that would occur when a
token started with the miminizing delimiter (i.e. a space)
* _minimize(): Handle the case where a token of more than 100 characters
did not contain a space (e.g. in Chinese).
* Fixed an issue that fused multiline text together if the total number of
characters was less than 100
* Fixed gtts-cli Unicode errors in Python 2.7
Deprecations and Removals
* Dropped Python 3.3 support
* Removed debug parameter of gTTS (in favour of logger)
* gtts-cli: Changed long option name of -o to --output instead of
--destination
* gTTS() will raise a ValueError rather than an AssertionError on
unsupported language
Improved Documentation
* Rewrote all documentation files as reStructuredText
* Comprehensive documentation writen for Sphinx, published to
http://gtts.readthedocs.io
* Changelog built with towncrier
Misc
* Major test re-work
* Language tests can read a TEST_LANGS enviromment variable so not all
language tests are run every time.
* Added AppVeyor CI for Windows
* PEP 8 compliance
- Add remove-pip-requirement.patch to remove the dependency on pip to build
the package.
-------------------------------------------------------------------
Thu May 3 15:38:01 UTC 2018 - alarrosa@suse.com
- Run spec-cleaner
-------------------------------------------------------------------
Thu May 3 09:36:21 UTC 2018 - alarrosa@suse.com
- Use %license for the LICENSE file
- Use %license for the LICENSE file
-------------------------------------------------------------------
Sun Mar 4 13:08:06 UTC 2018 - jengelh@inai.de