- Update to 3.7

- Improve and update the NLTK team page on nltk.org (#2855,
    #2941)
  - Drop support for Python 3.6, support Python 3.10 (#2920)
- Update to 3.6.7
  - Resolve IndexError in `sent_tokenize` and `word_tokenize`
    (#2922)
- Update to 3.6.6
  - Refactor `gensim.doctest` to work for gensim 4.0.0 and up
    (#2914)
  - Add Precision, Recall, F-measure, Confusion Matrix to Taggers
    (#2862)
  - Added warnings if .zip files exist without any corresponding
    .csv files. (#2908)
  - Fix `FileNotFoundError` when the `download_dir` is
    a non-existing nested folder (#2910)
  - Rename omw to omw-1.4 (#2907)
  - Resolve ReDoS opportunity by fixing incorrectly specified
    regex (#2906, bsc#1191030, CVE-2021-3828).
  - Support OMW 1.4 (#2899)
  - Deprecate Tree get and set node methods (#2900)
  - Fix broken inaugural test case (#2903)
  - Use Multilingual Wordnet Data from OMW with newer Wordnet
    versions (#2889)
  - Keep NLTKs "tokenize" module working with pathlib (#2896)
  - Make prettyprinter to be more readable (#2893)
  - Update links to the nltk book (#2895)
  - Add `CITATION.cff` to nltk (#2880)
  - Resolve serious ReDoS in PunktSentenceTokenizer (#2869)
  - Delete old CI config files (#2881)

OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-nltk?expand=0&rev=36
This commit is contained in:
Matej Cepl 2022-03-22 07:58:51 +00:00 committed by Git OBS Bridge
parent dc693937c4
commit 31f5bb280c
4 changed files with 125 additions and 13 deletions

View File

@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:845365449cd8c5f9731f7cb9f8bd6fd0767553b9d53af9eb1b3abf7700936b35
size 1433531

3
nltk-3.7.zip Normal file
View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d6507d6460cec76d70afea4242a226a7542f85c669177b9c7f562b7cf1b05502
size 3293449

View File

@ -1,3 +1,115 @@
-------------------------------------------------------------------
Tue Mar 22 07:48:14 UTC 2022 - Matej Cepl <mcepl@suse.com>
- Update to 3.7
- Improve and update the NLTK team page on nltk.org (#2855,
#2941)
- Drop support for Python 3.6, support Python 3.10 (#2920)
- Update to 3.6.7
- Resolve IndexError in `sent_tokenize` and `word_tokenize`
(#2922)
- Update to 3.6.6
- Refactor `gensim.doctest` to work for gensim 4.0.0 and up
(#2914)
- Add Precision, Recall, F-measure, Confusion Matrix to Taggers
(#2862)
- Added warnings if .zip files exist without any corresponding
.csv files. (#2908)
- Fix `FileNotFoundError` when the `download_dir` is
a non-existing nested folder (#2910)
- Rename omw to omw-1.4 (#2907)
- Resolve ReDoS opportunity by fixing incorrectly specified
regex (#2906, bsc#1191030, CVE-2021-3828).
- Support OMW 1.4 (#2899)
- Deprecate Tree get and set node methods (#2900)
- Fix broken inaugural test case (#2903)
- Use Multilingual Wordnet Data from OMW with newer Wordnet
versions (#2889)
- Keep NLTKs "tokenize" module working with pathlib (#2896)
- Make prettyprinter to be more readable (#2893)
- Update links to the nltk book (#2895)
- Add `CITATION.cff` to nltk (#2880)
- Resolve serious ReDoS in PunktSentenceTokenizer (#2869)
- Delete old CI config files (#2881)
- Improve Tokenize documentation + add TokenizerI as superclass
for TweetTokenizer (#2878)
- Fix expected value for BLEU score doctest after changes from
#2572
- Add multi Bleu functionality and tests (#2793)
- Deprecate 'return_str' parameter in NLTKWordTokenizer and
TreebankWordTokenizer (#2883)
- Allow empty string in CFG's + more (#2888)
- Partition `tree.py` module into `tree` package + pickle fix
(#2863)
- Fix several TreebankWordTokenizer and NLTKWordTokenizer bugs
(#2877)
- Rewind Wordnet data file after each lookup (#2868)
- Correct __init__ call for SyntaxCorpusReader subclasses
(#2872)
- Documentation fixes (#2873)
- Fix levenstein distance for duplicated letters (#2849)
- Support alternative Wordnet versions (#2860)
- Remove hundreds of formatting warnings for nltk.org (#2859)
- Modernize `nltk.org/howto` pages (#2856)
- Fix Bleu Score smoothing function from taking log(0) (#2839)
- Update third party tools to newer versions and removing
MaltParser fixed version (#2832)
- Fix TypeError: _pretty() takes 1 positional argument but 2
were given in sem/drt.py (#2854)
- Replace `http` with `https` in most URLs (#2852)
- Update to 3.6.5
- modernised nltk.org website
- addressed LGTM.com issues
- support ZWJ sequences emoji and skin tone modifer emoji in
TweetTokenizer
- METEOR evaluation now requires pre-tokenized input
- Code linting and type hinting
- implement get_refs function for DrtLambdaExpression
- Enable automated CoreNLP, Senna, Prover9/Mace4, Megam,
MaltParser CI tests
- specify minimum regex version that supports regex.Pattern
- avoid re.Pattern and regex.Pattern which fail for Python 3.6,
3.7
- Update to 3.6.4
- deprecate `nltk.usage(obj)` in favor of `help(obj)`
- resolve ReDoS vulnerability in Corpus Reader
- solidify performance tests
- improve phone number recognition in tweet tokenizer
- refactored CISTEM stemmer for German
- identify NLTK Team as the author
- replace travis badge with github actions badge
- add SECURITY.md
- Update to 3.6.3
- Dropped support for Python 3.5
- Run CI tests on Windows, too
- Moved from Travis CI to GitHub Actions
- Code and comment cleanups
- Visualize WordNet relation graphs using Graphviz
- Fixed large error in METEOR score
- Apply isort, pyupgrade, black, added as pre-commit hooks
- Prevent debug_decisions in Punkt from throwing IndexError
- Resolved ZeroDivisionError in RIBES with dissimilar sentences
- Initialize WordNet IC total counts with smoothing value
- Fixed AttributeError for Arabic ARLSTem2 stemmer
- Many fixes and improvements to lm language model package
- Fix bug in nltk.metrics.aline, C_skip = -10
- Improvements to TweetTokenizer
- Optional show arg for FreqDist.plot, ConditionalFreqDist.plot
- edit_distance now computes Damerau-Levenshtein edit-distance
- Update to 3.6.2
- move test code to nltk/test
- fix bug in NgramAssocMeasures (order preserving fix)
- Update to 3.6
- add support for Python 3.9
- add Tree.fromlist
- compute Minimum Spanning Tree of unweighted graph using BFS
- fix bug with infinite loop in Wordnet closure and tree
- fix bug in calculating BLEU using smoothing method 4
- Wordnet synset similarities work for all pos
- new Arabic light stemmer (ARLSTem2)
- new syllable tokenizer (LegalitySyllableTokenizer)
- remove nose in favor of pytest
-------------------------------------------------------------------
Thu Apr 23 13:54:08 UTC 2020 - John Vandenberg <jayvdb@gmail.com>

View File

@ -1,7 +1,7 @@
#
# spec file for package python-nltk
#
# Copyright (c) 2020 SUSE LINUX GmbH, Nuernberg, Germany.
# Copyright (c) 2022 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
@ -20,7 +20,7 @@
%define pyname nltk
%define skip_python2 1
Name: python-nltk
Version: 3.5
Version: 3.7
Release: 0
Summary: Natural Language Toolkit
License: Apache-2.0
@ -55,13 +55,13 @@ Python modules, data sets and tutorials supporting research and
development in Natural Language Processing.
%prep
%setup -q -n %{pyname}-%{version}
%autosetup -p1 -n %{pyname}-%{version}
sed -i "1,4{/\/usr\/bin\/env/d}" nltk/corpus/reader/knbc.py
sed -i "1,4{/\/usr\/bin\/env/d}" nltk/test/runtests.py
sed -i "1,4{/\/usr\/bin\/env/d}" nltk/test/unit/test_tgrep.py
sed -i "1,4{/\/usr\/bin\/env/d}" nltk/tgrep.py
sed -i "1,4{/\/usr\/bin\/env/d}" nltk/tokenize/stanford_segmenter.py
# sed -i "1,4{/\/usr\/bin\/env/d}" nltk/corpus/reader/knbc.py
# sed -i "1,4{/\/usr\/bin\/env/d}" nltk/test/runtests.py
# sed -i "1,4{/\/usr\/bin\/env/d}" nltk/test/unit/test_tgrep.py
# sed -i "1,4{/\/usr\/bin\/env/d}" nltk/tgrep.py
# sed -i "1,4{/\/usr\/bin\/env/d}" nltk/tokenize/stanford_segmenter.py
%build
%python_build