python-tokenizers.changes

-------------------------------------------------------------------
Tue Aug 20 07:27:42 UTC 2024 - Simon Lees <sflees@suse.de>

- Fix testsuite on 15.6

-------------------------------------------------------------------
Sun Aug 18 16:49:56 UTC 2024 - Soc Virnyl Estela <obs@uncomfyhalomacro.pl>

- Replace vendor tarball to zstd compressed vendor tarball
- Force gcc version on leap. Thanks @marv7000 for your zed.spec
- Use `CARGO_*` environmental variables to force generate
  full debuginfo and avoid stripping.
- Enable cargo test in %check.
- Update to version 0.20.0:
  * remove enforcement of non special when adding tokens
  * [BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder
  * Make USED_PARALLELISM atomic
  * Fixing for clippy 1.78
  * feat(ci): add trufflehog secrets detection
  * Switch from cached_download to hf_hub_download in tests
  * Fix "dictionnary" typo
  * make sure we don't warn on empty tokens
  * Enable dropout = 0.0 as an equivalent to none in BPE
  * Revert "[BREAKING CHANGE] Ignore added_tokens (both special and not) …
  * Add bytelevel normalizer to fix decode when adding tokens to BPE
  * Fix clippy + feature test management.
  * Bump spm_precompiled to 0.1.3
  * Add benchmark vs tiktoken
  * Fixing the benchmark.
  * Tiny improvement
  * Enable fancy regex
  * Fixing release CI strict (taken from safetensors).
  * Adding some serialization testing around the wrapper.
  * Add-legacy-tests
  * Adding a few tests for decoder deserialization.
  * Better serialization error
  * Add test normalizers
  * Improve decoder deserialization
  * Using serde (serde_pyo3) to get str and repr easily.
  * Merges cannot handle tokens containing spaces.
  * Fix doc about split
  * Support None to reset pre_tokenizers and normalizers, and index sequences
  * Fix strip python type
  * Tests + Deserialization improvement for normalizers.
  * add deserialize for pre tokenizers
  * Perf improvement 16% by removing offsets.

-------------------------------------------------------------------
Wed Jul  3 14:55:36 UTC 2024 - Christian Goll <cgoll@suse.com>

- initial commit on rust based python-tokenizers
Accepting request 1194910 from home:uncomfyhalomacro:branches:science:machinelearning OBS-URL: https://build.opensuse.org/request/show/1194910 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-tokenizers?expand=0&rev=2 2024-08-20 12:45:40 +00:00			`-------------------------------------------------------------------`
			`Tue Aug 20 07:27:42 UTC 2024 - Simon Lees <sflees@suse.de>`

			`- Fix testsuite on 15.6`

			`-------------------------------------------------------------------`
			`Sun Aug 18 16:49:56 UTC 2024 - Soc Virnyl Estela <obs@uncomfyhalomacro.pl>`

			`- Replace vendor tarball to zstd compressed vendor tarball`
			`- Force gcc version on leap. Thanks @marv7000 for your zed.spec`
			- Use `CARGO_*` environmental variables to force generate
			`full debuginfo and avoid stripping.`
			`- Enable cargo test in %check.`
			`- Update to version 0.20.0:`
			`* remove enforcement of non special when adding tokens`
			`* [BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder`
			`* Make USED_PARALLELISM atomic`
			`* Fixing for clippy 1.78`
			`* feat(ci): add trufflehog secrets detection`
			`* Switch from cached_download to hf_hub_download in tests`
			`* Fix "dictionnary" typo`
			`* make sure we don't warn on empty tokens`
			`* Enable dropout = 0.0 as an equivalent to none in BPE`
			`* Revert "[BREAKING CHANGE] Ignore added_tokens (both special and not) …`
			`* Add bytelevel normalizer to fix decode when adding tokens to BPE`
			`* Fix clippy + feature test management.`
			`* Bump spm_precompiled to 0.1.3`
			`* Add benchmark vs tiktoken`
			`* Fixing the benchmark.`
			`* Tiny improvement`
			`* Enable fancy regex`
			`* Fixing release CI strict (taken from safetensors).`
			`* Adding some serialization testing around the wrapper.`
			`* Add-legacy-tests`
			`* Adding a few tests for decoder deserialization.`
			`* Better serialization error`
			`* Add test normalizers`
			`* Improve decoder deserialization`
			`* Using serde (serde_pyo3) to get str and repr easily.`
			`* Merges cannot handle tokens containing spaces.`
			`* Fix doc about split`
			`* Support None to reset pre_tokenizers and normalizers, and index sequences`
			`* Fix strip python type`
			`* Tests + Deserialization improvement for normalizers.`
			`* add deserialize for pre tokenizers`
			`* Perf improvement 16% by removing offsets.`

Accepting request 1185138 from home:mslacken:ml initial commit ofrust base python-tokenizers OBS-URL: https://build.opensuse.org/request/show/1185138 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-tokenizers?expand=0&rev=1 2024-07-23 09:21:39 +00:00			`-------------------------------------------------------------------`
			`Wed Jul 3 14:55:36 UTC 2024 - Christian Goll <cgoll@suse.com>`

			`- initial commit on rust based python-tokenizers`