------------------------------------------------------------------- Wed Jan 3 01:32:19 UTC 2024 - Stefan Brüns - Update to 1.6.0: * Changes + Replaced the rANS codec SIMD gathers with simulated gathers via scalar memory fetches. This helps AMD Zen4, but importantly it also fixes a disastrous performance regression caused by Intel's DownFall microcode fix. + There is an impact on pre-DownFall speeds, but we should focus on patched CPUs as a priority. + A small speed up to the rans_F_to_s3 function used by order-0 rans decode. + Small speed up to SIMD rans32x16 order-1 encoder by reducing cache misses. Also sped up the rans4x8 order-1 encoder, particularly on AMD Zen4. + Now supports building with "zig cc" * Bug fixes + Improve robustness of name tokeniser when given non 7-bit ASCII and on machines where "char" defaults to unsigned. + Also fixed a 1 byte buffer read-overrun in name tokeniser. + Fix name tokeniser encoder failure with some duplicated streams. + Fixed rans_set_cpu to work multiple times, as well as reinstating the ability to change decode and encode side independently (accidentally lost in commit 958032c). No effect on usage, but it improves the test coverage. + Added a round-trip fuzz tester to test the ability to encode. The old fuzz testing was decode streams only. + Fixed bounds checking in rans_uncompress_O0_32x16_avx2, fixing buffer read overruns. + Removed undefined behaviour in transpose_and_copy(), fixing zig cc builds. ------------------------------------------------------------------- Thu May 4 14:47:16 UTC 2023 - Andrea Manzini - Update to 1.5.0: * Significant speed ups to the fqzcomp codec via code restructuring and use of memory prefetch instructions. Encode is 30-40% faster and decode 5-8% faster. * Remove unused ax_with_libdeflate.m4 file from build system - removed patch fix_ix86_build.patch already merged in upstream - Update to 1.4.0: * This is almost entirely minor bug fixing with a few small updates. * Optimise compression / speed of the name tokeniser * Improvements for Intel -m32 builds, including better AVX2 validation * Detect Neon capability at runtime via operating system APIs. * Update hts_pack to operate in line with CRAMcodecs spec, where the number of symbols > 16. * Fixed too-stringent buffer overflow checking in O1 rans decoder. ------------------------------------------------------------------- Thu Sep 8 21:52:47 UTC 2022 - Stefan Brüns - Update to 1.3.0: * The primary change in this release is a new SIMD enabled rANS codec. There is a 32-way unrolled rANS implementation. This is accessed using the existing rans 4x16 API with the RANS_ORDER_X32 bit set. * Improved memory allocation via a new htscodecs_tls_alloc function. * Some external functions have been renamed, with the old ones still existing in a deprecated fashion. * Improved test framework with an "entropy" tool that iterates over all entropy encoders. * Reworked fuzzing infrastructure. * Small speed improvements to various rANS encoders and decoders. * Substantial memory reduction to the name tokeniser (tok3). * Fixed undefined behaviour in our use of _builtin_clz(). * Fixed a few redundant #includes. * Work around strict aliasing bugs, uncovered with gcc -O2. * Fixed an issue with encoding data blocks close to 2GB in size. * Fix encode error with large blocks using RANS_ORDER_STRIPE. - Add fix_ix86_build.patch ------------------------------------------------------------------- Wed Apr 20 19:45:48 UTC 2022 - Ferdinand Thiessen - Created initial package for libhts