Accepting request 1009574 from home:dspinella:branches:devel:libraries:c_c++
- Add Power8 optimizations: * zlib-1.2.12-add-optimized-slide_hash-for-power.patch * zlib-1.2.12-add-vectorized-longest_match-for-power.patch * zlib-1.2.12-adler32-vector-optimizations-for-power.patch * zlib-1.2.12-fix-invalid-memory-access-on-ppc-and-ppc64.patch - Update zlib-1.2.12-IBM-Z-hw-accelerated-deflate-s390x.patch OBS-URL: https://build.opensuse.org/request/show/1009574 OBS-URL: https://build.opensuse.org/package/show/devel:libraries:c_c++/zlib?expand=0&rev=82
This commit is contained in:
parent
91abbb8bfa
commit
f6c5a01beb
@ -1,8 +1,99 @@
|
||||
From e1d7e6dc9698968a8536b00ebd9e9b4e429b4306 Mon Sep 17 00:00:00 2001
|
||||
From 171d0ff3c9ed40da0ac14085ab16b766b1162069 Mon Sep 17 00:00:00 2001
|
||||
From: Ilya Leoshkevich <iii@linux.ibm.com>
|
||||
Date: Wed, 27 Apr 2022 14:37:24 +0200
|
||||
Subject: [PATCH] zlib-1.2.12-IBM-Z-hw-accelrated-deflate-s390x.patch
|
||||
Date: Wed, 18 Jul 2018 13:14:07 +0200
|
||||
Subject: [PATCH] Add support for IBM Z hardware-accelerated deflate
|
||||
|
||||
IBM Z mainframes starting from version z15 provide DFLTCC instruction,
|
||||
which implements deflate algorithm in hardware with estimated
|
||||
compression and decompression performance orders of magnitude faster
|
||||
than the current zlib and ratio comparable with that of level 1.
|
||||
|
||||
This patch adds DFLTCC support to zlib. In order to enable it, the
|
||||
following build commands should be used:
|
||||
|
||||
$ ./configure --dfltcc
|
||||
$ make
|
||||
|
||||
When built like this, zlib would compress in hardware on level 1, and in
|
||||
software on all other levels. Decompression will always happen in
|
||||
hardware. In order to enable DFLTCC compression for levels 1-6 (i.e. to
|
||||
make it used by default) one could either configure with
|
||||
--dfltcc-level-mask=0x7e or set the environment variable
|
||||
DFLTCC_LEVEL_MASK to 0x7e at run time.
|
||||
|
||||
Two DFLTCC compression calls produce the same results only when they
|
||||
both are made on machines of the same generation, and when the
|
||||
respective buffers have the same offset relative to the start of the
|
||||
page. Therefore care should be taken when using hardware compression
|
||||
when reproducible results are desired. One such use case - reproducible
|
||||
software builds - is handled explicitly: when SOURCE_DATE_EPOCH
|
||||
environment variable is set, the hardware compression is disabled.
|
||||
|
||||
DFLTCC does not support every single zlib feature, in particular:
|
||||
|
||||
* inflate(Z_BLOCK) and inflate(Z_TREES)
|
||||
* inflateMark()
|
||||
* inflatePrime()
|
||||
* inflateSyncPoint()
|
||||
|
||||
When used, these functions will either switch to software, or, in case
|
||||
this is not possible, gracefully fail.
|
||||
|
||||
This patch tries to add DFLTCC support in the least intrusive way.
|
||||
All SystemZ-specific code is placed into a separate file, but
|
||||
unfortunately there is still a noticeable amount of changes in the
|
||||
main zlib code. Below is the summary of these changes.
|
||||
|
||||
DFLTCC takes as arguments a parameter block, an input buffer, an output
|
||||
buffer and a window. Since DFLTCC requires parameter block to be
|
||||
doubleword-aligned, and it's reasonable to allocate it alongside
|
||||
deflate and inflate states, ZALLOC_STATE, ZFREE_STATE and ZCOPY_STATE
|
||||
macros were introduced in order to encapsulate the allocation details.
|
||||
The same is true for window, for which ZALLOC_WINDOW and
|
||||
TRY_FREE_WINDOW macros were introduced.
|
||||
|
||||
While for inflate software and hardware window formats match, this is
|
||||
not the case for deflate. Therefore, deflateSetDictionary and
|
||||
deflateGetDictionary need special handling, which is triggered using the
|
||||
new DEFLATE_SET_DICTIONARY_HOOK and DEFLATE_GET_DICTIONARY_HOOK macros.
|
||||
|
||||
deflateResetKeep() and inflateResetKeep() now update the DFLTCC
|
||||
parameter block, which is allocated alongside zlib state, using
|
||||
the new DEFLATE_RESET_KEEP_HOOK and INFLATE_RESET_KEEP_HOOK macros.
|
||||
|
||||
The new DEFLATE_PARAMS_HOOK switches between hardware and software
|
||||
deflate implementations when deflateParams() arguments demand this.
|
||||
|
||||
The new INFLATE_PRIME_HOOK, INFLATE_MARK_HOOK and
|
||||
INFLATE_SYNC_POINT_HOOK macros make the respective unsupported calls
|
||||
gracefully fail.
|
||||
|
||||
The algorithm implemented in hardware has different compression ratio
|
||||
than the one implemented in software. In order for deflateBound() to
|
||||
return the correct results for the hardware implementation, the new
|
||||
DEFLATE_BOUND_ADJUST_COMPLEN and DEFLATE_NEED_CONSERVATIVE_BOUND macros
|
||||
were introduced.
|
||||
|
||||
Actual compression and decompression are handled by the new DEFLATE_HOOK
|
||||
and INFLATE_TYPEDO_HOOK macros. Since inflation with DFLTCC manages the
|
||||
window on its own, calling updatewindow() is suppressed using the new
|
||||
INFLATE_NEED_UPDATEWINDOW() macro.
|
||||
|
||||
In addition to compression, DFLTCC computes CRC-32 and Adler-32
|
||||
checksums, therefore, whenever it's used, software checksumming needs to
|
||||
be suppressed using the new DEFLATE_NEED_CHECKSUM and
|
||||
INFLATE_NEED_CHECKSUM macros.
|
||||
|
||||
DFLTCC will refuse to write an End-of-block Symbol if there is no input
|
||||
data, thus in some cases it is necessary to do this manually. In order
|
||||
to achieve this, send_bits, bi_reverse, bi_windup and flush_pending
|
||||
were promoted from local to ZLIB_INTERNAL. Furthermore, since block and
|
||||
stream termination must be handled in software as well, block_state enum
|
||||
was moved to deflate.h.
|
||||
|
||||
Since the first call to dfltcc_inflate already needs the window, and it
|
||||
might be not allocated yet, inflate_ensure_window was factored out of
|
||||
updatewindow and made ZLIB_INTERNAL.
|
||||
---
|
||||
Makefile.in | 8 +
|
||||
compress.c | 14 +-
|
||||
@ -27,10 +118,10 @@ Subject: [PATCH] zlib-1.2.12-IBM-Z-hw-accelrated-deflate-s390x.patch
|
||||
create mode 100644 contrib/s390/dfltcc.h
|
||||
create mode 100644 contrib/s390/dfltcc_deflate.h
|
||||
|
||||
Index: zlib-1.2.12/Makefile.in
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/Makefile.in
|
||||
+++ zlib-1.2.12/Makefile.in
|
||||
diff --git a/Makefile.in b/Makefile.in
|
||||
index fd28bbfbf..66e3a8057 100644
|
||||
--- a/Makefile.in
|
||||
+++ b/Makefile.in
|
||||
@@ -143,6 +143,14 @@ match.lo: match.S
|
||||
mv _match.o match.lo
|
||||
rm -f _match.s
|
||||
@ -46,10 +137,10 @@ Index: zlib-1.2.12/Makefile.in
|
||||
example.o: $(SRCDIR)test/example.c $(SRCDIR)zlib.h zconf.h
|
||||
$(CC) $(CFLAGS) $(ZINCOUT) -c -o $@ $(SRCDIR)test/example.c
|
||||
|
||||
Index: zlib-1.2.12/compress.c
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/compress.c
|
||||
+++ zlib-1.2.12/compress.c
|
||||
diff --git a/compress.c b/compress.c
|
||||
index e2db404ab..78fc6568f 100644
|
||||
--- a/compress.c
|
||||
+++ b/compress.c
|
||||
@@ -5,9 +5,15 @@
|
||||
|
||||
/* @(#) $Id$ */
|
||||
@ -67,7 +158,7 @@ Index: zlib-1.2.12/compress.c
|
||||
/* ===========================================================================
|
||||
Compresses the source buffer into the destination buffer. The level
|
||||
parameter has the same meaning as in deflateInit. sourceLen is the byte
|
||||
@@ -81,6 +87,12 @@ int ZEXPORT compress (dest, destLen, sou
|
||||
@@ -81,6 +87,12 @@ int ZEXPORT compress (dest, destLen, source, sourceLen)
|
||||
uLong ZEXPORT compressBound (sourceLen)
|
||||
uLong sourceLen;
|
||||
{
|
||||
@ -80,10 +171,10 @@ Index: zlib-1.2.12/compress.c
|
||||
return sourceLen + (sourceLen >> 12) + (sourceLen >> 14) +
|
||||
(sourceLen >> 25) + 13;
|
||||
}
|
||||
Index: zlib-1.2.12/configure
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/configure
|
||||
+++ zlib-1.2.12/configure
|
||||
diff --git a/configure b/configure
|
||||
index fbaf25357..02e325e22 100755
|
||||
--- a/configure
|
||||
+++ b/configure
|
||||
@@ -115,6 +115,7 @@ case "$1" in
|
||||
echo ' configure [--const] [--zprefix] [--prefix=PREFIX] [--eprefix=EXPREFIX]' | tee -a configure.log
|
||||
echo ' [--static] [--64] [--libdir=LIBDIR] [--sharedlibdir=LIBDIR]' | tee -a configure.log
|
||||
@ -109,7 +200,7 @@ Index: zlib-1.2.12/configure
|
||||
*)
|
||||
echo "unknown option: $1" | tee -a configure.log
|
||||
echo "$0 --help for help" | tee -a configure.log
|
||||
@@ -833,6 +844,19 @@ EOF
|
||||
@@ -836,6 +847,19 @@ EOF
|
||||
fi
|
||||
fi
|
||||
|
||||
@ -129,11 +220,11 @@ Index: zlib-1.2.12/configure
|
||||
# show the results in the log
|
||||
echo >> configure.log
|
||||
echo ALL = $ALL >> configure.log
|
||||
Index: zlib-1.2.12/contrib/README.contrib
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/contrib/README.contrib
|
||||
+++ zlib-1.2.12/contrib/README.contrib
|
||||
@@ -46,6 +46,10 @@ puff/ by Mark Adler <madler@alumni
|
||||
diff --git a/contrib/README.contrib b/contrib/README.contrib
|
||||
index 335e43508..130a28bdb 100644
|
||||
--- a/contrib/README.contrib
|
||||
+++ b/contrib/README.contrib
|
||||
@@ -46,6 +46,10 @@ puff/ by Mark Adler <madler@alumni.caltech.edu>
|
||||
Small, low memory usage inflate. Also serves to provide an
|
||||
unambiguous description of the deflate format.
|
||||
|
||||
@ -144,10 +235,11 @@ Index: zlib-1.2.12/contrib/README.contrib
|
||||
testzlib/ by Gilles Vollant <info@winimage.com>
|
||||
Example of the use of zlib
|
||||
|
||||
Index: zlib-1.2.12/contrib/s390/README.txt
|
||||
===================================================================
|
||||
diff --git a/contrib/s390/README.txt b/contrib/s390/README.txt
|
||||
new file mode 100644
|
||||
index 000000000..48be008bd
|
||||
--- /dev/null
|
||||
+++ zlib-1.2.12/contrib/s390/README.txt
|
||||
+++ b/contrib/s390/README.txt
|
||||
@@ -0,0 +1,17 @@
|
||||
+IBM Z mainframes starting from version z15 provide DFLTCC instruction,
|
||||
+which implements deflate algorithm in hardware with estimated
|
||||
@ -166,10 +258,11 @@ Index: zlib-1.2.12/contrib/s390/README.txt
|
||||
+make it used by default) one could either configure with
|
||||
+--dfltcc-level-mask=0x7e or set the environment variable
|
||||
+DFLTCC_LEVEL_MASK to 0x7e at run time.
|
||||
Index: zlib-1.2.12/contrib/s390/dfltcc.c
|
||||
===================================================================
|
||||
diff --git a/contrib/s390/dfltcc.c b/contrib/s390/dfltcc.c
|
||||
new file mode 100644
|
||||
index 000000000..cd959290d
|
||||
--- /dev/null
|
||||
+++ zlib-1.2.12/contrib/s390/dfltcc.c
|
||||
+++ b/contrib/s390/dfltcc.c
|
||||
@@ -0,0 +1,996 @@
|
||||
+/* dfltcc.c - SystemZ DEFLATE CONVERSION CALL support. */
|
||||
+
|
||||
@ -797,7 +890,7 @@ Index: zlib-1.2.12/contrib/s390/dfltcc.c
|
||||
+ state->bits = param->sbb;
|
||||
+ state->whave = param->hl;
|
||||
+ state->wnext = (param->ho + param->hl) & ((1 << HB_BITS) - 1);
|
||||
+ state->check = state->flags ? ZSWAP32(param->cv) : param->cv;
|
||||
+ strm->adler = state->check = state->flags ? ZSWAP32(param->cv) : param->cv;
|
||||
+ if (cc == DFLTCC_CC_OP2_CORRUPT && param->oesc != 0) {
|
||||
+ /* Report an error if stream is corrupted */
|
||||
+ state->mode = BAD;
|
||||
@ -1167,10 +1260,11 @@ Index: zlib-1.2.12/contrib/s390/dfltcc.c
|
||||
+ *dict_length = param->hl;
|
||||
+ return Z_OK;
|
||||
+}
|
||||
Index: zlib-1.2.12/contrib/s390/dfltcc.h
|
||||
===================================================================
|
||||
diff --git a/contrib/s390/dfltcc.h b/contrib/s390/dfltcc.h
|
||||
new file mode 100644
|
||||
index 000000000..da26612ca
|
||||
--- /dev/null
|
||||
+++ zlib-1.2.12/contrib/s390/dfltcc.h
|
||||
+++ b/contrib/s390/dfltcc.h
|
||||
@@ -0,0 +1,81 @@
|
||||
+#ifndef DFLTCC_H
|
||||
+#define DFLTCC_H
|
||||
@ -1253,10 +1347,11 @@ Index: zlib-1.2.12/contrib/s390/dfltcc.h
|
||||
+ } while (0)
|
||||
+
|
||||
+#endif
|
||||
Index: zlib-1.2.12/contrib/s390/dfltcc_deflate.h
|
||||
===================================================================
|
||||
diff --git a/contrib/s390/dfltcc_deflate.h b/contrib/s390/dfltcc_deflate.h
|
||||
new file mode 100644
|
||||
index 000000000..46acfc550
|
||||
--- /dev/null
|
||||
+++ zlib-1.2.12/contrib/s390/dfltcc_deflate.h
|
||||
+++ b/contrib/s390/dfltcc_deflate.h
|
||||
@@ -0,0 +1,55 @@
|
||||
+#ifndef DFLTCC_DEFLATE_H
|
||||
+#define DFLTCC_DEFLATE_H
|
||||
@ -1313,10 +1408,10 @@ Index: zlib-1.2.12/contrib/s390/dfltcc_deflate.h
|
||||
+#define DEFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_deflate((strm)))
|
||||
+
|
||||
+#endif
|
||||
Index: zlib-1.2.12/deflate.c
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/deflate.c
|
||||
+++ zlib-1.2.12/deflate.c
|
||||
diff --git a/deflate.c b/deflate.c
|
||||
index 7f421e4da..a56c1783c 100644
|
||||
--- a/deflate.c
|
||||
+++ b/deflate.c
|
||||
@@ -61,15 +61,30 @@ const char deflate_copyright[] =
|
||||
*/
|
||||
|
||||
@ -1355,7 +1450,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
typedef block_state (*compress_func) OF((deflate_state *s, int flush));
|
||||
/* Compression function. Returns the block state after the call. */
|
||||
|
||||
@@ -85,7 +100,6 @@ local block_state deflate_rle OF((def
|
||||
@@ -85,7 +100,6 @@ local block_state deflate_rle OF((deflate_state *s, int flush));
|
||||
local block_state deflate_huff OF((deflate_state *s, int flush));
|
||||
local void lm_init OF((deflate_state *s));
|
||||
local void putShortMSB OF((deflate_state *s, uInt b));
|
||||
@ -1363,7 +1458,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
local unsigned read_buf OF((z_streamp strm, Bytef *buf, unsigned size));
|
||||
#ifdef ASMV
|
||||
# pragma message("Assembler code may have bugs -- use at your own risk")
|
||||
@@ -294,7 +308,7 @@ int ZEXPORT deflateInit2_(strm, level, m
|
||||
@@ -299,7 +313,7 @@ int ZEXPORT deflateInit2_(strm, level, method, windowBits, memLevel, strategy,
|
||||
return Z_STREAM_ERROR;
|
||||
}
|
||||
if (windowBits == 8) windowBits = 9; /* until 256-byte window bug fixed */
|
||||
@ -1372,7 +1467,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
if (s == Z_NULL) return Z_MEM_ERROR;
|
||||
strm->state = (struct internal_state FAR *)s;
|
||||
s->strm = strm;
|
||||
@@ -311,7 +325,7 @@ int ZEXPORT deflateInit2_(strm, level, m
|
||||
@@ -316,7 +330,7 @@ int ZEXPORT deflateInit2_(strm, level, method, windowBits, memLevel, strategy,
|
||||
s->hash_mask = s->hash_size - 1;
|
||||
s->hash_shift = ((s->hash_bits+MIN_MATCH-1)/MIN_MATCH);
|
||||
|
||||
@ -1381,7 +1476,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
s->prev = (Posf *) ZALLOC(strm, s->w_size, sizeof(Pos));
|
||||
s->head = (Posf *) ZALLOC(strm, s->hash_size, sizeof(Pos));
|
||||
|
||||
@@ -429,6 +443,7 @@ int ZEXPORT deflateSetDictionary (strm,
|
||||
@@ -434,6 +448,7 @@ int ZEXPORT deflateSetDictionary (strm, dictionary, dictLength)
|
||||
/* when using zlib wrappers, compute Adler-32 for provided dictionary */
|
||||
if (wrap == 1)
|
||||
strm->adler = adler32(strm->adler, dictionary, dictLength);
|
||||
@ -1389,7 +1484,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
s->wrap = 0; /* avoid computing Adler-32 in read_buf */
|
||||
|
||||
/* if dictionary would fill window, just replace the history */
|
||||
@@ -487,6 +502,7 @@ int ZEXPORT deflateGetDictionary (strm,
|
||||
@@ -492,6 +507,7 @@ int ZEXPORT deflateGetDictionary (strm, dictionary, dictLength)
|
||||
|
||||
if (deflateStateCheck(strm))
|
||||
return Z_STREAM_ERROR;
|
||||
@ -1397,7 +1492,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
s = strm->state;
|
||||
len = s->strstart + s->lookahead;
|
||||
if (len > s->w_size)
|
||||
@@ -533,6 +549,8 @@ int ZEXPORT deflateResetKeep (strm)
|
||||
@@ -538,6 +554,8 @@ int ZEXPORT deflateResetKeep (strm)
|
||||
|
||||
_tr_init(s);
|
||||
|
||||
@ -1406,7 +1501,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
return Z_OK;
|
||||
}
|
||||
|
||||
@@ -608,6 +626,7 @@ int ZEXPORT deflateParams(strm, level, s
|
||||
@@ -613,6 +631,7 @@ int ZEXPORT deflateParams(strm, level, strategy)
|
||||
{
|
||||
deflate_state *s;
|
||||
compress_func func;
|
||||
@ -1414,7 +1509,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
|
||||
if (deflateStateCheck(strm)) return Z_STREAM_ERROR;
|
||||
s = strm->state;
|
||||
@@ -620,15 +639,18 @@ int ZEXPORT deflateParams(strm, level, s
|
||||
@@ -625,15 +644,18 @@ int ZEXPORT deflateParams(strm, level, strategy)
|
||||
if (level < 0 || level > 9 || strategy < 0 || strategy > Z_FIXED) {
|
||||
return Z_STREAM_ERROR;
|
||||
}
|
||||
@ -1437,7 +1532,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
return Z_BUF_ERROR;
|
||||
}
|
||||
if (s->level != level) {
|
||||
@@ -695,6 +717,7 @@ uLong ZEXPORT deflateBound(strm, sourceL
|
||||
@@ -700,6 +722,7 @@ uLong ZEXPORT deflateBound(strm, sourceLen)
|
||||
/* conservative upper bound for compressed data */
|
||||
complen = sourceLen +
|
||||
((sourceLen + 7) >> 3) + ((sourceLen + 63) >> 6) + 5;
|
||||
@ -1445,7 +1540,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
|
||||
/* if can't get parameters, return conservative bound plus zlib wrapper */
|
||||
if (deflateStateCheck(strm))
|
||||
@@ -736,7 +759,8 @@ uLong ZEXPORT deflateBound(strm, sourceL
|
||||
@@ -741,7 +764,8 @@ uLong ZEXPORT deflateBound(strm, sourceLen)
|
||||
}
|
||||
|
||||
/* if not default parameters, return conservative bound */
|
||||
@ -1455,7 +1550,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
return complen + wraplen;
|
||||
|
||||
/* default settings: return tight bound for that case */
|
||||
@@ -763,7 +787,7 @@ local void putShortMSB (s, b)
|
||||
@@ -768,7 +792,7 @@ local void putShortMSB (s, b)
|
||||
* applications may wish to modify it to avoid allocating a large
|
||||
* strm->next_out buffer and copying into it. (See also read_buf()).
|
||||
*/
|
||||
@ -1464,7 +1559,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
z_streamp strm;
|
||||
{
|
||||
unsigned len;
|
||||
@@ -1035,7 +1059,8 @@ int ZEXPORT deflate (strm, flush)
|
||||
@@ -1040,7 +1064,8 @@ int ZEXPORT deflate (strm, flush)
|
||||
(flush != Z_NO_FLUSH && s->status != FINISH_STATE)) {
|
||||
block_state bstate;
|
||||
|
||||
@ -1474,7 +1569,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
s->strategy == Z_HUFFMAN_ONLY ? deflate_huff(s, flush) :
|
||||
s->strategy == Z_RLE ? deflate_rle(s, flush) :
|
||||
(*(configuration_table[s->level].func))(s, flush);
|
||||
@@ -1082,7 +1107,6 @@ int ZEXPORT deflate (strm, flush)
|
||||
@@ -1087,7 +1112,6 @@ int ZEXPORT deflate (strm, flush)
|
||||
}
|
||||
|
||||
if (flush != Z_FINISH) return Z_OK;
|
||||
@ -1482,7 +1577,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
|
||||
/* Write the trailer */
|
||||
#ifdef GZIP
|
||||
@@ -1098,7 +1122,7 @@ int ZEXPORT deflate (strm, flush)
|
||||
@@ -1103,7 +1127,7 @@ int ZEXPORT deflate (strm, flush)
|
||||
}
|
||||
else
|
||||
#endif
|
||||
@ -1491,7 +1586,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
putShortMSB(s, (uInt)(strm->adler >> 16));
|
||||
putShortMSB(s, (uInt)(strm->adler & 0xffff));
|
||||
}
|
||||
@@ -1107,7 +1131,11 @@ int ZEXPORT deflate (strm, flush)
|
||||
@@ -1112,7 +1136,11 @@ int ZEXPORT deflate (strm, flush)
|
||||
* to flush the rest.
|
||||
*/
|
||||
if (s->wrap > 0) s->wrap = -s->wrap; /* write the trailer only once! */
|
||||
@ -1504,7 +1599,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
}
|
||||
|
||||
/* ========================================================================= */
|
||||
@@ -1124,9 +1152,9 @@ int ZEXPORT deflateEnd (strm)
|
||||
@@ -1129,9 +1157,9 @@ int ZEXPORT deflateEnd (strm)
|
||||
TRY_FREE(strm, strm->state->pending_buf);
|
||||
TRY_FREE(strm, strm->state->head);
|
||||
TRY_FREE(strm, strm->state->prev);
|
||||
@ -1516,7 +1611,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
strm->state = Z_NULL;
|
||||
|
||||
return status == BUSY_STATE ? Z_DATA_ERROR : Z_OK;
|
||||
@@ -1156,13 +1184,13 @@ int ZEXPORT deflateCopy (dest, source)
|
||||
@@ -1161,13 +1189,13 @@ int ZEXPORT deflateCopy (dest, source)
|
||||
|
||||
zmemcpy((voidpf)dest, (voidpf)source, sizeof(z_stream));
|
||||
|
||||
@ -1533,7 +1628,7 @@ Index: zlib-1.2.12/deflate.c
|
||||
ds->prev = (Posf *) ZALLOC(dest, ds->w_size, sizeof(Pos));
|
||||
ds->head = (Posf *) ZALLOC(dest, ds->hash_size, sizeof(Pos));
|
||||
ds->pending_buf = (uchf *) ZALLOC(dest, ds->lit_bufsize, 4);
|
||||
@@ -1209,7 +1237,8 @@ local unsigned read_buf(strm, buf, size)
|
||||
@@ -1214,7 +1242,8 @@ local unsigned read_buf(strm, buf, size)
|
||||
strm->avail_in -= len;
|
||||
|
||||
zmemcpy(buf, strm->next_in, len);
|
||||
@ -1543,11 +1638,11 @@ Index: zlib-1.2.12/deflate.c
|
||||
strm->adler = adler32(strm->adler, buf, len);
|
||||
}
|
||||
#ifdef GZIP
|
||||
Index: zlib-1.2.12/deflate.h
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/deflate.h
|
||||
+++ zlib-1.2.12/deflate.h
|
||||
@@ -299,6 +299,7 @@ void ZLIB_INTERNAL _tr_flush_bits OF((de
|
||||
diff --git a/deflate.h b/deflate.h
|
||||
index 1a06cd5f2..f92750ca6 100644
|
||||
--- a/deflate.h
|
||||
+++ b/deflate.h
|
||||
@@ -299,6 +299,7 @@ void ZLIB_INTERNAL _tr_flush_bits OF((deflate_state *s));
|
||||
void ZLIB_INTERNAL _tr_align OF((deflate_state *s));
|
||||
void ZLIB_INTERNAL _tr_stored_block OF((deflate_state *s, charf *buf,
|
||||
ulg stored_len, int last));
|
||||
@ -1555,7 +1650,7 @@ Index: zlib-1.2.12/deflate.h
|
||||
|
||||
#define d_code(dist) \
|
||||
((dist) < 256 ? _dist_code[dist] : _dist_code[256+((dist)>>7)])
|
||||
@@ -343,4 +344,15 @@ void ZLIB_INTERNAL _tr_stored_block OF((
|
||||
@@ -343,4 +344,15 @@ void ZLIB_INTERNAL _tr_stored_block OF((deflate_state *s, charf *buf,
|
||||
flush = _tr_tally(s, distance, length)
|
||||
#endif
|
||||
|
||||
@ -1571,10 +1666,10 @@ Index: zlib-1.2.12/deflate.h
|
||||
+void ZLIB_INTERNAL flush_pending OF((z_streamp strm));
|
||||
+
|
||||
#endif /* DEFLATE_H */
|
||||
Index: zlib-1.2.12/gzguts.h
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/gzguts.h
|
||||
+++ zlib-1.2.12/gzguts.h
|
||||
diff --git a/gzguts.h b/gzguts.h
|
||||
index 57faf3716..581f2b631 100644
|
||||
--- a/gzguts.h
|
||||
+++ b/gzguts.h
|
||||
@@ -153,7 +153,11 @@
|
||||
|
||||
/* default i/o buffer size -- double this for output when reading (this and
|
||||
@ -1587,10 +1682,10 @@ Index: zlib-1.2.12/gzguts.h
|
||||
|
||||
/* gzip modes, also provide a little integrity check on the passed structure */
|
||||
#define GZ_NONE 0
|
||||
Index: zlib-1.2.12/inflate.c
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/inflate.c
|
||||
+++ zlib-1.2.12/inflate.c
|
||||
diff --git a/inflate.c b/inflate.c
|
||||
index 2a3c4fe98..ca0f8c9a4 100644
|
||||
--- a/inflate.c
|
||||
+++ b/inflate.c
|
||||
@@ -85,6 +85,24 @@
|
||||
#include "inflate.h"
|
||||
#include "inffast.h"
|
||||
@ -1633,7 +1728,7 @@ Index: zlib-1.2.12/inflate.c
|
||||
state->window = Z_NULL;
|
||||
}
|
||||
|
||||
@@ -219,7 +238,7 @@ int stream_size;
|
||||
@@ -222,7 +241,7 @@ int stream_size;
|
||||
strm->zfree = zcfree;
|
||||
#endif
|
||||
state = (struct inflate_state FAR *)
|
||||
@ -1642,7 +1737,7 @@ Index: zlib-1.2.12/inflate.c
|
||||
if (state == Z_NULL) return Z_MEM_ERROR;
|
||||
Tracev((stderr, "inflate: allocated\n"));
|
||||
strm->state = (struct internal_state FAR *)state;
|
||||
@@ -228,7 +247,7 @@ int stream_size;
|
||||
@@ -231,7 +250,7 @@ int stream_size;
|
||||
state->mode = HEAD; /* to pass state test in inflateReset2() */
|
||||
ret = inflateReset2(strm, windowBits);
|
||||
if (ret != Z_OK) {
|
||||
@ -1651,7 +1746,7 @@ Index: zlib-1.2.12/inflate.c
|
||||
strm->state = Z_NULL;
|
||||
}
|
||||
return ret;
|
||||
@@ -250,6 +269,7 @@ int value;
|
||||
@@ -253,6 +272,7 @@ int value;
|
||||
struct inflate_state FAR *state;
|
||||
|
||||
if (inflateStateCheck(strm)) return Z_STREAM_ERROR;
|
||||
@ -1659,7 +1754,7 @@ Index: zlib-1.2.12/inflate.c
|
||||
state = (struct inflate_state FAR *)strm->state;
|
||||
if (bits < 0) {
|
||||
state->hold = 0;
|
||||
@@ -377,6 +397,27 @@ void makefixed()
|
||||
@@ -380,6 +400,27 @@ void makefixed()
|
||||
}
|
||||
#endif /* MAKEFIXED */
|
||||
|
||||
@ -1687,7 +1782,7 @@ Index: zlib-1.2.12/inflate.c
|
||||
/*
|
||||
Update the window with the last wsize (normally 32K) bytes written before
|
||||
returning. If window does not exist yet, create it. This is only called
|
||||
@@ -401,20 +442,7 @@ unsigned copy;
|
||||
@@ -404,20 +445,7 @@ unsigned copy;
|
||||
|
||||
state = (struct inflate_state FAR *)strm->state;
|
||||
|
||||
@ -1709,7 +1804,7 @@ Index: zlib-1.2.12/inflate.c
|
||||
|
||||
/* copy state->wsize or less output bytes into the circular window */
|
||||
if (copy >= state->wsize) {
|
||||
@@ -857,6 +885,7 @@ int flush;
|
||||
@@ -861,6 +889,7 @@ int flush;
|
||||
if (flush == Z_BLOCK || flush == Z_TREES) goto inf_leave;
|
||||
/* fallthrough */
|
||||
case TYPEDO:
|
||||
@ -1717,7 +1812,7 @@ Index: zlib-1.2.12/inflate.c
|
||||
if (state->last) {
|
||||
BYTEBITS();
|
||||
state->mode = CHECK;
|
||||
@@ -1218,7 +1247,7 @@ int flush;
|
||||
@@ -1222,7 +1251,7 @@ int flush;
|
||||
out -= left;
|
||||
strm->total_out += out;
|
||||
state->total += out;
|
||||
@ -1726,7 +1821,7 @@ Index: zlib-1.2.12/inflate.c
|
||||
strm->adler = state->check =
|
||||
UPDATE_CHECK(state->check, put - out, out);
|
||||
out = left;
|
||||
@@ -1273,8 +1302,9 @@ int flush;
|
||||
@@ -1277,8 +1306,9 @@ int flush;
|
||||
*/
|
||||
inf_leave:
|
||||
RESTORE();
|
||||
@ -1738,7 +1833,7 @@ Index: zlib-1.2.12/inflate.c
|
||||
if (updatewindow(strm, strm->next_out, out - strm->avail_out)) {
|
||||
state->mode = MEM;
|
||||
return Z_MEM_ERROR;
|
||||
@@ -1284,7 +1314,7 @@ int flush;
|
||||
@@ -1288,7 +1318,7 @@ int flush;
|
||||
strm->total_in += in;
|
||||
strm->total_out += out;
|
||||
state->total += out;
|
||||
@ -1747,7 +1842,7 @@ Index: zlib-1.2.12/inflate.c
|
||||
strm->adler = state->check =
|
||||
UPDATE_CHECK(state->check, strm->next_out - out, out);
|
||||
strm->data_type = (int)state->bits + (state->last ? 64 : 0) +
|
||||
@@ -1302,8 +1332,8 @@ z_streamp strm;
|
||||
@@ -1306,8 +1336,8 @@ z_streamp strm;
|
||||
if (inflateStateCheck(strm))
|
||||
return Z_STREAM_ERROR;
|
||||
state = (struct inflate_state FAR *)strm->state;
|
||||
@ -1758,7 +1853,7 @@ Index: zlib-1.2.12/inflate.c
|
||||
strm->state = Z_NULL;
|
||||
Tracev((stderr, "inflate: end\n"));
|
||||
return Z_OK;
|
||||
@@ -1482,6 +1512,7 @@ z_streamp strm;
|
||||
@@ -1486,6 +1516,7 @@ z_streamp strm;
|
||||
struct inflate_state FAR *state;
|
||||
|
||||
if (inflateStateCheck(strm)) return Z_STREAM_ERROR;
|
||||
@ -1766,7 +1861,7 @@ Index: zlib-1.2.12/inflate.c
|
||||
state = (struct inflate_state FAR *)strm->state;
|
||||
return state->mode == STORED && state->bits == 0;
|
||||
}
|
||||
@@ -1502,21 +1533,22 @@ z_streamp source;
|
||||
@@ -1506,21 +1537,22 @@ z_streamp source;
|
||||
|
||||
/* allocate space */
|
||||
copy = (struct inflate_state FAR *)
|
||||
@ -1793,7 +1888,7 @@ Index: zlib-1.2.12/inflate.c
|
||||
copy->strm = dest;
|
||||
if (state->lencode >= state->codes &&
|
||||
state->lencode <= state->codes + ENOUGH - 1) {
|
||||
@@ -1573,6 +1605,7 @@ z_streamp strm;
|
||||
@@ -1577,6 +1609,7 @@ z_streamp strm;
|
||||
|
||||
if (inflateStateCheck(strm))
|
||||
return -(1L << 16);
|
||||
@ -1801,20 +1896,20 @@ Index: zlib-1.2.12/inflate.c
|
||||
state = (struct inflate_state FAR *)strm->state;
|
||||
return (long)(((unsigned long)((long)state->back)) << 16) +
|
||||
(state->mode == COPY ? state->length :
|
||||
Index: zlib-1.2.12/inflate.h
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/inflate.h
|
||||
+++ zlib-1.2.12/inflate.h
|
||||
diff --git a/inflate.h b/inflate.h
|
||||
index f127b6b1f..519ed3535 100644
|
||||
--- a/inflate.h
|
||||
+++ b/inflate.h
|
||||
@@ -124,3 +124,5 @@ struct inflate_state {
|
||||
int back; /* bits back of last unprocessed length/lit */
|
||||
unsigned was; /* initial length of match */
|
||||
};
|
||||
+
|
||||
+int ZLIB_INTERNAL inflate_ensure_window OF((struct inflate_state *state));
|
||||
Index: zlib-1.2.12/test/infcover.c
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/test/infcover.c
|
||||
+++ zlib-1.2.12/test/infcover.c
|
||||
diff --git a/test/infcover.c b/test/infcover.c
|
||||
index 2be01646c..a208219dc 100644
|
||||
--- a/test/infcover.c
|
||||
+++ b/test/infcover.c
|
||||
@@ -373,7 +373,7 @@ local void cover_support(void)
|
||||
mem_setup(&strm);
|
||||
strm.avail_in = 0;
|
||||
@ -1833,10 +1928,10 @@ Index: zlib-1.2.12/test/infcover.c
|
||||
{
|
||||
static unsigned int next = 0;
|
||||
static unsigned char dat[] = {0x63, 0, 2, 0};
|
||||
Index: zlib-1.2.12/test/minigzip.c
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/test/minigzip.c
|
||||
+++ zlib-1.2.12/test/minigzip.c
|
||||
diff --git a/test/minigzip.c b/test/minigzip.c
|
||||
index e22fb08c0..4b5f4efed 100644
|
||||
--- a/test/minigzip.c
|
||||
+++ b/test/minigzip.c
|
||||
@@ -132,7 +132,11 @@ static void pwinerror (s)
|
||||
#endif
|
||||
#define SUFFIX_LEN (sizeof(GZ_SUFFIX)-1)
|
||||
@ -1849,11 +1944,11 @@ Index: zlib-1.2.12/test/minigzip.c
|
||||
#define MAX_NAME_LEN 1024
|
||||
|
||||
#ifdef MAXSEG_64K
|
||||
Index: zlib-1.2.12/trees.c
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/trees.c
|
||||
+++ zlib-1.2.12/trees.c
|
||||
@@ -149,8 +149,6 @@ local void send_all_trees OF((deflate_st
|
||||
diff --git a/trees.c b/trees.c
|
||||
index 72b521fb0..534f29c98 100644
|
||||
--- a/trees.c
|
||||
+++ b/trees.c
|
||||
@@ -149,8 +149,6 @@ local void send_all_trees OF((deflate_state *s, int lcodes, int dcodes,
|
||||
local void compress_block OF((deflate_state *s, const ct_data *ltree,
|
||||
const ct_data *dtree));
|
||||
local int detect_data_type OF((deflate_state *s));
|
||||
@ -1894,11 +1989,11 @@ Index: zlib-1.2.12/trees.c
|
||||
deflate_state *s;
|
||||
{
|
||||
if (s->bi_valid > 8) {
|
||||
Index: zlib-1.2.12/zutil.h
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/zutil.h
|
||||
+++ zlib-1.2.12/zutil.h
|
||||
@@ -87,6 +87,8 @@ extern z_const char * const z_errmsg[10]
|
||||
diff --git a/zutil.h b/zutil.h
|
||||
index d9a20ae1b..bc83f59d0 100644
|
||||
--- a/zutil.h
|
||||
+++ b/zutil.h
|
||||
@@ -87,6 +87,8 @@ extern z_const char * const z_errmsg[10]; /* indexed by 2-zlib_error */
|
||||
|
||||
#define PRESET_DICT 0x20 /* preset dictionary flag in zlib header */
|
||||
|
||||
|
219
zlib-1.2.12-add-optimized-slide_hash-for-power.patch
Normal file
219
zlib-1.2.12-add-optimized-slide_hash-for-power.patch
Normal file
@ -0,0 +1,219 @@
|
||||
From 4a8d89ae49aa17d1634a2816c8d159f533a07eae Mon Sep 17 00:00:00 2001
|
||||
From: Matheus Castanho <msc@linux.ibm.com>
|
||||
Date: Wed, 27 Nov 2019 10:18:10 -0300
|
||||
Subject: [PATCH] Add optimized slide_hash for Power
|
||||
|
||||
Considerable time is spent on deflate.c:slide_hash() during
|
||||
deflate. This commit introduces a new slide_hash function that
|
||||
uses VSX vector instructions to slide 8 hash elements at a time,
|
||||
instead of just one as the standard code does.
|
||||
|
||||
The choice between the optimized and default versions is made only
|
||||
on the first call to the function, enabling a fallback to standard
|
||||
behavior if the host processor does not support VSX instructions,
|
||||
so the same binary can be used for multiple Power processor
|
||||
versions.
|
||||
|
||||
Author: Matheus Castanho <msc@linux.ibm.com>
|
||||
---
|
||||
CMakeLists.txt | 3 +-
|
||||
Makefile.in | 8 ++++
|
||||
configure | 4 +-
|
||||
contrib/power/power.h | 3 ++
|
||||
contrib/power/slide_hash_power8.c | 63 +++++++++++++++++++++++++++++
|
||||
contrib/power/slide_hash_resolver.c | 15 +++++++
|
||||
deflate.c | 12 ++++++
|
||||
7 files changed, 105 insertions(+), 3 deletions(-)
|
||||
create mode 100644 contrib/power/slide_hash_power8.c
|
||||
create mode 100644 contrib/power/slide_hash_resolver.c
|
||||
|
||||
diff --git a/CMakeLists.txt b/CMakeLists.txt
|
||||
index 44de486f6..8208c626b 100644
|
||||
--- a/CMakeLists.txt
|
||||
+++ b/CMakeLists.txt
|
||||
@@ -186,7 +186,8 @@ if(CMAKE_COMPILER_IS_GNUCC)
|
||||
add_definitions(-DZ_POWER8)
|
||||
set(ZLIB_POWER8
|
||||
contrib/power/adler32_power8.c
|
||||
- contrib/power/crc32_z_power8.c)
|
||||
+ contrib/power/crc32_z_power8.c
|
||||
+ contrib/power/slide_hash_power8.c)
|
||||
|
||||
set_source_files_properties(
|
||||
${ZLIB_POWER8}
|
||||
diff --git a/Makefile.in b/Makefile.in
|
||||
index 9ef9fa9b5..f71c6eae0 100644
|
||||
--- a/Makefile.in
|
||||
+++ b/Makefile.in
|
||||
@@ -183,6 +183,9 @@ crc32_z_power8.o: $(SRCDIR)contrib/power/crc32_z_power8.c
|
||||
deflate.o: $(SRCDIR)deflate.c
|
||||
$(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)deflate.c
|
||||
|
||||
+slide_hash_power8.o: $(SRCDIR)contrib/power/slide_hash_power8.c
|
||||
+ $(CC) $(CFLAGS) -mcpu=power8 $(ZINC) -c -o $@ $(SRCDIR)contrib/power/slide_hash_power8.c
|
||||
+
|
||||
infback.o: $(SRCDIR)infback.c
|
||||
$(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)infback.c
|
||||
|
||||
@@ -245,6 +248,11 @@ deflate.lo: $(SRCDIR)deflate.c
|
||||
$(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/deflate.o $(SRCDIR)deflate.c
|
||||
-@mv objs/deflate.o $@
|
||||
|
||||
+slide_hash_power8.lo: $(SRCDIR)contrib/power/slide_hash_power8.c
|
||||
+ -@mkdir objs 2>/dev/null || test -d objs
|
||||
+ $(CC) $(SFLAGS) -mcpu=power8 $(ZINC) -DPIC -c -o objs/slide_hash_power8.o $(SRCDIR)contrib/power/slide_hash_power8.c
|
||||
+ -@mv objs/slide_hash_power8.o $@
|
||||
+
|
||||
infback.lo: $(SRCDIR)infback.c
|
||||
-@mkdir objs 2>/dev/null || test -d objs
|
||||
$(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/infback.o $(SRCDIR)infback.c
|
||||
diff --git a/configure b/configure
|
||||
index 810a7404d..d0dacf9c2 100755
|
||||
--- a/configure
|
||||
+++ b/configure
|
||||
@@ -879,8 +879,8 @@ if tryboth $CC -c $CFLAGS $test.c; then
|
||||
|
||||
if tryboth $CC -c $CFLAGS -mcpu=power8 $test.c; then
|
||||
POWER8="-DZ_POWER8"
|
||||
- PIC_OBJC="${PIC_OBJC} adler32_power8.lo crc32_z_power8.lo"
|
||||
- OBJC="${OBJC} adler32_power8.o crc32_z_power8.o"
|
||||
+ PIC_OBJC="${PIC_OBJC} adler32_power8.lo crc32_z_power8.lo slide_hash_power8.lo"
|
||||
+ OBJC="${OBJC} adler32_power8.o crc32_z_power8.o slide_hash_power8.o"
|
||||
echo "Checking for -mcpu=power8 support... Yes." | tee -a configure.log
|
||||
else
|
||||
echo "Checking for -mcpu=power8 support... No." | tee -a configure.log
|
||||
diff --git a/contrib/power/power.h b/contrib/power/power.h
|
||||
index f57c76167..28c8f78ca 100644
|
||||
--- a/contrib/power/power.h
|
||||
+++ b/contrib/power/power.h
|
||||
@@ -4,7 +4,10 @@
|
||||
*/
|
||||
#include "../../zconf.h"
|
||||
#include "../../zutil.h"
|
||||
+#include "../../deflate.h"
|
||||
|
||||
uLong _adler32_power8(uLong adler, const Bytef* buf, uInt len);
|
||||
|
||||
unsigned long _crc32_z_power8(unsigned long, const Bytef *, z_size_t);
|
||||
+
|
||||
+void _slide_hash_power8(deflate_state *s);
|
||||
diff --git a/contrib/power/slide_hash_power8.c b/contrib/power/slide_hash_power8.c
|
||||
new file mode 100644
|
||||
index 000000000..c5a0eb5a6
|
||||
--- /dev/null
|
||||
+++ b/contrib/power/slide_hash_power8.c
|
||||
@@ -0,0 +1,63 @@
|
||||
+ /* Copyright (C) 2019 Matheus Castanho <msc@linux.ibm.com>, IBM
|
||||
+ * For conditions of distribution and use, see copyright notice in zlib.h
|
||||
+ */
|
||||
+
|
||||
+#include <altivec.h>
|
||||
+#include "../../deflate.h"
|
||||
+
|
||||
+local inline void slide_hash_power8_loop OF((deflate_state *s,
|
||||
+ unsigned n_elems, Posf *table_end)) __attribute__((always_inline));
|
||||
+
|
||||
+local void slide_hash_power8_loop(
|
||||
+ deflate_state *s,
|
||||
+ unsigned n_elems,
|
||||
+ Posf *table_end)
|
||||
+{
|
||||
+ vector unsigned short vw, vm, *vp;
|
||||
+ unsigned chunks;
|
||||
+
|
||||
+ /* Each vector register (chunk) corresponds to 128 bits == 8 Posf,
|
||||
+ * so instead of processing each of the n_elems in the hash table
|
||||
+ * individually, we can do it in chunks of 8 with vector instructions.
|
||||
+ *
|
||||
+ * This function is only called from slide_hash_power8(), and both calls
|
||||
+ * pass n_elems as a power of 2 higher than 2^7, as defined by
|
||||
+ * deflateInit2_(), so n_elems will always be a multiple of 8. */
|
||||
+ chunks = n_elems >> 3;
|
||||
+ Assert(n_elems % 8 == 0, "Weird hash table size!");
|
||||
+
|
||||
+ /* This type casting is safe since s->w_size is always <= 64KB
|
||||
+ * as defined by deflateInit2_() and Posf == unsigned short */
|
||||
+ vw[0] = (Posf) s->w_size;
|
||||
+ vw = vec_splat(vw,0);
|
||||
+
|
||||
+ vp = (vector unsigned short *) table_end;
|
||||
+
|
||||
+ do {
|
||||
+ /* Processing 8 elements at a time */
|
||||
+ vp--;
|
||||
+ vm = *vp;
|
||||
+
|
||||
+ /* This is equivalent to: m >= w_size ? m - w_size : 0
|
||||
+ * Since we are using a saturated unsigned subtraction, any
|
||||
+ * values that are > w_size will be set to 0, while the others
|
||||
+ * will be subtracted by w_size. */
|
||||
+ *vp = vec_subs(vm,vw);
|
||||
+ } while (--chunks);
|
||||
+};
|
||||
+
|
||||
+void ZLIB_INTERNAL _slide_hash_power8(deflate_state *s)
|
||||
+{
|
||||
+ unsigned n;
|
||||
+ Posf *p;
|
||||
+
|
||||
+ n = s->hash_size;
|
||||
+ p = &s->head[n];
|
||||
+ slide_hash_power8_loop(s,n,p);
|
||||
+
|
||||
+#ifndef FASTEST
|
||||
+ n = s->w_size;
|
||||
+ p = &s->prev[n];
|
||||
+ slide_hash_power8_loop(s,n,p);
|
||||
+#endif
|
||||
+}
|
||||
diff --git a/contrib/power/slide_hash_resolver.c b/contrib/power/slide_hash_resolver.c
|
||||
new file mode 100644
|
||||
index 000000000..54fa1eb21
|
||||
--- /dev/null
|
||||
+++ b/contrib/power/slide_hash_resolver.c
|
||||
@@ -0,0 +1,15 @@
|
||||
+/* Copyright (C) 2019 Matheus Castanho <msc@linux.ibm.com>, IBM
|
||||
+ * For conditions of distribution and use, see copyright notice in zlib.h
|
||||
+ */
|
||||
+
|
||||
+#include "../gcc/zifunc.h"
|
||||
+#include "power.h"
|
||||
+
|
||||
+Z_IFUNC(slide_hash) {
|
||||
+#ifdef Z_POWER8
|
||||
+ if (__builtin_cpu_supports("arch_2_07"))
|
||||
+ return _slide_hash_power8;
|
||||
+#endif
|
||||
+
|
||||
+ return slide_hash_default;
|
||||
+}
|
||||
diff --git a/deflate.c b/deflate.c
|
||||
index 799fb93cc..b2db576dc 100644
|
||||
--- a/deflate.c
|
||||
+++ b/deflate.c
|
||||
@@ -196,6 +196,13 @@ local const config configuration_table[10] = {
|
||||
(unsigned)(s->hash_size-1)*sizeof(*s->head)); \
|
||||
} while (0)
|
||||
|
||||
+#ifdef Z_POWER_OPT
|
||||
+/* Rename function so resolver can use its symbol. The default version will be
|
||||
+ * returned by the resolver if the host has no support for an optimized version.
|
||||
+ */
|
||||
+#define slide_hash slide_hash_default
|
||||
+#endif /* Z_POWER_OPT */
|
||||
+
|
||||
/* ===========================================================================
|
||||
* Slide the hash table when sliding the window down (could be avoided with 32
|
||||
* bit values at the expense of memory usage). We slide even when level == 0 to
|
||||
@@ -227,6 +234,11 @@ local void slide_hash(s)
|
||||
#endif
|
||||
}
|
||||
|
||||
+#ifdef Z_POWER_OPT
|
||||
+#undef slide_hash
|
||||
+#include "contrib/power/slide_hash_resolver.c"
|
||||
+#endif /* Z_POWER_OPT */
|
||||
+
|
||||
/* ========================================================================= */
|
||||
int ZEXPORT deflateInit_(strm, level, version, stream_size)
|
||||
z_streamp strm;
|
338
zlib-1.2.12-add-vectorized-longest_match-for-power.patch
Normal file
338
zlib-1.2.12-add-vectorized-longest_match-for-power.patch
Normal file
@ -0,0 +1,338 @@
|
||||
From aecdff0646c7e188b48f6db285d8d63a74f246c1 Mon Sep 17 00:00:00 2001
|
||||
From: Matheus Castanho <msc@linux.ibm.com>
|
||||
Date: Tue, 29 Oct 2019 18:04:11 -0300
|
||||
Subject: [PATCH] Add vectorized longest_match for Power
|
||||
|
||||
This commit introduces an optimized version of the longest_match
|
||||
function for Power processors. It uses VSX instructions to match
|
||||
16 bytes at a time on each comparison, instead of one by one.
|
||||
|
||||
Author: Matheus Castanho <msc@linux.ibm.com>
|
||||
---
|
||||
CMakeLists.txt | 3 +-
|
||||
Makefile.in | 8 +
|
||||
configure | 4 +-
|
||||
contrib/power/longest_match_power9.c | 194 +++++++++++++++++++++++++
|
||||
contrib/power/longest_match_resolver.c | 15 ++
|
||||
contrib/power/power.h | 2 +
|
||||
deflate.c | 13 ++
|
||||
7 files changed, 236 insertions(+), 3 deletions(-)
|
||||
create mode 100644 contrib/power/longest_match_power9.c
|
||||
create mode 100644 contrib/power/longest_match_resolver.c
|
||||
|
||||
Index: zlib-1.2.12/CMakeLists.txt
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/CMakeLists.txt
|
||||
+++ zlib-1.2.12/CMakeLists.txt
|
||||
@@ -199,7 +199,8 @@ if(CMAKE_COMPILER_IS_GNUCC)
|
||||
|
||||
if(POWER9)
|
||||
add_definitions(-DZ_POWER9)
|
||||
- set(ZLIB_POWER9 )
|
||||
+ set(ZLIB_POWER9
|
||||
+ contrib/power/longest_match_power9.c)
|
||||
|
||||
set_source_files_properties(
|
||||
${ZLIB_POWER9}
|
||||
Index: zlib-1.2.12/Makefile.in
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/Makefile.in
|
||||
+++ zlib-1.2.12/Makefile.in
|
||||
@@ -189,6 +189,9 @@ crc32-vx.o: $(SRCDIR)contrib/s390/crc32-
|
||||
deflate.o: $(SRCDIR)deflate.c
|
||||
$(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)deflate.c
|
||||
|
||||
+longest_match_power9.o: $(SRCDIR)contrib/power/longest_match_power9.c
|
||||
+ $(CC) $(CFLAGS) -mcpu=power9 $(ZINC) -c -o $@ $(SRCDIR)contrib/power/longest_match_power9.c
|
||||
+
|
||||
slide_hash_power8.o: $(SRCDIR)contrib/power/slide_hash_power8.c
|
||||
$(CC) $(CFLAGS) -mcpu=power8 $(ZINC) -c -o $@ $(SRCDIR)contrib/power/slide_hash_power8.c
|
||||
|
||||
@@ -259,6 +262,11 @@ deflate.lo: $(SRCDIR)deflate.c
|
||||
$(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/deflate.o $(SRCDIR)deflate.c
|
||||
-@mv objs/deflate.o $@
|
||||
|
||||
+longest_match_power9.lo: $(SRCDIR)contrib/power/longest_match_power9.c
|
||||
+ -@mkdir objs 2>/dev/null || test -d objs
|
||||
+ $(CC) $(SFLAGS) -mcpu=power9 $(ZINC) -DPIC -c -o objs/longest_match_power9.o $(SRCDIR)contrib/power/longest_match_power9.c
|
||||
+ -@mv objs/longest_match_power9.o $@
|
||||
+
|
||||
slide_hash_power8.lo: $(SRCDIR)contrib/power/slide_hash_power8.c
|
||||
-@mkdir objs 2>/dev/null || test -d objs
|
||||
$(CC) $(SFLAGS) -mcpu=power8 $(ZINC) -DPIC -c -o objs/slide_hash_power8.o $(SRCDIR)contrib/power/slide_hash_power8.c
|
||||
Index: zlib-1.2.12/configure
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/configure
|
||||
+++ zlib-1.2.12/configure
|
||||
@@ -915,8 +915,8 @@ if tryboth $CC -c $CFLAGS $test.c; then
|
||||
|
||||
if tryboth $CC -c $CFLAGS -mcpu=power9 $test.c; then
|
||||
POWER9="-DZ_POWER9"
|
||||
- PIC_OBJC="${PIC_OBJC}"
|
||||
- OBJC="${OBJC}"
|
||||
+ PIC_OBJC="$PIC_OBJC longest_match_power9.lo"
|
||||
+ OBJC="$OBJC longest_match_power9.o"
|
||||
echo "Checking for -mcpu=power9 support... Yes." | tee -a configure.log
|
||||
else
|
||||
echo "Checking for -mcpu=power9 support... No." | tee -a configure.log
|
||||
Index: zlib-1.2.12/contrib/power/longest_match_power9.c
|
||||
===================================================================
|
||||
--- /dev/null
|
||||
+++ zlib-1.2.12/contrib/power/longest_match_power9.c
|
||||
@@ -0,0 +1,194 @@
|
||||
+/* Copyright (C) 2019 Matheus Castanho <msc@linux.ibm.com>, IBM
|
||||
+ * For conditions of distribution and use, see copyright notice in zlib.h
|
||||
+ */
|
||||
+
|
||||
+#include <altivec.h>
|
||||
+#include "../../deflate.h"
|
||||
+
|
||||
+local inline int vec_match OF((Bytef* scan, Bytef* match))
|
||||
+ __attribute__((always_inline));
|
||||
+
|
||||
+local inline int vec_match(Bytef* scan, Bytef* match)
|
||||
+{
|
||||
+ vector unsigned char vscan, vmatch, vc;
|
||||
+ int len;
|
||||
+
|
||||
+ vscan = *((vector unsigned char *) scan);
|
||||
+ vmatch = *((vector unsigned char *) match);
|
||||
+
|
||||
+ /* Compare 16 bytes at a time.
|
||||
+ * Each byte of vc will be either all ones or all zeroes,
|
||||
+ * depending on the result of the comparison
|
||||
+ */
|
||||
+ vc = (vector unsigned char) vec_cmpne(vscan,vmatch);
|
||||
+
|
||||
+ /* Since the index of matching bytes will contain only zeroes
|
||||
+ * on vc (since we used cmpne), counting the number of consecutive
|
||||
+ * bytes where LSB == 0 is the same as counting the length of the match.
|
||||
+ *
|
||||
+ * There was an issue in the way the vec_cnttz_lsbb builtin was implemented
|
||||
+ * that got fixed on GCC 12, but now we have to use different builtins
|
||||
+ * depending on the compiler version. To avoid that, let's use inline asm to
|
||||
+ * generate the exact instruction we need.
|
||||
+ */
|
||||
+ #ifdef __LITTLE_ENDIAN__
|
||||
+ asm volatile("vctzlsbb %0, %1\n\t" : "=r" (len) : "v" (vc));
|
||||
+ #else
|
||||
+ asm volatile("vclzlsbb %0, %1\n\t" : "=r" (len) : "v" (vc));
|
||||
+ #endif
|
||||
+
|
||||
+ return len;
|
||||
+}
|
||||
+
|
||||
+uInt ZLIB_INTERNAL _longest_match_power9(deflate_state *s, IPos cur_match)
|
||||
+{
|
||||
+ unsigned chain_length = s->max_chain_length;/* max hash chain length */
|
||||
+ register Bytef *scan = s->window + s->strstart; /* current string */
|
||||
+ register Bytef *match; /* matched string */
|
||||
+ register int len; /* length of current match */
|
||||
+ int best_len = (int)s->prev_length; /* best match length so far */
|
||||
+ int nice_match = s->nice_match; /* stop if match long enough */
|
||||
+ int mbytes; /* matched bytes inside loop */
|
||||
+ IPos limit = s->strstart > (IPos)MAX_DIST(s) ?
|
||||
+ s->strstart - (IPos)MAX_DIST(s) : 0;
|
||||
+ /* Stop when cur_match becomes <= limit. To simplify the code,
|
||||
+ * we prevent matches with the string of window index 0.
|
||||
+ */
|
||||
+ Posf *prev = s->prev;
|
||||
+ uInt wmask = s->w_mask;
|
||||
+
|
||||
+#if (MAX_MATCH == 258)
|
||||
+ /* Compare the last two bytes at once. */
|
||||
+ register Bytef *strend2 = s->window + s->strstart + MAX_MATCH - 2;
|
||||
+ register ush scan_end = *(ushf*)(scan+best_len-1);
|
||||
+#else
|
||||
+ register Bytef *strend = s->window + s->strstart + MAX_MATCH;
|
||||
+ register Byte scan_end1 = scan[best_len-1];
|
||||
+ register Byte scan_end = scan[best_len];
|
||||
+#endif
|
||||
+
|
||||
+ /* The code is optimized for HASH_BITS >= 8 and MAX_MATCH-2 multiple of 16.
|
||||
+ * It is easy to get rid of this optimization if necessary.
|
||||
+ */
|
||||
+ Assert(s->hash_bits >= 8 && MAX_MATCH == 258, "Code too clever");
|
||||
+
|
||||
+ /* Do not waste too much time if we already have a good match: */
|
||||
+ if (s->prev_length >= s->good_match) {
|
||||
+ chain_length >>= 2;
|
||||
+ }
|
||||
+ /* Do not look for matches beyond the end of the input. This is necessary
|
||||
+ * to make deflate deterministic.
|
||||
+ */
|
||||
+ if ((uInt)nice_match > s->lookahead) nice_match = (int)s->lookahead;
|
||||
+
|
||||
+ Assert((ulg)s->strstart <= s->window_size-MIN_LOOKAHEAD, "need lookahead");
|
||||
+
|
||||
+ do {
|
||||
+ Assert(cur_match < s->strstart, "no future");
|
||||
+ match = s->window + cur_match;
|
||||
+
|
||||
+ /* Skip to next match if the match length cannot increase
|
||||
+ * or if the match length is less than 2. Note that the checks below
|
||||
+ * for insufficient lookahead only occur occasionally for performance
|
||||
+ * reasons. Therefore uninitialized memory will be accessed, and
|
||||
+ * conditional jumps will be made that depend on those values.
|
||||
+ * However the length of the match is limited to the lookahead, so
|
||||
+ * the output of deflate is not affected by the uninitialized values.
|
||||
+ */
|
||||
+
|
||||
+/* MAX_MATCH - 2 should be a multiple of 16 for this optimization to work. */
|
||||
+#if (MAX_MATCH == 258)
|
||||
+
|
||||
+ /* Compare ending (2 bytes) and beginning of potential match.
|
||||
+ *
|
||||
+ * On Power processors, loading a 16-byte vector takes only 1 extra
|
||||
+ * cycle compared to a regular byte load. So instead of comparing the
|
||||
+ * first two bytes and then the rest later if they match, we can compare
|
||||
+ * the first 16 at once, and when we have a match longer than 2, we will
|
||||
+ * already have the result of comparing the first 16 bytes saved in mbytes.
|
||||
+ */
|
||||
+ if (*(ushf*)(match+best_len-1) != scan_end ||
|
||||
+ (mbytes = vec_match(scan,match)) < 3) continue;
|
||||
+
|
||||
+ scan += mbytes;
|
||||
+ match += mbytes;
|
||||
+
|
||||
+ /* In case when we may have a match longer than 16, we perform further
|
||||
+ * comparisons in chunks of 16 and keep going while all bytes match.
|
||||
+ */
|
||||
+ while(mbytes == 16) {
|
||||
+ mbytes = vec_match(scan,match);
|
||||
+ scan += mbytes;
|
||||
+ match += mbytes;
|
||||
+
|
||||
+ /* We also have to limit the maximum match based on MAX_MATCH.
|
||||
+ * Since we are comparing 16 bytes at a time and MAX_MATCH == 258 (to
|
||||
+ * comply with default implementation), we should stop comparing when
|
||||
+ * we have matched 256 bytes, which happens when scan == strend2.
|
||||
+ * In this ("rare") case, we have to check the remaining 2 bytes
|
||||
+ * individually using common load and compare operations.
|
||||
+ */
|
||||
+ if(scan >= strend2) {
|
||||
+ if(*scan == *match) {
|
||||
+ if(*++scan == *++match)
|
||||
+ scan++;
|
||||
+ }
|
||||
+ break;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ Assert(scan <= s->window+(unsigned)(s->window_size-1), "wild scan");
|
||||
+
|
||||
+ len = (MAX_MATCH - 2) - (int)(strend2 - scan);
|
||||
+ scan = strend2 - (MAX_MATCH - 2);
|
||||
+
|
||||
+#else /* MAX_MATCH == 258 */
|
||||
+
|
||||
+ if (match[best_len] != scan_end ||
|
||||
+ match[best_len-1] != scan_end1 ||
|
||||
+ *match != *scan ||
|
||||
+ *++match != scan[1]) continue;
|
||||
+
|
||||
+ /* The check at best_len-1 can be removed because it will be made
|
||||
+ * again later. (This heuristic is not always a win.)
|
||||
+ * It is not necessary to compare scan[2] and match[2] since they
|
||||
+ * are always equal when the other bytes match, given that
|
||||
+ * the hash keys are equal and that HASH_BITS >= 8.
|
||||
+ */
|
||||
+ scan += 2, match++;
|
||||
+ Assert(*scan == *match, "match[2]?");
|
||||
+
|
||||
+ /* We check for insufficient lookahead only every 8th comparison;
|
||||
+ * the 256th check will be made at strstart+258.
|
||||
+ */
|
||||
+ do {
|
||||
+ } while (*++scan == *++match && *++scan == *++match &&
|
||||
+ *++scan == *++match && *++scan == *++match &&
|
||||
+ *++scan == *++match && *++scan == *++match &&
|
||||
+ *++scan == *++match && *++scan == *++match &&
|
||||
+ scan < strend);
|
||||
+
|
||||
+ Assert(scan <= s->window+(unsigned)(s->window_size-1), "wild scan");
|
||||
+
|
||||
+ len = MAX_MATCH - (int)(strend - scan);
|
||||
+ scan = strend - MAX_MATCH;
|
||||
+
|
||||
+#endif /* MAX_MATCH == 258 */
|
||||
+
|
||||
+ if (len > best_len) {
|
||||
+ s->match_start = cur_match;
|
||||
+ best_len = len;
|
||||
+ if (len >= nice_match) break;
|
||||
+#if (MAX_MATCH == 258)
|
||||
+ scan_end = *(ushf*)(scan+best_len-1);
|
||||
+#else
|
||||
+ scan_end1 = scan[best_len-1];
|
||||
+ scan_end = scan[best_len];
|
||||
+#endif
|
||||
+ }
|
||||
+ } while ((cur_match = prev[cur_match & wmask]) > limit
|
||||
+ && --chain_length != 0);
|
||||
+
|
||||
+ if ((uInt)best_len <= s->lookahead) return (uInt)best_len;
|
||||
+ return s->lookahead;
|
||||
+}
|
||||
Index: zlib-1.2.12/contrib/power/longest_match_resolver.c
|
||||
===================================================================
|
||||
--- /dev/null
|
||||
+++ zlib-1.2.12/contrib/power/longest_match_resolver.c
|
||||
@@ -0,0 +1,15 @@
|
||||
+/* Copyright (C) 2019 Matheus Castanho <msc@linux.ibm.com>, IBM
|
||||
+ * For conditions of distribution and use, see copyright notice in zlib.h
|
||||
+ */
|
||||
+
|
||||
+#include "../gcc/zifunc.h"
|
||||
+#include "power.h"
|
||||
+
|
||||
+Z_IFUNC(longest_match) {
|
||||
+#ifdef Z_POWER9
|
||||
+ if (__builtin_cpu_supports("arch_3_00"))
|
||||
+ return _longest_match_power9;
|
||||
+#endif
|
||||
+
|
||||
+ return longest_match_default;
|
||||
+}
|
||||
Index: zlib-1.2.12/contrib/power/power.h
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/contrib/power/power.h
|
||||
+++ zlib-1.2.12/contrib/power/power.h
|
||||
@@ -10,4 +10,6 @@ uLong _adler32_power8(uLong adler, const
|
||||
|
||||
unsigned long _crc32_z_power8(unsigned long, const Bytef *, z_size_t);
|
||||
|
||||
+uInt _longest_match_power9(deflate_state *s, IPos cur_match);
|
||||
+
|
||||
void _slide_hash_power8(deflate_state *s);
|
||||
Index: zlib-1.2.12/deflate.c
|
||||
===================================================================
|
||||
--- zlib-1.2.12.orig/deflate.c
|
||||
+++ zlib-1.2.12/deflate.c
|
||||
@@ -1309,6 +1309,14 @@ local void lm_init (s)
|
||||
/* For 80x86 and 680x0, an optimized version will be provided in match.asm or
|
||||
* match.S. The code will be functionally equivalent.
|
||||
*/
|
||||
+
|
||||
+#ifdef Z_POWER_OPT
|
||||
+/* Rename function so resolver can use its symbol. The default version will be
|
||||
+ * returned by the resolver if the host has no support for an optimized version.
|
||||
+ */
|
||||
+#define longest_match longest_match_default
|
||||
+#endif /* Z_POWER_OPT */
|
||||
+
|
||||
local uInt longest_match(s, pcur_match)
|
||||
deflate_state *s;
|
||||
IPos pcur_match; /* current match */
|
||||
@@ -1454,6 +1462,11 @@ local uInt longest_match(s, pcur_match)
|
||||
}
|
||||
#endif /* ASMV */
|
||||
|
||||
+#ifdef Z_POWER_OPT
|
||||
+#undef longest_match
|
||||
+#include "contrib/power/longest_match_resolver.c"
|
||||
+#endif /* Z_POWER_OPT */
|
||||
+
|
||||
#else /* FASTEST */
|
||||
|
||||
/* ---------------------------------------------------------------------------
|
342
zlib-1.2.12-adler32-vector-optimizations-for-power.patch
Normal file
342
zlib-1.2.12-adler32-vector-optimizations-for-power.patch
Normal file
@ -0,0 +1,342 @@
|
||||
From 772f4bd0f880c4c193ab7da78728f38821572a02 Mon Sep 17 00:00:00 2001
|
||||
From: Rogerio Alves <rcardoso@linux.ibm.com>
|
||||
Date: Mon, 9 Dec 2019 14:40:53 -0300
|
||||
Subject: [PATCH] Adler32 vector optimization for Power.
|
||||
|
||||
This commit implements a Power (POWER8+) vector optimization for Adler32
|
||||
checksum using VSX (vector) instructions. The VSX adler32 checksum is up
|
||||
to 10x fast than the adler32 baseline code.
|
||||
|
||||
Author: Rogerio Alves <rcardoso@linux.ibm.com>
|
||||
---
|
||||
CMakeLists.txt | 1 +
|
||||
Makefile.in | 8 ++
|
||||
adler32.c | 11 ++
|
||||
configure | 4 +-
|
||||
contrib/power/adler32_power8.c | 196 +++++++++++++++++++++++++++++++
|
||||
contrib/power/adler32_resolver.c | 15 +++
|
||||
contrib/power/power.h | 4 +-
|
||||
7 files changed, 236 insertions(+), 3 deletions(-)
|
||||
create mode 100644 contrib/power/adler32_power8.c
|
||||
create mode 100644 contrib/power/adler32_resolver.c
|
||||
|
||||
diff --git a/CMakeLists.txt b/CMakeLists.txt
|
||||
index 581e1fa6d..c6296ee68 100644
|
||||
--- a/CMakeLists.txt
|
||||
+++ b/CMakeLists.txt
|
||||
@@ -185,6 +185,7 @@ if(CMAKE_COMPILER_IS_GNUCC)
|
||||
if(POWER8)
|
||||
add_definitions(-DZ_POWER8)
|
||||
set(ZLIB_POWER8
|
||||
+ contrib/power/adler32_power8.c
|
||||
contrib/power/crc32_z_power8.c)
|
||||
|
||||
set_source_files_properties(
|
||||
diff --git a/Makefile.in b/Makefile.in
|
||||
index 16943044e..a0ffac860 100644
|
||||
--- a/Makefile.in
|
||||
+++ b/Makefile.in
|
||||
@@ -165,6 +165,9 @@ minigzip64.o: $(SRCDIR)test/minigzip.c $(SRCDIR)zlib.h zconf.h
|
||||
adler32.o: $(SRCDIR)adler32.c
|
||||
$(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)adler32.c
|
||||
|
||||
+adler32_power8.o: $(SRCDIR)contrib/power/adler32_power8.c
|
||||
+ $(CC) $(CFLAGS) -mcpu=power8 $(ZINC) -c -o $@ $(SRCDIR)contrib/power/adler32_power8.c
|
||||
+
|
||||
crc32.o: $(SRCDIR)crc32.c
|
||||
$(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)crc32.c
|
||||
|
||||
@@ -216,6 +219,11 @@ adler32.lo: $(SRCDIR)adler32.c
|
||||
$(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/adler32.o $(SRCDIR)adler32.c
|
||||
-@mv objs/adler32.o $@
|
||||
|
||||
+adler32_power8.lo: $(SRCDIR)contrib/power/adler32_power8.c
|
||||
+ -@mkdir objs 2>/dev/null || test -d objs
|
||||
+ $(CC) $(SFLAGS) -mcpu=power8 $(ZINC) -DPIC -c -o objs/adler32_power8.o $(SRCDIR)contrib/power/adler32_power8.c
|
||||
+ -@mv objs/adler32_power8.o $@
|
||||
+
|
||||
crc32.lo: $(SRCDIR)crc32.c
|
||||
-@mkdir objs 2>/dev/null || test -d objs
|
||||
$(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/crc32.o $(SRCDIR)crc32.c
|
||||
diff --git a/adler32.c b/adler32.c
|
||||
index d0be4380a..4bde0fa18 100644
|
||||
--- a/adler32.c
|
||||
+++ b/adler32.c
|
||||
@@ -131,6 +131,12 @@ uLong ZEXPORT adler32_z(adler, buf, len)
|
||||
}
|
||||
|
||||
/* ========================================================================= */
|
||||
+
|
||||
+#ifdef Z_POWER_OPT
|
||||
+/* Rename the default function to avoid naming conflicts */
|
||||
+#define adler32 adler32_default
|
||||
+#endif /* Z_POWER_OPT */
|
||||
+
|
||||
uLong ZEXPORT adler32(adler, buf, len)
|
||||
uLong adler;
|
||||
const Bytef *buf;
|
||||
@@ -139,6 +145,11 @@ uLong ZEXPORT adler32(adler, buf, len)
|
||||
return adler32_z(adler, buf, len);
|
||||
}
|
||||
|
||||
+#ifdef Z_POWER_OPT
|
||||
+#undef adler32
|
||||
+#include "contrib/power/adler32_resolver.c"
|
||||
+#endif /* Z_POWER_OPT */
|
||||
+
|
||||
/* ========================================================================= */
|
||||
local uLong adler32_combine_(adler1, adler2, len2)
|
||||
uLong adler1;
|
||||
diff --git a/configure b/configure
|
||||
index 914d9f4aa..810a7404d 100755
|
||||
--- a/configure
|
||||
+++ b/configure
|
||||
@@ -879,8 +879,8 @@ if tryboth $CC -c $CFLAGS $test.c; then
|
||||
|
||||
if tryboth $CC -c $CFLAGS -mcpu=power8 $test.c; then
|
||||
POWER8="-DZ_POWER8"
|
||||
- PIC_OBJC="${PIC_OBJC} crc32_z_power8.lo"
|
||||
- OBJC="${OBJC} crc32_z_power8.o"
|
||||
+ PIC_OBJC="${PIC_OBJC} adler32_power8.lo crc32_z_power8.lo"
|
||||
+ OBJC="${OBJC} adler32_power8.o crc32_z_power8.o"
|
||||
echo "Checking for -mcpu=power8 support... Yes." | tee -a configure.log
|
||||
else
|
||||
echo "Checking for -mcpu=power8 support... No." | tee -a configure.log
|
||||
diff --git a/contrib/power/adler32_power8.c b/contrib/power/adler32_power8.c
|
||||
new file mode 100644
|
||||
index 000000000..473c39457
|
||||
--- /dev/null
|
||||
+++ b/contrib/power/adler32_power8.c
|
||||
@@ -0,0 +1,196 @@
|
||||
+/*
|
||||
+ * Adler32 for POWER 8+ using VSX instructions.
|
||||
+ *
|
||||
+ * Calculate adler32 checksum for 16 bytes at once using POWER8+ VSX (vector)
|
||||
+ * instructions.
|
||||
+ *
|
||||
+ * If adler32 do 1 byte at time on the first iteration s1 is s1_0 (_n means
|
||||
+ * iteration n) is the initial value of adler - at start _0 is 1 unless
|
||||
+ * adler initial value is different than 1. So s1_1 = s1_0 + c[0] after
|
||||
+ * the first calculation. For the iteration s1_2 = s1_1 + c[1] and so on.
|
||||
+ * Hence, for iteration N, s1_N = s1_(N-1) + c[N] is the value of s1 on
|
||||
+ * after iteration N.
|
||||
+ *
|
||||
+ * Therefore, for s2 and iteration N, s2_N = s2_0 + N*s1_N + N*c[0] +
|
||||
+ * N-1*c[1] + ... + c[N]
|
||||
+ *
|
||||
+ * In a more general way:
|
||||
+ *
|
||||
+ * s1_N = s1_0 + sum(i=1 to N)c[i]
|
||||
+ * s2_N = s2_0 + N*s1 + sum (i=1 to N)(N-i+1)*c[i]
|
||||
+ *
|
||||
+ * Where s1_N, s2_N are the values for s1, s2 after N iterations. So if we
|
||||
+ * can process N-bit at time we can do this at once.
|
||||
+ *
|
||||
+ * Since VSX can support 16-bit vector instructions, we can process
|
||||
+ * 16-bit at time using N = 16 we have:
|
||||
+ *
|
||||
+ * s1 = s1_16 = s1_(16-1) + c[16] = s1_0 + sum(i=1 to 16)c[i]
|
||||
+ * s2 = s2_16 = s2_0 + 16*s1 + sum(i=1 to 16)(16-i+1)*c[i]
|
||||
+ *
|
||||
+ * After the first iteration we calculate the adler32 checksum for 16 bytes.
|
||||
+ *
|
||||
+ * For more background about adler32 please check the RFC:
|
||||
+ * https://www.ietf.org/rfc/rfc1950.txt
|
||||
+ *
|
||||
+ * Copyright (C) 2019 Rogerio Alves <rcardoso@linux.ibm.com>, IBM
|
||||
+ * For conditions of distribution and use, see copyright notice in zlib.h
|
||||
+ *
|
||||
+ */
|
||||
+
|
||||
+#include "../../zutil.h"
|
||||
+#include <altivec.h>
|
||||
+
|
||||
+/* Largest prime smaller than 65536. */
|
||||
+#define BASE 65521U
|
||||
+#define NMAX 5552
|
||||
+/* NMAX is the largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1. */
|
||||
+
|
||||
+#define DO1(s1,s2,buf,i) {(s1) += buf[(i)]; (s2) += (s1);}
|
||||
+#define DO2(s1,s2,buf,i) {DO1(s1,s2,buf,i); DO1(s1,s2,buf,i+1);}
|
||||
+#define DO4(s1,s2,buf,i) {DO2(s1,s2,buf,i); DO2(s1,s2,buf,i+2);}
|
||||
+#define DO8(s1,s2,buf,i) {DO4(s1,s2,buf,i); DO4(s1,s2,buf,i+4);}
|
||||
+#define DO16(s1,s2,buf) {DO8(s1,s2,buf,0); DO8(s1,s2,buf,8);}
|
||||
+
|
||||
+/* Vector across sum unsigned int (saturate). */
|
||||
+inline vector unsigned int vec_sumsu (vector unsigned int __a,
|
||||
+ vector unsigned int __b)
|
||||
+{
|
||||
+ __b = vec_sld(__a, __a, 8);
|
||||
+ __b = vec_add(__b, __a);
|
||||
+ __a = vec_sld(__b, __b, 4);
|
||||
+ __a = vec_add(__a, __b);
|
||||
+
|
||||
+ return __a;
|
||||
+}
|
||||
+
|
||||
+uLong ZLIB_INTERNAL _adler32_power8 (uLong adler, const Bytef* buf, uInt len)
|
||||
+{
|
||||
+ /* If buffer is empty or len=0 we need to return adler initial value. */
|
||||
+ if (buf == NULL)
|
||||
+ return 1;
|
||||
+
|
||||
+ unsigned int s1 = adler & 0xffff;
|
||||
+ unsigned int s2 = (adler >> 16) & 0xffff;
|
||||
+
|
||||
+ /* in case user likes doing a byte at a time, keep it fast */
|
||||
+ if (len == 1) {
|
||||
+ s1 += buf[0];
|
||||
+ if (s1 >= BASE)
|
||||
+ s1 -= BASE;
|
||||
+ s2 += s1;
|
||||
+ if (s2 >= BASE)
|
||||
+ s2 -= BASE;
|
||||
+ return (s2 << 16) | s1;
|
||||
+ }
|
||||
+
|
||||
+ /* Keep it fast for short length buffers. */
|
||||
+ if (len < 16) {
|
||||
+ while (len--) {
|
||||
+ s1 += *buf++;
|
||||
+ s2 += s1;
|
||||
+ }
|
||||
+ if (s1 >= BASE)
|
||||
+ s1 -= BASE;
|
||||
+ s2 %= BASE;
|
||||
+ return (s2 << 16) | s1;
|
||||
+ }
|
||||
+
|
||||
+ /* This is faster than VSX code for len < 64. */
|
||||
+ if (len < 64) {
|
||||
+ while (len >= 16) {
|
||||
+ len -= 16;
|
||||
+ DO16(s1,s2,buf);
|
||||
+ buf += 16;
|
||||
+ }
|
||||
+ } else {
|
||||
+ /* Use POWER VSX instructions for len >= 64. */
|
||||
+ const vector unsigned int v_zeros = { 0 };
|
||||
+ const vector unsigned char v_mul = {16, 15, 14, 13, 12, 11, 10, 9, 8, 7,
|
||||
+ 6, 5, 4, 3, 2, 1};
|
||||
+ const vector unsigned char vsh = vec_splat_u8(4);
|
||||
+ const vector unsigned int vmask = {0xffffffff, 0x0, 0x0, 0x0};
|
||||
+ vector unsigned int vs1 = vec_xl(0, &s1);
|
||||
+ vector unsigned int vs2 = vec_xl(0, &s2);
|
||||
+ vector unsigned int vs1_save = { 0 };
|
||||
+ vector unsigned int vsum1, vsum2;
|
||||
+ vector unsigned char vbuf;
|
||||
+ int n;
|
||||
+
|
||||
+ /* Zeros the undefined values of vectors vs1, vs2. */
|
||||
+ vs1 = vec_and(vs1, vmask);
|
||||
+ vs2 = vec_and(vs2, vmask);
|
||||
+
|
||||
+ /* Do length bigger than NMAX in blocks of NMAX size. */
|
||||
+ while (len >= NMAX) {
|
||||
+ len -= NMAX;
|
||||
+ n = NMAX / 16;
|
||||
+ do {
|
||||
+ vbuf = vec_xl(0, (unsigned char *) buf);
|
||||
+ vsum1 = vec_sum4s(vbuf, v_zeros); /* sum(i=1 to 16) buf[i]. */
|
||||
+ /* sum(i=1 to 16) buf[i]*(16-i+1). */
|
||||
+ vsum2 = vec_msum(vbuf, v_mul, v_zeros);
|
||||
+ /* Save vs1. */
|
||||
+ vs1_save = vec_add(vs1_save, vs1);
|
||||
+ /* Accumulate the sums. */
|
||||
+ vs1 = vec_add(vsum1, vs1);
|
||||
+ vs2 = vec_add(vsum2, vs2);
|
||||
+
|
||||
+ buf += 16;
|
||||
+ } while (--n);
|
||||
+ /* Once each block of NMAX size. */
|
||||
+ vs1 = vec_sumsu(vs1, vsum1);
|
||||
+ vs1_save = vec_sll(vs1_save, vsh); /* 16*vs1_save. */
|
||||
+ vs2 = vec_add(vs1_save, vs2);
|
||||
+ vs2 = vec_sumsu(vs2, vsum2);
|
||||
+
|
||||
+ /* vs1[0] = (s1_i + sum(i=1 to 16)buf[i]) mod 65521. */
|
||||
+ vs1[0] = vs1[0] % BASE;
|
||||
+ /* vs2[0] = s2_i + 16*s1_save +
|
||||
+ sum(i=1 to 16)(16-i+1)*buf[i] mod 65521. */
|
||||
+ vs2[0] = vs2[0] % BASE;
|
||||
+
|
||||
+ vs1 = vec_and(vs1, vmask);
|
||||
+ vs2 = vec_and(vs2, vmask);
|
||||
+ vs1_save = v_zeros;
|
||||
+ }
|
||||
+
|
||||
+ /* len is less than NMAX one modulo is needed. */
|
||||
+ if (len >= 16) {
|
||||
+ while (len >= 16) {
|
||||
+ len -= 16;
|
||||
+
|
||||
+ vbuf = vec_xl(0, (unsigned char *) buf);
|
||||
+
|
||||
+ vsum1 = vec_sum4s(vbuf, v_zeros); /* sum(i=1 to 16) buf[i]. */
|
||||
+ /* sum(i=1 to 16) buf[i]*(16-i+1). */
|
||||
+ vsum2 = vec_msum(vbuf, v_mul, v_zeros);
|
||||
+ /* Save vs1. */
|
||||
+ vs1_save = vec_add(vs1_save, vs1);
|
||||
+ /* Accumulate the sums. */
|
||||
+ vs1 = vec_add(vsum1, vs1);
|
||||
+ vs2 = vec_add(vsum2, vs2);
|
||||
+
|
||||
+ buf += 16;
|
||||
+ }
|
||||
+ /* Since the size will be always less than NMAX we do this once. */
|
||||
+ vs1 = vec_sumsu(vs1, vsum1);
|
||||
+ vs1_save = vec_sll(vs1_save, vsh); /* 16*vs1_save. */
|
||||
+ vs2 = vec_add(vs1_save, vs2);
|
||||
+ vs2 = vec_sumsu(vs2, vsum2);
|
||||
+ }
|
||||
+ /* Copy result back to s1, s2 (mod 65521). */
|
||||
+ s1 = vs1[0] % BASE;
|
||||
+ s2 = vs2[0] % BASE;
|
||||
+ }
|
||||
+
|
||||
+ /* Process tail (len < 16). */
|
||||
+ while (len--) {
|
||||
+ s1 += *buf++;
|
||||
+ s2 += s1;
|
||||
+ }
|
||||
+ s1 %= BASE;
|
||||
+ s2 %= BASE;
|
||||
+
|
||||
+ return (s2 << 16) | s1;
|
||||
+}
|
||||
diff --git a/contrib/power/adler32_resolver.c b/contrib/power/adler32_resolver.c
|
||||
new file mode 100644
|
||||
index 000000000..07a1a2cb2
|
||||
--- /dev/null
|
||||
+++ b/contrib/power/adler32_resolver.c
|
||||
@@ -0,0 +1,15 @@
|
||||
+/* Copyright (C) 2019 Rogerio Alves <rcardoso@linux.ibm.com>, IBM
|
||||
+ * For conditions of distribution and use, see copyright notice in zlib.h
|
||||
+ */
|
||||
+
|
||||
+#include "../gcc/zifunc.h"
|
||||
+#include "power.h"
|
||||
+
|
||||
+Z_IFUNC(adler32) {
|
||||
+#ifdef Z_POWER8
|
||||
+ if (__builtin_cpu_supports("arch_2_07"))
|
||||
+ return _adler32_power8;
|
||||
+#endif
|
||||
+
|
||||
+ return adler32_default;
|
||||
+}
|
||||
diff --git a/contrib/power/power.h b/contrib/power/power.h
|
||||
index 79123aa90..f57c76167 100644
|
||||
--- a/contrib/power/power.h
|
||||
+++ b/contrib/power/power.h
|
||||
@@ -2,7 +2,9 @@
|
||||
* 2019 Rogerio Alves <rogerio.alves@ibm.com>, IBM
|
||||
* For conditions of distribution and use, see copyright notice in zlib.h
|
||||
*/
|
||||
-
|
||||
#include "../../zconf.h"
|
||||
+#include "../../zutil.h"
|
||||
+
|
||||
+uLong _adler32_power8(uLong adler, const Bytef* buf, uInt len);
|
||||
|
||||
unsigned long _crc32_z_power8(unsigned long, const Bytef *, z_size_t);
|
34
zlib-1.2.12-fix-invalid-memory-access-on-ppc-and-ppc64.patch
Normal file
34
zlib-1.2.12-fix-invalid-memory-access-on-ppc-and-ppc64.patch
Normal file
@ -0,0 +1,34 @@
|
||||
From 11b722e4ae91b611f605221587ec8e0829c27949 Mon Sep 17 00:00:00 2001
|
||||
From: Matheus Castanho <msc@linux.ibm.com>
|
||||
Date: Tue, 23 Jun 2020 10:26:19 -0300
|
||||
Subject: [PATCH] Fix invalid memory access on ppc and ppc64
|
||||
|
||||
---
|
||||
contrib/power/adler32_power8.c | 9 ++++-----
|
||||
1 file changed, 4 insertions(+), 5 deletions(-)
|
||||
|
||||
diff --git a/contrib/power/adler32_power8.c b/contrib/power/adler32_power8.c
|
||||
index 473c39457..fdd086453 100644
|
||||
--- a/contrib/power/adler32_power8.c
|
||||
+++ b/contrib/power/adler32_power8.c
|
||||
@@ -110,16 +110,15 @@ uLong ZLIB_INTERNAL _adler32_power8 (uLong adler, const Bytef* buf, uInt len)
|
||||
6, 5, 4, 3, 2, 1};
|
||||
const vector unsigned char vsh = vec_splat_u8(4);
|
||||
const vector unsigned int vmask = {0xffffffff, 0x0, 0x0, 0x0};
|
||||
- vector unsigned int vs1 = vec_xl(0, &s1);
|
||||
- vector unsigned int vs2 = vec_xl(0, &s2);
|
||||
+ vector unsigned int vs1 = { 0 };
|
||||
+ vector unsigned int vs2 = { 0 };
|
||||
vector unsigned int vs1_save = { 0 };
|
||||
vector unsigned int vsum1, vsum2;
|
||||
vector unsigned char vbuf;
|
||||
int n;
|
||||
|
||||
- /* Zeros the undefined values of vectors vs1, vs2. */
|
||||
- vs1 = vec_and(vs1, vmask);
|
||||
- vs2 = vec_and(vs2, vmask);
|
||||
+ vs1[0] = s1;
|
||||
+ vs2[0] = s2;
|
||||
|
||||
/* Do length bigger than NMAX in blocks of NMAX size. */
|
||||
while (len >= NMAX) {
|
10
zlib.changes
10
zlib.changes
@ -1,3 +1,13 @@
|
||||
-------------------------------------------------------------------
|
||||
Mon Oct 10 10:08:02 UTC 2022 - Danilo Spinella <danilo.spinella@suse.com>
|
||||
|
||||
- Add Power8 optimizations:
|
||||
* zlib-1.2.12-add-optimized-slide_hash-for-power.patch
|
||||
* zlib-1.2.12-add-vectorized-longest_match-for-power.patch
|
||||
* zlib-1.2.12-adler32-vector-optimizations-for-power.patch
|
||||
* zlib-1.2.12-fix-invalid-memory-access-on-ppc-and-ppc64.patch
|
||||
- Update zlib-1.2.12-IBM-Z-hw-accelerated-deflate-s390x.patch
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Tue Aug 23 16:22:59 UTC 2022 - Danilo Spinella <danilo.spinella@suse.com>
|
||||
|
||||
|
16
zlib.spec
16
zlib.spec
@ -44,12 +44,18 @@ Patch6: minizip-dont-install-crypt-header.patch
|
||||
# The following patches are taken from https://github.com/iii-i/zlib/commits/crc32vx-v3
|
||||
Patch7: zlib-1.2.5-minizip-fixuncrypt.patch
|
||||
Patch8: zlib-1.2.11-optimized-s390.patch
|
||||
# https://github.com/iii-i/zlib/commit/171d0ff3c9ed40da0ac14085ab16b766b1162069
|
||||
Patch9: zlib-1.2.12-IBM-Z-hw-accelerated-deflate-s390x.patch
|
||||
Patch10: zlib-1.2.11-covscan-issues.patch
|
||||
Patch11: zlib-1.2.11-covscan-issues-rhel9.patch
|
||||
Patch12: zlib-1.2.12-optimized-crc32-power8.patch
|
||||
Patch13: zlib-1.2.12-fix-configure.patch
|
||||
Patch14: zlib-1.2.12-s390-vectorize-crc32.patch
|
||||
# The following patches are taken from https://github.com/mscastanho/zlib/commits/power-optimizations-1.2.12
|
||||
Patch15: zlib-1.2.12-adler32-vector-optimizations-for-power.patch
|
||||
Patch16: zlib-1.2.12-fix-invalid-memory-access-on-ppc-and-ppc64.patch
|
||||
Patch17: zlib-1.2.12-add-optimized-slide_hash-for-power.patch
|
||||
Patch18: zlib-1.2.12-add-vectorized-longest_match-for-power.patch
|
||||
BuildRequires: autoconf
|
||||
BuildRequires: automake
|
||||
BuildRequires: libtool
|
||||
@ -148,6 +154,10 @@ It should exit 0
|
||||
%patch12 -p1
|
||||
%patch13 -p1
|
||||
%patch14 -p1
|
||||
%patch15 -p1
|
||||
%patch16 -p1
|
||||
%patch17 -p1
|
||||
%patch18 -p1
|
||||
cp %{SOURCE4} .
|
||||
|
||||
%build
|
||||
@ -167,10 +177,10 @@ CC="cc" ./configure \
|
||||
# Profiling flags breaks tests, as of 1.2.12
|
||||
# In particular, gzseek does not work as intended
|
||||
#%if %{do_profiling}
|
||||
# #make %{?_smp_mflags} CFLAGS="%{optflags} %{cflags_profile_generate}"
|
||||
# make %{?_smp_mflags} CFLAGS="%{optflags} %{cflags_profile_generate}"
|
||||
# make check %{?_smp_mflags}
|
||||
# #make %{?_smp_mflags} clean
|
||||
# #make %{?_smp_mflags} CFLAGS="%{optflags} %{cflags_profile_feedback}"
|
||||
# make %{?_smp_mflags} clean
|
||||
# make %{?_smp_mflags} CFLAGS="%{optflags} %{cflags_profile_feedback}"
|
||||
#%else
|
||||
make %{?_smp_mflags}
|
||||
#%endif
|
||||
|
Loading…
Reference in New Issue
Block a user