SHA256
1
0
forked from pool/xxhash

fix usage of DISPATCH=1 #1

Open
bruno_friedmann wants to merge 1 commits from bruno_friedmann/xxhash:fix-dispatch into master
First-time contributor

Hello Jan,

Finally got some time to work a bit on this. I've made some tests after the proposed changes, and the result looks like this on my computer.

Cpuinfo Version: 9.0.0
Vendor ID Raw: GenuineIntel
Hardware Raw:
Brand Raw: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Hz Advertised Friendly: 3.0000 GHz
Hz Actual Friendly: 1.0076 GHz
Hz Advertised: (3000000000, 0)
Hz Actual: (1007594000, 0)
Arch: X86_64
Bits: 64
Count: 8
Arch String Raw: x86_64
L1 Data Cache Size: 196608
L1 Instruction Cache Size: 131072
L2 Cache Size: 5242880
L2 Cache Line Size: 256
L2 Cache Associativity: 7
L3 Cache Size: 12582912
Stepping: 1
Model: 140
Family: 6
Processor Type:
Flags: 3dnowprefetch, abm, acpi, adx, aes, aperfmperf, apic, arat, arch_capabilities,
 arch_perfmon, art, avx, avx2, avx512_bitalg, avx512_vbmi2, avx512_vnni, avx512_vp2intersect,
 avx512_vpopcntdq, avx512bitalg, avx512bw, avx512cd, avx512dq, avx512f, avx512ifma, avx512vbmi,
 avx512vbmi2, avx512vl, avx512vnni, avx512vpopcntdq, bmi1, bmi2, bts, cat_l2, cdp_l2, clflush,
 clflushopt, clwb, cmov, constant_tsc, cpuid, cpuid_fault, cx16, cx8, de, ds_cpl, dtes64, dtherm,
 dts, epb, ept, ept_ad, erms, est, f16c, flexpriority, flush_l1d, fma, fpu, fsgsbase, fsrm, fxsr,
 gfni, ht, hwp, hwp_act_window, hwp_epp, hwp_notify, hwp_pkg_req, ibpb, ibrs, ibrs_enhanced, ibt,
 ida, intel_pt, invpcid, lahf_lm, lm, mca, mce, md_clear, mmx, monitor, movbe, movdir64b, movdiri,
 msr, mtrr, nonstop_tsc, nopl, nx, ospke, osxsave, pae, pat, pbe, pcid, pclmulqdq, pdcm, pdpe1gb,
 pebs, pge, pku, pln, pni, popcnt, pqe, pse, pse36, pts, rdpid, rdrand, rdrnd, rdseed, rdt_a,
 rdtscp, rep_good, sdbg, sep, sha, sha_ni, smap, smep, smx, split_lock_detect, ss, ssbd, sse,
 sse2, sse4_1, sse4_2, ssse3, stibp, syscall, tm, tm2, tpr_shadow, tsc, tsc_adjust,
 tsc_deadline_timer, tsc_known_freq, tscdeadline, umip, vaes, vme, vmx, vnmi, vpclmulqdq,
 vpid, x2apic, xgetbv1, xsave, xsavec, xsaveopt, xsaves, xtopology, xtpr


for s in 1 2 3 4 5;do time ./xh < linux-6.6.$s.tar;done

dispatch on                             native suse
dd8537f7ead0541b95c95d3fc2b4dbeb        dd8537f7ead0541b95c95d3fc2b4dbeb
                                        
real    0m0.191s                        real    0m0.427s
user    0m0.071s                        user    0m0.107s
sys     0m0.121s                        sys     0m0.243s
e3a44a84bc7a878e5095077c5da54dfd        e3a44a84bc7a878e5095077c5da54dfd
                                        
real    0m0.188s                        real    0m0.523s
user    0m0.073s                        user    0m0.104s
sys     0m0.115s                        sys     0m0.354s
276241f64d895b289833eb2fd64b8085        276241f64d895b289833eb2fd64b8085
                                        
real    0m0.206s                        real    0m0.530s
user    0m0.070s                        user    0m0.110s
sys     0m0.136s                        sys     0m0.301s
fcf2f84b194b2781f92e7b1fdb360832        fcf2f84b194b2781f92e7b1fdb360832
                                        
real    0m0.207s                        real    0m0.507s
user    0m0.057s                        user    0m0.117s
sys     0m0.150s                        sys     0m0.269s
09d29b63346e1fff6b2ba74a41e7dc5d        09d29b63346e1fff6b2ba74a41e7dc5d
                                        
real    0m0.202s                        real    0m0.460s
user    0m0.084s                        user    0m0.134s
sys     0m0.117s                        sys     0m0.237s


native 
make test
cat linux-6.*.tar{,,,,,,,} | time ./xh
5df8e93736b5a2bddd0b324a455b61fc
3.21user 5.34system 0:13.98elapsed 61%CPU (0avgtext+0avgdata 3968maxresident)k
0inputs+0outputs (0major+180minor)pagefaults 0swaps

dispatch
cat linux-6.*.tar{,,,,,,,} | time ./xh
5df8e93736b5a2bddd0b324a455b61fc
2.27user 6.84system 0:14.60elapsed 62%CPU (0avgtext+0avgdata 4028maxresident)k
5776inputs+0outputs (9major+168minor)pagefaults 0swaps

My changes may be like cowboy or naive style, but there's a performance gain. and the dispatch header is now present to be packaged.

I can adapt eventually more or if you need more feedback just comment.

  • upstream will activate optimization only with -O3 but %{optflags}
    set -O2, we patch optflags to use -O3
  • DISPATCH=1 seem to be needed on make / make install line call to
    obtain an optimized binary, and also get dispatch header installed
    in include

Signed-off-by: Bruno Friedmann bruno@ioda-net.ch

Hello Jan, Finally got some time to work a bit on this. I've made some tests after the proposed changes, and the result looks like this on my computer. ``` Cpuinfo Version: 9.0.0 Vendor ID Raw: GenuineIntel Hardware Raw: Brand Raw: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz Hz Advertised Friendly: 3.0000 GHz Hz Actual Friendly: 1.0076 GHz Hz Advertised: (3000000000, 0) Hz Actual: (1007594000, 0) Arch: X86_64 Bits: 64 Count: 8 Arch String Raw: x86_64 L1 Data Cache Size: 196608 L1 Instruction Cache Size: 131072 L2 Cache Size: 5242880 L2 Cache Line Size: 256 L2 Cache Associativity: 7 L3 Cache Size: 12582912 Stepping: 1 Model: 140 Family: 6 Processor Type: Flags: 3dnowprefetch, abm, acpi, adx, aes, aperfmperf, apic, arat, arch_capabilities, arch_perfmon, art, avx, avx2, avx512_bitalg, avx512_vbmi2, avx512_vnni, avx512_vp2intersect, avx512_vpopcntdq, avx512bitalg, avx512bw, avx512cd, avx512dq, avx512f, avx512ifma, avx512vbmi, avx512vbmi2, avx512vl, avx512vnni, avx512vpopcntdq, bmi1, bmi2, bts, cat_l2, cdp_l2, clflush, clflushopt, clwb, cmov, constant_tsc, cpuid, cpuid_fault, cx16, cx8, de, ds_cpl, dtes64, dtherm, dts, epb, ept, ept_ad, erms, est, f16c, flexpriority, flush_l1d, fma, fpu, fsgsbase, fsrm, fxsr, gfni, ht, hwp, hwp_act_window, hwp_epp, hwp_notify, hwp_pkg_req, ibpb, ibrs, ibrs_enhanced, ibt, ida, intel_pt, invpcid, lahf_lm, lm, mca, mce, md_clear, mmx, monitor, movbe, movdir64b, movdiri, msr, mtrr, nonstop_tsc, nopl, nx, ospke, osxsave, pae, pat, pbe, pcid, pclmulqdq, pdcm, pdpe1gb, pebs, pge, pku, pln, pni, popcnt, pqe, pse, pse36, pts, rdpid, rdrand, rdrnd, rdseed, rdt_a, rdtscp, rep_good, sdbg, sep, sha, sha_ni, smap, smep, smx, split_lock_detect, ss, ssbd, sse, sse2, sse4_1, sse4_2, ssse3, stibp, syscall, tm, tm2, tpr_shadow, tsc, tsc_adjust, tsc_deadline_timer, tsc_known_freq, tscdeadline, umip, vaes, vme, vmx, vnmi, vpclmulqdq, vpid, x2apic, xgetbv1, xsave, xsavec, xsaveopt, xsaves, xtopology, xtpr for s in 1 2 3 4 5;do time ./xh < linux-6.6.$s.tar;done dispatch on native suse dd8537f7ead0541b95c95d3fc2b4dbeb dd8537f7ead0541b95c95d3fc2b4dbeb real 0m0.191s real 0m0.427s user 0m0.071s user 0m0.107s sys 0m0.121s sys 0m0.243s e3a44a84bc7a878e5095077c5da54dfd e3a44a84bc7a878e5095077c5da54dfd real 0m0.188s real 0m0.523s user 0m0.073s user 0m0.104s sys 0m0.115s sys 0m0.354s 276241f64d895b289833eb2fd64b8085 276241f64d895b289833eb2fd64b8085 real 0m0.206s real 0m0.530s user 0m0.070s user 0m0.110s sys 0m0.136s sys 0m0.301s fcf2f84b194b2781f92e7b1fdb360832 fcf2f84b194b2781f92e7b1fdb360832 real 0m0.207s real 0m0.507s user 0m0.057s user 0m0.117s sys 0m0.150s sys 0m0.269s 09d29b63346e1fff6b2ba74a41e7dc5d 09d29b63346e1fff6b2ba74a41e7dc5d real 0m0.202s real 0m0.460s user 0m0.084s user 0m0.134s sys 0m0.117s sys 0m0.237s native make test cat linux-6.*.tar{,,,,,,,} | time ./xh 5df8e93736b5a2bddd0b324a455b61fc 3.21user 5.34system 0:13.98elapsed 61%CPU (0avgtext+0avgdata 3968maxresident)k 0inputs+0outputs (0major+180minor)pagefaults 0swaps dispatch cat linux-6.*.tar{,,,,,,,} | time ./xh 5df8e93736b5a2bddd0b324a455b61fc 2.27user 6.84system 0:14.60elapsed 62%CPU (0avgtext+0avgdata 4028maxresident)k 5776inputs+0outputs (9major+168minor)pagefaults 0swaps ``` My changes may be like `cowboy` or naive style, but there's a performance gain. and the dispatch header is now present to be packaged. I can adapt eventually more or if you need more feedback just comment. - upstream will activate optimization only with -O3 but %{optflags} set -O2, we patch optflags to use -O3 - DISPATCH=1 seem to be needed on make / make install line call to obtain an optimized binary, and also get dispatch header installed in include Signed-off-by: Bruno Friedmann <bruno@ioda-net.ch>
bruno_friedmann added 1 commit 2024-09-25 15:21:50 +02:00
- upstream will activate optimization only with -O3 but %{optflags}
set -O2, we patch optflags to use -O3
- DISPATCH=1 seem to be needed on make / make install line call to
obtain an optimized binary, and also get dispatch header installed
in include

Signed-off-by: Bruno Friedmann <bruno@ioda-net.ch>
Owner

Your numbers (relative gains) don't validate here.

Even the absolute numbers are out of whack. I am the one having the weaker CPU (1135G7, compared to your 1185G7) and yet,where you have real 0m0.427s I have 0m0.258s with the stock xxhash already.

Your numbers (relative gains) don't validate here. Even the absolute numbers are out of whack. *I* am the one having the weaker CPU (1135G7, compared to your 1185G7) and yet,where you have `real 0m0.427s` I have `0m0.258s` with the stock xxhash already.
This pull request has changes conflicting with the target branch.
  • xxhash.spec

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u fix-dispatch:bruno_friedmann-fix-dispatch
git checkout bruno_friedmann-fix-dispatch
Sign in to join this conversation.
No reviewers
No Label
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: jengelh/xxhash#1
No description provided.