For the test, we actually care to find the fastest test run (and take
"min_elapsed"). That is useful, because that is the run where we
possibly have the least interference from external factors, it was the
run where the CPU solved the problem as fast as it could.
As such, we should not reject the first 5% as additional warm up. If the
first 5% are slower (and part of "warmup"), then they are anyway not
considered. If there is a the fastest run in the first 5 percent, then
we want to take that.
Also note, that the calculation of "avg_elapsed" was wrong, since it
divided by the full "num_rounds" while only summing 95% of the runs.
This is fixed too by now considering all runs.
Fixes: 282d536fd229 ('tests/performance: ensure to always warm up for 2 seconds')
When running the test (without parameters), it estimates a factor for
the run size for each test. That is useful for running a reasonable size
of the test, on different machines.
However, when comparing two runs, it seems important that both runs
share a common factor. Otherwise, the factor is determined differently,
and the test is less comparable. For that there is the "--factor" option
or the GLIB_PERFORMANCE_FACTOR environment variable.
However, the factor option can only set the factors for all tests at the
same time. Optimally, one factor is roughly suitable for all tests, but
it is not, as currently the detected factors on my machine are widely
different
$ ./build/gobject/tests/performance/performance -v > p
$ cat p | sed -n -e 's/^Running test //p' -e 's/.*correction factor //p' | sed 'N;s/\n/ /'
simple-construction 34.78
simple-construction1 145.45
complex-construction 11.08
complex-construction1 20.46
complex-construction2 23.74
finalization 4.74
type-check 37.74
emit-unhandled 5.63
emit-unhandled-empty 49.69
emit-unhandled-generic 7.17
emit-unhandled-generic-empty 50.63
emit-unhandled-args 5.20
emit-handled 3.86
emit-handled-empty 4.01
emit-handled-generic 3.96
emit-handled-generic-empty 7.04
emit-handled-args 3.78
notify-unhandled 52.63
notify-by-pspec-unhandled 156.86
notify-handled 2.55
notify-by-pspec-handled 2.66
property-set 34.63
property-get 32.92
refcount 0.83
refcount-1 2.30
refcount-toggle 1.33
Adjust the base factors with these measurements.
PERFORMANCE_FILE="./gobject/tests/performance/performance.c"
IFS=$'\n'
for LINE in $(cat p | sed -n -e 's/^Running test //p' -e 's/.*correction factor //p' | sed 'N;s/\n/ /') ; do
(
IFS=' '
set -- $LINE
TESTNAME="$1"
FACTOR="$2"
LINENUMBER="$(grep -n "^ \"$TESTNAME\",$" "$PERFORMANCE_FILE" | cut -d: -f1)"
LINENUMBER=$((LINENUMBER + 2))
OLD_FACTOR="$(sed -n "$LINENUMBER s/^ \([0-9]\+\),$/\1/p" "$PERFORMANCE_FILE")"
NEW_FACTOR="$(awk -v factor="$FACTOR" -v old_factor="$OLD_FACTOR" 'BEGIN {print int(factor * old_factor + 0.5)}')"
sed -i "$LINENUMBER s/^ \([0-9]\+\),$/ $NEW_FACTOR,/" "$PERFORMANCE_FILE"
)
done
Afterwards, we get comparable factors:
$ ./build/gobject/tests/performance/performance -v > p2
$ cat p2 | sed -n -e 's/^Running test //p' -e 's/.*correction factor //p' | sed 'N;s/\n/ /'
simple-construction 0.98
simple-construction1 0.75
complex-construction 0.99
complex-construction1 0.96
complex-construction2 1.02
finalization 1.05
type-check 0.98
emit-unhandled 1.01
emit-unhandled-empty 1.10
emit-unhandled-generic 1.03
emit-unhandled-generic-empty 1.07
...
Of course, this measurement was taken in my setup. But I think it
brings the base factors into a comparable range for most users.
Also, the commit message shows an ugly script how you can re-generate
this for your own purposes.
Move the factor inside the PerformanceTest structure, so it can be
programatically accessed.
More importantly, the number is now expressed directly beside the test
setup (the PerformanceTest structure), all at one place.
Also, each test now gets a separate factor.
This change will be useful in the next commit. So far there is no
notable change in behavior.
Despite assigning the function to a variable, gcc can still detect that
the function never changes and most of the test code is optimized out.
Initialize it somewhere, where the compiler cannot prove that this
function pointer is always set to the same value.
We could also make the pointer volatile, but this approach seems
preferable to me.
If we enable `-Wfloat-conversion`, these warn about a possible loss of
precision due to an implicit conversion from `double` to some other
numeric type.
The warning is correct: there is a possible loss of precision here. In
these instances, we don’t care, as the floating point arithmetic is
being done to do some imprecise scaling or imprecise timing. A loss of
precision is not a problem.
So, add an explicit cast to squash the warning.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3405
The tests were using a lot of signed `int`s when actually the values
being handled were always non-negative. Use `unsigned int` consistently
throughout.
Take the opportunity to move declarations of loop iterator variables
into the loops.
This introduces no functional changes.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3405
Bumping the reference count from 1 to 2 (and back) is more expensive,
due to the check for toggle notifications.
We have a performance test already that hits that code path. Avoid that
for he "property-{get,set}" tests, so we avoid the known overhead and
test more relevant parts.
Despite all the efforts, there still seems to be a lot of noise in the
performance measurement. Especially, the first iterations seem to run
faster. Maybe that is because the kernel didn't yet determine that the
process is CPU bound and is less likely to schedule it out Or maybe it's
because burning the cycles heats up the CPU and it gets throttled after
a while. It's unclear why, and it's even unclear whether this really
happens. But from my observations, it seems to do.
Hence, more warm up.
- the first time we enter the test, ensure that we keep the CPU busy for
at 2 seconds. This additional warm up (WARM_UP_ALWAYS_SEC) is
global, and not per test.
- for each test, ignore the first 5% of the runs. It seems those tend to
run faster, thus skewing the results.
- if the user specifies a "--factor", the warm up operations are the
same and independent from external factors (such as time
measurements).
Note that this matters the most, when you want to run the executable
twice in a row and compare the results.
By default, the test estimates a run factor for each test. This means,
if you run performance under `perf`, the results are not comparable,
as the run time depends on the estimated factor.
Add an option, to set a fixed factor.
Of course, there is only one factor argument for all tests. Quite
possibly, you would want to run each test individually with a factor
appropriate for the test. On the other hand, all tests should be tuned
so that the same factor gives a similar test duration. So this may not
be a concern, or the tests should be adjusted. In any case, the option
is most useful when running only one test explicitly.
You can get a suitable factor by running the test once with "--verbose".
Another use case is if you run the benchmark under valgrind. Valgrind
slows down the run so much, that the estimated factor would be quite
off. As a result, the chosen code paths are different from the real run.
By setting the factor, the timing measurements don't affect the executed
code.
The default output is annoyingly verbose. You see
Running test simple-construction
simple-construction: Millions of constructed objects per second: 33.498
Running test simple-construction1
simple-construction1: Millions of constructed objects per second: 142.493
Running test complex-construction
complex-construction: Millions of constructed objects per second: 14.304
Running test complex-construction1
...
where the "Running test" lines just clutter the output. In fact so much
so, that my terminal fills up and I don't see the output of all tests in
one page. The "Running test" line is not so useful, because I mostly
care about the test result, and that line already contains the test
name.
Add an option to silence this.
Previously, the result lines are not unique, for example
Running test simple-construction
Millions of constructed objects per second: 27.629
Running test simple-construction1
Millions of constructed objects per second: 151.879
...
That is undesirable, because we might want to parse the test results
with a script, and that's easier when the line is unique.
Change to:
Running test simple-construction
simple-construction: Millions of constructed objects per second: 27.629
Running test simple-construction1
simple-construction1: Millions of constructed objects per second: 151.879
...
This doesn’t change the tests’ behaviour, but moves them to a slightly
more logical location.
They are still not installed or run by default.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Helps: #1434