Some tweakings of the time spend during warm up. That mostly matters if
you set very short "--seconds", which can make sense for quickly
checking something. Then the warmup should not take more thatn a certain
percentage of the requested runs.
When we have a constant factor, we still want not to run for more than
10% of the overall test time ... except, we still want to run at least
ESTIMATE_ROUND_TIME_N_RUNS (because we skip the estimation step below).
Also, adjust WARM_UP_ALWAYS_SEC to be only 20% of the test time, for
short test runs.
Also, don't print the messages about "Estimating round time" with a
fixed "--factor".
One test round aims to run for 8msec (TARGET_ROUND_TIME).
As the "--seconds" parameter previously took whole integer numbers, that
meant that we would run at least 125 rounds.
For a quick run, we should also support even faster runs, e.g. to select
only 0.5 seconds.
Historically, there was a verbose mode and a non-verbose mode.
In non-verbose mode (the default), we would still print two lines:
Running test property-set
Property set per second: 39329344
Later, this was changed to include the test name in the second line, so
we would print:
Running test property-set
property-set: Property set per second: 39329344
But this first line is really just noise, making parsing and reading the
results harder. Hence a "--quiet" mode was added, that only printed one
line per test while keeping the previous default behavior. And all was
good.
Except, unless you want verbose mode, this "Running test" line is still
not very useful and mainly clutters the output.
Supporess it now also in normal mode. It is now only printed in verbose
mode.
This also makes the "--quiet" option do nothing. The option is still
there, maybe we find a future use and we should not break the command
line API by dropping an argument.
g_object_set() optimizes the case where there are no signals connected.
Add a test that sets the property with signals. Obviously, this one is
much slower, since we will freeze and thaw the notifications.
For the test, we actually care to find the fastest test run (and take
"min_elapsed"). That is useful, because that is the run where we
possibly have the least interference from external factors, it was the
run where the CPU solved the problem as fast as it could.
As such, we should not reject the first 5% as additional warm up. If the
first 5% are slower (and part of "warmup"), then they are anyway not
considered. If there is a the fastest run in the first 5 percent, then
we want to take that.
Also note, that the calculation of "avg_elapsed" was wrong, since it
divided by the full "num_rounds" while only summing 95% of the runs.
This is fixed too by now considering all runs.
Fixes: 282d536fd229 ('tests/performance: ensure to always warm up for 2 seconds')
- fix and improve usage output for "performance-run.sh" script.
- add a sleep after compilation. It seems to me, that the first run
usually performs better, which might be because the temperature of the
CPU raises and the CPU gets throttled. Unclear whether that is really
the case, but add a sleep so that all runs start under similar
conditions where the CPU was idle for a moment.
When running the test (without parameters), it estimates a factor for
the run size for each test. That is useful for running a reasonable size
of the test, on different machines.
However, when comparing two runs, it seems important that both runs
share a common factor. Otherwise, the factor is determined differently,
and the test is less comparable. For that there is the "--factor" option
or the GLIB_PERFORMANCE_FACTOR environment variable.
However, the factor option can only set the factors for all tests at the
same time. Optimally, one factor is roughly suitable for all tests, but
it is not, as currently the detected factors on my machine are widely
different
$ ./build/gobject/tests/performance/performance -v > p
$ cat p | sed -n -e 's/^Running test //p' -e 's/.*correction factor //p' | sed 'N;s/\n/ /'
simple-construction 34.78
simple-construction1 145.45
complex-construction 11.08
complex-construction1 20.46
complex-construction2 23.74
finalization 4.74
type-check 37.74
emit-unhandled 5.63
emit-unhandled-empty 49.69
emit-unhandled-generic 7.17
emit-unhandled-generic-empty 50.63
emit-unhandled-args 5.20
emit-handled 3.86
emit-handled-empty 4.01
emit-handled-generic 3.96
emit-handled-generic-empty 7.04
emit-handled-args 3.78
notify-unhandled 52.63
notify-by-pspec-unhandled 156.86
notify-handled 2.55
notify-by-pspec-handled 2.66
property-set 34.63
property-get 32.92
refcount 0.83
refcount-1 2.30
refcount-toggle 1.33
Adjust the base factors with these measurements.
PERFORMANCE_FILE="./gobject/tests/performance/performance.c"
IFS=$'\n'
for LINE in $(cat p | sed -n -e 's/^Running test //p' -e 's/.*correction factor //p' | sed 'N;s/\n/ /') ; do
(
IFS=' '
set -- $LINE
TESTNAME="$1"
FACTOR="$2"
LINENUMBER="$(grep -n "^ \"$TESTNAME\",$" "$PERFORMANCE_FILE" | cut -d: -f1)"
LINENUMBER=$((LINENUMBER + 2))
OLD_FACTOR="$(sed -n "$LINENUMBER s/^ \([0-9]\+\),$/\1/p" "$PERFORMANCE_FILE")"
NEW_FACTOR="$(awk -v factor="$FACTOR" -v old_factor="$OLD_FACTOR" 'BEGIN {print int(factor * old_factor + 0.5)}')"
sed -i "$LINENUMBER s/^ \([0-9]\+\),$/ $NEW_FACTOR,/" "$PERFORMANCE_FILE"
)
done
Afterwards, we get comparable factors:
$ ./build/gobject/tests/performance/performance -v > p2
$ cat p2 | sed -n -e 's/^Running test //p' -e 's/.*correction factor //p' | sed 'N;s/\n/ /'
simple-construction 0.98
simple-construction1 0.75
complex-construction 0.99
complex-construction1 0.96
complex-construction2 1.02
finalization 1.05
type-check 0.98
emit-unhandled 1.01
emit-unhandled-empty 1.10
emit-unhandled-generic 1.03
emit-unhandled-generic-empty 1.07
...
Of course, this measurement was taken in my setup. But I think it
brings the base factors into a comparable range for most users.
Also, the commit message shows an ugly script how you can re-generate
this for your own purposes.
Move the factor inside the PerformanceTest structure, so it can be
programatically accessed.
More importantly, the number is now expressed directly beside the test
setup (the PerformanceTest structure), all at one place.
Also, each test now gets a separate factor.
This change will be useful in the next commit. So far there is no
notable change in behavior.
Despite assigning the function to a variable, gcc can still detect that
the function never changes and most of the test code is optimized out.
Initialize it somewhere, where the compiler cannot prove that this
function pointer is always set to the same value.
We could also make the pointer volatile, but this approach seems
preferable to me.
If we enable `-Wfloat-conversion`, these warn about a possible loss of
precision due to an implicit conversion from `double` to some other
numeric type.
The warning is correct: there is a possible loss of precision here. In
these instances, we don’t care, as the floating point arithmetic is
being done to do some imprecise scaling or imprecise timing. A loss of
precision is not a problem.
So, add an explicit cast to squash the warning.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3405
The tests were using a lot of signed `int`s when actually the values
being handled were always non-negative. Use `unsigned int` consistently
throughout.
Take the opportunity to move declarations of loop iterator variables
into the loops.
This introduces no functional changes.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3405
Bumping the reference count from 1 to 2 (and back) is more expensive,
due to the check for toggle notifications.
We have a performance test already that hits that code path. Avoid that
for he "property-{get,set}" tests, so we avoid the known overhead and
test more relevant parts.
Despite all the efforts, there still seems to be a lot of noise in the
performance measurement. Especially, the first iterations seem to run
faster. Maybe that is because the kernel didn't yet determine that the
process is CPU bound and is less likely to schedule it out Or maybe it's
because burning the cycles heats up the CPU and it gets throttled after
a while. It's unclear why, and it's even unclear whether this really
happens. But from my observations, it seems to do.
Hence, more warm up.
- the first time we enter the test, ensure that we keep the CPU busy for
at 2 seconds. This additional warm up (WARM_UP_ALWAYS_SEC) is
global, and not per test.
- for each test, ignore the first 5% of the runs. It seems those tend to
run faster, thus skewing the results.
- if the user specifies a "--factor", the warm up operations are the
same and independent from external factors (such as time
measurements).
Note that this matters the most, when you want to run the executable
twice in a row and compare the results.
By default, the test estimates a run factor for each test. This means,
if you run performance under `perf`, the results are not comparable,
as the run time depends on the estimated factor.
Add an option, to set a fixed factor.
Of course, there is only one factor argument for all tests. Quite
possibly, you would want to run each test individually with a factor
appropriate for the test. On the other hand, all tests should be tuned
so that the same factor gives a similar test duration. So this may not
be a concern, or the tests should be adjusted. In any case, the option
is most useful when running only one test explicitly.
You can get a suitable factor by running the test once with "--verbose".
Another use case is if you run the benchmark under valgrind. Valgrind
slows down the run so much, that the estimated factor would be quite
off. As a result, the chosen code paths are different from the real run.
By setting the factor, the timing measurements don't affect the executed
code.
The default output is annoyingly verbose. You see
Running test simple-construction
simple-construction: Millions of constructed objects per second: 33.498
Running test simple-construction1
simple-construction1: Millions of constructed objects per second: 142.493
Running test complex-construction
complex-construction: Millions of constructed objects per second: 14.304
Running test complex-construction1
...
where the "Running test" lines just clutter the output. In fact so much
so, that my terminal fills up and I don't see the output of all tests in
one page. The "Running test" line is not so useful, because I mostly
care about the test result, and that line already contains the test
name.
Add an option to silence this.
Previously, the result lines are not unique, for example
Running test simple-construction
Millions of constructed objects per second: 27.629
Running test simple-construction1
Millions of constructed objects per second: 151.879
...
That is undesirable, because we might want to parse the test results
with a script, and that's easier when the line is unique.
Change to:
Running test simple-construction
simple-construction: Millions of constructed objects per second: 27.629
Running test simple-construction1
simple-construction1: Millions of constructed objects per second: 151.879
...
They take too long and time out, and are not particularly useful to run
under valgrind because they aren’t designed to test code coverage.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
We have tests that are failing in some environments, but it's
difficult to handle them because:
- for some environments we just allow all the tests to fail: DANGEROUS
- when we don't allow failures we have flacky tests: A CI pain
So, to avoid this and ensure that:
- New failing tests are tracked in all platforms
- gitlab integration on tests reports is working
- coverage is reported also for failing tests
Add support for `can_fail` keyword on tests that would mark the test as
part of the `failing` test suite.
Not adding the suite directly when defining the tests as this is
definitely simpler and allows to define conditions more clearly (see next
commits).
Now, add a default test setup that does not run the failing and flaky tests
by default (not to bother distributors with testing well-known issues) and
eventually run all the tests in CI:
- Non-flaky tests cannot fail in all platforms
- Failing and Flaky tests can fail
In both cases we save the test reports so that gitlab integration is
preserved.
ginsttest-runner defaults to timing out each test after 5 minutes,
but gobject/tests/performance/performance.c defaults to running each
of 18 tests for 15 seconds. The result is close enough to 5 minutes
that the setup overhead is enough to make it time out.
We're only running these tests to prove that they still work, not to
get meaningful performance numbers, so cut them down to 1 second per
test-case (the result of which is that performance.c takes about a
minute).
Signed-off-by: Simon McVittie <smcv@collabora.com>
Pass arguments to them so that they take minimal time. This will not
produce useful performance profiling results, but will smoketest that
the tests still run, don’t crash, and therefore probably aren’t
bitrotting too badly.
This is useful because a fair amount of work has gone into these
performance tests, and they’re useful every few years to analyse and
compare GObject performance. We don’t want them to bitrot between uses.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
When running the test with `-s 0` it would previously crash. Fix that,
and make it so that it only does a single test run in that case.
This will be useful in an upcoming commit for smoketesting the test to
avoid bitrot.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
This doesn’t change the tests’ behaviour, but moves them to a slightly
more logical location.
They are still not installed or run by default.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Helps: #1434