Despite all the efforts, there still seems to be a lot of noise in the
performance measurement. Especially, the first iterations seem to run
faster. Maybe that is because the kernel didn't yet determine that the
process is CPU bound and is less likely to schedule it out Or maybe it's
because burning the cycles heats up the CPU and it gets throttled after
a while. It's unclear why, and it's even unclear whether this really
happens. But from my observations, it seems to do.
Hence, more warm up.
- the first time we enter the test, ensure that we keep the CPU busy for
at 2 seconds. This additional warm up (WARM_UP_ALWAYS_SEC) is
global, and not per test.
- for each test, ignore the first 5% of the runs. It seems those tend to
run faster, thus skewing the results.
- if the user specifies a "--factor", the warm up operations are the
same and independent from external factors (such as time
measurements).
Note that this matters the most, when you want to run the executable
twice in a row and compare the results.
By default, the test estimates a run factor for each test. This means,
if you run performance under `perf`, the results are not comparable,
as the run time depends on the estimated factor.
Add an option, to set a fixed factor.
Of course, there is only one factor argument for all tests. Quite
possibly, you would want to run each test individually with a factor
appropriate for the test. On the other hand, all tests should be tuned
so that the same factor gives a similar test duration. So this may not
be a concern, or the tests should be adjusted. In any case, the option
is most useful when running only one test explicitly.
You can get a suitable factor by running the test once with "--verbose".
Another use case is if you run the benchmark under valgrind. Valgrind
slows down the run so much, that the estimated factor would be quite
off. As a result, the chosen code paths are different from the real run.
By setting the factor, the timing measurements don't affect the executed
code.
The default output is annoyingly verbose. You see
Running test simple-construction
simple-construction: Millions of constructed objects per second: 33.498
Running test simple-construction1
simple-construction1: Millions of constructed objects per second: 142.493
Running test complex-construction
complex-construction: Millions of constructed objects per second: 14.304
Running test complex-construction1
...
where the "Running test" lines just clutter the output. In fact so much
so, that my terminal fills up and I don't see the output of all tests in
one page. The "Running test" line is not so useful, because I mostly
care about the test result, and that line already contains the test
name.
Add an option to silence this.
Previously, the result lines are not unique, for example
Running test simple-construction
Millions of constructed objects per second: 27.629
Running test simple-construction1
Millions of constructed objects per second: 151.879
...
That is undesirable, because we might want to parse the test results
with a script, and that's easier when the line is unique.
Change to:
Running test simple-construction
simple-construction: Millions of constructed objects per second: 27.629
Running test simple-construction1
simple-construction1: Millions of constructed objects per second: 151.879
...
They take too long and time out, and are not particularly useful to run
under valgrind because they aren’t designed to test code coverage.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
We have tests that are failing in some environments, but it's
difficult to handle them because:
- for some environments we just allow all the tests to fail: DANGEROUS
- when we don't allow failures we have flacky tests: A CI pain
So, to avoid this and ensure that:
- New failing tests are tracked in all platforms
- gitlab integration on tests reports is working
- coverage is reported also for failing tests
Add support for `can_fail` keyword on tests that would mark the test as
part of the `failing` test suite.
Not adding the suite directly when defining the tests as this is
definitely simpler and allows to define conditions more clearly (see next
commits).
Now, add a default test setup that does not run the failing and flaky tests
by default (not to bother distributors with testing well-known issues) and
eventually run all the tests in CI:
- Non-flaky tests cannot fail in all platforms
- Failing and Flaky tests can fail
In both cases we save the test reports so that gitlab integration is
preserved.
ginsttest-runner defaults to timing out each test after 5 minutes,
but gobject/tests/performance/performance.c defaults to running each
of 18 tests for 15 seconds. The result is close enough to 5 minutes
that the setup overhead is enough to make it time out.
We're only running these tests to prove that they still work, not to
get meaningful performance numbers, so cut them down to 1 second per
test-case (the result of which is that performance.c takes about a
minute).
Signed-off-by: Simon McVittie <smcv@collabora.com>
Pass arguments to them so that they take minimal time. This will not
produce useful performance profiling results, but will smoketest that
the tests still run, don’t crash, and therefore probably aren’t
bitrotting too badly.
This is useful because a fair amount of work has gone into these
performance tests, and they’re useful every few years to analyse and
compare GObject performance. We don’t want them to bitrot between uses.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
When running the test with `-s 0` it would previously crash. Fix that,
and make it so that it only does a single test run in that case.
This will be useful in an upcoming commit for smoketesting the test to
avoid bitrot.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
This doesn’t change the tests’ behaviour, but moves them to a slightly
more logical location.
They are still not installed or run by default.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Helps: #1434