Despite all the efforts, there still seems to be a lot of noise in the
performance measurement. Especially, the first iterations seem to run
faster. Maybe that is because the kernel didn't yet determine that the
process is CPU bound and is less likely to schedule it out Or maybe it's
because burning the cycles heats up the CPU and it gets throttled after
a while. It's unclear why, and it's even unclear whether this really
happens. But from my observations, it seems to do.
Hence, more warm up.
- the first time we enter the test, ensure that we keep the CPU busy for
at 2 seconds. This additional warm up (WARM_UP_ALWAYS_SEC) is
global, and not per test.
- for each test, ignore the first 5% of the runs. It seems those tend to
run faster, thus skewing the results.
- if the user specifies a "--factor", the warm up operations are the
same and independent from external factors (such as time
measurements).
Note that this matters the most, when you want to run the executable
twice in a row and compare the results.
By default, the test estimates a run factor for each test. This means,
if you run performance under `perf`, the results are not comparable,
as the run time depends on the estimated factor.
Add an option, to set a fixed factor.
Of course, there is only one factor argument for all tests. Quite
possibly, you would want to run each test individually with a factor
appropriate for the test. On the other hand, all tests should be tuned
so that the same factor gives a similar test duration. So this may not
be a concern, or the tests should be adjusted. In any case, the option
is most useful when running only one test explicitly.
You can get a suitable factor by running the test once with "--verbose".
Another use case is if you run the benchmark under valgrind. Valgrind
slows down the run so much, that the estimated factor would be quite
off. As a result, the chosen code paths are different from the real run.
By setting the factor, the timing measurements don't affect the executed
code.
The default output is annoyingly verbose. You see
Running test simple-construction
simple-construction: Millions of constructed objects per second: 33.498
Running test simple-construction1
simple-construction1: Millions of constructed objects per second: 142.493
Running test complex-construction
complex-construction: Millions of constructed objects per second: 14.304
Running test complex-construction1
...
where the "Running test" lines just clutter the output. In fact so much
so, that my terminal fills up and I don't see the output of all tests in
one page. The "Running test" line is not so useful, because I mostly
care about the test result, and that line already contains the test
name.
Add an option to silence this.
Previously, the result lines are not unique, for example
Running test simple-construction
Millions of constructed objects per second: 27.629
Running test simple-construction1
Millions of constructed objects per second: 151.879
...
That is undesirable, because we might want to parse the test results
with a script, and that's easier when the line is unique.
Change to:
Running test simple-construction
simple-construction: Millions of constructed objects per second: 27.629
Running test simple-construction1
simple-construction1: Millions of constructed objects per second: 151.879
...
On a fast laptop, this test currently takes about 7s to run, which is a
significant portion of the overall test suite time.
On a slower CI machine, especially running the test under valgrind, the
test can time out.
There’s no need to always run so many iterations: we run the tests under
CI so often that it’s likely a failure will eventually be hit (if there
is a bug) even with fewer iterations. We also now run the tests once a
week with `-m slow`, so the original iteration count will also still be
used then.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
It’s more helpful to always register the test, even if it’s normally
skipped, since then the skip is recorded in the test logs so people can
see what’s ‘missing’ from them.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
GSList doesn't seem the best choice here. It's benefits are that it's
relatively convenient to use (albeit not very efficient) and that an
empty list requires only the pointer to the list's head.
But for non-empty list, we need to allocate GSList elements. We can do
better, by writing more code.
I think it's worth optimizing GObject, at the expense of a bit(?) more
complicated code. The complicated code is still entirely self-contained,
so unless you review WeakRefData usage, it doesn't need to bother you.
Note that this can be easily measure to be a bit faster. But I think the
more important part is to safe some allocations. Often objects are
long-lived, and the GWeakRef will be tracked for a long time. It is
interesting, to optimize the memory usage of that.
- if the list only contains one weak reference, it's interned/embedded in
WeakRefData.list.one. Otherwise, an array is allocated and tracked
at WeakRefData.list.many.
- when the buffer grows, we double the size. When the buffer shrinks,
we reallocate to 50% when 75% are empty. When the buffer shrinks to
length 1, we free it (so that "list.one" is always used with a length
of 1).
That means, at worst case we waste 75% of the allocated buffer,
which is a choice in the hope that future weak references will be
registered, and that this is a suitable strategy.
- on architectures like x86_68, does this not increase the size of
WeakRefData.
Also, the number of weak-refs is now limited to 65535, and now an
assertion fails when you try to register more than that. But note that
the internal tracking just uses a linear search, so you really don't
want to register thousands of weak references on an object. If you do
that, the current implementation is not suitable anyway and you must
rethink your approach. Nor does it make sense to optimize the
implementation for such a use case. Instead, the implementation is
optimized for a few (one!) weak reference per object.
The implementation of GWeakRef tracks weak references in a way, that
requires linear search. That is probably best, for an expected low
number of entries (e.g. compared to the overhead of having a hash
table). However, it means, if you create thousands of weak references,
performance start to degrade.
Add a test that creates 64k weak references. Just to see how it goes.
g_object_weak_ref() documentation refers to GWeakRef as thread-safe
replacement. However, it's not clear to me, how GWeakRef is a
replacement for a callback. I think, it means, that you combine
g_object_weak_ref() with GWeakRef, to both hold a (thread-safe) weak
reference and get a notification on destruction.
Add a test, that GWeakRef is already cleared inside the GWeakNotify
callback.
This isn’t normally hit because it’s in a test which is disabled unless
run with `-m thorough`.
The data is owned by `g_test_add_data_func_full()` until the end of the
process.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
This isn’t normally hit because it’s in a test which is disabled unless
run with `-m thorough`.
The `GParamSpec` is initially floating, but its floating ref is sunk by
`g_object_interface_install_property()` (regardless of whether that call
succeeds or aborts). The behaviour of
`g_object_interface_install_property()` in this respect may have changed
more recently than the test was written.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Usually, after g_pointer_bit_lock() we want to read the pointer that we
have. In many cases, when we g_pointer_bit_lock() a pointer, we can
access it afterwards without atomic, as nobody is going to modify the
pointer then.
However, gdataset also supports g_datalist_set_flags(), so the pointer
may change at any time and we must always use atomics to read it. For
that reason, g_datalist_lock_and_get() does an atomic read right after
g_pointer_bit_lock().
g_pointer_bit_lock() can easily access the value that it just set. Add
g_pointer_bit_lock_and_get() which can return the value that gets set
afterwards.
Aside from saving the second atomic-get in certain scenarios, the
returned value is also atomically the one that we just set.
The existing g_pointer_bit_lock() and g_pointer_bit_unlock() API
requires the user to understand/reimplement how bits of the pointer get
mangled. Add helper functions for that.
The useful thing to do with g_pointer_bit_lock() API is to get/set
pointers while having it locked. For example, to set the pointer a user
can do:
g_pointer_bit_lock (&lockptr, lock_bit);
ptr2 = set_bit_pointer_as_if_locked(ptr, lock_bit);
g_atomic_pointer_set (&lockptr, ptr2);
g_pointer_bit_unlock (&lockptr, lock_bit);
That has several problems:
- it requires one extra atomic operations (3 instead of 2, in the
non-contended case).
- the first g_atomic_pointer_set() already wakes blocked threads,
which find themselves still being locked and needs to go back to
sleep.
- the user needs to re-implement how bit-locking mangles the pointer so
that it looks as if it were locked.
- while the user tries to re-implement what glib does to mangle the
pointer for bitlocking, there is no immediate guarantee that they get
it right.
Now we can do instead:
g_pointer_bit_lock(&lockptr, lock_bit);
g_pointer_bit_unlock_and_set(&lockptr, lock_bit, ptr, 0);
This will also emit a critical if @ptr has the locked bit set.
g_pointer_bit_lock() really only works with pointers that have a certain
alignment, and the lowest bits unset. Otherwise, there is no space to
encode both the locking and all pointer values. The new assertion helps
to catch such bugs.
Also, g_pointer_bit_lock_mask_ptr() is here, so we can do:
g_pointer_bit_lock(&lockptr, lock_bit);
/* set a pointer separately, when g_pointer_bit_unlock_and_set() is unsuitable. */
g_atomic_pointer_set(&lockptr, g_pointer_bit_lock_mask_ptr(ptr, lock_bit, TRUE, 0, NULL));
...
g_pointer_bit_unlock(&lockptr, lock_bit);
and:
g_pointer_bit_lock(&lockptr, lock_bit);
/* read the real pointer after getting the lock. */
ptr = g_pointer_bit_lock_mask_ptr(lockptr, lock_bit, FALSE, 0, NULL));
...
g_pointer_bit_unlock(&lockptr, lock_bit);
In general, we must not call out to external, unknown code while holding
a lock. That is prone to dead lock.
g_object_ref() can emit a toggle notification. In g_weak_ref_set(), we
must not do that while holding the lock.
GWeakRef calls g_object_ref() while holding a read lock.
g_object_ref() can emit a toggle notification.
If you then inside the toggle notification setup a GWeakRef
(to the same or another object), the code will try to get
a write lock. Deadlock will happen.
Add a test to "show" that (the breaking part is commented out).
Will be fixed next.
In g_object_unref(), we call _object_unref_clear_weak_locations() before
and after dispose() already. At those places it is necessary to do.
Calling it a third time during g_object_real_dispose() seems not useful,
has unnecessary overhead and actually undesirable.
In particular, because g_object_real_dispose() is the implementation for
the virtual function GObject.dispose(). For subclasses that override dispose(),
it's not well defined at which point they should chain up the parent
implementation (for dispose(), I'd argue that usually they chain up at
the end of their own code). If they chain up at the end, this has no
effect.
This only really matters if you try to register GWeakRef during dipose
and/or resurrect the object.
static void dispose(GObject *object)
{
g_weak_ref_set(&global_weak_ref, object);
global_ref = g_object_ref(object);
G_OBJECT_CLASS (parent_class)->dispose (object);
}
the object was resurrected, but g_object_real_dispose() would clear the
weak ref. That is not desirable, nor does it make sense.
Instead, the virtual function dispose() is called from two places, from
g_object_unref() and g_object_run_dispose(). In both cases, it is
ensured that weak locations are cleared *after* dispatching the virtual
function. Don't do it somewhere in the middle from
g_object_real_dispose().
The previous g_object_unref() was racy. There were three places where we
decremented the ref count, but still accessed the object afterwards
(while assuming that somebody else might still hold a reference). For
example:
if (!g_atomic_int_compare_and_exchange_full ((int *) &object->ref_count,
old_ref, old_ref - 1,
&old_ref))
continue;
TRACE (GOBJECT_OBJECT_UNREF (object, G_TYPE_FROM_INSTANCE (object), old_ref));
/* if we went from 2->1 we need to notify toggle refs if any */
if (old_ref == 2 && OBJECT_HAS_TOGGLE_REF (object))
{
/* The last ref being held in this case is owned by the toggle_ref */
toggle_refs_notify (object, TRUE);
}
After we decrement the reference count (and gave up our reference), we
are only allowed to access object if we know we have the only possible
reference to it. In particular, if old_ref is larger than 1, then
somebody else holds references and races against destroying object.
The object might be a dangling pointer already.
This is slightly complicated due to toggle references and clearing of
weak-locations.
For toggle references, we must take a lock on the mutex. Luckily, that
is only necessary, when the current reference count is exactly 2.
Note that we emit the TRACE() after the ref count was already decreased.
If another thread unrefs the object, inside the TRACE() we might have a
dangling pointer. That would only be fixable, by emitting the TRACE()
before the actual unref (which has its own problems). This problem
already existed previously.
The change to the test is necessary and correct. Before this patch,
g_object_unref() would call dispose() and decrement the reference count
right after.
In the test case at gobject/tests/reference.c:1108, the reference count
after dispose and decrement is 1. Then it thaws the queue notification,
which emits a property changed signal. The test then proceeds to
reference the object again and notifying the toggle reference.
Previously, the toggle reference was notified 3 times.
After this change, the property changed signal is emitted before
decreasing the reference count. Taking a reference then does not cause
an additional toggle on+off, so in total only one toggle happens.
That accounts for the change in the test. The new behavior is
correct.
Aside from checking that we're accessing the global GParamSpecPool
without necessarily initializing GObjectClass, we should also verify
that we're doing so safely without the class init lock.
Ensure that the fix in commit af024b6d7e7d3fbef23c1f7d1f7704fc90ac4fb1
works, by replicating what gobject-introspection does:
- initialise the default GTypeInterface from a GType without also
initialising GObjectClass
- install a property for the interface
- find the GParamSpec using g_object_interface_find_property()
Modify all the similar Python test wrappers to set
`G_DEBUG=fatal-warnings` in the environment of the program being tested,
so we can catch unexpected warnings/criticals.
Adding this because I noticed it was missing, not because I noticed a
warning/critical was being ignored.
Signed-off-by: Philip Withnall <philip@tecnocode.co.uk>
This patch is based upon Garrett Regier's work from 2015 to provide
some reliability and predictability to how disposal handles weak
reference state.
A primary problem is that GWeakRef and GWeakNotify state is shared and
therefore you cannot rely on GWeakRef status due to a GWeakNotify
calling into second-degree code.
It's important to ensure that both weak pointer locations and GWeakRef
will do the proper thing before user callbacks are executed during
disposal. Otherwise, user callbacks cannot rely on the status of their
weak pointers. That would be mostly okay but becomes an issue when
second degree objects are then disposed before any notification of
dependent object disposal.
Consider objects A and B.
`A` contains a reference to `B` and `B` contains a `GWeakRef` to `A`.
When `A` is disposed, `B` may be disposed as a consequence but has not
yet been notified that `A` has been disposed. It's `GWeakRef` may also
cause liveness issues if `GWeakNotify` on `A` result in tertiary code
running which wants to interact with `B`.
This example is analagous to how `GtkTextView` and `GtkTextBuffer` work
in text editing applications.
To provide application and libraries the ability to handle this using
already existing API, `GWeakRef` is separated into it's own GData quark
so that weak locations and `GWeakRef` are cleared before user code is
executed as a consequence of `GData` cleanup.
# Conflicts:
# gobject/tests/signals.c
There’s no reason that anyone can think of that this should be
disallowed. It’s useful for language runtimes like GJS to be able to
find out the allocation size of dynamic GObjects.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Fixes: #623
UAC will terminate this test program from running in 32-bit x86 builds as
it believes that it will alter Windows. In order to make this run, we
create a manifest file for 32-bit Windows builds in order to tell UAC
that this program should not need admin privileges.
This will allow the entire test suite for GLib to run on 32-bit Windows
builds.
For now, the function parse_trigraph() defined in gobject/glib-mkenums
script was not taking double-quotes characters into account:
>>> parse_trigraph('name="eek, a comma"')
{'name': '"eek', 'a': None}
This patch take double-quotes characters into account:
>>> parse_trigraph('name="eek, a comma"')
{'name': 'eek, a comma'}
Closes issue #65
We used to call this function as unlocked, with a node value that
could be invalid at the point of the call, so let's ensure that when
we call such function it's defined, and then reduce the access to the
signal node members when we're unlocked or after a lock/unlock operation
that may have changed it.
As per this, add more tests handling multiple signal hooks cases that we
did not cover before.
While x86_64 has enough precision in long double to do a round trip
from guint64 to long double and back, this is platform-specific, and
is a disservice to users trying to debug failing unit tests on other
architectures where it loses precision for g_assert_cmp{int,uint,hex}.
See also https://bugzilla.gnome.org/show_bug.cgi?id=788385 which
mentions having to add casts to specifically silence the compiler on
platforms where the precision loss occurs.
Meanwhile, g_assert_cmpuint() does an unsigned comparison, but outputs
signed values if the comparison fails, which is confusing.
Fix both issues by introducing a new g_assertion_message_cmpint()
function with a new 'u' numtype. For backwards compatibility, the
macros still call into the older g_assertion_message_cmpnum() when not
targetting 2.78, and that function still works when passed 'i' and 'x'
types even though code compiled for 2.78 and later will never invoke
it with numtype anything other than 'f'. Note that g_assert_cmpmem
can also take advantage of the new code, even though in practice,
comparison between two size_t values representing array lengths that
can actually be compiled is unlikely to have ever hit the precision
loss. The macros in signals.c test code does not have to worry about
versioning, since it is not part of the glib library proper.
Closes#2997
Signed-off-by: Eric Blake <eblake@redhat.com>
Calling g_signal_handlers_block/unblock/disconnect_matched with only G_SIGNAL_MATCH_ID
do not match any handlers and return 0.
Fixes: #2980
Signed-off-by: Przemyslaw Gorszkowski <pgorszkowski@igalia.com>
The use of ‘OR’ in the existing documentation suggests that the matching
is disjunctive, but it’s actually conjunctive. Clarify that in the
documentation and add a test.
Spotted while reviewing
https://gitlab.gnome.org/GNOME/glib/-/merge_requests/3376.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
They just listed built files. Since the move to Meson, these are all
kept in a separate build directory, not the source tree, so don’t need
to be ignored.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
They take too long and time out, and are not particularly useful to run
under valgrind because they aren’t designed to test code coverage.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
In all these cases we don't really care about running the test file,
while building and basic execution it is relevant.
Also they don't support TAP at all.
Meson supports tap protocol results parsing, allowing us to track better
the tests that are running (and the ones that are actually skipped) without
manually parsing the test output.
However this also implies that using the verbose mode for a test doesn't
show its output by default (unless there are failures).