It was replaced by the GData lock and g_datalist_id_update_atomic().
Note that we introduced object_bit_lock() to replace global mutexes.
Now, object_bit_lock() is itself completely replaced, by the GData lock
(and hooking it via g_datalist_id_update_atomic()).
This means, all mutex-like locks now only go through the GData lock on
the GObject's qdata.
The object_bit_lock() API is still here, because it might be useful. For
example, there might be cases where the GData lock is not sufficient.
Also, we introduced GObjectPrivate mainly for object_bit_lock(). On 32
bit architecture, there is an overhead with that, as we cannot fit the
optional_flags into GObject. This is not dropped either. The optional
flags seem rather useful even without the object_bit_lock(). For
example, the OPTIONAL_FLAG_EVER_HAD_WEAK_REF flag is an important
optimization.
Note that OBJECT_HAS_TOGGLE_REF() is no longer used. This is good and
bad.
What we soon will do, is drop the OPTIONAL_BIT_LOCK_TOGGLE_REFS lock,
and only call g_datalist_id_update_atomic(). This way, we only need to
take the GData lock.
However, g_datalist_id_update_atomic() also needs to do a linear search
for quark_toggle_refs key. Previously, with OBJECT_HAS_TOGGLE_REF() we
didn't require this.
So the new approach is only faster, if searching the GQuark key is
reasonably fast. GData makes an effort to implement fast lookup, so
this is suppoed to be fine.
This allows the caller to take the lock on the GData first, and perform
some operations.
This is useful under the assumption, that the caller can find cases
where calling g_datalist_id_update_atomic() is unnecessary, but it's
still necessary to perform that check while holding a lock.
That will yield better performance, if we can avoid calling
g_datalist_id_update_atomic(). That matters for checking the
toggle-notify in g_object_ref()/g_object_unref().
Note that with "already_locked", g_datalist_id_update_atomic() will
still unlock the GData at the end. That is because the API of
g_datalist_id_update_atomic() requires that it might re-allocate the
buffer, and it can do a more efficient unlock in that case, instead of
leaving it to the caller. The usage and purpose of this parameter is
very special, so this asymmetry is taken because the only callers will
be fine with this behavior, and it results in potentially more efficient
unlocking.
Previously, at two places (in g_object_real_dispose() and shortly before
finalize()), we would call
g_datalist_id_set_data (&object->qdata, quark_weak_notifies, NULL);
This clears @quark_weak_notifies at once and then invokes all
notifications.
This means, if you were inside a notification and called
g_object_weak_unref() on the object for *another* weak-reference, then
an exception failed:
GLib-GObject-FATAL-CRITICAL: g_object_weak_unref_cb: couldn't find weak ref 0x401320(0x16b9fe0)
Granted, maybe inside a GWeakNotify you shouldn't call anything on
where_the_object_was. However, unregistering things (like calling
g_object_weak_unref() should still reasonably work.
Instead, now remove each weak notification one by one and invoke it.
https://gitlab.gnome.org/GNOME/glib/-/issues/1002
Note that we have two calls
g_datalist_id_set_data (&object->qdata, quark_weak_notifies, NULL);
that don't take OPTIONAL_BIT_LOCK_WEAK_REFS lock. One is right before
finalize(). At that point, no other thread can hold a reference to the
object to race. One call is from g_object_real_dispose(). At this point,
theoretically the object could have been resurrected and a pointer
passed to another thread, so it can race against g_object_weak_ref()
and g_object_weak_unref().
Fix that, by performing all operations on the WeakRefStack while holding
the GData lock. By using g_datalist_id_update_atomic(), we no longer
need the OPTIONAL_BIT_LOCK_WEAK_REFS lock. On the other hand, we might
now do slightly more work while holding the GData lock.
Also, during g_object_weak_unref() free the WeakRefStack, if there
are no more references.
The class of a GObject cannot change during its lifetime. And also the
the class itself not change (e.g. change the virtual methods).
As such, we can cache CLASS_NEEDS_NOTIFY() per object.
This way, our _g_object_has_notify_handler() only needs to compare the
flags, instead of fetching the class and check the vtable.
You might think, CLASS_NEEDS_NOTIFY() is cheaper to check, because it
doesn't need an atomic operation to fetch the class and compare the
vtable (unlike checking the object's optional-flags). You might also
think a constant flag should not be attached to the object's
optional-flags (which otherwise change). However, in all cases where we
care about CLASS_NEEDS_NOTIFY(), we anyway also check for
OPTIONAL_FLAG_HAS_NOTIFY_HANDLER which is right beside in the the
optional-flags. So having the OPTIONAL_FLAG_HAS_NEEDS_NOTIFY flag right
beside has no downside.
We are still very early. Initialize the flags without
an atomic.
Note that we also:
- above initialize object->ref_count without atomic.
- we called atomic operation set_object_in_construction(),
which sets certain flags, but relies on the other
flags being initialized correctly (they are not touched
by g_atomic_int_or()). Those other flags are also initialized
without atomic, relying on being memset() to zero.
When we set a property we usually tend to freeze the queue notification
and thaw it at the end. This always requires a per-object allocation
that is necessary to track the freeze count and frozen properties.
But there are cases cases, where we freeze only a single time and never
add a property to unfreeze. In such cases, we can avoid allocating a new
GObjectNotifyQueue instance.
Optimize for that case by initially adding a global, immutable sentinel
pointer "notify_queue_empty". Only when requiring a per-object queue,
allocate one.
This can be useful before calling dispose(). While there are probably
dispose functions that still try to set properties on the object (which
is the main reason to freeze the notification), most probably don't. In
this case, we can avoid allocating the memory during g_object_unref().
Another such case is object construction. If the object has no construct
properties and the user didn't specify any properties during
g_object_new(), we may well freeze the object but never add properties
to it. In that case too, we can get away without ever allocating the
GObjectNotifyQueue.
During object initialization, we may want to freeze the notifications,
but only do so once (and once unfreeze at the end).
Rework how that was done. We can avoid an additional GData lookup.
By now, GObjectNotifyQueue gets reallocated. So quite possibly if we
keep the queue, it is a dangling pointer.
That is error prone, but it's also unnecessary. All we need to know is
whether we bumped the freeze count and need to unfreeze. The queue
itself was not useful, because we anyway must take a lock (via
g_datalist_id_update_atomic()) to do anything with it.
Instead, use a nqueue_is_frozen boolean variable.
GSList is almost in all use cases a bad choice. It's bad for locality
and requires a heap allocation per entry.
Instead, use an array, and grow the buffer exponentially via realloc().
Now, that we use g_datalist_id_update_atomic(), it is also easy to
update the pointer. Hence, the GObjectNotifyQueue struct does not point
to an array of pspecs. Instead the entire GObjectNotifyQueue itself gets
reallocated, thus saving one heap allocation for the separate head
structure.
Now all accesses to quark_notify_queue are guarded by the GData lock.
Several non-trivial operations are implemented via
g_datalist_id_update_atomic().
The OPTIONAL_BIT_LOCK_NOTIFY lock is thus unnecessary and can be dropped.
Note that with the move to g_datalist_id_update_atomic(), we now
potentially do more work while holding the GData lock (e.g. some code
paths allocation additional memory). But note that
g_datalist_id_set_data() already has code paths where it must allocate
memory to track the GDataElt. Also, most objects are not used in
parallel, so holding the per-object (per-GData) lock longer does not
affect them. Also, many operations also require a object_bit_lock(), so
it seems very unlikely that you really could achieve higher parallelism
by taking more locks (and minimizing the time to hold the GData lock).
On the contrary, taking one lock less and doing all the work there is
beneficial.
A common pattern is to look whether a GData entry exists, and if it
doesn't, add it.
For that, we currently always must take a OPTIONAL_BIT_LOCK_NOTIFY lock.
This can be avoided, because GData already uses an internal mutex. By
using g_datalist_id_update_atomic(), we can perform all relevant
operations while holding that mutex.
Move functionality from g_object_notify_queue_freeze() inside
g_datalist_id_update_atomic().
The goal will be to drop the OPTIONAL_BIT_LOCK_NOTIFY lock in a later
commit.
Previously, we would call
g_datalist_id_set_data (&object->qdata, quark_closure_array, NULL);
which called destroy_closure_array() on the CArray.
At that point, it would iterate over the CArray, and invalidate all
closures. But note that this invokes external callbacks, which in turn
can destroy other closures, which can call object_remove_closure().
But now that closure can no longer be found and an assertion fails.
Instead of removing the entire CArray at once, remove each closure one
by one in a loop.
This problem is similar to issue 1002, except here it's about closure
watches instead of GWeakNotify.
Note that now we destroy closures one-by-one in a loop, and we iterate
the loop as long as we have closures. That makes a difference when a new
closure gets registered while we destroy them all. Previously, newly
registered closures would survive. It would be possible to implement the
previous behavior, but I think the new behavior is better. It is anyway
a very remote use case.
There are two calls to
g_datalist_id_set_data (&object->qdata, quark_closure_array, NULL);
that don't take a OPTIONAL_BIT_LOCK_CLOSURE_ARRAY lock. These are inside
g_object_real_dispose() and right before finalize(). The one before
finalize() is fine, becase we are already in a situation where nobody
else holds a reference on object.
However not so with g_object_real_dispose(). That is called after we
checked that there is only one strong reference left and we are inside
the call to dispose(). However, at that point (before chaining up
g_object_real_dispose()) the callee is able can pass the reference
to another thread. That other thread could create a Closure and destroy it
again. This calles object_remove_closure() (accessing CArray) which now
races against g_object_real_dispose() (destroying CArray).
Granted, this is very unlikely to happen. But let's try to avoid such
races in principle.
We can avoid this problem with less overhead by doing everything while
holding the GData lock, using g_datalist_id_update_atomic(). This is
probably even faster, as we don't need the additional
OPTIONAL_BIT_LOCK_CLOSURE_ARRAY lock.
Also free the empty closure data during object_remove_closure(). This
frees some unused memory.
Cache the function pointer for g_datalist_id_update_atomic() in a static
variable in "gobject.c" to avoid looking it up repeatedly.
g_datalist_id_update_atomic() is anyway internal API. Like GData is not
a useful data structure in general, this function is only useful for
something specific inside GObject.
It can be easily seen that _local_g_datalist_id_update_atomic is never
read without having a GObject at hand (because we call it on
`&object->qdata`). Thus initializing the pointer in
g_object_do_class_init() (under lock) is sufficient to ensure
thread-safe initialization. Note that we still set the pointer via
g_atomic_pointer_set(). This is done in an attempt to pacify thread
sanatizer.
Note that also with LTO enabled, the GLIB_PRIVATE_CALL() call cannot be
inlined. Previously we get:
0000000000011300 <_weak_ref_set>:
...
1131d: e8 ee 03 ff ff call 1710 <glib__private__@plt>
11322: 8b 35 0c b2 05 00 mov 0x5b20c(%rip),%esi # 6c534 <quark_weak_locations.lto_priv.0>
11328: 4c 89 e1 mov %r12,%rcx
1132b: 49 8d 7c 24 10 lea 0x10(%r12),%rdi
11330: 48 8d 15 b9 42 ff ff lea -0xbd47(%rip),%rdx # 55f0 <weak_ref_data_get_or_create_cb.lto_priv.0>
11337: ff 90 80 00 00 00 call *0x80(%rax)
afterwards:
0000000000011300 <_weak_ref_set>:
...
1131d: 48 8d 7e 10 lea 0x10(%rsi),%rdi
11321: 48 89 f1 mov %rsi,%rcx
11324: 48 8d 15 c5 42 ff ff lea -0xbd3b(%rip),%rdx # 55f0 <weak_ref_data_get_or_create_cb.lto_priv.0>
1132b: 8b 35 0b b2 05 00 mov 0x5b20b(%rip),%esi # 6c53c <quark_weak_locations.lto_priv.0>
11331: ff 15 f9 b1 05 00 call *0x5b1f9(%rip) # 6c530 <_local_g_datalist_id_update_atomic.lto_priv.0>
Also note, that the point here is not to optimize _weak_ref_set() (which
is not a hot path). There is work in progress that will use
g_datalist_id_update_atomic() for more purposes (and during more
relevant code paths of GObject).
None of the users actually care about this parameter. And it's unlikely
that they ever will. Also, the passed "key_id" is the argument from
g_datalist_id_update_atomic(). If the caller really cared to know the
"key_id" in the callback, they could pass it as additional user data.
The default values for construct properties always have to be set, even
if those properties are deprecated. The code to do that is in GLib, and
not under the control of the user (unless they completely override the
`constructor` vfunc, which is not recommended). So don’t emit a warning
for that if `G_ENABLE_DIAGNOSTICS` is enabled.
In particular, this fixes deprecation warnings being emitted for
properties of a parent class when chaining up with a custom constructor,
even when none of the child class code mentions the deprecated property.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Fixes: #3254
Avoid scan-build thinking that `new_wrdata` could be `NULL` on this
control path. It can’t be `NULL` if `new_object` is set.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #1767
It may not be obvious, but the moment unlock is called, the locker
instance may be destroyed.
See g_object_unref(), which calls toggle_refs_check_and_ref_or_deref().
It will check for toggle references while dropping the ref count from 2
to 1. It must decrement the ref count while holding the lock, but it
also must still unlock afterwards.
Note that the locker instance is on the object itself. Once we decrement
the ref count we give up our reference and another thread may race
against destroying the object. We thus must not touch object anymore.
How can we then still unlock?
This works correctly because:
- unlock operations must not touch the locker instance after unlocking.
- assume that another thread races g_object_unref() to destroy the
object, while we are about to call object_bit_unlock() in
toggle_refs_check_and_ref_or_deref(). Then that other thread will also
need to acquire the same lock (during g_object_notify_queue_freeze()).
It thus is blocked to destroy the object.
Add code comments about that.
We can only assert for having one toggle reference, after we confirmed
(under lock) that the ref count was in the toggle case.
Otherwise, if another thread refs/unrefs the object, we can hit a wrong
g_critical() assertion about
if (tstackptr->n_toggle_refs != 1)
{
g_critical ("Unexpected number of toggle-refs. g_object_add_toggle_ref() must be paired with g_object_remove_toggle_ref()");
Fixes: 9ae43169cfe0 ('gobject: fix race in toggle ref during g_object_ref()')
GSList doesn't seem the best choice here. It's benefits are that it's
relatively convenient to use (albeit not very efficient) and that an
empty list requires only the pointer to the list's head.
But for non-empty list, we need to allocate GSList elements. We can do
better, by writing more code.
I think it's worth optimizing GObject, at the expense of a bit(?) more
complicated code. The complicated code is still entirely self-contained,
so unless you review WeakRefData usage, it doesn't need to bother you.
Note that this can be easily measure to be a bit faster. But I think the
more important part is to safe some allocations. Often objects are
long-lived, and the GWeakRef will be tracked for a long time. It is
interesting, to optimize the memory usage of that.
- if the list only contains one weak reference, it's interned/embedded in
WeakRefData.list.one. Otherwise, an array is allocated and tracked
at WeakRefData.list.many.
- when the buffer grows, we double the size. When the buffer shrinks,
we reallocate to 50% when 75% are empty. When the buffer shrinks to
length 1, we free it (so that "list.one" is always used with a length
of 1).
That means, at worst case we waste 75% of the allocated buffer,
which is a choice in the hope that future weak references will be
registered, and that this is a suitable strategy.
- on architectures like x86_68, does this not increase the size of
WeakRefData.
Also, the number of weak-refs is now limited to 65535, and now an
assertion fails when you try to register more than that. But note that
the internal tracking just uses a linear search, so you really don't
want to register thousands of weak references on an object. If you do
that, the current implementation is not suitable anyway and you must
rethink your approach. Nor does it make sense to optimize the
implementation for such a use case. Instead, the implementation is
optimized for a few (one!) weak reference per object.
We can safely combine this, and use bit 30 of the ref-count for locking.
This leaves still 2^30-1 for the ref-count, which is more than enough,
because these references are only taken for a short time in
g_weak_ref_get() and g_weak_ref_set(). Note that one thread can at most
take one reference at a time, so the ref-count will always a smaller
number.
Also note, that obviously we will only take a bit lock while also
holding a reference. That means, when weak_ref_data_unref() decreases
the ref-count to zero, the bit will be unlocked as well.
The reason to do this is to free up some space in WeakRefData. Note that
(on x86_64) this doesn't actually make the struct smaller. It's
probably not reasonably possible to make WeakRefData smaller than it
already is (on x86_64). However, by combining the fields we have some
space for reuse without increasing the struct size. That space will be
used next.
Replace the global RWLock with per-object locking. Note that there are
three places where we needed to take the globlal lock. g_weak_ref_get(),
g_weak_ref_set() and in _object_unref_clear_weak_locations(), during
g_object_unref(). The calls during g_object_unref() seem the most
relevant here, where we would want to avoid a global lock. Luckily, that
global lock only had to be taken if the object ever had a GWeakRef
registered, so most objects wouldn't care. The global lock only affects
objects, that are ever set via g_weak_ref_set(). Still, try to avoid that
global lock.
Related to GWeakRef, there are various moments when we don't hold a
strong reference to the object. So the per-object lock cannot be on the
object itself, because when we want to unlock we no longer have access
to the object. And we cannot take a strong reference on the GObject
either, because that triggers toggle notifications. And worse, when one
thread holds the last strong reference of an object and decides to
destroy it, then a `g_weak_ref_set(weak_ref, NULL)` on another thread
could acquire a temporary reference, and steal the destruction of the
object from the other thread.
Instead, we already had a "quark_weak_locations" GData and an allocated
structure for tracking the GSList with GWeakRef. Extend that to be
ref-counted and have a separate lifetime from the object. This
WeakRefData now contains the per-object mutex for locking. We can
request the WeakRefData from an object, take a reference to keep it
alive, and use it to hold the lock without having the object alive.
We also need a bitlock on GWeakRef itself. So to set or get a
GWeakRef we must take the per-object lock on the WeakRefData and the
lock on the GWeakRef (in this order). During g_weak_ref_set() there may
be of course two objects (and two WeakRefData) involved, the previous
and the new object.
Note that now once an object gets a WeakRefData allocated, it can no
longer be freed. It must stick until the object gets destroyed. This
allocation happens, once an object is set via g_weak_ref_set(). In
other words, objects involved with GWeakRef will have extra data
allocated.
It may be possible to also release the WeakRefData once it's no longer
needed. However, that would be quite complicated, and require additional
atomic operations, so it's not clear to be worth it. So it's not done.
Instead, the WeakRefData sticks on the object once it's set.
_object_unref_clear_weak_locations() is called twice during
g_object_unref(). In both cases, it is when we expect that the reference
count is 1 and we are either about to call dispose() or finalize().
At this point, we must check for GWeakRef to avoid a race that the ref
count gets increased just at that point.
However, we can do something better than to always take the global lock.
On the object, whenever an object is set to a GWeakRef, set a flag
OPTIONAL_FLAG_EVER_HAD_WEAK_REF. Most objects are not involved with weak
references and won't have this flag set.
If we reach _object_unref_clear_weak_locations() we just (atomically)
checked that the ref count is one. If the object at this point never had
a GWeakRef registered, we know that nobody else could have raced against
obtaining another reference. In this case, we can skip taking the lock
and checking for weak locations.
As most object don't ever have a GWeakRef registered, this significantly
avoids unnecessary work during _object_unref_clear_weak_locations().
This even fixes a hard to hit race in the do_unref=FALSE case.
Previously, if do_unref=FALSE there were code paths where we avoided
taking the global lock. We do so, when quark_weak_locations is unset.
However, that is not race free. If we enter
_object_unref_clear_weak_locations() with a ref-count of 1 and one
GWeakRef registered, another thread can take a strong reference and
unset the GWeakRef. Then quark_weak_locations will be unset, and
_object_unref_clear_weak_locations() misses the fact that the ref count
is now bumped to two. That is now fixed, because once
OPTIONAL_FLAG_EVER_HAD_WEAK_REF is set, it will stick.
Previously, there was an optimization to first take a read lock to check
whether there are weak locations to clear. It's not clear that this is
worth it, because we now already have a hint that there might be a weak
location. Unfortunately, GRWLock does not support an upgradable lock, so
we cannot take an (upgradable) read lock, and when necessary upgrade
that to a write lock.
It's not clear what this code comment tries to tell us. Yes, when we
make changes, we must take care that the changes are correct and update
the relevant places.
It seems long obsolete. Drop it.
This partly reverts commit d7dd9aefd840 ('placed a comment about not
changing CArray until we have').
In general, we must not call out to external, unknown code while holding
a lock. That is prone to dead lock.
g_object_ref() can emit a toggle notification. In g_weak_ref_set(), we
must not do that while holding the lock.
The optional flags should be used for bit locks. That means,
we must only use atomic operations when updating the flags.
Having a variant of _X methods that update the flags without
locks (_X), means that we must take care not to take bit locks
during construction.
That is hard to get right. There is so much happening during object
construction, that it's unclear when it's really safe to access the
flags without atomic. Don't do this.
Add a GObjectPrivate struct and let GObject have private data.
On architectures where we have an alignment gap in GObject struct (64
bit), we use the gap for "optional_flags". Use the private data for
those optional flags, on architectures where we don't have them.
For now, private data is only added for those optional flags (and not on
architectures, where the flags fit inside GObject). In the future, we
may add additional fields there, and add the private struct always.
The main purpose will be to replace all the global locks with per-object
locks, and make "optional_flags" also available on 32bit.