The pattern here is that we acquire strong references via GWeakRef.
So you might think that g_object_weak_unref() is safe to call.
However, it is not safe, in case where another thread might run
g_object_run_dispose(). Which is clear, because GWeakRef only ensures we
hold a strong reference, but g_object_run_dispose() is anyway run on an
object where we already hold a reference.
In weak_unbind(), we obtain strong references, but (after) that
point, another thead might call g_object_run_dispose(). Then,
inside unbind_internal_locked() we do:
g_object_weak_unref (source, weak_unbind, context);
binding_context_unref (context);
Note that here weak_unbind might have already be unrefed, and
g_object_weak_unref() fails an assertion. But worse, the
weak_unbind callback will also be called and we issue two
binding_context_unref() and crash.
This is fixed by using g_object_weak_ref_full() (which handles the case
that the weak notification might have already be emitted) and a separate
GDestroyNotify (that is guaranteed to run exactly once).
This still doesn't make it fully work. Note that we also call
g_signal_handler_disconnect (source, binding->source_notify);
this has exactly the same problem. A concurrent g_object_run_dispose()
will already disconnect all signal handlers, and calling disconnect
fails an assertion. I think the solution for that is a new API
g_signal_handler_try_disconnect(), which does not assert. After all, the
gulong signal ID is unique (the gulong is large enough to never wrap and
there is even a g_error() check against that).
The weak notification APIs g_object_weak_ref() and g_object_weak_unref()
are not thread-safe. This patch adds thread-safe alternatives:
g_object_weak_ref_full() and g_object_weak_unref_full().
The problem arises when other threads call g_object_run_dispose() or
g_object_unref(), making g_object_weak_unref() unsafe. The caller cannot
know whether the weak notification was successfully removed or might
still be invoked.
For example, g_object_weak_unref() will assert if no matching
notification is found. This is inherrently racy. Beyond that problem,
weak notifications often involve user data that must be freed -- either
by the callback or after g_object_weak_unref(). Since you can't know
which path executed, this can lead to races and double-free errors.
The new g_object_weak_unref_full() returns a boolean to indicate whether
the notification was removed or will still be invoked, allowing safe
cleanup. This return value and acting upon it is the main solution for
thread-safety.
Note that g_object_unref() only starts disposing after ensuring there
are no more GWeakRefs and only the single caller's strong reference
remains. So you might think that no other thread could acquire a strong
reference and race by calling g_object_weak_unref(). While this makes
such a race less likely, it is not eliminated. If there are multiple
weak notifications or closures, one can pass a reference to another
thread that calls g_object_weak_unref() and enables the race. Also, with
g_object_run_dispose(), there is nothing preventing another thread from
racing against g_object_weak_unref().
g_object_weak_ref_full() and g_object_weak_unref_full() also support a
`synchronize=TRUE` flag. This ensures the callback runs while holding a
per-callback mutex, allowing g_object_weak_unref_full() to wait until
the callback has either already run or will never run.
Calling user callbacks while holding a lock can risk deadlocks, but the
risk is limited because the lock is specific to that notification.
Finally, GDestroyNotify callbacks are supported. While mostly a
convenience, they are also invoked outside the lock, which enables more
complex cleanup without the risk of deadlock.
Contrary to common wisdom, combining weak notifications with GWeakRef
does not solve this problem. Also, it forces to acquire strong
references, which emits toggle notifications. When carefully using
g_object_weak_ref_full(), the caller of g_object_weak_unref_full()
can safely use a pointer to the object, without ever increasing
the reference count. A unit test shows how that is done.
This improves correctness and safety for weak references in
multithreaded contexts.
It feels ugly to leave the buffer not sized right.
We call g_object_weak_release_all() during g_object_real_dispose() and
right before finalize. In most cases, we expect that the loop iterates
until there are no weak notifications left (in which case the entire
WeakRefStack is freed). In that case, there is no need to shrink the
buffer, because it's going to be released soon anyway.
Note that no new weak references can be registered after finalize (as
the ref count already dropped to zero). However, new weak referenes can
be registered during dispose (either during the last g_object_unref() or
during g_object_run_dispose()).
In that case, I feel it is nice to bring the buffer size right again. We
don't know how long the object will continue to live afterwards, so
let's trim the extra allocation.
Refactor the function to separate the search and removal logic. Instead
of nesting the removal inside the loop, first search for the matching
entry. If none is found, return early. Otherwise, goto the removal
logic.
This reduces indentation, emphasizes the main path, and improves
readability and maintainability. The change uses the often unfairly
maligned goto for clarity.
It’s possible for the server communications to finish one main context
iteration before all of the client communications, depending on how the
kernel queues socket connection messages.
Fixes CI failure: https://gitlab.gnome.org/GNOME/glib/-/jobs/5341950
```
GLib-GIO:ERROR:../gio/tests/socket-listener.c:639:test_accept_multi_simultaneously: 'clients[i].result' should not be NULL
not ok /socket-listener/accept/multi-simultaneously - GLib-GIO:ERROR:../gio/tests/socket-listener.c:639:test_accept_multi_simultaneously: 'clients[i].result' should not be NULL
```
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
As the new comments in the code try to explain, this fixes infinite
blocking which could happen when calling
`g_socket_listener_accept_async()` multiple times in parallel, with more
parallel calls than there are pending incoming connections on any of the
`GSocket`s in the `GSocketListener`.
The way `g_socket_listener_accept_async()` works is to create a set of
`GSocketSource`s when it’s called, one for each of the `GSocket`s in the
`GSocketListener`. Those sources are attached to the main context,
polling for `G_IO_IN` (indicating that the socket has a pending incoming
connection to accept).
When one of the socket sources polls ready, `g_socket_accept()` is
called on it, and a new connection is created.
If there are multiple pending `g_socket_listener_accept_async()` calls,
there are correspondingly multiple `GSocketSource` sources for each
`GSocket` in the `GSocketListener`. They will all poll ready in a single
`GMainContext` iteration. The first one to be dispatched will
successfully call `g_socket_accept()`, and subsequent ones to dispatch
will do likewise until there are no more pending incoming connections.
At that point, any remaining socket sources polling ready in that
`GMainContext` iteration will call `g_socket_accept()` on a socket which
is *not* ready to accept, and that will block indefinitely, because
`GSocket` has its own blocking layer on top of `poll()`.
This is not great.
It seems like a better approach would be to disable `GSocket`’s blocking
code, because `GSocketListener` is using `poll()` directly. We only need
one source of poll truth. So, do that.
Unfortunately, that’s complicated by the fact that
`g_socket_listener_add_socket()` allows third party code to provide its
own `GSocket`s to listen on. We probably can’t unilaterally change those
to non-blocking mode, so users of that API will get what they ask for.
That might include blocking indefinitely. I’ve adjusted the
documentation to mention that, at least.
The changes are fairly simple; the accompanying unit test is less
simple. Shrug. It tests for the scenario fixed by this commit, plus the
scenario fixed by the previous commit.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Fixes: #3739
The changes in commit 30ccfac9cf were not quite correct. The code is
structured so that a single reference to a `GTask` (and hence its
`AcceptSocketAsyncData` task data) is shared across the multiple
`GSocketSource`s which are created for a pending `accept_async()` call.
Setting `returned_yet` to true to short-circuit the remaining
`accept_ready()` dispatches in a given `GMainContext` iteration would
have worked, were it not for the fact that the code then immediately
dropped the last reference it had to the `GTask`, potentially freeing
the structure which contained `returned_yet`. Because of the async
nature of `GTask`, the exact timing of finalisation could vary.
This also meant that the other `GSocketSource`s were not destroyed until
an unknown time later.
Improve on that by explicitly destroying the other `GSocketSource`s as
soon as the first one returns an accepted socket. This causes
`GMainContext` to skip dispatching them, even within the same
`GMainContext` iteration. It also means the separate `returned_yet`
member is unnecessary.
This should fix the immediate issue seen in #3739. However, while
testing it I found a further issue which will be fixed in a following
commit, before I add a unit test.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3739
As with the previous commit, but for `g_socket_connect()`, which is the
other cancellable use of `g_socket_condition_wait()` in the file.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Currently, if `g_socket_accept()` is called with a cancelled cancellable
and the socket is in non-blocking mode, `G_IO_ERROR_CANCELLED` is not
returned, because the cancellable is only checked in the call to
`g_socket_condition_wait()`, which only happens in blocking mode.
Fix that and add a unit test.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3739
`GMemoryMonitor` is a singleton, which means we can’t use the usual
approach of emitting signals in the thread-default main context from the
time of construction of the object.
The next best thing is to emit them in the global default main context.
For many applications, this will be exactly what they are expecting. For
multi-threaded applications, they will need to implement their own
thread safety in the signal handler, but they would have to do that
anyway.
Currently, the signals are emitted in the GLib worker thread (for the
PSI and poll implementations of `GMemoryMonitor`) — this is the worst
option, because it means that third party signal handlers could block
the worker thread (which is precisely what the worker thread is meant to
avoid).
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
The memory free ratio test can be skipped when we run the test. If
proc_override is TRUE and proc path is overridden, the memory free
ratio test can be skipped.
The value overriding has to be set before testing the value and returning
the callback. Otherwise, the callback can't emit a signal to the test
program.
It might be possible for the `low-memory-warning` signal to be emitted
(by the GLib worker thread) before the test has connected to it, which
could cause the tests to loop forever.
Potentially fixes
https://gitlab.gnome.org/pwithnall/glib/-/jobs/5322491, though I have
not been able to reproduce the race locally.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
If the first `goto out` is taken, `file` has not yet been initialised,
but `g_clear_object (&file)` is called on it, and things get unhappy.
Fix that by following standard convention and initialising
‘autoptr’-like variables at declaration time.
Spotted by scan-build in
https://gitlab.gnome.org/GNOME/glib/-/jobs/5321883.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
The `mainloop` test suite takes about 775s on my machine under valgrind
with this test enabled, vs 50s without this test enabled.
This causes CI failures like
https://gitlab.gnome.org/GNOME/glib/-/jobs/5321882.
I’m not sure that valgrind will actually successfully reproduce the race
condition because it runs too slowly (but I haven’t verified that by
reverting the fix for the race).
In any case, you can still choose to run the test under valgrind with
`-m thorough`.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
This backend periodically watch the memory free ratio through sysinfo().
It signals the applications when the memory free ratio drops to 40%, 30%,
and 20% for LOW, MEDIUM, CRITICAL status, respectively.
The PSI backend is based on Kernel PSI [1]. It monitors the memory
allocation time with in a given time window. If the allocation time
is more than the given threshold, Kernel signal the application to
deliver the memory pressure events.
The current thresholds are:
LOW: 70ms out of 2 seconds for partial stall
MEDIUM: 100ms out of 2 seconds for partial stall
CRITICAL: 100ms out of 2 seconds for full stall
[1] https://docs.kernel.org/accounting/psi.html
This class provides the shared functions, such as sending a signal and
string and value conversion. The backend classes should inherit this
class to get the shared functions.
It adds a configure time check for `sysinfo()`, as some systems don’t
have it.
This will catch regressions like
fc030b2b64 if they happen again in future,
by testing that fallback argument parsing code path in
`g_application_run()`.
Heavily based on the PyGObject `test_local_and_remote_command_line` unit
test at
578a55982a/tests/test_gio.py (L289).
Thanks to Arjan Molenaar for investigating the failure and writing it
up in !4703.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Do an extra check if the options argument is NULL,
This will avoid unnessecary (critical warning).
`g_application_run` calls the code with options == NULL.
The array buffer is of size BUFSIZE. The if-check correctly avoids
writing characters into the buffer, but the ending newline may still
overflow buffer. Keep space for the EOL character.