Add G_MAIN_CONTEXT_FLAGS_OWNERLESS_POLLING

It fixes a race. g_source_attach() had the following check to ensure a
loop blocked on poll() would wakeup.

  if (do_wakeup && context->owner && context->owner != G_THREAD_SELF)
    g_wakeup_signal (context->wakeup);

However it doesn't contemplate an implementation where poll()ing is a
non-blocking operation that will be scheduled while the thread is
released to perform other tasks. This scenario opens up several
different possibilities where the condition would fail to hold true. I
experienced two of such races.

The first race pertains to a mono-threaded application. Do keep in mind
that integrating GLib to a foreign event loop will make GLib act as a
slave in the new event loop. When you post a new work unit to execute in
the thread managed by the foreign event loop, you don't use
g_main_context_invoke(). In fact the only reason to integrate
GMainContext in a foreign event loop is to make the two of them
communicate. So from time to time, the foreign event loop will execute
callbacks that manipulate the GMainContext loop. An illustration
follows.

  // in this callback we translate an event from the foreign event loop
  // to an event in the GMainContext event loop (that runs in the same
  // thread)
  static void my_event_loop_callback(void* data)
  {
    GMainContext* ctx = /* ... */;
    // ...
    g_source_attach(source, ctx);
  }

  int main()
  {
    // ...
    my_event_loop_invoke(my_event_loop_callback, data);
    // ...

    // this function has all mechanisms in place to run the foreign
    // event loop and the hooks to call
    // g_main_context_{prepare,query,check,dispatch}
    my_event_loop_run();
  }

In this case, you would have the following series of calls:

1. g_main_context_prepare()
2. g_main_context_query()
3. A callback to my_event_loop is registered when any fd on the set is
   ready or the timeout is reached.
4. The thread is released to perform other tasks.
5. One of the tasks executed wishes to communicate with my_event_loop
   and enters my_event_loop_callback.
6. g_source_attach() is called.
7. g_source_attach() detects do_wakeup=TRUE, context->owner != NULL, and
   context->owner == G_THREAD_SELF so g_wakeup_signal() is skipped.
8. None of the fds on the GLib poll() set becomes ready nor the GLib
   timeout expires. The my_event_loop callback that would call
   g_main_context_check() is never executed. Deadlock.

A shallow analysis will fail to detect the race here. The explanation
seems to showcase a scenario that will deterministically fail with a
deadlock every time. However do keep in mind that my_event_loop_callback
could be invoked before or after g_main_context_prepare(). There is an
_event_ race here. Furthermore, some GLib libraries such as GDBus will
initialize objects from extra threads (GAsyncInitable interface) and
invoke the result on the original thread when ready (g_source_attach()
will eventually be called). Now you have scenarios closer to standard
race examples.

The other scenario where a race would manifest happens in a
multi-threaded application that has a concurrency design similar to the
actor model. No actor executes in two threads simultaneously, but it's
not guaranteed that it'll always wake-up in the same thread. It'd
perform steps 1-4 just as in the previous example, but before thread
control is returned to the pool, it'd call g_main_context_release(). Now
g_source_attach() would skip g_wakeup_signal() for a different reason:

7. g_source_attach() detects do_wakeup=TRUE, context->owner == NULL so
   g_wakeup_signal() is skipped.
8. Same as before.

Certainly there are other concurrency designs where this optimization
would cause a deadlock, but all of them have origin in the same place:
the optimization assumes the poll() implementation is a blocking
operation and the thread will never be released to perform other tasks
(possibly involving GLib calls) while result is not ready. They share
not only the same problem, but also the same solution: do not make
assumptions and just call g_wakeup_signal().

This patch implements this solution by introducing the
G_MAIN_CONTEXT_FLAGS_OWNERLESS_POLLING flag. This flag will force a call
to g_wakeup_signal() and fix the race on foreign event loops. The reason
to prevent changing this option after creation is to avoid other races
that would lead to event loss. Construction is the only proper time to
set this option.

The implementation design means we do not change **any** semantics for
current working code. If you don't set the new flag, the code won't
enter in different branches and current behavior won't be affected. The
patch is small and easy to follow too.
This commit is contained in:
Vinícius dos Santos Oliveira 2021-04-12 20:03:32 -03:00 committed by Philip Withnall
parent 7ba86be594
commit e26a8a5981
3 changed files with 72 additions and 2 deletions

View File

@ -177,6 +177,15 @@
* g_main_context_prepare(), g_main_context_query(),
* g_main_context_check() and g_main_context_dispatch().
*
* If the event loop thread releases #GMainContext ownership until the results
* required by g_main_context_check() are ready you must create a context with
* the flag %G_MAIN_CONTEXT_FLAGS_OWNERLESS_POLLING or else you'll lose
* g_source_attach() notifications. This happens for instance when you integrate
* the GLib event loop into implementations that follow the proactor pattern
* (i.e. in these contexts the `poll()` implementation will reclaim the thread for
* other tasks until the results are ready). One example of the proactor pattern
* is the Boost.Asio library.
*
* ## State of a Main Context # {#mainloop-states}
*
* The operation of these functions can best be seen in terms
@ -1267,8 +1276,12 @@ g_source_attach_unlocked (GSource *source,
/* If another thread has acquired the context, wake it up since it
* might be in poll() right now.
*/
if (do_wakeup && context->owner && context->owner != G_THREAD_SELF)
g_wakeup_signal (context->wakeup);
if (do_wakeup &&
(context->flags & G_MAIN_CONTEXT_FLAGS_OWNERLESS_POLLING ||
(context->owner && context->owner != G_THREAD_SELF)))
{
g_wakeup_signal (context->wakeup);
}
g_trace_mark (G_TRACE_CURRENT_TIME, 0,
"GLib", "g_source_attach",

View File

@ -41,6 +41,10 @@ typedef enum /*< flags >*/
/**
* GMainContextFlags:
* @G_MAIN_CONTEXT_FLAGS_NONE: Default behaviour.
* @G_MAIN_CONTEXT_FLAGS_OWNERLESS_POLLING: Assume that polling for events will
* free the thread to process other jobs. That's useful if you're using
* `g_main_context_{prepare,query,check,dispatch}` to integrate GMainContext in
* other event loops.
*
* Flags to pass to g_main_context_new_with_flags() which affect the behaviour
* of a #GMainContext.
@ -51,6 +55,7 @@ GLIB_AVAILABLE_TYPE_IN_2_72
typedef enum /*< flags >*/
{
G_MAIN_CONTEXT_FLAGS_NONE = 0,
G_MAIN_CONTEXT_FLAGS_OWNERLESS_POLLING = 1
} GMainContextFlags;

View File

@ -158,6 +158,56 @@ test_mainloop_basic (void)
g_main_loop_unref (loop);
}
static void
test_ownerless_polling (gconstpointer test_data)
{
gboolean attach_first = GPOINTER_TO_INT (test_data);
GMainContext *ctx = g_main_context_new_with_flags (
G_MAIN_CONTEXT_FLAGS_OWNERLESS_POLLING);
GPollFD fds[20];
gint fds_size;
gint max_priority;
GSource *source = NULL;
g_assert_true (ctx != g_main_context_default ());
g_main_context_push_thread_default (ctx);
/* Drain events */
for (;;)
{
gboolean ready_to_dispatch = g_main_context_prepare (ctx, &max_priority);
gint timeout, nready;
fds_size = g_main_context_query (ctx, max_priority, &timeout, fds, G_N_ELEMENTS (fds));
nready = g_poll (fds, fds_size, /*timeout=*/0);
if (!ready_to_dispatch && nready == 0)
{
if (timeout == -1)
break;
else
g_usleep (timeout * 1000);
}
ready_to_dispatch = g_main_context_check (ctx, max_priority, fds, fds_size);
if (ready_to_dispatch)
g_main_context_dispatch (ctx);
}
if (!attach_first)
g_main_context_pop_thread_default (ctx);
source = g_idle_source_new ();
g_source_attach (source, ctx);
g_source_unref (source);
if (attach_first)
g_main_context_pop_thread_default (ctx);
g_assert_cmpint (g_poll (fds, fds_size, 0), >, 0);
g_main_context_unref (ctx);
}
static gint a;
static gint b;
static gint c;
@ -2145,6 +2195,8 @@ main (int argc, char *argv[])
#endif
g_test_add_func ("/mainloop/nfds", test_nfds);
g_test_add_func ("/mainloop/steal-fd", test_steal_fd);
g_test_add_data_func ("/mainloop/ownerless-polling/attach-first", GINT_TO_POINTER (TRUE), test_ownerless_polling);
g_test_add_data_func ("/mainloop/ownerless-polling/pop-first", GINT_TO_POINTER (FALSE), test_ownerless_polling);
return g_test_run ();
}