gthread: Use C11-style memory consistency to speed up g_once()

The g_once() function exists to call a callback function exactly once, and to block multiple contending threads on its completion, then to return its return value to all of them (so they all see the same value). The full implementation of g_once() (in g_once_impl()) uses a mutex and condition variable to achieve this, and is needed in the contended case, where multiple threads need to be blocked on completion of the callback. However, most of the times that g_once() is called, the callback will already have been called, and it just needs to establish that it has been called and to return the stored return value. Previously, a fast path was used if we knew that memory barriers were not needed on the current architecture to safely access two dependent global variables in the presence of multi-threaded access. This is true of all sequentially consistent architectures. Checking whether we could use this fast path (if `G_ATOMIC_OP_MEMORY_BARRIER_NEEDED` was *not* defined) was a bit of a pain, though, as it required GLib to know the memory consistency model of every architecture. This kind of knowledge is traditionally a compiler’s domain. So, simplify the fast path by using the compiler-provided atomic intrinsics, and acquire-release memory consistency semantics, if they are available. If they’re not available, fall back to always locking as before. We definitely need to use `__ATOMIC_ACQUIRE` in the macro implementation of g_once(). We don’t actually need to make the `__ATOMIC_RELEASE` changes in `gthread.c` though, since locking and unlocking a mutex guarantees to insert a full compiler and hardware memory barrier (enforcing sequential consistency). So the `__ATOMIC_RELEASE` changes are only in there to make it obvious what stores are logically meant to match up with the `__ATOMIC_ACQUIRE` loads in `gthread.h`. Notably, only the second store (and the first load) has to be atomic. i.e. When storing `once->retval` and `once->status`, the first store is normal and the second is atomic. This is because the writes have a happens-before relationship, and all (atomic or non-atomic) writes which happen-before an atomic store/release are visible in the thread doing an atomic load/acquire on the same atomic variable, once that load is complete. References: * https://preshing.com/20120913/acquire-and-release-semantics/ * https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/_005f_005fatomic-Builtins.html * https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync * https://en.cppreference.com/w/cpp/atomic/memory_order#Release-Acquire_ordering Signed-off-by: Philip Withnall <withnall@endlessm.com> Fixes: #1323
2025-03-26 17:40:05 +01:00 · 2020-02-13 16:31:24 +00:00 · 2020-02-13 16:31:24 +00:00 · e52fb6b1d3
commit e52fb6b1d3
parent bfd8f8cbaf
2 changed files with 27 additions and 6 deletions
--- a/glib/gthread.c
+++ b/glib/gthread.c
@ -630,13 +630,25 @@ g_once_impl (GOnce       *once,

  if (once->status != G_ONCE_STATUS_READY)
    {
+      gpointer retval;
+
      once->status = G_ONCE_STATUS_PROGRESS;
      g_mutex_unlock (&g_once_mutex);

-      once->retval = func (arg);
+      retval = func (arg);

      g_mutex_lock (&g_once_mutex);
+/* We prefer the new C11-style atomic extension of GCC if available. If not,
+ * fall back to always locking. */
+#if defined(G_ATOMIC_LOCK_FREE) && defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4) && defined(__ATOMIC_SEQ_CST)
+      /* Only the second store needs to be atomic, as the two writes are related
+       * by a happens-before relationship here. */
+      once->retval = retval;
+      __atomic_store_n (&once->status, G_ONCE_STATUS_READY, __ATOMIC_RELEASE);
+#else
+      once->retval = retval;
      once->status = G_ONCE_STATUS_READY;
+#endif
      g_cond_broadcast (&g_once_cond);
    }

--- a/glib/gthread.h
+++ b/glib/gthread.h
@ -234,14 +234,23 @@ GLIB_AVAILABLE_IN_ALL
 void            g_once_init_leave               (volatile void  *location,
                                                 gsize           result);

-#ifdef G_ATOMIC_OP_MEMORY_BARRIER_NEEDED
-# define g_once(once, func, arg) g_once_impl ((once), (func), (arg))
-#else /* !G_ATOMIC_OP_MEMORY_BARRIER_NEEDED*/
+/* Use C11-style atomic extensions to check the fast path for status=ready. If
+ * they are not available, fall back to using a mutex and condition variable in
+ * g_once_impl().
+ *
+ * On the C11-style codepath, only the load of once->status needs to be atomic,
+ * as the writes to it and once->retval in g_once_impl() are related by a
+ * happens-before relation. Release-acquire semantics are defined such that any
+ * atomic/non-atomic write which happens-before a store/release is guaranteed to
+ * be seen by the load/acquire of the same atomic variable. */
+#if defined(G_ATOMIC_LOCK_FREE) && defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4) && defined(__ATOMIC_SEQ_CST)
 # define g_once(once, func, arg) \
-  (((once)->status == G_ONCE_STATUS_READY) ? \
+  ((__atomic_load_n (&(once)->status, __ATOMIC_ACQUIRE) == G_ONCE_STATUS_READY) ? \
   (once)->retval : \
   g_once_impl ((once), (func), (arg)))
-#endif /* G_ATOMIC_OP_MEMORY_BARRIER_NEEDED */
+#else
+# define g_once(once, func, arg) g_once_impl ((once), (func), (arg))
+#endif

 #ifdef __GNUC__
 # define g_once_init_enter(location) \