Previously, all GVariants would allocate a GBytes for the buffered
contents. This presents a challenge for small GVariant type created
during the building process of GVariantBuilder as that results in an
allocation for the GVariant, GBytes, and the byte buffer.
Recent changes for GBytes may reduce those 3 allocations to 2, but even
that is quite substantial overhead for a 32-bit integer.
This changeset switches GVariant to use g_new/g_free allocators instead
of g_slice_new/free. When benchmarked alone, this presented no
measurable difference in overhead with the standard glibc allocator.
With that change in place, allocations may then become variable in size
to contain small allocations at the end of the GVariant reducing things
to a single allocation (plus the GVariantTypeInfo reference).
The size of GVariant is already 1 cacheline @ 64-bytes on x86_64. This
uses that to guarantee our alignment of data maintains the 8-bytes
guarantee of GVariant, and also extends it to match malloc().
On 32-bit systems, we are similarly aligned but reduce the amount we
will inline to 32 bytes so we have a total of 1 cacheline.
This is all asserted at compile-time to enforce the guarantee.
In the end, this changeset reduces the wallclock time of building many
GVariant in a loop using GVariantBuilder by 10% beyond the 10% already
gained through GBytes doing the same thing.
All uses of g_variant_builder_init() in gio are safe to translate to the
new g_variant_builder_init_static() alternative as the type will outlive
the call to g_variant_builder_end() (or is already static in nature).
Make the generated GDBus-based code use GVariantBuilder with a static
GVariantType to avoid copying the GVariantType on each use.
This is gated behind a GLib 2.83.0 check.
This adds another form of stack building which allows avoiding the rather
expensive GVariantType malloc/memcpy/free. In a tight loop this reduced
wallclock time by about 4-5% for cases where you do not need to further
open using g_variant_builder_open() which still require a copy at this
time.
New API is provided instead of modifying g_variant_type_init() because
previously it was possible (though misguided) to use g_variant_type_init()
which a dynamically allocated GVariantType which could be freed before
g_variant_builder_end() or g_variant_builder_clear() was called.
# Conflicts:
# glib/gvariant.c
When trying to locate a GVariantTypeInfo from the cache we were copying
the string so that we can use g_str_hash, as that requires a \0 terminated
string.
However, we have hash and equal functions which can be used without the
extra copy. Additionally, these were moved to headers in previous commits
so they can be used without having to re-check GVariantType we already
know to be valid.
Use the [provide] section to override the binary name, and track the
main development branch, like GTK does, so we get warnings and a
consistent output.
It's better to warn by default on MSVC (which we were already doing before
we bumped to 1.4.0) than to fail by default on macOS.
Fixes: 51e3e7d9ae ("build: Bump Meson dependency to 1.4.0")
As with the previous commit, this is _always_ defined in `gmacros.h`
and therefore the `#ifndef` will always be 0 even if disabled.
Just use `#if` instead.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Right now we create a bunch of GBytes which then get their reference count
incremented and immediately decremented. This causes quite a bit of
disruption for cacheline re-use.
Instead, this change creates an internal helper to transfer ownership of
GBytes to the new GVariant directly.
Surprisingly, this reduced wallclock time by about 6% for a contrived
benchmark of building "as" variant with GVariantBuilder.
When dealing with small allocations it can save considerable cycles to
do a single allocation for both the GBytes and the data by tacking it
onto the end of the GBytes.
Care is taken to preserve the glibc expectation of 2*sizeof(void*)
alignment of allocations at the expense of some padding bytes.
The degenerate case here is when you want to steal the bytes afterwards
but that amounts to the same overhead as the status-quo.
Where this can help considerably is in GVariant building such as
g_variant_new_int32() which allocates for the GVariant, the GBytes, and
the int32 within the GBytes.
In a simple benchmark of using GVariantBuilder to create "(ii)" variants
this saved about 10% in wallclock time.
This follows the usage in the glib codebase and recommendations at
main/docs/toolchain-requirements.md
It also fixes a build warning with ancient gcc 4.8:
../girepository/gitypelib-internal.h:202:1: warning:
no previous prototype for ‘_blob_is_registered_type’ [-Wmissing-prototypes]
In almost all cases, the variant is sunk shortly after creation and
certainly before other threads get a view of it.
This skips the bitlock for STATE_FLOATING, which is also done in
g_variant_take_ref(). We can do this because STATE_FLOATING may only
be set up-front, never after being cleared.
That allows us to do a relaxed read of the value once (which if it is
zero means no additional atomics beyond our ref count increment) as well
as a single atomic if we do in fact steal the floating reference.
Without any other GVariant performance changes, this is in the 2% range
of benchmarking a tight loop using GVariantBuilder. However, after the
rest of them are applied, the percentage is greater due to reduced
runtime overhead and lands in the 4.5% range.
If you have a definite-tuple type such as (iiii) then the number of
children that are allocated will match the offset when a GVariantBuilder
has completed.
That means we can avoid an expensive call into the allocator which is
normally done to shrink memory use by releasing it back to the allocator.
This saves about 5% of wallclock time when building such variants in a
tight loop.
This adds a fastpath for the extremely common case of checking if a
GVariant type is a subtype of itself _and_ a definite basic type. For
example, checking 'i' against 'i' or 's' against 's'.
In a loop using GVariantBuilder this can cut the cost of this function
alone in half on profiles from 3.3% of samples to 1.7% of samples.
We can get the length of the string if we provide an out argument to
g_utf8_validate(). This avoids an extra strlen() when creating GVariant
for UTF-8 strings.
This is good for nearly 7% reduction of CPU samples when building
heavily string-based GVariant using GVariantBuilder as a benchmark.
Any extremely common use-case of valid_format_string() is validation
when using GVariantBuilder. The optimal-case there is that there is
no programming error and thus the fast path should match.
This creates a fast path for that case without substantial change to the
GVariant type-checking case by checking for a non-NULL GVariant. It then
proceeds to hoist the actual scan directly without type allocation.
Locally this single change reduces wallclock time in a single-threaded
benchmark using GVariantBuilder by 17%.
Previously, g_variant_type_equal() would walk the strings multiple times.
In debug builds, you initially have the type checks for validity. But
also you walk the string to determine its length only to memcmp it after.
Instead, this does the comparison while walking the string for length.
Previously, G_VARIANT_TYPE() would "cast check" the type string unless
G_DISABLE_CHECKS was set. We very much don't want people to ever set
G_DISABLE_CHECKS but it is extremely normal to turn of cast-checks using
G_DISASABLE_CAST_CHECKS in stable releases.
This allows disabling the expensive type checking without catastrophically
disabling g_return_if_fail() macros.