These two data structures are allocated once and live for the lifetime
of the process, and are leaked on exit. That’s fine, and intentional.
Add `g_ignore_leak()` to them to make that a bit clearer, and
communicate the intent to asan.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3472
The goal of this change is to avoid re-parsing `GVariantTypeInfo` on
every creation or parsing of a `GVariant` byte-buffer. Parsing presents
a non-trivial amount of overhead which can typically be elided.
It was discovered that many applications and tooling are re-generating
this information upon receiving a D-Bus message as they tend to process
messages serially, thus dropping the last reference count.
Previously, when the last reference count for a `GVariantTypeInfo` was
dropped we would finalize the parsed type information.
This change keeps `GVariantTypeInfo` alive in a Garbage Collected array.
The array is collected upon reaching a 32 entries. The number 32 was
chosen because it is larger than what I've seen active on various D-Bus
based applications-or-daemons.
Take a simple test case of using `GVariantBuilder` in a loop with a
debugoptimized build of GLib. A reduction in wallclock time can be
observed in the 35% to more than 70% based on the complexity of the
GVariant being created.
For cases like ibus-daemon, it was previously parsing `GVariantTypeInfo`
up to dozens of times per key-press/release cycle.
Closes: #3472
This arm of the condition is always true, because 0x00 has been checked
in the previous branch.
This is not going to improve performance, but does mean we now have full
branch coverage of the code via our unit tests, which gives some
assurance that it’s all good.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3481
The move to c-utf8 for validation has exposed a few new branches where
our existing (fairly comprehensive) UTF-8 validation test suite didn’t
check things.
Add unit tests for those branches, so we keep code coverage.
I’ve validated (with an independent UTF-8 decoder) that the test vectors
are correctly marked as valid/invalid in the test data (so the tests
aren’t just blindly coded to match the behaviour of the new validator
code).
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3481
It turns out it’s not actually been explicitly tested before, even
though it has full code coverage through being called by other code
which is tested.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
This moves g_str_is_ascii() from gstrfuncs.c to gutf8.c so that we can
reuse the same SIMD code for ASCII validation.
On Apple Silicon:
Before: 3297 MB/s
After: 26146 MB/s
This is based on the https://github.com/c-util/c-utf8 project and has
been adapted for portability and integration into GLib. c-utf8 is dual
licensed Apache-2.0 and LGPLv2.1+, the latter matching GLib.
Notably, `case 0x01 ... 0x7F:` style switch/case labels have been
converted to if/else which is more portable to non-GCC/Clang platforms
while generating the same assembly, at least on x86_64 with GCC.
Additionally, `__attribute__((aligned(n)))` is used in favor of
`__builtin_assume_aligned(n)` because it is more portable to MSVC's
`__declspec(align(n))` and also generates the same assembly as GCC's
`__builtin_assume_aligned(n)`.
For GCC x86_64 Linux on a Xeon 4214 this improved the throughput of
g_utf8_validate() for ASCII from 750MB/s to around 10,000MB/s (13x).
On GCC aarch64 Linux with an Apple Silicon M2 Pro we go from about
2,200 MB/s to 26,700 MB/s (12x).
Closes: #3481
It wasn’t previously clear (to me) whether `end` was returned pointing
to the nul terminator, or to the byte immediately preceding it. Try and
clarify that.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Nobody’s going to keep it up to date if it’s floating about in an
unrelated place, not next to the code.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Nobody’s going to keep it up to date if it’s floating about in an
unrelated place, not next to the code.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Just like we have `g_steal_pointer()` and `g_clear_pointer()`, it would
be useful to have a ‘steal’ version of `g_clear_handle_id()`.
Particularly in situations where a clear function can’t be represented
as a `GClearHandleFunc`, such as
`g_dbus_connection_signal_unsubscribe()` (there’s no way of passing the
`GDBusConnection` to it) — or in situations where a handle ID isn’t
being released, but is being passed from one struct to another or from a
local scope to a struct or vice-versa.
Includes unit tests.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
And to not assume that every main context iteration will provide
progress on the sources that the test is interested in. It’s possible
that other sources may be attached to the `GMainContext` which get
dispatched instead of the pipe sources on an iteration.
I don’t know if this will fix#3483, but it will certainly make this
test a little tidier. It doesn’t affect test run time.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Fixes: #3483
Otherwise `scan-build` thinks it’s possible for the `GBytes` to be
double-freed, which would indeed happen if `try_steal_and_unref()` were
to return `NULL` on this branch.
It’s not actually possible for it to return `NULL` here though, as if
`bytes->data` were `NULL`, the function would have already returned
higher up.
Fixes this `scan-build` failure:
https://gitlab.gnome.org/GNOME/glib/-/jobs/4359929
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
This fixes commit 9eb9df2396, which I really should have noticed at the
time would cause a load of unused variables to be declared if compiled
with `G_DISABLE_ASSERT`. That’s what the original `#ifdef`s were to
protect against.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
We want to keep the suffix aligned to 8 bytes on 32-bit too. This makes
sure we do that in a way that is portable across our supported compilers.
Fixes: #3486
Current implementation replaces find in string with replace
one-at-a-time and replacement is done in O(n^2) time. The new
implementation is O(n). Memory allocation is done once instead of
incrementally.
In the old implementation, every replacement will move the whole
rest of the string from the current position to make space for
the replace string and then insert the replace string.
In the new implementation, when the replace string is shorter or equal
to the find string, it will move the characters in-place and requires no
additional memory. When the replace string is longer, find the required
size needed for the new string and preallocate a new string to copy the
original string to with its replacements. When the replace string is an
empty string use the old implementation to handle the special case.
Previously, all GVariants would allocate a GBytes for the buffered
contents. This presents a challenge for small GVariant type created
during the building process of GVariantBuilder as that results in an
allocation for the GVariant, GBytes, and the byte buffer.
Recent changes for GBytes may reduce those 3 allocations to 2, but even
that is quite substantial overhead for a 32-bit integer.
This changeset switches GVariant to use g_new/g_free allocators instead
of g_slice_new/free. When benchmarked alone, this presented no
measurable difference in overhead with the standard glibc allocator.
With that change in place, allocations may then become variable in size
to contain small allocations at the end of the GVariant reducing things
to a single allocation (plus the GVariantTypeInfo reference).
The size of GVariant is already 1 cacheline @ 64-bytes on x86_64. This
uses that to guarantee our alignment of data maintains the 8-bytes
guarantee of GVariant, and also extends it to match malloc().
On 32-bit systems, we are similarly aligned but reduce the amount we
will inline to 32 bytes so we have a total of 1 cacheline.
This is all asserted at compile-time to enforce the guarantee.
In the end, this changeset reduces the wallclock time of building many
GVariant in a loop using GVariantBuilder by 10% beyond the 10% already
gained through GBytes doing the same thing.
This adds another form of stack building which allows avoiding the rather
expensive GVariantType malloc/memcpy/free. In a tight loop this reduced
wallclock time by about 4-5% for cases where you do not need to further
open using g_variant_builder_open() which still require a copy at this
time.
New API is provided instead of modifying g_variant_type_init() because
previously it was possible (though misguided) to use g_variant_type_init()
which a dynamically allocated GVariantType which could be freed before
g_variant_builder_end() or g_variant_builder_clear() was called.
# Conflicts:
# glib/gvariant.c
When trying to locate a GVariantTypeInfo from the cache we were copying
the string so that we can use g_str_hash, as that requires a \0 terminated
string.
However, we have hash and equal functions which can be used without the
extra copy. Additionally, these were moved to headers in previous commits
so they can be used without having to re-check GVariantType we already
know to be valid.
Right now we create a bunch of GBytes which then get their reference count
incremented and immediately decremented. This causes quite a bit of
disruption for cacheline re-use.
Instead, this change creates an internal helper to transfer ownership of
GBytes to the new GVariant directly.
Surprisingly, this reduced wallclock time by about 6% for a contrived
benchmark of building "as" variant with GVariantBuilder.
When dealing with small allocations it can save considerable cycles to
do a single allocation for both the GBytes and the data by tacking it
onto the end of the GBytes.
Care is taken to preserve the glibc expectation of 2*sizeof(void*)
alignment of allocations at the expense of some padding bytes.
The degenerate case here is when you want to steal the bytes afterwards
but that amounts to the same overhead as the status-quo.
Where this can help considerably is in GVariant building such as
g_variant_new_int32() which allocates for the GVariant, the GBytes, and
the int32 within the GBytes.
In a simple benchmark of using GVariantBuilder to create "(ii)" variants
this saved about 10% in wallclock time.