The goal of this change is to avoid re-parsing `GVariantTypeInfo` on
every creation or parsing of a `GVariant` byte-buffer. Parsing presents
a non-trivial amount of overhead which can typically be elided.
It was discovered that many applications and tooling are re-generating
this information upon receiving a D-Bus message as they tend to process
messages serially, thus dropping the last reference count.
Previously, when the last reference count for a `GVariantTypeInfo` was
dropped we would finalize the parsed type information.
This change keeps `GVariantTypeInfo` alive in a Garbage Collected array.
The array is collected upon reaching a 32 entries. The number 32 was
chosen because it is larger than what I've seen active on various D-Bus
based applications-or-daemons.
Take a simple test case of using `GVariantBuilder` in a loop with a
debugoptimized build of GLib. A reduction in wallclock time can be
observed in the 35% to more than 70% based on the complexity of the
GVariant being created.
For cases like ibus-daemon, it was previously parsing `GVariantTypeInfo`
up to dozens of times per key-press/release cycle.
Closes: #3472
While reading a single byte or uint16 from an input stream is fairly
simple and uncontroversial, the code to read a line or read up to any of
a set of stop characters is not so trivial. People may be using
`GDataInputStream` to parse untrusted input like this, so we should
probably test that it’s robust against a variety of input conditions.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
This attempts to use GCC __attribute__((ifunc("resolver_func"))) to check
for valgrind early in the process startup so that the proper function is
dispatched instead of runtime checks within the function.
This should make #3493 less annoying when run under Valgrind.
...in the main tests that we expect to pass.
Due to an upstream issue in PCRE2-10.44, disable running the PCRE2 tests for
now, until the next release (or so) of PCRE2, as the issue has already been
resolved in upstream PCRE2 but has not made it to the PCRE2 release that we use
for our subprojects.
This introduces no functional changes, but documents the intent a bit
better in the code where these signal IDs are stored in a struct.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
This makes no functional changes, but does tidy the code up a bit and
means `g_steal_handle_id()` gets a bit more testing.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
ip_mreqn.imr_ifindex is not used correctly by the XNU kernel, and
causes us to bind to the default interface; so fallback to ip_mreq
and set the iface source address (not SSM).
Fixes https://gitlab.gnome.org/GNOME/glib/-/issues/3489
...and add ability to select target platform for calling vcvarsall.bat,
so that we can accomodate 32-bit builds and possibly ARM64 builds in the
CI if we need to.
Limit the input size. With a short @find, and a long `init` and `replace`
it’s quite possible to hit OOM. We’re not interested in testing that — it’s
up to the caller of `g_string_replace()` to handle that. 1KB on each of the
inputs should be plenty to find any string parsing or pointer arithmetic
bugs in `g_string_replace()`.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
oss-fuzz#371233785
This patch fixes a build error when compiling with GCC cross compiler for Windows on ARM64.
See issue #3490 for details.
Signed-off-by: Carlo Bramini carlo_bramini@users.sourceforge.netFixes: #3490Closes#3490
This affects the new `g_string_replace()` code which landed on `main` a
few days ago. It does not affect the old implementation of
`g_string_replace()`.
The code for the `f_len == 0` (needle is an empty string) case was
modifying `string` in the loop, without updating any of the string
pointers into it. If the replacement was long enough (or inserted enough
times), this would trigger a realloc of `string->str` and cause all the
string pointers to be dangling.
Fix this by pulling the `f_len == 0` code out into a separate branch and
loop, rather than trying to integrate it into the main loop. This
simplifies the main loop significantly, and makes both easier to verify.
An alternative approach, which doesn’t involve splitting the
`f_len == 0` case out, might have been to track the positions using
indexes rather than string pointers. I think the approach in this commit
is better, though, as it removes the possibility of `f_len == 0`
entirely from the loop, which makes it much easier to verify termination
of the loop.
Add more tests to validate this, including the test from oss-fuzz which
triggered the realloc and found the heap buffer overflow.
The new tests have also been run against the _old_ implementation of
`g_string_replace()` to ensure its behaviour (particularly around `f_len
== 0 && limit > 0`) has not changed.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
oss-fuzz#371043019
This arm of the condition is always true, because 0x00 has been checked
in the previous branch.
This is not going to improve performance, but does mean we now have full
branch coverage of the code via our unit tests, which gives some
assurance that it’s all good.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3481
The move to c-utf8 for validation has exposed a few new branches where
our existing (fairly comprehensive) UTF-8 validation test suite didn’t
check things.
Add unit tests for those branches, so we keep code coverage.
I’ve validated (with an independent UTF-8 decoder) that the test vectors
are correctly marked as valid/invalid in the test data (so the tests
aren’t just blindly coded to match the behaviour of the new validator
code).
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3481
It turns out it’s not actually been explicitly tested before, even
though it has full code coverage through being called by other code
which is tested.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Scanning for stop chars can require looking through a considerable amount
of input data. In the case there is a single stop character, use memchr()
which can be optimized by the compiler to look at word size or greater
amounts of data at a time.
This moves g_str_is_ascii() from gstrfuncs.c to gutf8.c so that we can
reuse the same SIMD code for ASCII validation.
On Apple Silicon:
Before: 3297 MB/s
After: 26146 MB/s
This is based on the https://github.com/c-util/c-utf8 project and has
been adapted for portability and integration into GLib. c-utf8 is dual
licensed Apache-2.0 and LGPLv2.1+, the latter matching GLib.
Notably, `case 0x01 ... 0x7F:` style switch/case labels have been
converted to if/else which is more portable to non-GCC/Clang platforms
while generating the same assembly, at least on x86_64 with GCC.
Additionally, `__attribute__((aligned(n)))` is used in favor of
`__builtin_assume_aligned(n)` because it is more portable to MSVC's
`__declspec(align(n))` and also generates the same assembly as GCC's
`__builtin_assume_aligned(n)`.
For GCC x86_64 Linux on a Xeon 4214 this improved the throughput of
g_utf8_validate() for ASCII from 750MB/s to around 10,000MB/s (13x).
On GCC aarch64 Linux with an Apple Silicon M2 Pro we go from about
2,200 MB/s to 26,700 MB/s (12x).
Closes: #3481
And improve formatting in a few places while I’m there:
* Add quotes around ‘maybe’ types to make it clearer that ‘maybe’ is
being used as a proper noun
* Add linebreaks so that all doc comments start with a single-sentence
summary of the method
* Improve formatting of constants
* Add a few links to external specifications
See https://developer.gnome.org/documentation/guidelines/devel-docs.html
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3250