It is unused when compiling with `G_DISABLE_ASSERT`. That’s fine, but we
definitely want the `g_hash_table_remove()` call to still be made.
Fixes this CI failure: https://gitlab.gnome.org/GNOME/glib/-/jobs/4483098
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Testing this in a normal testcaes is a bit tricky, since
triggering a non-fatal assertion has the side-effect of
marking the test as failed.
So just don't run any testcases here, but check the side-effect
manually. Since we don't produce TAP output when not using
g_test_run(), tell meson that we're using the exitcode protocol.
There is a race between releasing and re-acquiring an interned
GRefString if this happens on two threads at the same time. This can
result in already freed memory to be returned from
g_ref_string_new_intern().
| Thread 1 | Thread 2 |
| ------------------------------ | ----------------------------- |
| g_ref_string_release() | g_ref_string_new_intern() |
| g_atomic_rc_box_release_full() | g_mutex_lock() |
| | g_hash_table_lookup() |
| remove_if_interned() | g_ref_string_acquire() |
| g_mutex_lock() | g_mutex_unlock() |
| g_hash_table_remove() | |
| g_mutex_unlock() | |
| g_free() | |
| | return res; // this is freed |
This use-after-free usually also gives a critical warning because
g_atomic_ref_count_inc() checks for the refcount having been 0
before incrementing.
It is not possible to safely implement weak references via garcbox.
To avoid this race do not implement weak references via garcbox but
instead implement the allocation of the string manually with a manually
managed reference count. This allows to safely resurrect the interned
string if the above race happens, and also avoids other races.
As a side-effect this also
* reduces the allocation size in addition to the actual string length
from 32 bytes to 16 bytes on 64 bit platforms and keeps it at 16 bytes
on 32 bit platforms,
* doesn't lock a mutex when freeing non-interned GRefStrings.
We have a mechanism for turning on optional features of the GLib
test harness by passing options to g_test_init(). Use it for the
non-fatal assertions as well.
The documentation for glibc's pthread_setname_np states:
The thread name is a meaningful C language string,
whose length is restricted to 16 characters,
including the terminating null byte ('\0').
The documentation for Solaris' pthread_setname_np states:
The thread name is a string of length 31 bytes or less,
UTF-8 encoded.
Failing to respect this length limitation may lead to no name being
set, which is confusing, since the thread then shows up under the
binary name in gdb. This was happening for the pango worker thread
with the name "[pango] fontconfig".
For g_auto(GVariantBuilder) one needs to initialize it before the
function returns, so it's best to do it when the variable is declared.
G_VARIANT_BUILDER_INIT exists but requires specifying a GVariantType in
the declaration which moves the type away from the usage of the builder
which often results in less readable code. G_VARIANT_BUILDER_INIT also
mentions that it's possible to explicitly zero the variable but this is
hard to find and writing `g_auto(GVariantBuilder) builder = {0,};` is
kind of ugly.
This introduces G_VARIANT_BUILDER_INIT_UNSET which zero initializes the
variable being declared. This gives us documentation and hides the
explicitly zeroing detail:
auto(GVariantBuilder) builder = G_VARIANT_BUILDER_INIT_UNSET ();
Every usage in GLib ensures this but theoretically external code might
pass something else. As this is only meant to be used internally from
GLib, don't support the other case but at least avoid potential out of
bound reads.
The length might be passed explicitly in the field instead, and the
string might not have a NUL-terminator as happens for example when
passed from the Rust bindings.
This might lead to out of bounds reads.
Thanks to Sebastian Wiesner for noticing this.
Just like how commit ad572e77802c3c383619fe63a4832b5c75dbea82 added an
ifunc resolver for `g_utf8_validate()`, we also need to add one for
`g_str_is_ascii()`, as it also calls into the c-utf8 SIMD validation
code which causes false-positive buffer read overflow warnings from
valgrind and asan.
I thought about just adding the `strlen()` call into `g_str_is_ascii()`
unconditionally, as a simpler fix, but from a quick
codesearch.debian.net, it appears `g_str_is_ascii()` is used quite
widely, so this would have an unacceptable performance impact.
This should fix the valgrind failures on the `search-utils` test seen
here: https://gitlab.gnome.org/GNOME/glib/-/jobs/4423753.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
As suggested by Michael Catanzaro, this should make the return type of
the resolve function a bit easier for people to parse.
This introduces no functional changes.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
It looks like these might get more complex in future, as compilers claim
to support the attribute (`__has_attribute(ifunc)` is true) but then
raise errors at compile time if the target architecture doesn’t support
ifuncs.
For example, see #3511.
This doesn’t fix#3511 (I don’t have time to test on musl right now), but
it should make it easier to update the platform preprocessor conditions
in future.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3511
This adds various additional tests to cover branches of `gunidecomp.c`
which are not already covered, bringing our branch coverage of that file
up to 100% (if you ignore `g_utf8_normalize()`, which is tested by
`unicode-normalize.c` and I’m counting it separately).
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3470
This introduces no functional changes, but allows the test to be easily
extended, in the following commit, to test restricted `result_len`
sizes.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
`g_assert_false (g_unichar_compose (…) && ch == 0)` will succeed if
`g_unichar_compose()` succeeds and returns a non-zero character (which
it will if it succeeds), so this isn’t really testing what we want it to
test. This regressed in commit ae4eea7a39.
Refactor out the repetitive calls to `g_unichar_compose()` and fix the
boolean checks.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Just to make it a bit easier for people to understand the logic in the
implementation in future, because it took me a while.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
See the big comment in the code for details. Essentially, this adds a
new compose table specifically for the transitive closure of ‘either’
codepoints — codepoints which appear as the first codepoint in a
composition pair and as the second point in a composition pair
(potentially, but not necessarily, the same pair); or which appear in a
composition pair with an ‘either’ codepoint.
This new compose table has to be symmetrically indexed, as the
`COMPOSE_INDEX` macro doesn’t differentiate based on codepoint position
(first or second). It’s not possible to achieve that with the main
`compose_array` without making it absolutely huge (it’s currently about
150×40 in size and would have to become at least 150×150 in size). In
contrast, the new `compose_either_array` is currently 15×15.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3470
We’re essentially trying to build a minimal perfect hash function, and
`vals` is the map which represents that function. If we redefine a
member of `vals`, the map is no longer a partial function — one input
value (a Unicode codepoint) has two output values (compose table
indices).
So it’s bad if a member of `vals` gets redefined, and we want to be
notified if that happens.
As it happens, some of the new codepoints in Unicode 16.0 cause these
checks to fail. For example, U+16121 Gurung Khema Vowel Sign U
decomposes to U+1611E U+1611E. This causes `vals{U+1611E}` to be defined
to an index from the `first` map, and then redefined to an index from
the `second` map.
The following few commits will fix this, but let’s get the checks in
first.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Because how these big tables of numbers work is perhaps a bit hard to
figure out, and it would be useful to document the design decisions
involved in it.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
All changes mechanically generated with:
```
./tools/update-unicode-data.sh ~/Downloads/UCD 16.0.0
```
using the data from https://www.unicode.org/Public/16.0.0/ucd/UCD.zip.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Fixes: #3470
Manually added from the data in
https://www.unicode.org/Public/16.0.0/ucd/UCD.zip.
The following commit will mechanically update the Unicode tables to use
them.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3470
The time has finally come when Unicode has specified a codepoint above
U+FFFF which has a decomposition: U+16125 GURUNG KHEMA VOWEL SIGN AI, in
Unicode 16 which the following commits will add support for.
So far, we’ve managed to store the reverse-lookup from decomposed pairs
to their composed form using a 16-bit integer. Now we have to switch to
storing the composed form in a 32-bit `gunichar` as U+16125 won’t fit
otherwise.
This introduces no functional changes, but does double the in-memory
size of the `compose_array` table from 9176 bytes to 19932 bytes.
The code which uses this lookup table, in `gunidecomp.c`, was already
implicitly converting the loaded value to a `gunichar`, so needs no
changes.
When we update to Unicode 16, the new `NormalizationTest.txt` file
contains a test which will check that composed codepoints > U+FFFF work.
Specifically, U+11391 TULU-TIGALARI LETTER AU is tested.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3470
On AIX, the system poll.h redefines the names of struct members,
for example `#define events reqevents`. This means that accesses
to GPollFD will fail to compile if poll.h was included after
glib/gpoll.h.
We can't simply add `#include <poll.h>` in glib/gpoll.h, because
that wouldn't work on platforms where poll.h doesn't exist, and
GLib supports some platforms in that category.
Resolves: https://gitlab.gnome.org/GNOME/glib/-/issues/3500
On recent versions of Debian, PST8PDT is part of the tzdata-legacy
package, which is not always installed and might disappear in future.
Successfully tested with and without tzdata-legacy on Debian unstable.
Signed-off-by: Simon McVittie <smcv@debian.org>
Instead of using timestamp 0 as a magic number (in this case interpreted
as 1970-01-01T00:00:00-08:00), calculate a timestamp from a recent
year/month/day in winter, in this case 2024-01-01T00:00:00-08:00.
Similarly, instead of using a timestamp 15 million seconds later
(1970-06-23T15:40:00-07:00), calculate a timestamp from a recent
year/month/day in summer, in this case 2024-07-01T00:00:00-07:00.
Signed-off-by: Simon McVittie <smcv@debian.org>
In newer tzdata, it is an alias for America/Los_Angeles, which has a
slightly different meaning: DST did not exist there before 1883. As a
result, we can no longer hard-code the knowledge that interval 0 is
standard time and interval 1 is summer time, and instead we need to look
up the correct intervals from known timestamps.
Resolves: https://gitlab.gnome.org/GNOME/glib/-/issues/3502
Bug-Debian: https://bugs.debian.org/1084190
[smcv: expand commit message, fix whitespace]
Signed-off-by: Simon McVittie <smcv@debian.org>
Why they’re necessary, why we _think_ the optimised implementation of
`g_utf8_validate()` is OK despite what valgrind and asan are telling us,
and how they work.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3493