`g_assert_false (g_unichar_compose (…) && ch == 0)` will succeed if
`g_unichar_compose()` succeeds and returns a non-zero character (which
it will if it succeeds), so this isn’t really testing what we want it to
test. This regressed in commit ae4eea7a39.
Refactor out the repetitive calls to `g_unichar_compose()` and fix the
boolean checks.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Just to make it a bit easier for people to understand the logic in the
implementation in future, because it took me a while.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
See the big comment in the code for details. Essentially, this adds a
new compose table specifically for the transitive closure of ‘either’
codepoints — codepoints which appear as the first codepoint in a
composition pair and as the second point in a composition pair
(potentially, but not necessarily, the same pair); or which appear in a
composition pair with an ‘either’ codepoint.
This new compose table has to be symmetrically indexed, as the
`COMPOSE_INDEX` macro doesn’t differentiate based on codepoint position
(first or second). It’s not possible to achieve that with the main
`compose_array` without making it absolutely huge (it’s currently about
150×40 in size and would have to become at least 150×150 in size). In
contrast, the new `compose_either_array` is currently 15×15.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3470
We’re essentially trying to build a minimal perfect hash function, and
`vals` is the map which represents that function. If we redefine a
member of `vals`, the map is no longer a partial function — one input
value (a Unicode codepoint) has two output values (compose table
indices).
So it’s bad if a member of `vals` gets redefined, and we want to be
notified if that happens.
As it happens, some of the new codepoints in Unicode 16.0 cause these
checks to fail. For example, U+16121 Gurung Khema Vowel Sign U
decomposes to U+1611E U+1611E. This causes `vals{U+1611E}` to be defined
to an index from the `first` map, and then redefined to an index from
the `second` map.
The following few commits will fix this, but let’s get the checks in
first.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Because how these big tables of numbers work is perhaps a bit hard to
figure out, and it would be useful to document the design decisions
involved in it.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
All changes mechanically generated with:
```
./tools/update-unicode-data.sh ~/Downloads/UCD 16.0.0
```
using the data from https://www.unicode.org/Public/16.0.0/ucd/UCD.zip.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Fixes: #3470
Manually added from the data in
https://www.unicode.org/Public/16.0.0/ucd/UCD.zip.
The following commit will mechanically update the Unicode tables to use
them.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3470
Only one other previous author, and my contribution just now is so
simple as to not be copyrightable.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #1415
The time has finally come when Unicode has specified a codepoint above
U+FFFF which has a decomposition: U+16125 GURUNG KHEMA VOWEL SIGN AI, in
Unicode 16 which the following commits will add support for.
So far, we’ve managed to store the reverse-lookup from decomposed pairs
to their composed form using a 16-bit integer. Now we have to switch to
storing the composed form in a 32-bit `gunichar` as U+16125 won’t fit
otherwise.
This introduces no functional changes, but does double the in-memory
size of the `compose_array` table from 9176 bytes to 19932 bytes.
The code which uses this lookup table, in `gunidecomp.c`, was already
implicitly converting the loaded value to a `gunichar`, so needs no
changes.
When we update to Unicode 16, the new `NormalizationTest.txt` file
contains a test which will check that composed codepoints > U+FFFF work.
Specifically, U+11391 TULU-TIGALARI LETTER AU is tested.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3470
The version of ninja-build in Debian 12 isn't built with large file
support, and therefore is not compatible with filesystems with large
inode numbers. Unfortunately, that includes the overlay mounts used by
Docker.
I've suggested a stable update for this as part of the next Debian 12
point release. Until/unless that happens, we can build our own.
Signed-off-by: Simon McVittie <smcv@debian.org>
This is enough to detect most ILP32-specific issues. We previously
relied on 32-bit Windows to catch those, but the toolchains we're using
have increasingly minimal support for 32-bit Windows.
The combination of fedora-x86_64 and debian-stable-i386 between them
should cover nearly everything that debian-stable-x86_64 does, so demote
debian-stable-x86_64 to be run on a schedule or manually.
Helps: https://gitlab.gnome.org/GNOME/glib/-/issues/3477
Signed-off-by: Simon McVittie <smcv@debian.org>
This is identical to the debian-stable image, except that it uses
packages from the i386 dpkg architecture (i686-linux-gnu) instead of
amd64 (x86_64-linux-gnu). x86_64 Docker hosts with x86_64 kernels can
run i386 Docker images, so we can use our existing CI workers.
Instead of duplicating the content of the Dockerfile, add a layer of
architecture-switching so we can build essentially the same image
from a different base.
Signed-off-by: Simon McVittie <smcv@debian.org>
On AIX, the system poll.h redefines the names of struct members,
for example `#define events reqevents`. This means that accesses
to GPollFD will fail to compile if poll.h was included after
glib/gpoll.h.
We can't simply add `#include <poll.h>` in glib/gpoll.h, because
that wouldn't work on platforms where poll.h doesn't exist, and
GLib supports some platforms in that category.
Resolves: https://gitlab.gnome.org/GNOME/glib/-/issues/3500
On recent versions of Debian, PST8PDT is part of the tzdata-legacy
package, which is not always installed and might disappear in future.
Successfully tested with and without tzdata-legacy on Debian unstable.
Signed-off-by: Simon McVittie <smcv@debian.org>
Instead of using timestamp 0 as a magic number (in this case interpreted
as 1970-01-01T00:00:00-08:00), calculate a timestamp from a recent
year/month/day in winter, in this case 2024-01-01T00:00:00-08:00.
Similarly, instead of using a timestamp 15 million seconds later
(1970-06-23T15:40:00-07:00), calculate a timestamp from a recent
year/month/day in summer, in this case 2024-07-01T00:00:00-07:00.
Signed-off-by: Simon McVittie <smcv@debian.org>
In newer tzdata, it is an alias for America/Los_Angeles, which has a
slightly different meaning: DST did not exist there before 1883. As a
result, we can no longer hard-code the knowledge that interval 0 is
standard time and interval 1 is summer time, and instead we need to look
up the correct intervals from known timestamps.
Resolves: https://gitlab.gnome.org/GNOME/glib/-/issues/3502
Bug-Debian: https://bugs.debian.org/1084190
[smcv: expand commit message, fix whitespace]
Signed-off-by: Simon McVittie <smcv@debian.org>
We need to write a Meson cross-compilation file on the fly here, hence the
additions in test-msvc.bat to set up the paths.
Like the 32-bit VS2019 CI job this is only run manually or weekly.
Why they’re necessary, why we _think_ the optimised implementation of
`g_utf8_validate()` is OK despite what valgrind and asan are telling us,
and how they work.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3493
Commit 760a6f647 rearranged how the lengths are calculated for the test
data and added `escape_data_string()` so they could be printed safely.
Unfortunately there was a miscount in the length of the first test
vector in `test_read_upto()`: there are 31 bytes in the string literal,
plus one nul terminator which is added by the compiler. The quoted
string length was 32 bytes. This should be fine (explicitly including
the nul delimiter), but then `escape_data_string()` adds another byte to
the length because it assumes the nul delimiter has *not* been included
in the count.
Changing the string length from 32 to 31 breaks the tests, as the final
component of the data is then the wrong length, so add an additional
explicit nul byte to the string literal so that it matches the length.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
This is a follow up to commit e7e5ddd2a. oss-fuzz found a case where
performance was pathologically bad with a long `stop_chars` string.
Since our inner loop in that case was iterating over `stop_chars` and
comparing each of them to `buffer[i]`, we can use `memchr()` the
opposite way round to in commit e7e5ddd2a to speed that up, using
`buffer[i]` as the needle in a `stop_chars` haystack.
From some brief testing, this doesn’t impact on the performance of a
more normal use case of having a short (<10 bytes long) `stop_chars`. I
was slightly concerned that the function call overhead of calling out to
`memchr()` would have an impact there, but apparently not.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
oss-fuzz#372994443
I wish people would stop moving their documentation around without
adding redirects. This is not how the internet is supposed to work.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
The existing g-io-module-default-singleton-calloc suppression
seems applicable to definite _g_io_module_get_default_type leaks
seen with Valgrind 3.23.0.
This check is necessary for Solaris & illumos, where 32-bit libelf
is incompatible with large-file mode, which meson forces to be enabled,
but 64-bit libelf works fine.
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
For historical reasons, pid_t & mode_t are defined as long instead
of int for 32-bit processes in the Solaris headers, and even though
they are the same size, gcc issues -Wformat headers if you try to
print them with "%d" and "%u" instead of "%ld" & "%lu".
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Previously the build was requesting interfaces matching SUSv1/Unix95,
as implemented in Solaris 2.6 and later. This changes it to try the
most recent version supported, and limits to the versions supported
by OS versions that meson supports. This includes these _XOPEN_SOURCE
versions:
800 (2024): supported by illumos starting in July 2024
700 (2008): supported by Solaris 11.4 & illumos from 2014-2024
600 (2001): supported by Solaris 10-11.3 & illumos prior to 2014
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Like _GNU_SOURCE on glibc, this tells the header to define functions
not included in the requested standards versions. This is needed to
build glib/tests/utils-c-89 with -std=c89 and utils-c-89 with -std=c99
and still be able to call functions like isnan() and realpath().
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>