gen-unicode-tables.pl: Add more error checking

We’re essentially trying to build a minimal perfect hash function, and
`vals` is the map which represents that function. If we redefine a
member of `vals`, the map is no longer a partial function — one input
value (a Unicode codepoint) has two output values (compose table
indices).

So it’s bad if a member of `vals` gets redefined, and we want to be
notified if that happens.

As it happens, some of the new codepoints in Unicode 16.0 cause these
checks to fail. For example, U+16121 Gurung Khema Vowel Sign U
decomposes to U+1611E U+1611E. This causes `vals{U+1611E}` to be defined
to an index from the `first` map, and then redefined to an index from
the `second` map.

The following few commits will fix this, but let’s get the checks in
first.

Signed-off-by: Philip Withnall <pwithnall@gnome.org>
This commit is contained in:
Philip Withnall 2024-10-21 16:57:46 +01:00
parent ebd26727a8
commit dc2491d224
No known key found for this signature in database
GPG Key ID: C5C42CFB268637CA

View File

@ -1354,12 +1354,18 @@ sub output_composition_table
printf OUT "#define COMPOSE_FIRST_SINGLE_START %d\n", $total;
for $record (@first_singletons) {
my $code = $record->[0];
if (defined $vals{$code}) {
die "redefining $code as first-singleton";
}
$vals{$code} = $i++ + $total;
$last = $code if $code > $last;
}
$total += @first_singletons;
printf OUT "#define COMPOSE_SECOND_START %d\n", $total;
for $code (keys %second) {
if (defined $vals{$code}) {
die "redefining $code as second";
}
$vals{$code} = $second{$code} + $total;
$last = $code if $code > $last;
}
@ -1368,6 +1374,9 @@ sub output_composition_table
printf OUT "#define COMPOSE_SECOND_SINGLE_START %d\n\n", $total;
for $record (@second_singletons) {
my $code = $record->[0];
if (defined $vals{$code}) {
die "redefining $code as second-singleton";
}
$vals{$code} = $i++ + $total;
$last = $code if $code > $last;
}