This was mostly machine generated with the following command:
```
codespell \
--builtin clear,rare,usage \
--skip './po/*' --skip './.git/*' --skip './NEWS*' \
--write-changes .
```
using the latest git version of `codespell` as per [these
instructions](https://github.com/codespell-project/codespell#user-content-updating).
Then I manually checked each change using `git add -p`, made a few
manual fixups and dropped a load of incorrect changes.
There are still some outdated or loaded terms used in GLib, mostly to do
with git branch terminology. They will need to be changed later as part
of a wider migration of git terminology.
If I’ve missed anything, please file an issue!
Signed-off-by: Philip Withnall <withnall@endlessm.com>
In glib/gutf8.c there was an UB in function g_utf8_find_prev_char when
p == str. In this case we substract one from p and now p points to a
location outside of the boundary of str. It's a UB by the standard.
Since this function are meant to be fast, we don't check the boundary
conditions.
Fix glib/tests/utf8-pointer test. It failed due to the UB described
above and aggressive optimisation when -O2 and LTO are enabled. Some
compilers (e.g. GCC with major version >= 8) create an optimised version
of g_utf8_find_prev_char with the first argument fixed and stored
somewhere else (with a different pointer). It can be solved with either
marking str as volatile or creating a copy of str in memory. We choose
the second approach since it's more explicit solution.
Add additional checks to glib/tests/utf8-pointer test.
Closes#1917
g_utf8_get_char_validated() was not exactly matching its
documentation. The function was not checking if the sequence of
unicode characters was free of null bytes before performing a more
in-depth validation.
Fix issue #1052
You may expect funny effects from passing invalid UTF-8, but not
that funny. The assert will probably be a better and more immediate
confirmation of an error than invalid writes under the address of the
string copy.
https://gitlab.gnome.org/GNOME/glib/issues/1863
This is a variant of g_utf8_validate() which requires the length to be
specified, thereby allowing string lengths up to G_MAXSIZE rather than
just G_MAXSSIZE.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Make it clearer that it will only return NULL if @end is non-NULL. Add a
test for this too.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
https://bugzilla.gnome.org/show_bug.cgi?id=773842
If g_utf8_get_char_validated() encounters a nul byte in the middle of a
string of given longer length, it returns -2, indicating a partial
gunichar. That is not the obvious behaviour, but since
g_utf8_get_char_validated() has been API for a long time, the behaviour
cannot be changed.
Document it, and add some unit tests (for this behaviour and the other
behaviour of g_utf8_get_char_validated()).
Signed-off-by: Philip Withnall <withnall@endlessm.com>
https://bugzilla.gnome.org/show_bug.cgi?id=780095
All glib/*.{c,h} files have been processed, as well as gtester-report.
12 of those files are not licensed under LGPL:
gbsearcharray.h
gconstructor.h
glibintl.h
gmirroringtable.h
gscripttable.h
gtranslit-data.h
gunibreak.h
gunichartables.h
gunicomp.h
gunidecomp.h
valgrind.h
win_iconv.c
Some of them are generated files, some are licensed under a BSD-style
license and win_iconv.c is in the public domain.
Sub-directories inside glib/:
deprecated/: processed in a previous commit
glib-mirroring-tab/: already LGPLv2.1+
gnulib/: not modified, the code is copied from gnulib
libcharset/: a copy
pcre/: a copy
tests/: processed in a previous commit
https://bugzilla.gnome.org/show_bug.cgi?id=776504
It’s hard to remember what the difference is between -1 and -2, so give
them names.
This introduces no functional changes.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
https://bugzilla.gnome.org/show_bug.cgi?id=780095
g_utf8_make_valid was turned into a public API this cycle. However
now that it is public we should make the API more generic, allowing
the caller to specify the length. This is especially useful if
the function is called with a string that has \0 in the middle
or for chunks of a strings that are not nul terminated.
This is also consistent with most of the other utf8 utils.
Callers inside glib are updated to the new signature.
https://bugzilla.gnome.org/show_bug.cgi?id=779456
If we have an input parameter (or return value) we need to use (nullable).
However, if it is an (inout) or (out) parameter, (optional) is sufficient.
It looks like (nullable) could be used for everything according to the
Annotation documentation, but (optional) is more specific.
Add various (nullable) and (optional) annotations which were missing
from a variety of functions. Also port a couple of existing (allow-none)
annotations in the same files to use (nullable) and (optional) as
appropriate instead.
Secondly, add various (not nullable) annotations as needed by the new
default in gobject-introspection of marking gpointers as (nullable). See
https://bugzilla.gnome.org/show_bug.cgi?id=729660.
This includes adding some stub documentation comments for the
assertion macro error functions, which weren’t previously documented.
The new comments are purely to allow for annotations, and hence are
marked as (skip) to prevent the symbols appearing in the GIR file.
https://bugzilla.gnome.org/show_bug.cgi?id=719966
A recent change permitted some characters from range 0x80-0xbf as
would-be valid sequence starters for length 2, as long as
continuation characters were OK.
https://bugzilla.gnome.org/show_bug.cgi?id=738504
Unrolling the branches and expressions for all expected cases
of UTF-8 sequences facilitates the work of both an optimizing compiler
and the branch prediction logic in the CPU. This speeds up decoding
noticeably on text composed primarily of longer sequences.
https://bugzilla.gnome.org/show_bug.cgi?id=738504
The number of branches and logical operations can be reduced by
never producing a resulting wide character value to check its range.
Instead, individual bytes in the sequence are validated
depending on the branch taken on the basis of preceding bytes.
The syntax given in RFC 3629 is made use of.
https://bugzilla.gnome.org/show_bug.cgi?id=738504
Make some of the conversion functions a bit more friendly to allocation
failure.
Even though the glib policy is to abort() on allocation failure by
default, it can be quite helpful to return an allocation error for
functions already providing a GError.
I needed a safer g_utf16_to_utf8() to solve crash on big clipboard
operations with win32, related to rhbz#1017250 (and coming gdk handling
bug).
https://bugzilla.gnome.org/show_bug.cgi?id=711546
The "end" argument is unusual in g_utf8_validate(): it's not a classic out
argument which gets allocated by the called function, but merely points into
one of its input arguments. Thus it is "transfer none".
https://bugzilla.gnome.org/show_bug.cgi?id=672889
In order for this function to have any point, it has to be possible to
pass non-UTF-8 data to it, so annotate @str as being array-of-guint8
instead of utf8.
https://bugzilla.gnome.org/show_bug.cgi?id=672548
Remove the explicit thread initialisation functions for g_get_charset(),
g_get_filename_charsets() and g_get_language_names().
Add a lock around one remaining case of access to libcharset (the other
2 cases already have the lock).
Do a proper g_once_init_enter() style initialisation for the GLib
gettext functions.
https://bugzilla.gnome.org/show_bug.cgi?id=658683