Commit Graph

132 Commits

Author SHA1 Message Date
Philip Withnall
de2f692846 Merge branch 'main' into 'main'
gutf8: add string length check when ending character offset is -1

See merge request GNOME/glib!2328
2021-11-22 12:22:54 +00:00
Chen Guanqiao
9adbdd45d7 gutf8: add string length check when ending character offset is -1
Some function such as atk_text_get_text, use -1 to indicate the end of the
string. And an crash occurs when the -1 is passed to g_utf8_substring.

Call Trace:
  0  __memmove_avx_unaligned_erms
  1  memcpy
  2  g_utf8_substring
  3  impl_GetText
  4  handle_other
  5  handle_message
  6  _dbus_object_tree_dispatch_and_unlock
  7  dbus_connection_dispatch
  8  dbus_connection_dispatch
  9  ()
  10 g_main_dispatch
  11 g_main_context_dispatch
  12 g_main_context_iterate
  13 g_main_context_iteration
  14 g_application_run
  15 main

Signed-off-by: Chen Guanqiao <chen.chenchacha@foxmail.com>
2021-11-19 00:52:07 +08:00
Emmanuel Fleury
8c35109a21 Fix signedness warnings in glib/gutf8.c
glib/gutf8.c: In function 'g_utf8_get_char_extended':
glib/gutf8.c:626:39: error: comparison of integer expressions of different signedness: 'guint' {aka 'unsigned int'} and 'gssize' {aka 'int'}
  626 |   if (G_UNLIKELY (max_len >= 0 && len > max_len))
      |                                       ^
glib/gmacros.h:1091:27: note: in definition of macro 'G_UNLIKELY'
 1091 | #define G_UNLIKELY(expr) (expr)
      |                           ^~~~
glib/gutf8.c:628:21: error: comparison of integer expressions of different signedness: 'guint' {aka 'unsigned int'} and 'gssize' {aka 'int'}
  628 |       for (i = 1; i < max_len; i++)
      |                     ^
2021-11-17 14:40:38 +01:00
Philip Withnall
90b2ad80ee gutf8: Document that out args from g_utf16_to_utf8() are non-negative
Despite their type, the values returned will always be ≥ 0. It’s
unfortunate they weren’t declared with an unsigned type, but we can’t
change that now without breaking API.

Spotted in !2294.

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
2021-10-14 12:45:30 +01:00
Philip Chimento
748103d75a introspection: Remove 'caller-allocates' from POD types
The (out caller-allocates) and (out callee-allocates) annotations are
meant for structured or pointer types. Plain old data types are just
regular out parameters and don't need the annotation about who
allocates them.

See: https://gitlab.gnome.org/GNOME/gjs/-/issues/386
2021-03-20 11:14:15 -07:00
Philip Withnall
9d859f001d gutf8: Fix a typo in the docs for g_utf16_to_utf8()
Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-09-10 14:14:32 +01:00
Philip Withnall
00bfb3ab44 tree: Fix various typos and outdated terminology
This was mostly machine generated with the following command:
```
codespell \
    --builtin clear,rare,usage \
    --skip './po/*' --skip './.git/*' --skip './NEWS*' \
    --write-changes .
```
using the latest git version of `codespell` as per [these
instructions](https://github.com/codespell-project/codespell#user-content-updating).

Then I manually checked each change using `git add -p`, made a few
manual fixups and dropped a load of incorrect changes.

There are still some outdated or loaded terms used in GLib, mostly to do
with git branch terminology. They will need to be changed later as part
of a wider migration of git terminology.

If I’ve missed anything, please file an issue!

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-06-12 15:01:08 +01:00
nightuser
b555119ca3 gunicode: Fix UB in gutf8.c and utf8-pointer test
In glib/gutf8.c there was an UB in function g_utf8_find_prev_char when
p == str. In this case we substract one from p and now p points to a
location outside of the boundary of str. It's a UB by the standard.
Since this function are meant to be fast, we don't check the boundary
conditions.

Fix glib/tests/utf8-pointer test. It failed due to the UB described
above and aggressive optimisation when -O2 and LTO are enabled. Some
compilers (e.g. GCC with major version >= 8) create an optimised version
of g_utf8_find_prev_char with the first argument fixed and stored
somewhere else (with a different pointer). It can be solved with either
marking str as volatile or creating a copy of str in memory. We choose
the second approach since it's more explicit solution.

Add additional checks to glib/tests/utf8-pointer test.

Closes #1917
2019-11-14 18:38:03 +00:00
Emmanuel Fleury
568720006c Add a missing check to g_utf8_get_char_validated()
g_utf8_get_char_validated() was not exactly matching its
documentation. The function was not checking if the sequence of
unicode characters was free of null bytes before performing a more
in-depth validation.

Fix issue #1052
2019-09-14 18:01:22 +02:00
Дилян Палаузов
512655aa12 minor typos in the documentation (a/an) 2019-08-24 19:14:05 +00:00
Carlos Garnacho
154f6cafa9 gutf8: Assert that written memory stays in bounds
You may expect funny effects from passing invalid UTF-8, but not
that funny. The assert will probably be a better and more immediate
confirmation of an error than invalid writes under the address of the
string copy.

https://gitlab.gnome.org/GNOME/glib/issues/1863
2019-08-07 23:33:46 +02:00
Philip Withnall
8f6e5f1b01 gutf8: Add various missing (transfer) annotations
Signed-off-by: Philip Withnall <withnall@endlessm.com>

Fixes: #872
2019-06-21 11:05:11 +01:00
Philip Withnall
7a4025cac1 gutf8: Add a g_utf8_validate_len() function
This is a variant of g_utf8_validate() which requires the length to be
specified, thereby allowing string lengths up to G_MAXSIZE rather than
just G_MAXSSIZE.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2018-10-23 17:01:51 +13:00
Patrick Griffis
1c0bed93a3 docs: Clarify dest requirements of g_utf8_strncpy()
(Minor wording tweak by Philip Withnall.)

https://bugzilla.gnome.org/show_bug.cgi?id=520116
2018-02-03 12:12:28 +01:00
Ole André Vadla Ravnås
b829b762fd gutf8: Fix length handling in g_utf8_make_valid()
We cannot blindly append the remainder when a length was provided
because the string isn't nul-terminated.

https://bugzilla.gnome.org/show_bug.cgi?id=789444
2017-10-25 10:33:48 +01:00
Philip Withnall
1366ce7ee0 gutf8: Clarify return value docs for g_utf8_find_next_char()
Make it clearer that it will only return NULL if @end is non-NULL. Add a
test for this too.

Signed-off-by: Philip Withnall <withnall@endlessm.com>

https://bugzilla.gnome.org/show_bug.cgi?id=773842
2017-06-21 11:39:52 +01:00
Philip Withnall
3e89b19c44 gutf8: Fix documentation for g_utf8_get_char_validated() length limits
If g_utf8_get_char_validated() encounters a nul byte in the middle of a
string of given longer length, it returns -2, indicating a partial
gunichar. That is not the obvious behaviour, but since
g_utf8_get_char_validated() has been API for a long time, the behaviour
cannot be changed.

Document it, and add some unit tests (for this behaviour and the other
behaviour of g_utf8_get_char_validated()).

Signed-off-by: Philip Withnall <withnall@endlessm.com>

https://bugzilla.gnome.org/show_bug.cgi?id=780095
2017-06-21 11:38:46 +01:00
Sébastien Wilmet
f9faac7661 glib/: LGPLv2+ -> LGPLv2.1+
All glib/*.{c,h} files have been processed, as well as gtester-report.

12 of those files are not licensed under LGPL:

	gbsearcharray.h
	gconstructor.h
	glibintl.h
	gmirroringtable.h
	gscripttable.h
	gtranslit-data.h
	gunibreak.h
	gunichartables.h
	gunicomp.h
	gunidecomp.h
	valgrind.h
	win_iconv.c

Some of them are generated files, some are licensed under a BSD-style
license and win_iconv.c is in the public domain.

Sub-directories inside glib/:

	deprecated/: processed in a previous commit
	glib-mirroring-tab/: already LGPLv2.1+
	gnulib/: not modified, the code is copied from gnulib
	libcharset/: a copy
	pcre/: a copy
	tests/: processed in a previous commit

https://bugzilla.gnome.org/show_bug.cgi?id=776504
2017-05-24 11:58:19 +02:00
Philip Withnall
69b4c72fe5 gutf8: Clarify documentation for g_utf8_get_char_validated()
There is no such thing as ‘no maximum’ when reading a string. It’s got
to end somewhere.

Signed-off-by: Philip Withnall <withnall@endlessm.com>

https://bugzilla.gnome.org/show_bug.cgi?id=780095
2017-03-20 11:07:48 +00:00
Philip Withnall
1c56a87c08 gutf8: Clarify return values from g_utf8_get_char_extended()
It’s hard to remember what the difference is between -1 and -2, so give
them names.

This introduces no functional changes.

Signed-off-by: Philip Withnall <withnall@endlessm.com>

https://bugzilla.gnome.org/show_bug.cgi?id=780095
2017-03-20 11:07:48 +00:00
Paolo Borelli
f559bc01dc Make g_utf8_make_valid optionally take a length
g_utf8_make_valid was turned into a public API this cycle. However
now that it is public we should make the API more generic, allowing
the caller to specify the length. This is especially useful if
the function is called with a string that has \0 in the middle
or for chunks of a strings that are not nul terminated.
This is also consistent with most of the other utf8 utils.

Callers inside glib are updated to the new signature.

https://bugzilla.gnome.org/show_bug.cgi?id=779456
2017-03-02 10:46:51 +01:00
Christian Hergert
18a33f72db introspection: use (nullable) or (optional) instead of (allow-none)
If we have an input parameter (or return value) we need to use (nullable).
However, if it is an (inout) or (out) parameter, (optional) is sufficient.

It looks like (nullable) could be used for everything according to the
Annotation documentation, but (optional) is more specific.
2016-11-22 14:14:37 -08:00
Simon McVittie
c46dbd4752 Make g_utf8_make_valid public
Based on a patch by Simon van der Linden and rebased onto current GLib,
with improved documentation loosely based on Telepathy's
tp_utf8_make_valid().

Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>
Bug: https://bugzilla.gnome.org/show_bug.cgi?id=591603
Bug: https://bugzilla.gnome.org/show_bug.cgi?id=610969
Reviewed-by: Colin Walters <walters@verbum.org>
2016-10-13 21:52:42 +01:00
Matthias Clasen
e0e652e403 Fix a corner-case in g_utf8_find_next_char
In the case that *p is '\0', we should return p + 1, not p.
This change allows to simplify g_utf8_find_next_char a bit.

https://bugzilla.gnome.org/show_bug.cgi?id=547200
2016-07-16 21:34:21 -04:00
Philip Withnall
30788dff5b gutf8: Fix typo in GIR annotation for g_utf8_to_ucs4() 2015-12-23 16:48:10 +00:00
Philip Withnall
25a7c817d3 glib: Add missing (nullable) and (optional) annotations
Add various (nullable) and (optional) annotations which were missing
from a variety of functions. Also port a couple of existing (allow-none)
annotations in the same files to use (nullable) and (optional) as
appropriate instead.

Secondly, add various (not nullable) annotations as needed by the new
default in gobject-introspection of marking gpointers as (nullable). See
https://bugzilla.gnome.org/show_bug.cgi?id=729660.

This includes adding some stub documentation comments for the
assertion macro error functions, which weren’t previously documented.
The new comments are purely to allow for annotations, and hence are
marked as (skip) to prevent the symbols appearing in the GIR file.

https://bugzilla.gnome.org/show_bug.cgi?id=719966
2015-11-07 10:48:32 +01:00
Mikhail Zabaluev
d1f4d4a91a g_utf8_validate: fix a regression
A recent change permitted some characters from range 0x80-0xbf as
would-be valid sequence starters for length 2, as long as
continuation characters were OK.

https://bugzilla.gnome.org/show_bug.cgi?id=738504
2015-09-13 13:04:59 -04:00
Mikhail Zabaluev
b963565125 Unrolled implementation of g_utf8_to_ucs4_fast()
Unrolling the branches and expressions for all expected cases
of UTF-8 sequences facilitates the work of both an optimizing compiler
and the branch prediction logic in the CPU. This speeds up decoding
noticeably on text composed primarily of longer sequences.

https://bugzilla.gnome.org/show_bug.cgi?id=738504
2015-09-05 13:12:48 -04:00
Mikhail Zabaluev
3188b8ee79 Optimized branching in g_utf8_validate()
The number of branches and logical operations can be reduced by
never producing a resulting wide character value to check its range.
Instead, individual bytes in the sequence are validated
depending on the branch taken on the basis of preceding bytes.
The syntax given in RFC 3629 is made use of.

https://bugzilla.gnome.org/show_bug.cgi?id=738504
2015-09-05 13:10:57 -04:00
William Jon McCann
20f4d1820b docs: use "Returns:" consistently
Instead of "Return value:".
2014-02-19 19:41:52 -05:00
Matthias Clasen
4d12e0d66f Docs: Don't use the emphasis tag
Most of the time, the text read just as well without the extra
boldness.
2014-01-31 20:34:33 -05:00
Daniel Mustieles
078dbda148 Updated FSF's address 2014-01-31 14:31:55 +01:00
Matthias Clasen
fc04275a00 Docs: don't use the type tag
Just avoid explicit docbook markup.
2014-01-31 05:58:17 -05:00
Marc-André Lureau
b2bf13ccdd gutf8: use g_try_malloc_n
As recommended by Christian Persch.

https://bugzilla.gnome.org/show_bug.cgi?id=711546
2013-12-03 15:56:47 +01:00
Marc-André Lureau
d6a19d2e76 utf8: report allocation error
Make some of the conversion functions a bit more friendly to allocation
failure.

Even though the glib policy is to abort() on allocation failure by
default, it can be quite helpful to return an allocation error for
functions already providing a GError.

I needed a safer g_utf16_to_utf8() to solve crash on big clipboard
operations with win32, related to rhbz#1017250 (and coming gdk handling
bug).

https://bugzilla.gnome.org/show_bug.cgi?id=711546
2013-11-25 12:07:57 +01:00
Christian Persch
f91ef4ef15 unicode: Allow noncharacters
Implement unicode corrigendum #9.

https://bugzilla.gnome.org/show_bug.cgi?id=694669
2013-03-05 17:27:53 +01:00
Martin Pitt
b81d788652 Fix g_utf8_validate() out argument transfer mode
The "end" argument is unusual in g_utf8_validate(): it's not a classic out
argument which gets allocated by the called function, but merely points into
one of its input arguments. Thus it is "transfer none".

https://bugzilla.gnome.org/show_bug.cgi?id=672889
2012-06-18 07:39:23 +02:00
Dan Winship
7cadf4f15f g_utf8_validate: @str shouldn't end up annotated as utf8
In order for this function to have any point, it has to be possible to
pass non-UTF-8 data to it, so annotate @str as being array-of-guint8
instead of utf8.

https://bugzilla.gnome.org/show_bug.cgi?id=672548
2012-05-18 12:36:12 -04:00
Robert Ancell
4143842eb4 Add missing allow-none annotations for function parameters.
Found using:
find . -name '*.c' | xargs grep 'or %NULL' | grep ' \* @' | grep -v '@error' | grep -v allow-none
2012-03-31 20:34:28 +11:00
Matthias Clasen
1b919d2e56 Clarify g_utf8_strlen docs a bit 2012-01-04 00:10:11 -05:00
Benjamin Otte
c4fc258424 docs: Clarify non-NUL requirement in g_utf8_validate()
UTF8 validation is not about your character on a dating site, so don't
talk about meeting.

https://bugzilla.gnome.org/show_bug.cgi?id=666803
2011-12-24 14:26:24 +01:00
Matthias Clasen
0589f715e5 Move charset and locale name functions to their own files
They did not really belong into either gutils or gutf8.
2011-10-16 18:40:58 -04:00
Matthias Clasen
d0bb1e0b0a Move g_get_codeset next to g_get_charset
g_get_codeset is a close relatove to g_get_charset, and up to now
it lived a shadowy existence without any header presence.
2011-10-15 23:27:28 -04:00
Matthias Clasen
34ce4dd032 Replace static privates by privates
GStaticPrivate is heading for deprecation soon, and GPrivate
can replace these uses now.
2011-10-02 22:11:33 -04:00
Ryan Lortie
f1494c156d Clean up l10n threading stuff
Remove the explicit thread initialisation functions for g_get_charset(),
g_get_filename_charsets() and g_get_language_names().

Add a lock around one remaining case of access to libcharset (the other
2 cases already have the lock).

Do a proper g_once_init_enter() style initialisation for the GLib
gettext functions.

https://bugzilla.gnome.org/show_bug.cgi?id=658683
2011-09-09 19:50:55 -04:00
Matthias Clasen
1b28408b8b Spelling fixes
Spelling fixes in comments and docs, provided by
Kjartan Maraas in bug 657336.
2011-08-29 14:49:32 -04:00
Cosimo Cecchi
82a0733751 utf8: annotate the end pointer in g_utf8_validate as out + allow-none 2011-07-26 16:44:18 +02:00
Matthias Clasen
9eb65dd3ed Unicode: add a g_utf8_substring convenience api
This function is useful in the GTK+ accessibility implementations,
and seems like a nice thing to have around in general.
2011-06-23 21:31:40 -04:00
Christian Persch
f9cec26968 Clarify nul-termination of g_utf8_to_ucs4[_fast] result
The docs for g_utf8_to_ucs4_fast didn't mention that the resulting
string is terminated by a 0 character.

Bug #652897.
2011-06-19 13:14:39 +02:00
Ryan Lortie
8073759f8c Remove all uses of G_CONST_RETURN
Just use 'const'.

https://bugzilla.gnome.org/show_bug.cgi?id=644611
2011-06-09 11:15:40 -04:00