gutf8: Add a comment explaining the ifunc and asan annotation

Why they’re necessary, why we _think_ the optimised implementation of
`g_utf8_validate()` is OK despite what valgrind and asan are telling us,
and how they work.

Signed-off-by: Philip Withnall <pwithnall@gnome.org>

Helps: #3493
This commit is contained in:
Philip Withnall 2024-10-17 18:26:19 +01:00
parent ad572e7780
commit ec7cf334db
No known key found for this signature in database
GPG Key ID: C5C42CFB268637CA

View File

@ -1842,6 +1842,35 @@ g_utf8_validate_native (const char *str,
}
#if g_macro__has_attribute(ifunc) && !defined(G_OS_WIN32)
/* The fast implementation of UTF-8 validation in `utf8_verify()` technically
* uses undefined behaviour when the string length is not provided (i.e. when
* its looking for a trailing nul terminator): when doing word-sized reads of
* the string, it can read up to the word size (minus one byte) beyond the end
* of the string in order to find the nul terminator.
*
* While this is guaranteed to not cause a page fault (at worst, the nul
* terminator could be in the final word of the page, and the code wont read
* any further than that), it is still technically undefined behaviour in C,
* because were reading off the end of an array.
*
* We dont *think* this can cause any bugs due to compiler optimisations,
* because glibc does exactly the same thing in its string handling code, and
* that code has been extensively tested. For example:
* https://github.com/bminor/glibc/blob/2c1903cbbac0022153a67776f474c221250ad6ed/string/strchrnul.c
*
* However, both valgrind and asan warn about the read beyond the end of the
* array (a heap buffer overflow read). Theyre right to do this (they cant
* know the read is bounded to the word size minus one, and guaranteed to not
* cross a page boundary), but its annoying for any application which calls
* `g_utf8_validate()`.
*
* Use an [indirect function (`ifunc`)](https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-ifunc-function-attribute)
* to use a fallback implementation of `g_utf8_validate()` when running under
* valgrind. This is resolved at load time using `resolve_g_utf8_validate()`.
*
* Similarly, mark the real implementation so that its not instrumented by asan
* using `no_sanitize_address`.
*/
static gboolean
g_utf8_validate_valgrind (const char *str,
gssize max_len,