docs: Document high-level UTF-8 requirements for GLib

I’ve finally found the right place in the docs to put this stuff.

This doesn’t auto-link this section from every string in the GLib
documentation, but I think that at this point (with gtk-doc in
maintenance mode, and gi-docgen not fully applied to GLib) I don’t think
we can do any better. The perfect is the enemy of the good, and having
this stuff documented somewhere means that someone can link to it from
multiple places in future *somehow*.

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>

Fixes: #116
This commit is contained in:
Philip Withnall 2023-04-28 11:17:23 +01:00
parent c86fde7e02
commit 0e941418d1

View File

@ -30,6 +30,36 @@ to test all the allocation failure code paths.
</para>
</refsect2>
<refsect2>
<title>UTF-8 and String Encoding</title>
<para>
All GLib, GObject and GIO functions accept and return strings in
<ulink url="https://en.wikipedia.org/wiki/UTF-8">UTF-8 encoding</ulink>
unless otherwise specified.
</para>
<para>
Input strings to function calls are <emphasis>not</emphasis> checked to see if
they are valid UTF-8: it is the application developers responsibility to
validate input strings at the time of input, either at the program or library
boundary, and to only use valid UTF-8 string constants in their application.
If GLib were to UTF-8 validate all string inputs to all functions, there would
be a significant drop in performance.
</para>
<para>Similarly, output strings from functions are guaranteed to be in UTF-8,
and this does not need to be validated by the calling function. If a function
returns invalid UTF-8 (and is not documented as doing so), thats a bug.
</para>
<para>
See <link linkend='g-utf8-validate'><function>g_utf8_validate()</function></link>
and <link linkend='g-utf8-make-valid'><function>g_utf8_make_valid()</function></link>
for validating UTF-8 input.
</para>
</refsect2>
<refsect2>
<title>Threads</title>