From 0630b8145d3554ad0ac3a51250612da6c061a12f Mon Sep 17 00:00:00 2001 From: Emmanuele Bassi Date: Mon, 9 Sep 2024 16:58:38 +0100 Subject: [PATCH] Collation keys are not encoded in UTF-8 The value returned when generating a collation key is an opaque binary blob that is only meant to be used for byte-wise comparisons; we should not imply it's a nul-terminated, UTF-8 string. This is especially true for language bindings that try to convert C strings returned by GLib into UTF-8 encoded strings. Ideally, the collation functions should return a byte array, but the closest thing we have is the OS native encoding type that we use for paths and environment variables. See: https://github.com/gtk-rs/gtk-rs-core/issues/1504 --- glib/gunicollate.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/glib/gunicollate.c b/glib/gunicollate.c index af4179571..3ec254d65 100644 --- a/glib/gunicollate.c +++ b/glib/gunicollate.c @@ -372,11 +372,16 @@ carbon_collate_key_for_filename (const gchar *str, * The results of comparing the collation keys of two strings * with strcmp() will always be the same as comparing the two * original keys with g_utf8_collate(). - * + * * Note that this function depends on the [current locale][setlocale]. + * + * Note that the returned string is not guaranteed to be in any + * encoding, especially UTF-8. The returned value is meant to be + * used only for comparisons. * - * Returns: a newly allocated string. This string should - * be freed with g_free() when you are done with it. + * Returns: (transfer full) (type filename): a newly allocated string. + * The contents of the string are only meant to be used when sorting. + * This string should be freed with g_free() when you are done with it. **/ gchar * g_utf8_collate_key (const gchar *str, @@ -504,8 +509,13 @@ g_utf8_collate_key (const gchar *str, * * Note that this function depends on the [current locale][setlocale]. * - * Returns: a newly allocated string. This string should - * be freed with g_free() when you are done with it. + * Note that the returned string is not guaranteed to be in any + * encoding, especially UTF-8. The returned value is meant to be + * used only for comparisons. + * + * Returns: (transfer full) (type filename): a newly allocated string. + * The contents of the string are only meant to be used when sorting. + * This string should be freed with g_free() when you are done with it. * * Since: 2.8 */