mirror of
https://gitlab.gnome.org/GNOME/glib.git
synced 2024-11-10 11:26:16 +01:00
gconvert: Fix error handling for g_iconv() with unrepresentable chars
The behaviour of upstream iconv() when faced with a character which is valid in the input encoding, but not representable in the output encoding, is implementation defined: http://pubs.opengroup.org/onlinepubs/9699919799/ Specifically: If iconv() encounters a character in the input buffer that is valid, but for which an identical character does not exist in the target codeset, iconv() shall perform an implementation-defined conversion on this character. This behaviour was being exposed in our g_iconv() wrapper and also in g_convert_with_iconv() — but users of g_convert_with_iconv() (both the GLib unit tests, and the implementation of g_convert_with_fallback()) were assuming that iconv() would return EILSEQ if faced with an unrepresentable character. On platforms like NetBSD, this is not the case: NetBSD’s iconv() finishes the conversion successfully, and outputs a string containing replacement characters. It signals those replacements in its return value from iconv(), which is positive (specifically, non-zero) in such a case. Let’s codify the existing assumed behaviour of g_convert_with_iconv(), documenting that it will return G_CONVERT_ERROR_INVALID_SEQUENCE if faced with an unrepresentable character. As g_iconv() is a thin wrapper around iconv(), leave the behaviour there implementation-defined (but document it as such). Signed-off-by: Philip Withnall <withnall@endlessm.com> https://bugzilla.gnome.org/show_bug.cgi?id=790698
This commit is contained in:
parent
a19eed4691
commit
8abf3a04e6
@ -264,6 +264,13 @@ g_iconv_open (const gchar *to_codeset,
|
||||
* GLib provides g_convert() and g_locale_to_utf8() which are likely
|
||||
* more convenient than the raw iconv wrappers.
|
||||
*
|
||||
* Note that the behaviour of iconv() for characters which are valid in the
|
||||
* input character set, but which have no representation in the output character
|
||||
* set, is implementation defined. This function may return success (with a
|
||||
* positive number of non-reversible conversions as replacement characters were
|
||||
* used), or it may return -1 and set an error such as %EILSEQ, in such a
|
||||
* situation.
|
||||
*
|
||||
* Returns: count of non-reversible conversions, or -1 on error
|
||||
**/
|
||||
gsize
|
||||
@ -371,6 +378,14 @@ close_converter (GIConv cd)
|
||||
* character until it knows that the next character is not a mark that
|
||||
* could combine with the base character.)
|
||||
*
|
||||
* Characters which are valid in the input character set, but which have no
|
||||
* representation in the output character set will result in a
|
||||
* %G_CONVERT_ERROR_ILLEGAL_SEQUENCE error. This is in contrast to the iconv()
|
||||
* specification, which leaves this behaviour implementation defined. Note that
|
||||
* this is the same error code as is returned for an invalid byte sequence in
|
||||
* the input character set. To get defined behaviour for conversion of
|
||||
* unrepresentable characters, use g_convert_with_fallback().
|
||||
*
|
||||
* Returns: If the conversion was successful, a newly allocated
|
||||
* nul-terminated string, which must be freed with
|
||||
* g_free(). Otherwise %NULL and @error will be set.
|
||||
@ -449,6 +464,13 @@ g_convert_with_iconv (const gchar *str,
|
||||
break;
|
||||
}
|
||||
}
|
||||
else if (err > 0)
|
||||
{
|
||||
/* @err gives the number of replacement characters used. */
|
||||
g_set_error_literal (error, G_CONVERT_ERROR, G_CONVERT_ERROR_ILLEGAL_SEQUENCE,
|
||||
_("Unrepresentable character in conversion input"));
|
||||
have_error = TRUE;
|
||||
}
|
||||
else
|
||||
{
|
||||
if (!reset)
|
||||
|
@ -37,7 +37,9 @@ G_BEGIN_DECLS
|
||||
* GConvertError:
|
||||
* @G_CONVERT_ERROR_NO_CONVERSION: Conversion between the requested character
|
||||
* sets is not supported.
|
||||
* @G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input.
|
||||
* @G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input;
|
||||
* or the character sequence could not be represented in the target
|
||||
* character set.
|
||||
* @G_CONVERT_ERROR_FAILED: Conversion failed for some reason.
|
||||
* @G_CONVERT_ERROR_PARTIAL_INPUT: Partial character sequence at end of input.
|
||||
* @G_CONVERT_ERROR_BAD_URI: URI is invalid.
|
||||
|
Loading…
Reference in New Issue
Block a user