mirror of
https://gitlab.gnome.org/GNOME/glib.git
synced 2025-01-27 14:36:16 +01:00
gconvert: Fix error handling for g_iconv() with unrepresentable chars
The behaviour of upstream iconv() when faced with a character which is valid in the input encoding, but not representable in the output encoding, is implementation defined: http://pubs.opengroup.org/onlinepubs/9699919799/ Specifically: If iconv() encounters a character in the input buffer that is valid, but for which an identical character does not exist in the target codeset, iconv() shall perform an implementation-defined conversion on this character. This behaviour was being exposed in our g_iconv() wrapper and also in g_convert_with_iconv() — but users of g_convert_with_iconv() (both the GLib unit tests, and the implementation of g_convert_with_fallback()) were assuming that iconv() would return EILSEQ if faced with an unrepresentable character. On platforms like NetBSD, this is not the case: NetBSD’s iconv() finishes the conversion successfully, and outputs a string containing replacement characters. It signals those replacements in its return value from iconv(), which is positive (specifically, non-zero) in such a case. Let’s codify the existing assumed behaviour of g_convert_with_iconv(), documenting that it will return G_CONVERT_ERROR_INVALID_SEQUENCE if faced with an unrepresentable character. As g_iconv() is a thin wrapper around iconv(), leave the behaviour there implementation-defined (but document it as such). Signed-off-by: Philip Withnall <withnall@endlessm.com> https://bugzilla.gnome.org/show_bug.cgi?id=790698
This commit is contained in:
parent
a19eed4691
commit
8abf3a04e6
@ -264,6 +264,13 @@ g_iconv_open (const gchar *to_codeset,
|
|||||||
* GLib provides g_convert() and g_locale_to_utf8() which are likely
|
* GLib provides g_convert() and g_locale_to_utf8() which are likely
|
||||||
* more convenient than the raw iconv wrappers.
|
* more convenient than the raw iconv wrappers.
|
||||||
*
|
*
|
||||||
|
* Note that the behaviour of iconv() for characters which are valid in the
|
||||||
|
* input character set, but which have no representation in the output character
|
||||||
|
* set, is implementation defined. This function may return success (with a
|
||||||
|
* positive number of non-reversible conversions as replacement characters were
|
||||||
|
* used), or it may return -1 and set an error such as %EILSEQ, in such a
|
||||||
|
* situation.
|
||||||
|
*
|
||||||
* Returns: count of non-reversible conversions, or -1 on error
|
* Returns: count of non-reversible conversions, or -1 on error
|
||||||
**/
|
**/
|
||||||
gsize
|
gsize
|
||||||
@ -371,6 +378,14 @@ close_converter (GIConv cd)
|
|||||||
* character until it knows that the next character is not a mark that
|
* character until it knows that the next character is not a mark that
|
||||||
* could combine with the base character.)
|
* could combine with the base character.)
|
||||||
*
|
*
|
||||||
|
* Characters which are valid in the input character set, but which have no
|
||||||
|
* representation in the output character set will result in a
|
||||||
|
* %G_CONVERT_ERROR_ILLEGAL_SEQUENCE error. This is in contrast to the iconv()
|
||||||
|
* specification, which leaves this behaviour implementation defined. Note that
|
||||||
|
* this is the same error code as is returned for an invalid byte sequence in
|
||||||
|
* the input character set. To get defined behaviour for conversion of
|
||||||
|
* unrepresentable characters, use g_convert_with_fallback().
|
||||||
|
*
|
||||||
* Returns: If the conversion was successful, a newly allocated
|
* Returns: If the conversion was successful, a newly allocated
|
||||||
* nul-terminated string, which must be freed with
|
* nul-terminated string, which must be freed with
|
||||||
* g_free(). Otherwise %NULL and @error will be set.
|
* g_free(). Otherwise %NULL and @error will be set.
|
||||||
@ -449,6 +464,13 @@ g_convert_with_iconv (const gchar *str,
|
|||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
else if (err > 0)
|
||||||
|
{
|
||||||
|
/* @err gives the number of replacement characters used. */
|
||||||
|
g_set_error_literal (error, G_CONVERT_ERROR, G_CONVERT_ERROR_ILLEGAL_SEQUENCE,
|
||||||
|
_("Unrepresentable character in conversion input"));
|
||||||
|
have_error = TRUE;
|
||||||
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
if (!reset)
|
if (!reset)
|
||||||
|
@ -37,7 +37,9 @@ G_BEGIN_DECLS
|
|||||||
* GConvertError:
|
* GConvertError:
|
||||||
* @G_CONVERT_ERROR_NO_CONVERSION: Conversion between the requested character
|
* @G_CONVERT_ERROR_NO_CONVERSION: Conversion between the requested character
|
||||||
* sets is not supported.
|
* sets is not supported.
|
||||||
* @G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input.
|
* @G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input;
|
||||||
|
* or the character sequence could not be represented in the target
|
||||||
|
* character set.
|
||||||
* @G_CONVERT_ERROR_FAILED: Conversion failed for some reason.
|
* @G_CONVERT_ERROR_FAILED: Conversion failed for some reason.
|
||||||
* @G_CONVERT_ERROR_PARTIAL_INPUT: Partial character sequence at end of input.
|
* @G_CONVERT_ERROR_PARTIAL_INPUT: Partial character sequence at end of input.
|
||||||
* @G_CONVERT_ERROR_BAD_URI: URI is invalid.
|
* @G_CONVERT_ERROR_BAD_URI: URI is invalid.
|
||||||
|
Loading…
Reference in New Issue
Block a user