gconvert: Fix error handling for g_iconv() with unrepresentable chars

The behaviour of upstream iconv() when faced with a character which is
valid in the input encoding, but not representable in the output
encoding, is implementation defined:

http://pubs.opengroup.org/onlinepubs/9699919799/

Specifically:

   If iconv() encounters a character in the input buffer that is valid,
   but for which an identical character does not exist in the target
   codeset, iconv() shall perform an implementation-defined conversion
   on this character.

This behaviour was being exposed in our g_iconv() wrapper and also in
g_convert_with_iconv() — but users of g_convert_with_iconv() (both the
GLib unit tests, and the implementation of g_convert_with_fallback())
were assuming that iconv() would return EILSEQ if faced with an
unrepresentable character.

On platforms like NetBSD, this is not the case: NetBSD’s iconv()
finishes the conversion successfully, and outputs a string containing
replacement characters. It signals those replacements in its return
value from iconv(), which is positive (specifically, non-zero) in such a
case.

Let’s codify the existing assumed behaviour of g_convert_with_iconv(),
documenting that it will return G_CONVERT_ERROR_INVALID_SEQUENCE if
faced with an unrepresentable character. As g_iconv() is a thin wrapper
around iconv(), leave the behaviour there implementation-defined (but
document it as such).

Signed-off-by: Philip Withnall <withnall@endlessm.com>

https://bugzilla.gnome.org/show_bug.cgi?id=790698
This commit is contained in:
Philip Withnall 2018-01-22 12:50:15 +00:00
parent a19eed4691
commit 8abf3a04e6
2 changed files with 25 additions and 1 deletions

View File

@ -264,6 +264,13 @@ g_iconv_open (const gchar *to_codeset,
* GLib provides g_convert() and g_locale_to_utf8() which are likely * GLib provides g_convert() and g_locale_to_utf8() which are likely
* more convenient than the raw iconv wrappers. * more convenient than the raw iconv wrappers.
* *
* Note that the behaviour of iconv() for characters which are valid in the
* input character set, but which have no representation in the output character
* set, is implementation defined. This function may return success (with a
* positive number of non-reversible conversions as replacement characters were
* used), or it may return -1 and set an error such as %EILSEQ, in such a
* situation.
*
* Returns: count of non-reversible conversions, or -1 on error * Returns: count of non-reversible conversions, or -1 on error
**/ **/
gsize gsize
@ -371,6 +378,14 @@ close_converter (GIConv cd)
* character until it knows that the next character is not a mark that * character until it knows that the next character is not a mark that
* could combine with the base character.) * could combine with the base character.)
* *
* Characters which are valid in the input character set, but which have no
* representation in the output character set will result in a
* %G_CONVERT_ERROR_ILLEGAL_SEQUENCE error. This is in contrast to the iconv()
* specification, which leaves this behaviour implementation defined. Note that
* this is the same error code as is returned for an invalid byte sequence in
* the input character set. To get defined behaviour for conversion of
* unrepresentable characters, use g_convert_with_fallback().
*
* Returns: If the conversion was successful, a newly allocated * Returns: If the conversion was successful, a newly allocated
* nul-terminated string, which must be freed with * nul-terminated string, which must be freed with
* g_free(). Otherwise %NULL and @error will be set. * g_free(). Otherwise %NULL and @error will be set.
@ -449,6 +464,13 @@ g_convert_with_iconv (const gchar *str,
break; break;
} }
} }
else if (err > 0)
{
/* @err gives the number of replacement characters used. */
g_set_error_literal (error, G_CONVERT_ERROR, G_CONVERT_ERROR_ILLEGAL_SEQUENCE,
_("Unrepresentable character in conversion input"));
have_error = TRUE;
}
else else
{ {
if (!reset) if (!reset)

View File

@ -37,7 +37,9 @@ G_BEGIN_DECLS
* GConvertError: * GConvertError:
* @G_CONVERT_ERROR_NO_CONVERSION: Conversion between the requested character * @G_CONVERT_ERROR_NO_CONVERSION: Conversion between the requested character
* sets is not supported. * sets is not supported.
* @G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input. * @G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input;
* or the character sequence could not be represented in the target
* character set.
* @G_CONVERT_ERROR_FAILED: Conversion failed for some reason. * @G_CONVERT_ERROR_FAILED: Conversion failed for some reason.
* @G_CONVERT_ERROR_PARTIAL_INPUT: Partial character sequence at end of input. * @G_CONVERT_ERROR_PARTIAL_INPUT: Partial character sequence at end of input.
* @G_CONVERT_ERROR_BAD_URI: URI is invalid. * @G_CONVERT_ERROR_BAD_URI: URI is invalid.