- Add a test for parsing FILE scheme from uri
It had taken from GST test_protocol_case
- Add a split uri test with encoded spaces in its path
It had taken from GST test_uri_get_location
- Add tests for g_uri_is_valid
It had taken from GST test_uri_misc
Note that the 4 followings uri failed under gst_uri_is_valid but not
under g_uri_is_valid
B:\\foo.txt
B:/foo.txt
B://foo.txt
B:foo.txt
- Add tests for g_uri_split
It had taken from GST test_url_parsing
- Add tests for test_uri_normalize and test_uri_parsing_relative
The test URI had been taken from GST test_url_normalization
- Add tests for test_uri_iter_params
It had taken from GST test_url_unescape_equals_in_http_query
Closes#2150
Signed-off-by: Frederic Martinsons <frederic.martinsons@sigfox.com>
For URIs produced in string form, the path should be normalized and port
omitted when the default one is used. When querying the path and port of
a GUri (using getters or g_uri_split()) the normalized path and the
default port should be returned when they were omitted in the parsed URI.
Closes#2257
In file included from glib/glib.h:86,
from glib/tests/uri.c:25:
glib/gtestutils.h:134:96: error: comparison of integer expressions of different signedness: ‘gint’ {aka ‘int’} and ‘GConvertError’
134 | if (!err || (err)->domain != dom || (err)->code != c) \
| ^~
glib/tests/uri.c:182:9: note: in expansion of macro ‘g_assert_error’
182 | g_assert_error (error, G_CONVERT_ERROR, file_to_uri_tests[i].expected_error);
| ^~~~~~~~~~~~~~
glib/gtestutils.h:134:96: error: comparison of integer expressions of different signedness: ‘gint’ {aka ‘int’} and ‘GConvertError’
134 | if (!err || (err)->domain != dom || (err)->code != c) \
| ^~
glib/tests/uri.c:220:9: note: in expansion of macro ‘g_assert_error’
220 | g_assert_error (error, G_CONVERT_ERROR, file_from_uri_tests[i].expected_error);
| ^~~~~~~~~~~~~~
glib/tests/uri.c: In function ‘test_uri_parsing_absolute’:
glib/gtestutils.h:134:96: error: comparison of integer expressions of different signedness: ‘gint’ {aka ‘int’} and ‘GUriError’
134 | if (!err || (err)->domain != dom || (err)->code != c) \
| ^~
glib/tests/uri.c:790:11: note: in expansion of macro ‘g_assert_error’
790 | g_assert_error (error, G_URI_ERROR, test->expected_error_code);
| ^~~~~~~~~~~~~~
In file included from glib/glibconfig.h:9,
from glib/gtypes.h:32,
from glib/galloca.h:32,
from glib/glib.h:30,
from glib/tests/uri.c:25:
glib/tests/uri.c: In function ‘test_uri_iter_params’:
glib/tests/uri.c:1495:51: error: comparison of integer expressions of different signedness: ‘gssize’ {aka ‘const long int’} and ‘long unsigned int’
1495 | params_tests[i].expected_n_params <= G_N_ELEMENTS (params_tests[i].expected_param_key_values) / 2);
| ^~
glib/gmacros.h:941:25: note: in definition of macro ‘G_LIKELY’
941 | #define G_LIKELY(expr) (expr)
| ^~~~
glib/tests/uri.c:1494:7: note: in expansion of macro ‘g_assert’
1494 | g_assert (params_tests[i].expected_n_params < 0 ||
| ^~~~~~~~
glib/tests/uri.c: In function ‘test_uri_parse_params’:
glib/tests/uri.c:1562:51: error: comparison of integer expressions of different signedness: ‘gssize’ {aka ‘const long int’} and ‘long unsigned int’
1562 | params_tests[i].expected_n_params <= G_N_ELEMENTS (params_tests[i].expected_param_key_values) / 2);
| ^~
glib/gmacros.h:941:25: note: in definition of macro ‘G_LIKELY’
941 | #define G_LIKELY(expr) (expr)
| ^~~~
glib/tests/uri.c:1561:7: note: in expansion of macro ‘g_assert’
1561 | g_assert (params_tests[i].expected_n_params < 0 ||
| ^~~~~~~~
glib/tests/uri.c: In function ‘run_file_to_uri_tests’:
glib/tests/uri.c:172:17: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’
172 | for (i = 0; i < G_N_ELEMENTS (file_to_uri_tests); i++)
| ^
glib/tests/uri.c: In function ‘run_file_from_uri_tests’:
glib/tests/uri.c:197:17: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’
197 | for (i = 0; i < G_N_ELEMENTS (file_from_uri_tests); i++)
| ^
glib/tests/uri.c: In function ‘run_file_roundtrip_tests’:
glib/tests/uri.c:276:17: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’
276 | for (i = 0; i < G_N_ELEMENTS (file_to_uri_tests); i++)
| ^
glib/tests/uri.c: In function ‘test_uri_parse_params’:
glib/tests/uri.c:1594:25: error: comparison of integer expressions of different signedness: ‘gsize’ {aka ‘long unsigned int’} and ‘gssize’ {aka ‘const long int’}
1594 | for (j = 0; j < params_tests[i].expected_n_params; j += 2)
| ^
This changes it so when a segment is encoded it will be
normalized at parse time which ensures its valid and
it can more easily be compared with other uris.
The return value from `g_utf8_get_char_validated()` is a `gunichar`,
which is unsigned, so comparing it with `> 0` is always going to return
true, even for return values `(gunichar) -1` and `(gunichar) -2`, which
indicate errors.
Handle them more explicitly.
oss-fuzz#26083
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
This bumps the coverage of `parse_host()` and `parse_ip_literal()` up to
100% of lines and branches.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
The previous parsing code could read off the end of a URI if it had an
incorrect %-escaped character in.
Fix that, and more closely implement parsing for the syntax defined in
RFC 6874, which is the amendment to RFC 3986 which specifies zone ID
syntax.
This requires reworking some network-address tests, which were
previously treating zone IDs incorrectly.
oss-fuzz#23816
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Make `G_URI_FLAGS_PARSE_RELAXED` available instead, for the
implementations which need to handle user-provided or incorrect URIs.
The default should nudge people towards being compliant with RFC 3986.
This required also adding a new `G_URI_PARAMS_PARSE_RELAXED` flag, as
previously parsing param strings *always* used relaxed mode and there
was no way to control it. Now it defaults to using strict mode, and the
new flag allows for relaxed mode to be enabled if needed.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Fixes: #2149
According to my reading of
https://tools.ietf.org/html/rfc3986#section-4, the only requirement for
a URI to be ‘absolute’ (actually, not a relative reference) is for the
scheme to be specified. A hostname doesn’t have to be specified: see any
of the options in the `hier-part` production in
https://tools.ietf.org/html/rfc3986#appendix-A which don’t include
`authority`.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Sometimes there are sensitive details in URI query components, so we
should provide the option for hiding them too.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
This brings its naming in line with the ‘generic’ error codes in other
error domains.
This is not an API break since `GUriError` hasn’t been in a release yet.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
The test was failing since commit 20ae4b46d ("uri: do not add ipv6
brackets on non-ip host").
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
The g_uri_join_internal() function was making a simplification that
userinfo can be encoded with the same restricted character set as the
user field alone, fix this by allowing the correct character set.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Otherwise, the to_string() encoding will not be reversible. Furthermore,
if no distinction is needed in the first place, g_uri_build() with
userinfo should be used instead.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
g_uri_build_with_user() builds a userinfo, but it shouldn't encode it
itself, but let the user flags declare what's there. Otherwise,
to_string() code paths may encode a second time.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Add encoded flags, similar to what was done in commit 7bee36b4 ("uri:
add G_FLAGS_ENCODED_QUERY").
SoupURI has manual handling of encoded path & fragment, but it can rely
on GUri decoding for the rest.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
The heuristic is a bit too agressive, as we may have hostname with
%-encoded ':' (as shown in GVfs URI tests).
Add an extra test to check :-decoding as well.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Add a few ipv6 scope parsing corner test cases.
- checking incorrect scoped IPv6 ending with only %25 isn't decoded.
- checking valid scoped IPv6 is passing g_uri_is_valid()
As discussed in
https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1567#note_860499,
for historical reasons, GUri accepts the % preceding the zone-id in the
unescaped form as well.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
g_uri_is_valid() should check the given URI is valid following RFC-3986,
and reject relative references.
Fixes: https://gitlab.gnome.org/GNOME/glib/-/issues/2169
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
The previous implementation of g_uri_unescape_segment() allowed non-utf8
decoded characters. uri_decoder() allows it too with FLAGS_ENCODED (I
think it's abusing a bit the user-facing flags for some internal
decoding behaviour)
However, it didn't allow \0 in the decoded string. Let's have an extra
check for that, outside of uri_decoder().
Fixes: d83d68d64c40021be432416f9912ff9e59a337ce
Reported-by: Matthias Clasen <mclasen@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
It's not clear to me why this argument was excluded in the first place,
and Dan doesn't remember either. At least for consistency with
unescape_string, add it.
See also:
https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1574#note_867283
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
The illegal character set used to be applied only to the decoded
characters.
Fixes: https://gitlab.gnome.org/GNOME/glib/-/issues/2160
Fixes: d83d68d64c40 ("guri: new URI parsing and generating functions")
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
A query string may have some '=' characters '%'-encoded that could be
split by g_uri_parse_params() incorrectly. Instead, callers should leave
the query part encoded, and let g_uri_parse_params() do the decoding.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
This is a minor convenience, to avoid caller to do further '+' decoding.
According to the W3C HTML specification, space characters are replaced
by '+': https://url.spec.whatwg.org/#urlencoded-parsing
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
This will allow to further enhance the parsing, without breaking API,
and also makes argument on call side a bit clearer than just TRUE/FALSE.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
This should give a bit more flexibility, without drawbacks.
Many URI encoding accept either '&' or ';' as separators.
Change the documentation to reflect that '&' is probably more
common (http query string).
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
There is a limited (1 or 2 byte) read off the end of the buffer if its
final or penultimate byte is `%` and it’s not nul-terminated after that.
If the buffer *is* nul-terminated then the first `g_ascii_isxdigit()`
call safely returns `FALSE` and the code moves on.
Fix it by adding an additional check, and some unit tests to catch the
behaviour.
This bug is present in libsoup, which `GUri` is based on, but not
exploitable due to how the external API only exposes nul-terminated
strings. See https://gitlab.gnome.org/GNOME/libsoup/-/merge_requests/126
for the fix there.
oss-fuzz#23815
oss-fuzz#23818
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Modify the existing test function to run each test twice: once
nul-terminated and once with a length specified.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
This introduces no functional changes, but will make it easier to add
more tests in future.
It splits the unescaping tests out so the different types of unescaping
(string, bytes, segment) are tested separately, since they have
different limitations.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Modify the existing test function to run each test twice: once
nul-terminated and once with a length specified.
Signed-off-by: Philip Withnall <withnall@endlessm.com>