For URIs produced in string form, the path should be normalized and port
omitted when the default one is used. When querying the path and port of
a GUri (using getters or g_uri_split()) the normalized path and the
default port should be returned when they were omitted in the parsed URI.
Closes#2257
glib/guri.c: In function ‘should_normalize_empty_path’:
glib/guri.c:756:17: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’
756 | for (i = 0; i < G_N_ELEMENTS (schemes); ++i)
| ^
This changes it so when a segment is encoded it will be
normalized at parse time which ensures its valid and
it can more easily be compared with other uris.
The return value from `g_utf8_get_char_validated()` is a `gunichar`,
which is unsigned, so comparing it with `> 0` is always going to return
true, even for return values `(gunichar) -1` and `(gunichar) -2`, which
indicate errors.
Handle them more explicitly.
oss-fuzz#26083
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
The previous parsing code could read off the end of a URI if it had an
incorrect %-escaped character in.
Fix that, and more closely implement parsing for the syntax defined in
RFC 6874, which is the amendment to RFC 3986 which specifies zone ID
syntax.
This requires reworking some network-address tests, which were
previously treating zone IDs incorrectly.
oss-fuzz#23816
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Having the goto labels at the bottom of a function makes things a little
more readable. This introduces no functional changes.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
This introduces no functional changes, but makes the memory ownership a
little clearer and reduces the length of the code.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
This introduces no functional changes, but will make future changes to
the code a little cleaner.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
`uri` is always non-`NULL` by the time the `fail` label is reached, so
drop the `NULL` pointer check. Inline the `fail` code since it’s only
used from two places.
Coverity CID: #1430970
Signed-off-by: Philip Withnall <withnall@endlessm.com>
This might eliminate some false positives being thrown by Coverity to
do with the return value of `uri_decoder()` and whether it’s allocated
anything.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Make `G_URI_FLAGS_PARSE_RELAXED` available instead, for the
implementations which need to handle user-provided or incorrect URIs.
The default should nudge people towards being compliant with RFC 3986.
This required also adding a new `G_URI_PARAMS_PARSE_RELAXED` flag, as
previously parsing param strings *always* used relaxed mode and there
was no way to control it. Now it defaults to using strict mode, and the
new flag allows for relaxed mode to be enabled if needed.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Fixes: #2149
According to my reading of
https://tools.ietf.org/html/rfc3986#section-4, the only requirement for
a URI to be ‘absolute’ (actually, not a relative reference) is for the
scheme to be specified. A hostname doesn’t have to be specified: see any
of the options in the `hier-part` production in
https://tools.ietf.org/html/rfc3986#appendix-A which don’t include
`authority`.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
This reduces the possibility for overflow, and makes the code a little
more conventional to read.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
`guint8` is the conventional way in modern GLib APIs to represent ‘a byte
which could contain arbitrary binary’. `guchar` is not advised for that
(even though it’s equivalent) because it could be misread as `gchar`.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
This reduces the chance of the caller accidentally double-freeing or
use-after-free-ing something.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Sometimes there are sensitive details in URI query components, so we
should provide the option for hiding them too.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
This brings its naming in line with the ‘generic’ error codes in other
error domains.
This is not an API break since `GUriError` hasn’t been in a release yet.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
RFC 3986, section 3 says:
> The scheme and path components are required, though the path may be
> empty (no characters). When authority is present, the path must
> either be empty or begin with a slash ("/") character. When
> authority is not present, the path cannot begin with two slash
> characters ("//"). These restrictions result in five different ABNF
> rules for a path (Section 3.3), only one of which will match any
> given URI reference.
(See https://tools.ietf.org/html/rfc3986#section-3.)
Given that those conditions are almost always going to be true, and that
typically the number and form of arguments passed to g_uri_join() will
be known at compile time, it would be unnecessarily awkward to add a
`GError` argument to g_uri_join() to detect these situations.
Instead, add precondition checks and document the restrictions.
Developers are responsible for ensuring their paths are in the right
format themselves.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
The g_uri_join_internal() function was making a simplification that
userinfo can be encoded with the same restricted character set as the
user field alone, fix this by allowing the correct character set.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Otherwise, the to_string() encoding will not be reversible. Furthermore,
if no distinction is needed in the first place, g_uri_build() with
userinfo should be used instead.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
g_uri_build_with_user() builds a userinfo, but it shouldn't encode it
itself, but let the user flags declare what's there. Otherwise,
to_string() code paths may encode a second time.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Add encoded flags, similar to what was done in commit 7bee36b4 ("uri:
add G_FLAGS_ENCODED_QUERY").
SoupURI has manual handling of encoded path & fragment, but it can rely
on GUri decoding for the rest.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
The heuristic is a bit too agressive, as we may have hostname with
%-encoded ':' (as shown in GVfs URI tests).
Add an extra test to check :-decoding as well.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
g_uri_is_valid() should check the given URI is valid following RFC-3986,
and reject relative references.
Fixes: https://gitlab.gnome.org/GNOME/glib/-/issues/2169
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>