Commit Graph

64 Commits

Author SHA1 Message Date
Sebastian Wilhelmi
21b45d6ac2 guri: Improve performance of remove_dot_segments() algorithm 2021-11-17 15:20:28 +00:00
Carlos Garcia Campos
7e428aa4e5 guri: always apply the remove dot segments algorithm
And not only when g_uri_parse_relative() is called with a base uri. This
follows the spec and it's compatible with SoupURI.

Fixes #2342
2021-05-05 15:13:16 +02:00
Carlos Garcia Campos
5221b6a261 guri: Mark g_uri_get_host as nullable
It's currently annotated as not nullable, but it can be NULL.
2021-02-03 09:47:30 +00:00
Philip Withnall
95c19181ae guri: Correctly set an error when parsing an invalid hostname
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
2020-12-04 13:54:27 +00:00
Carlos Garcia Campos
fb838bf3f6 guri: apply scheme normalization flag consistently
For URIs produced in string form, the path should be normalized and port
omitted when the default one is used. When querying the path and port of
a GUri (using getters or g_uri_split()) the normalized path and the
default port should be returned when they were omitted in the parsed URI.

Closes #2257
2020-11-24 14:35:19 +01:00
Emmanuel Fleury
f5b2b8132d Fix signedness warning in glib/guri.c
glib/guri.c: In function ‘should_normalize_empty_path’:
glib/guri.c:756:17: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’
  756 |   for (i = 0; i < G_N_ELEMENTS (schemes); ++i)
      |                 ^
2020-11-17 14:12:53 +01:00
Patrick Griffis
9da213ea34 docs: Add note about uri normalization for equality 2020-11-06 15:32:17 -06:00
Patrick Griffis
64f478dca3 guri: Add G_URI_FLAGS_SCHEME_NORMALIZE
This flag enables optional scheme-defined normalization
during parsing of a URI.
2020-11-06 15:32:17 -06:00
Patrick Griffis
482e10d3bb guri: Normalize uri segments if they are encoded
This changes it so when a segment is encoded it will be
normalized at parse time which ensures its valid and
it can more easily be compared with other uris.
2020-11-04 10:55:04 -06:00
Marc-André Lureau
2306f96fb0 uri: add missing (not)nullable annotations
As suggested by Sebastian Dröge:
https://github.com/gtk-rs/glib/pull/697#pullrequestreview-505797722

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2020-10-13 09:59:28 +03:00
Philip Withnall
a4cba75581 guri: Fix UTF-8 validation when escaping URI components
The return value from `g_utf8_get_char_validated()` is a `gunichar`,
which is unsigned, so comparing it with `> 0` is always going to return
true, even for return values `(gunichar) -1` and `(gunichar) -2`, which
indicate errors.

Handle them more explicitly.

oss-fuzz#26083

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
2020-10-05 13:53:02 +01:00
Philip Withnall
b43fb9cbfb guri: Fix URI scope parsing
The previous parsing code could read off the end of a URI if it had an
incorrect %-escaped character in.

Fix that, and more closely implement parsing for the syntax defined in
RFC 6874, which is the amendment to RFC 3986 which specifies zone ID
syntax.

This requires reworking some network-address tests, which were
previously treating zone IDs incorrectly.

oss-fuzz#23816

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
2020-09-30 19:39:30 +01:00
Philip Withnall
7e400a886e guri: Refactor error handling in parse_ip_literal()
Having the goto labels at the bottom of a function makes things a little
more readable. This introduces no functional changes.

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
2020-09-30 19:39:09 +01:00
Philip Withnall
17a53f2fc7 guri: Simplify memory management in parse_host()
This introduces no functional changes, but makes the memory ownership a
little clearer and reduces the length of the code.

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
2020-09-30 17:46:27 +01:00
Philip Withnall
58fce4b92b guri: Move IP-literal parsing out into a separate function
This introduces no functional changes, but will make future changes to
the code a little cleaner.

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
2020-09-30 17:45:19 +01:00
Philip Withnall
f19cb44b98 guri: Remove unnecessary NULL pointer check
`uri` is always non-`NULL` by the time the `fail` label is reached, so
drop the `NULL` pointer check. Inline the `fail` code since it’s only
used from two places.

Coverity CID: #1430970
Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-09-08 09:59:24 +01:00
Patrick Griffis
8b319a687b guri: Fix user passed to g_uri_split_with_user() not being NULL'd 2020-09-02 15:43:58 -05:00
Philip Withnall
0caa00b349 guri: Add an assertion to help static analysis
This might eliminate some false positives being thrown by Coverity to
do with the return value of `uri_decoder()` and whether it’s allocated
anything.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-18 09:33:42 +01:00
Philip Withnall
b654eb1846 guri: Make G_URI_FLAGS_PARSE_STRICT the default
Make `G_URI_FLAGS_PARSE_RELAXED` available instead, for the
implementations which need to handle user-provided or incorrect URIs.
The default should nudge people towards being compliant with RFC 3986.

This required also adding a new `G_URI_PARAMS_PARSE_RELAXED` flag, as
previously parsing param strings *always* used relaxed mode and there
was no way to control it. Now it defaults to using strict mode, and the
new flag allows for relaxed mode to be enabled if needed.

Signed-off-by: Philip Withnall <withnall@endlessm.com>

Fixes: #2149
2020-08-07 14:02:18 +01:00
Philip Withnall
943b1e45ab guri: Don’t fail g_uri_is_valid() if URI is missing a hostname
According to my reading of
https://tools.ietf.org/html/rfc3986#section-4, the only requirement for
a URI to be ‘absolute’ (actually, not a relative reference) is for the
scheme to be specified. A hostname doesn’t have to be specified: see any
of the options in the `hier-part` production in
https://tools.ietf.org/html/rfc3986#appendix-A which don’t include
`authority`.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-07 14:02:18 +01:00
Philip Withnall
b5c59cc3fc guri: Use gssize for array/string lengths
This reduces the possibility for overflow, and makes the code a little
more conventional to read.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-07 14:02:18 +01:00
Philip Withnall
e446c3487b guri: Change type of g_uri_escape_bytes() to use guint8
`guint8` is the conventional way in modern GLib APIs to represent ‘a byte
which could contain arbitrary binary’. `guchar` is not advised for that
(even though it’s equivalent) because it could be misread as `gchar`.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-07 14:02:18 +01:00
Philip Withnall
ceda9755de guri: Clear return values on error from g_uri_params_iter_next()
This reduces the chance of the caller accidentally double-freeing or
use-after-free-ing something.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-07 14:02:18 +01:00
Philip Withnall
83597b9e57 guri: Use NONE values of flags rather than 0
This introduces no functional changes, but makes the code a little
easier to read.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-07 14:02:18 +01:00
Philip Withnall
41a21c3566 guri: Add links to RFC 3986 in code comments
This should make the RFC easier to refer to in future.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-07 14:02:18 +01:00
Philip Withnall
f873b88f89 guri: Add G_URI_HIDE_QUERY
Sometimes there are sensitive details in URI query components, so we
should provide the option for hiding them too.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-07 14:02:18 +01:00
Philip Withnall
ae6a0ef8b8 guri: Tweak quotes in error strings
Use nice curly Unicode quotes.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-07 14:02:18 +01:00
Philip Withnall
40873f8452 guri: Use g_steal_pointer() to make ownership transfer clearer
This introduces no functional changes, just makes the code a bit easier
to read.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-07 14:02:18 +01:00
Philip Withnall
de0ebf8a5f guri: Minor code formatting fixes
Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-07 14:02:18 +01:00
Philip Withnall
1f1efbbb05 guri: Rename G_URI_ERROR_MISC to G_URI_ERROR_FAILED
This brings its naming in line with the ‘generic’ error codes in other
error domains.

This is not an API break since `GUriError` hasn’t been in a release yet.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-07 14:02:18 +01:00
Philip Withnall
ff60d2ebf5 guri: Various minor documentation tweaks and improvements
Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-07 14:02:17 +01:00
Sebastian Dröge
602b7cca33 Merge branch 'uri-path-slashes' into 'master'
guri: Document and check restrictions on path prefixes

See merge request GNOME/glib!1612
2020-08-07 12:57:17 +00:00
Philip Withnall
89cf298b19 guri: Document and check restrictions on path prefixes
RFC 3986, section 3 says:
> The scheme and path components are required, though the path may be
> empty (no characters).  When authority is present, the path must
> either be empty or begin with a slash ("/") character.  When
> authority is not present, the path cannot begin with two slash
> characters ("//").  These restrictions result in five different ABNF
> rules for a path (Section 3.3), only one of which will match any
> given URI reference.

(See https://tools.ietf.org/html/rfc3986#section-3.)

Given that those conditions are almost always going to be true, and that
typically the number and form of arguments passed to g_uri_join() will
be known at compile time, it would be unnecessarily awkward to add a
`GError` argument to g_uri_join() to detect these situations.

Instead, add precondition checks and document the restrictions.
Developers are responsible for ensuring their paths are in the right
format themselves.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-07 13:24:12 +01:00
Philip Withnall
623cb10f55 guri: Always prepend // to the host when building a URI
This is needed to distinguish the host (‘authority’ in the terms of RFC
3986) from a path if a scheme is not present.

It can be seen from the grammar in RFC 3986
(https://tools.ietf.org/html/rfc3986#appendix-A) that `authority` only
ever appears after `"//"`.

Spotted by Simon McVittie in
https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1606#note_884893.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-08-07 12:51:31 +01:00
Felix Yan
161168c672 Fix multiple typos in guri.c 2020-08-06 14:18:01 +00:00
Marc-André Lureau
0ba7ebfda9 uri: allow to join a partial URI, without scheme
Fixes: #2166

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2020-08-05 22:40:08 +04:00
Philip Withnall
bb1df0e515 Merge branch 'uri-params-iter' into 'master'
Add GUriParamsIter

See merge request GNOME/glib!1572
2020-08-05 16:07:42 +00:00
Philip Withnall
df8dc7fc38 Merge branch 'guri-gio' into 'master'
Replace _g_uri_parse_authority() with GUri

Closes #2156

See merge request GNOME/glib!1567
2020-08-05 16:06:02 +00:00
Marc-André Lureau
5767eef895 uri: add GUriParamsIter
See also:
https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1328#note_863735
2020-08-04 20:10:57 +04:00
Sebastian Dröge
50343afb6e Merge branch 'uri-userinfo-enc' into 'master'
uri: do not encode ':' and ';' from userinfo

See merge request GNOME/glib!1600
2020-08-04 13:33:18 +00:00
Marc-André Lureau
ef173e2e75 uri: do not encode ':' and ';' from userinfo
The g_uri_join_internal() function was making a simplification that
userinfo can be encoded with the same restricted character set as the
user field alone, fix this by allowing the correct character set.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2020-08-04 17:10:06 +04:00
Marc-André Lureau
4c20ea477c uri: always add G_URI_FLAGS_HAS_PASSWORD with build_with_user()
Otherwise, the to_string() encoding will not be reversible. Furthermore,
if no distinction is needed in the first place, g_uri_build() with
userinfo should be used instead.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2020-08-04 17:10:06 +04:00
Marc-André Lureau
b0f9af0e1d uri: do not encode userinfo fields
g_uri_build_with_user() builds a userinfo, but it shouldn't encode it
itself, but let the user flags declare what's there. Otherwise,
to_string() code paths may encode a second time.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2020-08-04 17:03:30 +04:00
Marc-André Lureau
c9c349aeaa uri: add ENCODED_PATH & ENCODED_FRAGMENT flags
Add encoded flags, similar to what was done in commit 7bee36b4 ("uri:
add G_FLAGS_ENCODED_QUERY").

SoupURI has manual handling of encoded path & fragment, but it can rely
on GUri decoding for the rest.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2020-08-04 16:56:19 +04:00
Marc-André Lureau
20ae4b46d4 uri: do not add ipv6 brackets on non-ip host
The heuristic is a bit too agressive, as we may have hostname with
%-encoded ':' (as shown in GVfs URI tests).

Add an extra test to check :-decoding as well.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2020-07-30 21:37:34 +04:00
Marc-André Lureau
82ad7853ba uri: change g_uri_is_valid() to check absolute URI
g_uri_is_valid() should check the given URI is valid following RFC-3986,
and reject relative references.

Fixes: https://gitlab.gnome.org/GNOME/glib/-/issues/2169

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2020-07-28 15:44:43 +04:00
Marc-André Lureau
44d4640c47 uri: rename absolute & relative uri_string to uri_ref
Let's reserve the term URI for absolute URIs, following rfc3986
terminology.

See:
https://gitlab.gnome.org/GNOME/glib/-/issues/2169

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2020-07-28 15:07:08 +04:00
Marc-André Lureau
d625a29b28 uri: add a comment about temporary GUri construction
As pointed out in the discussion of:
https://gitlab.gnome.org/GNOME/glib/-/issues/2169

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2020-07-28 14:54:55 +04:00
Marc-André Lureau
19c0db3185 uri: improve some documentation about absolute URIs
As pointed out in the discussion
https://gitlab.gnome.org/GNOME/glib/-/issues/2169.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2020-07-28 14:54:55 +04:00
Marc-André Lureau
3521763532 uri: add some note about the API scope
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2020-07-28 14:54:42 +04:00