Currently we require explicitly specifying the port when configuring a
proxy server, which is seriously weird. I take the fact that nobody
reported a bug until 2022 to indicate that almost nobody is using
proxies. Whatever. Let's assume that if no port is provided, the default
port for the protocol should be used instead.
For example, you can now specify in GNOME settings that your proxy server
is https://example.com and it will work. Previously, you had to write
https://example.com:443. Yuck!
This was originally reported as GProxyResolver bug, but nothing is
actually wrong there. It's actually GProxyAddressEnumerator that gets
tripped up by URLs returned by GProxyResolver without a default port.
This breaks GSocketClient.
Fixing this requires exposing GUri's _default_scheme_port() function to
GIO. I considered copy/pasting it since it's not very much code, but I
figure the private call mechanism is probably not too expensive, and I
don't like code duplication.
Fixes#2832
This is true for socks://, socks4://, socks4a://, and socks5://. I could
list them individually and risk breaking in the future if socks6:// ever
exists, or test for "socks" and risk breaking if a future URL scheme
begins with "socks" but doesn't use port 1080. I picked the latter.
Add SPDX license (but not copyright) headers to all files which follow a
certain pattern in their existing non-machine-readable header comment.
This commit was entirely generated using the command:
```
git ls-files glib/*.[ch] | xargs perl -0777 -pi -e 's/\n \*\n \* This library is free software; you can redistribute it and\/or\n \* modify it under the terms of the GNU Lesser General Public/\n \*\n \* SPDX-License-Identifier: LGPL-2.1-or-later\n \*\n \* This library is free software; you can redistribute it and\/or\n \* modify it under the terms of the GNU Lesser General Public/igs'
```
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Helps: #1415
Rather than reallocating the string buffer a few times as more
components are added, use a default buffer size which should hopefully
accommodate most average URIs.
The buffer size is a guess and can be tweaked in future.
This has the advantage of no longer passing a potentially-`NULL`
`scheme` to `g_string_new()`, which should placate the static analysers,
which think that `g_string_new()` shouldn’t accept `NULL`.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Coverity CID: #1474691
For URIs produced in string form, the path should be normalized and port
omitted when the default one is used. When querying the path and port of
a GUri (using getters or g_uri_split()) the normalized path and the
default port should be returned when they were omitted in the parsed URI.
Closes#2257
glib/guri.c: In function ‘should_normalize_empty_path’:
glib/guri.c:756:17: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’
756 | for (i = 0; i < G_N_ELEMENTS (schemes); ++i)
| ^
This changes it so when a segment is encoded it will be
normalized at parse time which ensures its valid and
it can more easily be compared with other uris.
The return value from `g_utf8_get_char_validated()` is a `gunichar`,
which is unsigned, so comparing it with `> 0` is always going to return
true, even for return values `(gunichar) -1` and `(gunichar) -2`, which
indicate errors.
Handle them more explicitly.
oss-fuzz#26083
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
The previous parsing code could read off the end of a URI if it had an
incorrect %-escaped character in.
Fix that, and more closely implement parsing for the syntax defined in
RFC 6874, which is the amendment to RFC 3986 which specifies zone ID
syntax.
This requires reworking some network-address tests, which were
previously treating zone IDs incorrectly.
oss-fuzz#23816
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Having the goto labels at the bottom of a function makes things a little
more readable. This introduces no functional changes.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
This introduces no functional changes, but makes the memory ownership a
little clearer and reduces the length of the code.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
This introduces no functional changes, but will make future changes to
the code a little cleaner.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
`uri` is always non-`NULL` by the time the `fail` label is reached, so
drop the `NULL` pointer check. Inline the `fail` code since it’s only
used from two places.
Coverity CID: #1430970
Signed-off-by: Philip Withnall <withnall@endlessm.com>
This might eliminate some false positives being thrown by Coverity to
do with the return value of `uri_decoder()` and whether it’s allocated
anything.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Make `G_URI_FLAGS_PARSE_RELAXED` available instead, for the
implementations which need to handle user-provided or incorrect URIs.
The default should nudge people towards being compliant with RFC 3986.
This required also adding a new `G_URI_PARAMS_PARSE_RELAXED` flag, as
previously parsing param strings *always* used relaxed mode and there
was no way to control it. Now it defaults to using strict mode, and the
new flag allows for relaxed mode to be enabled if needed.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Fixes: #2149
According to my reading of
https://tools.ietf.org/html/rfc3986#section-4, the only requirement for
a URI to be ‘absolute’ (actually, not a relative reference) is for the
scheme to be specified. A hostname doesn’t have to be specified: see any
of the options in the `hier-part` production in
https://tools.ietf.org/html/rfc3986#appendix-A which don’t include
`authority`.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
This reduces the possibility for overflow, and makes the code a little
more conventional to read.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
`guint8` is the conventional way in modern GLib APIs to represent ‘a byte
which could contain arbitrary binary’. `guchar` is not advised for that
(even though it’s equivalent) because it could be misread as `gchar`.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
This reduces the chance of the caller accidentally double-freeing or
use-after-free-ing something.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Sometimes there are sensitive details in URI query components, so we
should provide the option for hiding them too.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
This brings its naming in line with the ‘generic’ error codes in other
error domains.
This is not an API break since `GUriError` hasn’t been in a release yet.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
RFC 3986, section 3 says:
> The scheme and path components are required, though the path may be
> empty (no characters). When authority is present, the path must
> either be empty or begin with a slash ("/") character. When
> authority is not present, the path cannot begin with two slash
> characters ("//"). These restrictions result in five different ABNF
> rules for a path (Section 3.3), only one of which will match any
> given URI reference.
(See https://tools.ietf.org/html/rfc3986#section-3.)
Given that those conditions are almost always going to be true, and that
typically the number and form of arguments passed to g_uri_join() will
be known at compile time, it would be unnecessarily awkward to add a
`GError` argument to g_uri_join() to detect these situations.
Instead, add precondition checks and document the restrictions.
Developers are responsible for ensuring their paths are in the right
format themselves.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
The g_uri_join_internal() function was making a simplification that
userinfo can be encoded with the same restricted character set as the
user field alone, fix this by allowing the correct character set.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Otherwise, the to_string() encoding will not be reversible. Furthermore,
if no distinction is needed in the first place, g_uri_build() with
userinfo should be used instead.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>