This might eliminate some false positives being thrown by Coverity to
do with the return value of `uri_decoder()` and whether it’s allocated
anything.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Make `G_URI_FLAGS_PARSE_RELAXED` available instead, for the
implementations which need to handle user-provided or incorrect URIs.
The default should nudge people towards being compliant with RFC 3986.
This required also adding a new `G_URI_PARAMS_PARSE_RELAXED` flag, as
previously parsing param strings *always* used relaxed mode and there
was no way to control it. Now it defaults to using strict mode, and the
new flag allows for relaxed mode to be enabled if needed.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Fixes: #2149
According to my reading of
https://tools.ietf.org/html/rfc3986#section-4, the only requirement for
a URI to be ‘absolute’ (actually, not a relative reference) is for the
scheme to be specified. A hostname doesn’t have to be specified: see any
of the options in the `hier-part` production in
https://tools.ietf.org/html/rfc3986#appendix-A which don’t include
`authority`.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
This reduces the possibility for overflow, and makes the code a little
more conventional to read.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
`guint8` is the conventional way in modern GLib APIs to represent ‘a byte
which could contain arbitrary binary’. `guchar` is not advised for that
(even though it’s equivalent) because it could be misread as `gchar`.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
This reduces the chance of the caller accidentally double-freeing or
use-after-free-ing something.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Sometimes there are sensitive details in URI query components, so we
should provide the option for hiding them too.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
This brings its naming in line with the ‘generic’ error codes in other
error domains.
This is not an API break since `GUriError` hasn’t been in a release yet.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
RFC 3986, section 3 says:
> The scheme and path components are required, though the path may be
> empty (no characters). When authority is present, the path must
> either be empty or begin with a slash ("/") character. When
> authority is not present, the path cannot begin with two slash
> characters ("//"). These restrictions result in five different ABNF
> rules for a path (Section 3.3), only one of which will match any
> given URI reference.
(See https://tools.ietf.org/html/rfc3986#section-3.)
Given that those conditions are almost always going to be true, and that
typically the number and form of arguments passed to g_uri_join() will
be known at compile time, it would be unnecessarily awkward to add a
`GError` argument to g_uri_join() to detect these situations.
Instead, add precondition checks and document the restrictions.
Developers are responsible for ensuring their paths are in the right
format themselves.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
The test was failing since commit 20ae4b46d ("uri: do not add ipv6
brackets on non-ip host").
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
The g_uri_join_internal() function was making a simplification that
userinfo can be encoded with the same restricted character set as the
user field alone, fix this by allowing the correct character set.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Otherwise, the to_string() encoding will not be reversible. Furthermore,
if no distinction is needed in the first place, g_uri_build() with
userinfo should be used instead.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
g_uri_build_with_user() builds a userinfo, but it shouldn't encode it
itself, but let the user flags declare what's there. Otherwise,
to_string() code paths may encode a second time.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Add encoded flags, similar to what was done in commit 7bee36b4 ("uri:
add G_FLAGS_ENCODED_QUERY").
SoupURI has manual handling of encoded path & fragment, but it can rely
on GUri decoding for the rest.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
GTK ci rightly complains about this when ti builds
GLib as a subproject with -Werror=maybe-uninitialized.
subkey_dynamic_w will be freed without being initialized
when the first goto is taken.
The heuristic is a bit too agressive, as we may have hostname with
%-encoded ':' (as shown in GVfs URI tests).
Add an extra test to check :-decoding as well.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Add a few ipv6 scope parsing corner test cases.
- checking incorrect scoped IPv6 ending with only %25 isn't decoded.
- checking valid scoped IPv6 is passing g_uri_is_valid()
As discussed in
https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1567#note_860499,
for historical reasons, GUri accepts the % preceding the zone-id in the
unescaped form as well.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
g_uri_is_valid() should check the given URI is valid following RFC-3986,
and reject relative references.
Fixes: https://gitlab.gnome.org/GNOME/glib/-/issues/2169
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Just to make it a little more obvious that a thread pool can be
initialised with one thread per logical CPU.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Fixes: #272
Improve formatting, and clarify that the same *type* of test fixture can
be reused, not the same specific instance of a test fixture.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Fixes: #250
It may be defined by the environment (we document that as being allowed)
— if so, individual files should not try to redefine it, as that causes
a preprocessor warning.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Apparently, the function calls added in 11729cdc use Foundation, not
Cocoa. Cocoa is a massive superset of Foundation, and is not available
on iOS.
Patch contributed by Jay Freeman, but without providing an e-mail
address. So the git commit cannot be attributed to him.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Fixes: #1869
Various different BSD systems use a different errno from `E_LOOP` (as
defined by POSIX and used on Linux) to indicate that a file is a symlink
when you try to `open()` it with `O_NOFOLLOW`.
Fix the code which detects this. This is a follow-up to #1302.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
These exercise all the code paths I can manage without adding a load of
machinery to inject faults into `write()`.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Helps: #1302
Where applicable. Where the current use of `g_file_set_contents()` seems
the most appropriate, leave that in place.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Helps: #1302
This is used when creating the temporary file, or new file from scratch.
I wondered about also allowing the file owner and group to be set, but
that’s not as generally applicable — if your process is operating across
multiple user IDs then it likely has some fairly OS-specific
requirements and will need tighter control of its syscalls anyway.
(Eventually, support for setting the file owner and group atomically
could be added by writing out a file using `O_TMPFILE` so it’s not
addressable, and then linking it into the file system in place of the
old file using something like `renameat2(AT_EMPTY_PATH)` or `linkat()`.
That’s currently not possible without patching the kernel with
https://marc.info/?l=linux-fsdevel&m=152472898003523&w=2, as far as I
know at the moment.)
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Fixes: #1203
This moves `write_to_temp_file()` into `g_file_set_contents_full()` and
coalesces its handling of `do_fsync` with the `rename_file()` call. It
adds support for `G_FILE_SET_CONTENTS_DURABLE` and
`G_FILE_SET_CONTENTS_NONE` — previously only
`G_FILE_SET_CONTENTS_CONSISTENT | G_FILE_SET_CONTENTS_ONLY_EXISTING` was
supported.
In the case that `G_FILE_SET_CONTENTS_CONSISTENT |
G_FILE_SET_CONTENTS_DURABLE` is set, an additional `fsync()` is now done
on the directory after renaming the temporary file.
In the case that `G_FILE_SET_CONTENTS_ONLY_EXISTING` isn’t set, the
`fsync()` after writing the temporary file will always be done (unless
the file system guarantees it never needs to be done).
In the case that only `G_FILE_SET_CONTENTS_DURABLE` is set, the
destination file will be written to directly (using this mode is not
really advised).
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Fixes: #1302
This introduces no functional changes, just makes the code a bit more
modular and reusable.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Helps: #1302
This is a new version of the g_file_set_contents() API which will allow
its safety to be controlled by some flags, allowing the user to choose
their preferred tradeoff between safety (`fsync()` calls) and speed.
Currently, the flags do nothing and the new API behaves like the old
API. This will change in the following commits.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Helps: #1302
The previous implementation of g_uri_unescape_segment() allowed non-utf8
decoded characters. uri_decoder() allows it too with FLAGS_ENCODED (I
think it's abusing a bit the user-facing flags for some internal
decoding behaviour)
However, it didn't allow \0 in the decoded string. Let's have an extra
check for that, outside of uri_decoder().
Fixes: d83d68d64c
Reported-by: Matthias Clasen <mclasen@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
`getenv()` doesn't work well on Windows, f.ex., it can't fetch env
vars set with `SetEnvironmentVariable()`. This also means that it
doesn't work at all when targeting UWP since that's the only way to
set env vars in that case.
This adds really basic validation that `GTimeZone` can successfully
parse a ‘slim’ format timezone file.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Helps: #2129
Since tzcode95f (1995), TZif files have had a trailing
TZ string, used for timestamps after the last transition.
This string is specified in Internet RFC 8536 section 3.3.
init_zone_from_iana_info has ignored this string, causing it
to mishandle timestamps past the year 2038. With zic's new -b
slim flag, init_zone_from_iana_info would even mishandle current
timestamps. Fix this by parsing the trailing TZ string and adding
its transitions.
Closes#2129
Time zone transition times can range from -167:59:59 through
+167:59:59, according to Internet RFC 8536 section 3.3.1;
this is an extension to POSIX. It is needed for proper
support of TZif version 3 files.
It's not clear to me why this argument was excluded in the first place,
and Dan doesn't remember either. At least for consistency with
unescape_string, add it.
See also:
https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1574#note_867283
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
The illegal character set used to be applied only to the decoded
characters.
Fixes: https://gitlab.gnome.org/GNOME/glib/-/issues/2160
Fixes: d83d68d64c ("guri: new URI parsing and generating functions")
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
g_assert_true(), g_assert_cmpint(), and friends, can return to the
caller if test_nonfatal_assertions is set, but this is normally not the
case. In particular, for the purposes of static analysis, we
specifically don't want to assume that they might return. Clang has an
analyzer_noreturn attribute for this exact purpose, which conveniently
already has a macro in gmacros.h.
Fixes: #1288Fixes: #1200
Creating 1000 threads with the default stack size of 8 MiB will fail on
architectures with a 32-bit address space. Move up the existing THREADS
macro and use that instead, but change its definition to 1000 if
pointers are larger than 32 bits.
Signed-off-by: Harald van Dijk <harald@gigawatt.nl>
A query string may have some '=' characters '%'-encoded that could be
split by g_uri_parse_params() incorrectly. Instead, callers should leave
the query part encoded, and let g_uri_parse_params() do the decoding.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
This is a minor convenience, to avoid caller to do further '+' decoding.
According to the W3C HTML specification, space characters are replaced
by '+': https://url.spec.whatwg.org/#urlencoded-parsing
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>