docs: Move the GUri SECTION

Move it to the struct docs.

Signed-off-by: Philip Withnall <pwithnall@gnome.org>

Helps: #3037
This commit is contained in:
Philip Withnall 2023-11-15 16:37:36 +00:00
parent 43588fcdd9
commit be783fdc2a

View File

@ -29,23 +29,74 @@
#include "guriprivate.h"
/**
* SECTION:guri
* @short_description: URI-handling utilities
* @include: glib.h
* GUri:
*
* The #GUri type and related functions can be used to parse URIs into
* The `GUri` type and related functions can be used to parse URIs into
* their components, and build valid URIs from individual components.
*
* Note that #GUri scope is to help manipulate URIs in various applications,
* Since `GUri` only represents absolute URIs, all `GUri`s will have a
* URI scheme, so [method@GLib.Uri.get_scheme] will always return a non-`NULL`
* answer. Likewise, by definition, all URIs have a path component, so
* [method@GLib.Uri.get_path] will always return a non-`NULL` string (which may
* be empty).
*
* If the URI string has an
* [authority component](https://tools.ietf.org/html/rfc3986#section-3) (that
* is, if the scheme is followed by `://` rather than just `:`), then the
* `GUri` will contain a hostname, and possibly a port and userinfo.
* Additionally, depending on how the `GUri` was constructed/parsed (for example,
* using the `G_URI_FLAGS_HAS_PASSWORD` and `G_URI_FLAGS_HAS_AUTH_PARAMS` flags),
* the userinfo may be split out into a username, password, and
* additional authorization-related parameters.
*
* Normally, the components of a `GUri` will have all `%`-encoded
* characters decoded. However, if you construct/parse a `GUri` with
* `G_URI_FLAGS_ENCODED`, then the `%`-encoding will be preserved instead in
* the userinfo, path, and query fields (and in the host field if also
* created with `G_URI_FLAGS_NON_DNS`). In particular, this is necessary if
* the URI may contain binary data or non-UTF-8 text, or if decoding
* the components might change the interpretation of the URI.
*
* For example, with the encoded flag:
*
* ```c
* g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http%3A%2F%2Fhost%2Fpath%3Fparam%3Dvalue", G_URI_FLAGS_ENCODED, &err);
* g_assert_cmpstr (g_uri_get_query (uri), ==, "query=http%3A%2F%2Fhost%2Fpath%3Fparam%3Dvalue");
* ```
*
* While the default `%`-decoding behaviour would give:
*
* ```c
* g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http%3A%2F%2Fhost%2Fpath%3Fparam%3Dvalue", G_URI_FLAGS_NONE, &err);
* g_assert_cmpstr (g_uri_get_query (uri), ==, "query=http://host/path?param=value");
* ```
*
* During decoding, if an invalid UTF-8 string is encountered, parsing will fail
* with an error indicating the bad string location:
*
* ```c
* g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http%3A%2F%2Fhost%2Fpath%3Fbad%3D%00alue", G_URI_FLAGS_NONE, &err);
* g_assert_error (err, G_URI_ERROR, G_URI_ERROR_BAD_QUERY);
* ```
*
* You should pass `G_URI_FLAGS_ENCODED` or `G_URI_FLAGS_ENCODED_QUERY` if you
* need to handle that case manually. In particular, if the query string
* contains `=` characters that are `%`-encoded, you should let
* [func@GLib.Uri.parse_params] do the decoding once of the query.
*
* `GUri` is immutable once constructed, and can safely be accessed from
* multiple threads. Its reference counting is atomic.
*
* Note that the scope of `GUri` is to help manipulate URIs in various applications,
* following [RFC 3986](https://tools.ietf.org/html/rfc3986). In particular,
* it doesn't intend to cover web browser needs, and doesn't implement the
* it doesn't intend to cover web browser needs, and doesnt implement the
* [WHATWG URL](https://url.spec.whatwg.org/) standard. No APIs are provided to
* help prevent
* [homograph attacks](https://en.wikipedia.org/wiki/IDN_homograph_attack), so
* #GUri is not suitable for formatting URIs for display to the user for making
* `GUri` is not suitable for formatting URIs for display to the user for making
* security-sensitive decisions.
*
* ## Relative and absolute URIs # {#relative-absolute-uris}
* ## Relative and absolute URIs
*
* As defined in [RFC 3986](https://tools.ietf.org/html/rfc3986#section-4), the
* hierarchical nature of URIs means that they can either be relative
@ -65,145 +116,85 @@
*
* Absolute URIs have a scheme specified. Any other components of the URI which
* are missing are specified as explicitly unset in the URI, rather than being
* resolved relative to a base URI using g_uri_parse_relative().
* resolved relative to a base URI using [method@GLib.Uri.parse_relative].
*
* For example, a valid absolute URI is `file:///home/bob` or
* `https://search.com?query=string`.
*
* A #GUri instance is always an absolute URI. A string may be an absolute URI
* A `GUri` instance is always an absolute URI. A string may be an absolute URI
* or a relative reference; see the documentation for individual functions as to
* what forms they accept.
*
* ## Parsing URIs
*
* The most minimalist APIs for parsing URIs are g_uri_split() and
* g_uri_split_with_user(). These split a URI into its component
* The most minimalist APIs for parsing URIs are [func@GLib.Uri.split] and
* [func@GLib.Uri.split_with_user]. These split a URI into its component
* parts, and return the parts; the difference between the two is that
* g_uri_split() treats the userinfo component of the URI as a
* single element, while g_uri_split_with_user() can (depending on the
* #GUriFlags you pass) treat it as containing a username, password,
* and authentication parameters. Alternatively, g_uri_split_network()
* [func@GLib.Uri.split] treats the userinfo component of the URI as a
* single element, while [func@GLib.Uri.split_with_user] can (depending on the
* [flags@GLib.UriFlags] you pass) treat it as containing a username, password,
* and authentication parameters. Alternatively, [func@GLib.Uri.split_network]
* can be used when you are only interested in the components that are
* needed to initiate a network connection to the service (scheme,
* host, and port).
*
* g_uri_parse() is similar to g_uri_split(), but instead of returning
* individual strings, it returns a #GUri structure (and it requires
* [func@GLib.Uri.parse] is similar to [func@GLib.Uri.split], but instead of
* returning individual strings, it returns a `GUri` structure (and it requires
* that the URI be an absolute URI).
*
* g_uri_resolve_relative() and g_uri_parse_relative() allow you to
* resolve a relative URI relative to a base URI.
* g_uri_resolve_relative() takes two strings and returns a string,
* and g_uri_parse_relative() takes a #GUri and a string and returns a
* #GUri.
* [func@GLib.Uri.resolve_relative] and [method@GLib.Uri.parse_relative] allow
* you to resolve a relative URI relative to a base URI.
* [func@GLib.Uri.resolve_relative] takes two strings and returns a string,
* and [method@GLib.Uri.parse_relative] takes a `GUri` and a string and returns a
* `GUri`.
*
* All of the parsing functions take a #GUriFlags argument describing
* All of the parsing functions take a [flags@GLib.UriFlags] argument describing
* exactly how to parse the URI; see the documentation for that type
* for more details on the specific flags that you can pass. If you
* need to choose different flags based on the type of URI, you can
* use g_uri_peek_scheme() on the URI string to check the scheme
* use [func@GLib.Uri.peek_scheme] on the URI string to check the scheme
* first, and use that to decide what flags to parse it with.
*
* For example, you might want to use %G_URI_PARAMS_WWW_FORM when parsing the
* params for a web URI, so compare the result of g_uri_peek_scheme() against
* `http` and `https`.
* For example, you might want to use `G_URI_PARAMS_WWW_FORM` when parsing the
* params for a web URI, so compare the result of [func@GLib.Uri.peek_scheme]
* against `http` and `https`.
*
* ## Building URIs
*
* g_uri_join() and g_uri_join_with_user() can be used to construct
* [func@GLib.Uri.join] and [func@GLib.Uri.join_with_user] can be used to construct
* valid URI strings from a set of component strings. They are the
* inverse of g_uri_split() and g_uri_split_with_user().
* inverse of [func@GLib.Uri.split] and [func@GLib.Uri.split_with_user].
*
* Similarly, g_uri_build() and g_uri_build_with_user() can be used to
* construct a #GUri from a set of component strings.
* Similarly, [func@GLib.Uri.build] and [func@GLib.Uri.build_with_user] can be
* used to construct a `GUri` from a set of component strings.
*
* As with the parsing functions, the building functions take a
* #GUriFlags argument. In particular, it is important to keep in mind
* [flags@GLib.UriFlags] argument. In particular, it is important to keep in mind
* whether the URI components you are using are already `%`-encoded. If so,
* you must pass the %G_URI_FLAGS_ENCODED flag.
* you must pass the `G_URI_FLAGS_ENCODED` flag.
*
* ## `file://` URIs
*
* Note that Windows and Unix both define special rules for parsing
* `file://` URIs (involving non-UTF-8 character sets on Unix, and the
* interpretation of path separators on Windows). #GUri does not
* implement these rules. Use g_filename_from_uri() and
* g_filename_to_uri() if you want to properly convert between
* interpretation of path separators on Windows). `GUri` does not
* implement these rules. Use [func@GLib.filename_from_uri] and
* [func@GLib.filename_to_uri] if you want to properly convert between
* `file://` URIs and local filenames.
*
* ## URI Equality
*
* Note that there is no `g_uri_equal ()` function, because comparing
* URIs usefully requires scheme-specific knowledge that #GUri does
* not have. #GUri can help with normalization if you use the various
* encoded #GUriFlags as well as %G_URI_FLAGS_SCHEME_NORMALIZE however
* it is not comprehensive.
* URIs usefully requires scheme-specific knowledge that `GUri` does
* not have. `GUri` can help with normalization if you use the various
* encoded [flags@GLib.UriFlags] as well as `G_URI_FLAGS_SCHEME_NORMALIZE`
* however it is not comprehensive.
* For example, `data:,foo` and `data:;base64,Zm9v` resolve to the same
* thing according to the `data:` URI specification which GLib does not
* handle.
*
* Since: 2.66
*/
/**
* GUri:
*
* A parsed absolute URI.
*
* Since #GUri only represents absolute URIs, all #GUris will have a
* URI scheme, so g_uri_get_scheme() will always return a non-%NULL
* answer. Likewise, by definition, all URIs have a path component, so
* g_uri_get_path() will always return a non-%NULL string (which may be empty).
*
* If the URI string has an
* [authority component](https://tools.ietf.org/html/rfc3986#section-3) (that
* is, if the scheme is followed by `://` rather than just `:`), then the
* #GUri will contain a hostname, and possibly a port and userinfo.
* Additionally, depending on how the #GUri was constructed/parsed (for example,
* using the %G_URI_FLAGS_HAS_PASSWORD and %G_URI_FLAGS_HAS_AUTH_PARAMS flags),
* the userinfo may be split out into a username, password, and
* additional authorization-related parameters.
*
* Normally, the components of a #GUri will have all `%`-encoded
* characters decoded. However, if you construct/parse a #GUri with
* %G_URI_FLAGS_ENCODED, then the `%`-encoding will be preserved instead in
* the userinfo, path, and query fields (and in the host field if also
* created with %G_URI_FLAGS_NON_DNS). In particular, this is necessary if
* the URI may contain binary data or non-UTF-8 text, or if decoding
* the components might change the interpretation of the URI.
*
* For example, with the encoded flag:
*
* |[<!-- language="C" -->
* g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http%3A%2F%2Fhost%2Fpath%3Fparam%3Dvalue", G_URI_FLAGS_ENCODED, &err);
* g_assert_cmpstr (g_uri_get_query (uri), ==, "query=http%3A%2F%2Fhost%2Fpath%3Fparam%3Dvalue");
* ]|
*
* While the default `%`-decoding behaviour would give:
*
* |[<!-- language="C" -->
* g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http%3A%2F%2Fhost%2Fpath%3Fparam%3Dvalue", G_URI_FLAGS_NONE, &err);
* g_assert_cmpstr (g_uri_get_query (uri), ==, "query=http://host/path?param=value");
* ]|
*
* During decoding, if an invalid UTF-8 string is encountered, parsing will fail
* with an error indicating the bad string location:
*
* |[<!-- language="C" -->
* g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http%3A%2F%2Fhost%2Fpath%3Fbad%3D%00alue", G_URI_FLAGS_NONE, &err);
* g_assert_error (err, G_URI_ERROR, G_URI_ERROR_BAD_QUERY);
* ]|
*
* You should pass %G_URI_FLAGS_ENCODED or %G_URI_FLAGS_ENCODED_QUERY if you
* need to handle that case manually. In particular, if the query string
* contains `=` characters that are `%`-encoded, you should let
* g_uri_parse_params() do the decoding once of the query.
*
* #GUri is immutable once constructed, and can safely be accessed from
* multiple threads. Its reference counting is atomic.
*
* Since: 2.66
*/
struct _GUri {
gchar *scheme;
gchar *userinfo;
@ -1072,7 +1063,7 @@ g_uri_split_internal (const gchar *uri_string,
* @error: #GError for error reporting, or %NULL to ignore.
*
* Parses @uri_ref (which can be an
* [absolute or relative URI][relative-absolute-uris]) according to @flags, and
* [absolute or relative URI](#relative-and-absolute-uris)) according to @flags, and
* returns the pieces. Any component that doesn't appear in @uri_ref will be
* returned as %NULL (but note that all URIs always have a path component,
* though it may be the empty string).
@ -1139,7 +1130,7 @@ g_uri_split (const gchar *uri_ref,
* @error: #GError for error reporting, or %NULL to ignore.
*
* Parses @uri_ref (which can be an
* [absolute or relative URI][relative-absolute-uris]) according to @flags, and
* [absolute or relative URI](#relative-and-absolute-uris)) according to @flags, and
* returns the pieces. Any component that doesn't appear in @uri_ref will be
* returned as %NULL (but note that all URIs always have a path component,
* though it may be the empty string).
@ -1191,7 +1182,7 @@ g_uri_split_with_user (const gchar *uri_ref,
* port, or `-1`
* @error: #GError for error reporting, or %NULL to ignore.
*
* Parses @uri_string (which must be an [absolute URI][relative-absolute-uris])
* Parses @uri_string (which must be an [absolute URI](#relative-and-absolute-uris))
* according to @flags, and returns the pieces relevant to connecting to a host.
* See the documentation for g_uri_split() for more details; this is
* mostly a wrapper around that function with simpler arguments.
@ -1260,7 +1251,7 @@ g_uri_split_network (const gchar *uri_string,
* @error: #GError for error reporting, or %NULL to ignore.
*
* Parses @uri_string according to @flags, to determine whether it is a valid
* [absolute URI][relative-absolute-uris], i.e. it does not need to be resolved
* [absolute URI](#relative-and-absolute-uris), i.e. it does not need to be resolved
* relative to another URI using g_uri_parse_relative().
*
* If its not a valid URI, an error is returned explaining how its invalid.
@ -1398,7 +1389,7 @@ remove_dot_segments (gchar *path)
* @error: #GError for error reporting, or %NULL to ignore.
*
* Parses @uri_string according to @flags. If the result is not a
* valid [absolute URI][relative-absolute-uris], it will be discarded, and an
* valid [absolute URI](#relative-and-absolute-uris), it will be discarded, and an
* error returned.
*
* Return value: (transfer full): a new #GUri, or NULL on error.
@ -1424,7 +1415,7 @@ g_uri_parse (const gchar *uri_string,
* @error: #GError for error reporting, or %NULL to ignore.
*
* Parses @uri_ref according to @flags and, if it is a
* [relative URI][relative-absolute-uris], resolves it relative to @base_uri.
* [relative URI](#relative-and-absolute-uris), resolves it relative to @base_uri.
* If the result is not a valid absolute URI, it will be discarded, and an error
* returned.
*
@ -1555,7 +1546,7 @@ g_uri_parse_relative (GUri *base_uri,
* @error: #GError for error reporting, or %NULL to ignore.
*
* Parses @uri_ref according to @flags and, if it is a
* [relative URI][relative-absolute-uris], resolves it relative to
* [relative URI](#relative-and-absolute-uris), resolves it relative to
* @base_uri_string. If the result is not a valid absolute URI, it will be
* discarded, and an error returned.
*