diff --git a/docs/reference/glib/Makefile.am b/docs/reference/glib/Makefile.am index 27fdf0b21..4372db562 100644 --- a/docs/reference/glib/Makefile.am +++ b/docs/reference/glib/Makefile.am @@ -66,7 +66,8 @@ content_files = \ glib-gettextize.xml \ gtester.xml \ gtester-report.xml \ - gvariant-varargs.xml + gvariant-varargs.xml \ + gvariant-text.xml # Extra options to supply to gtkdoc-fixref FIXXREF_OPTIONS= diff --git a/docs/reference/glib/glib-docs.sgml b/docs/reference/glib/glib-docs.sgml index 9474af97b..f3197d9ed 100644 --- a/docs/reference/glib/glib-docs.sgml +++ b/docs/reference/glib/glib-docs.sgml @@ -125,6 +125,7 @@ synchronize their operation. + diff --git a/docs/reference/glib/gvariant-text.xml b/docs/reference/glib/gvariant-text.xml new file mode 100644 index 000000000..3565aa24a --- /dev/null +++ b/docs/reference/glib/gvariant-text.xml @@ -0,0 +1,615 @@ + + + + + GVariant Text Format + + + + GVariant Text Format + + + This page attempts to document the GVariant text format as produced by + g_variant_print() and parsed by the + g_variant_parse() family of functions. In most + cases the style closely resembles the formatting of literals in Python but there are some additions and + exceptions. + + + + The functions that deal with GVariant text format absolutely always deal in utf-8. Conceptually, GVariant + text format is a string of Unicode characters -- not bytes. Non-ASCII but otherwise printable Unicode + characters are not treated any differently from normal ASCII characters. + + + + The parser makes two passes. The purpose of the first pass is to determine the type of the value being + parsed. The second pass does the actual parsing. Based on the fact that all elements in an array have to + have the same type, GVariant is able to make some deductions that would not otherwise be possible. As an + example: + + [[1, 2, 3], [4, 5, 6]] + + is parsed as an array of arrays of integers (type 'aai'), but + + [[1, 2, 3], [4, 5, 6.0]] + + is parsed as a array of arrays of doubles (type 'aad'). + + + + As another example, GVariant is able to determine that + + ["hello", nothing] + + is an array of maybe strings (type 'ams'). + + + + What the parser accepts as valid input is dependent on context. The API permits for out-of-band type + information to be supplied to the parser (which will change its behaviour). This can be seen in the + GSettings and GDBus command line utilities where the type information is available from the schema or the + remote introspection information. The additional information can cause parses to succeed when they would not + otherwise have been able to (by resolving ambiguous type information) or can cause them to fail (due to + conflicting type information). Unless stated otherwise, the examples given in this section assume that no + out-of-band type data has been given to the parser. + + + + + Syntax Summary + + + The following table describes the rough meaning of symbols that may appear inside GVariant text format. + Each symbol is described in detail in its own section, including usage examples. + + + + + + + + + + + + Symbol + + + + + Meaning + + + + + + + + true, + false + + + + + Booleans. + + + + + + + + "", + '' + + + + + String literal. See Strings below. + + + + + + + + numbers + + + + + See Numbers below. + + + + + + + + () + + + + + Tuples. + + + + + + + + [] + + + + + Arrays. + + + + + + + + {} + + + + + Dictionaries and Dictionary Entries. + + + + + + + + <> + + + + + Variants. + + + + + + + + just, + nothing + + + + + Maybe Types. + + + + + + + + @ + + + + + Type Annotations. + + + + + + + + type keywords + + + + + boolean, + byte, + int16, + uint16, + int32, + uint32, + handle, + int64, + uint64, + double, + string, + objectpath, + signature + + + See Type Annotations below. + + + + + + + + b"", + b'' + + + + + Bytestrings. + + + + + + + + % + + + + + Positional Parameters. + + + + + + + + + Booleans + + The strings true and false are parsed as booleans. This is the only + way to specify a boolean value. + + + + + Strings + + Strings literals must be quoted using "" or ''. The two are + completely equivalent (except for the fact that each one is unable to contain itself unescaped). + + + Strings are Unicode strings with no particular encoding. For example, to specify the character + é, you just write 'é'. You could also give the Unicode codepoint of + that character (U+E9) as the escape sequence '\u00e9'. Since the strings are pure + Unicode, you should not attempt to encode the utf-8 byte sequence corresponding to the string using escapes; + it won't work and you'll end up with the individual characters corresponding to each byte. + + + Unicode escapes of the form \uxxxx and \Uxxxxxxxx are supported, in + hexidecimal. The usual control sequence escapes \a, \b, + \f, \n, \r, \t and + \v are supported. Additionally, a \ before a newline character causes + the newline to be ignored. Finally, any other character following \ is copied literally + (for example, \" or \\) but for forwards compatibility with future + additions you should only use this feature when necessary for escaping backslashes or quotes. + + + The usual octal and hexidecimal escapes \0nnn and \xnn are not + supported here. Those escapes are used to encode byte values and GVariant strings are Unicode. + + + Single-character strings are not interpreted as bytes. Bytes must be specified by their numerical value. + + + + + Numbers + + Numbers are given by default as decimal values. Octal and hex values can be given in the usual way (by + prefixing with 0 or 0x). Note that GVariant considers bytes to be + unsigned integers and will print them as a two digit hexidecimal number by default. + + + Floating point numbers can also be given in the usual ways, including scientific and hexidecimal notations. + + + For lack of additional information, integers will be parsed as int32 values by default. If the number has a + point or an 'e' in it, then it will be parsed as a double precision floating point number by default. If + type information is available (either explicitly or inferred) then that type will be used instead. + + + Some examples: + + + 5 parses as the int32 value five. + + + 37.5 parses as a floating point value. + + + 3.75e1 parses the same as the value above. + + + uint64 7 parses seven as a uint64. + See Type Annotations. + + + + + Tuples + + Tuples are formed using the same syntax as Python. Here are some examples: + + + () parses as the empty tuple. + + + (5,) is a tuple containing a single value. + + + ("hello", 42) is a pair. Note that values of different types are permitted. + + + + + Arrays + + Arrays are formed using the same syntax as Python uses for lists (which is arguably the term that GVariant + should have used). Note that, unlike Python lists, GVariant arrays are statically typed. This has two + implications. + + + First, all items in the array must have the same type. Second, the type of the array must be known, even in + the case that it is empty. This means that (unless there is some other way to infer it) type information + will need to be given explicitly for empty arrays. + + + The parser is able to infer some types based on the fact that all items in an array must have the same type. + See the examples below: + + + [1] parses (without additional type information) as a one-item array of signed integers. + + + [1, 2, 3] parses (similarly) as a three-item array. + + + [1, 2, 3.0] parses as an array of doubles. This is the most simple case of the type + inferencing in action. + + + [(1, 2), (3, 4.0)] causes the 2 to also be parsed as a double (but the 1 and 4 are still + integers). + + + ["", nothing] parses as an array of maybe strings. The presence of + "nothing" clearly implies that the array elements are nullable. + + + [[], [""]] will parse properly because the type of the first (empty) array can be + inferred to be equal to the type of the second array (both are arrays of strings). + + + [b'hello', []] looks odd but will parse properly. + See Bytestrings + + + And some examples of errors: + + + ["hello", 42] fails to parse due to conflicting types. + + + [] will fail to parse without additional type information. + + + + + Dictionaries and Dictionary Entries + + Dictionaries and dictionary entries are both specified using the {} characters. + + + The dictionary syntax is more commonly used. This is what the printer elects to use in the normal case of + dictionary entries appearing in an array (aka "a dictionary"). The separate syntax for dictionary entries + is typically only used for when the entries appear on their own, outside of an array (which is valid but + unusual). Of course, you are free to use the dictionary entry syntax within arrays but there is no good + reason to do so (and the printer itself will never do so). Note that, as with arrays, the type of empty + dictionaries must be established (either explicitly or through inference). + + + The dictionary syntax is the same as Python's syntax for dictionaries. Some examples: + + + @a{sv} {} parses as the empty dictionary of everyone's favourite type. + + + @a{sv} [] is the same as above (owing to the fact that dictionaries are really arrays). + + + {1: "one", 2: "two", 3: "three"} parses as a dictionary mapping integers to strings. + + + The dictionary entry syntax looks just like a pair (2-tuple) that uses braces instead of parens. The + presence of a comma immediately following the key differentiates it from the dictionary syntax (which + features a colon after the first key). Some examples: + + + {1, "one"} is a free-standing dictionary entry that can be parsed on its own or as part + of another container value. + + + [{1, "one"}, {2, "two"}, {3, "three"}] is exactly equivalent to the dictionary example + given above. + + + + + Variants + + Variants are denoted using angle brackets (aka "XML brackets"), <>. They may not + be omitted. + + + Using <> effectively disrupts the type inferencing that occurs between array + elements. This can have positive and negative effects. + + + [<"hello">, <42>] will parse whereas ["hello", 42] would + not. + + + [<['']>, <[]>] will fail to parse even though [[''], []] + parses successfully. You would need to specify [<['']>, <@as []>]. + + + {"title": <"frobit">, "enabled": <true>, width: <800>} is an example of + perhaps the most pervasive use of both dictionaries and variants. + + + + + Maybe Types + + The syntax for specifying maybe types is inspired by Haskell. + + + The null case is specified using the keyword nothing and the non-null case is explicitly + specified using the keyword just. GVariant allows just to be omitted + in every case that it is able to unambiguously determine the intention of the writer. There are two cases + where it must be specified: + + + + when using nested maybes, in order to specify the just nothing case + + + + to establish the nullability of the type of a value without explicitly specifying its full type + + + + + Some examples: + + + just 'hello' parses as a non-null nullable string. + + + @ms 'hello' is the same (demonstrating how just can be dropped if the type is already + known). + + + nothing will not parse wtihout extra type information. + + + @ms nothing parses as a null nullable string. + + + [just 3, nothing] is an array of nullable integers + + + [3, nothing] is the same as the above (demonstrating another place were + just can be dropped). + + + [3, just nothing] parses as an array of maybe maybe integers (type + 'ammi'). + + + + + Type Annotations + + Type annotations allow additional type information to be given to the parser. Depending on the context, + this type information can change the output of the parser, cause an error when parsing would otherwise have + succeeded or resolve an error when parsing would have otherwise failed. + + + Type annotations come in two forms: type codes and type keywords. + + + Type keywords can be seen as more verbose (and more legible) versions of a common subset of the type codes. + The type keywords boolean, byte, int16, + uint16, int32, uint32, handle, + int64, uint64, double, string, + objectpath and literal signature are each exactly equivalent to their + corresponding type code. + + + Type codes are an @ ("at" sign) followed by a definite GVariant type string. Some + examples: + + + uint32 5 causes the number to be parsed unsigned instead of signed (the default). + + + @u 5 is the same + + + objectpath "/org/gnome/xyz" creates an object path instead of a normal string + + + @au [] specifies the type of the empty array (which would not parse otherwise) + + + @ms "" indicates that a string value is meant to have a maybe type + + + + + Bytestrings + + The bytestring syntax is a piece of syntactic sugar meant to complement the bytestring APIs in GVariant. It + constructs arrays of non-nul bytes (type 'ay') with a nul terminator at the end. + + + Bytestrings are specified with either b"" or b''. As with strings, + there is no fundamental difference between the two different types of quotes. + + + Bytestrings support the full range of escapes that you would expect (ie: those supported by + g_strcompress(). This includes the normal control + sequence escapes (as mentioned in the section on strings) as well as octal and hexidecimal escapes of the + forms \0nnn and \xnn. + + + b'abc' is equivalent to [byte 0x97, 0x98, 0x99, 0]. + + + When formatting arrays of bytes, the printer will choose to display the array as a bytestring if it contains + a nul character at the end and no other nul bytes within. Otherwise, it is formatted as a normal array. + + + + + Positional Parameters + + Positional parameters are not a part of the normal GVariant text format, but they are mentioned here because + they can be used with g_variant_new_parsed(). + + + A positional parameter is indicated with a % followed by any valid + GVariant Format String. Variable arguments are collected as + specified by the format string and the resulting value is inserted at the current position. + + + This feature is best explained by example: + + , 'enabled': <%b>}", t, en);]]> + + This constructs a dictionary mapping strings to variants (type 'a{sv}') with two items in + it. The key names are parsed from the string and the values for those keys are taken as variable arguments + parameters. + + + The arguments are always collected in the order that they appear in the string to be parsed. Format strings + that collect multiple arguments are permitted, so you may require more varargs parameters than the number of + % signs that appear. You can also give format strings that collect no arguments, but + there's no good reason to do so. + + + + diff --git a/glib/gvariant-parser.c b/glib/gvariant-parser.c index ca69dfb85..7cd424b22 100644 --- a/glib/gvariant-parser.c +++ b/glib/gvariant-parser.c @@ -2290,6 +2290,8 @@ parse (TokenStream *stream, * * A single #GVariant is parsed from the content of @text. * + * The format is described here. + * * The memory at @limit will never be accessed and the parser behaves as * if the character at @limit is the nul terminator. This has the * effect of bounding @text. diff --git a/glib/gvariant.c b/glib/gvariant.c index 6bfc9007d..fc7461117 100644 --- a/glib/gvariant.c +++ b/glib/gvariant.c @@ -2234,6 +2234,8 @@ g_variant_print_string (GVariant *value, * * Pretty-prints @value in the format understood by g_variant_parse(). * + * The format is described here. + * * If @type_annotate is %TRUE, then type information is included in * the output. */