saxon10/saxon10.saxonq.pod

594 lines
24 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

=encoding utf8
=head1 NAME
saxon10q - Saxon XQuery 3.0 processor
=head1 SYNOPSIS
saxon10q [I<options>] [I<params>]
=head1 DESCRIPTION
A command to run a query contained in a file.
Saxon 9.8 and later releases implement the XQuery 3.1 Recommendation dated 18
April 2017. Because XQuery 3.1 is backwards compatible with XQuery 1.0 and
XQuery 3.0, Saxon will also execute queries written according to the 1.0 or 3.0
versions of the specification. Saxon no longer has a mode to execute as an
XQuery 1.0 or 3.0 processor (that is, to reject constructs that were not
present in earlier versions of the language). For information about the
conformance of Saxon to the XQuery 3.1 specification, and about the handling of
implementation-defined features of the specification, see
L<Conformance|https://www.saxonica.com/documentation10/index.html#!conformance>.
Saxon uses the same run-time engine to support both XQuery and XSLT, reflecting
the fact that the two languages have very similar semantics. Most of the
compile-time code (in particular, the type checking logic and the optimizer) is
also common. The XQuery support in Saxon consists essentially of an XQuery
parser (which is itself an extension of the XPath parser); the parser generates
the same internal interpretable code as the XSLT processor. There are also some
constructs in the internal expression tree that will only be generated from
XQuery source rather than XSLT source; examples are the XQuery order by and
group by clauses, which have no direct XSLT equivalent.
The XQuery processor may be invoked either from the operating system command
line, or via an API from a user-written application. There is no graphical user
interface provided.
Saxon is an in-memory processor. Unless you can take advantage of streaming,
Saxon is designed to process source documents that fit in memory. Saxon has
been used successfully to process source documents of 100Mbytes or more without
streaming, but if you attempt anything this large, you need to be aware (a)
that you will need to allocate sufficient memory to the Java VM (at least 5
times the size of the source document), and (b) that complex FLWOR expressions
may be very time-consuming to execute. (In this scenario, Saxon-EE is
recommended, because it has a more powerful optimizer for complex joins.)
=head1 OPTIONS
=over
=item B<-backup>:(C<on>|C<off>)
Only relevant when B<-update>:C<on> is specified. Default is on. When backup is
enabled, any file that is updated by the query will be preserved in its
original state by renaming it, adding “C<.bak>” to the original filename. If
backup is disabled, updated files will be silently overwritten.
=item B<-catalog>:I<filenames>
I<filenames> is either a file name or a list of file names separated by
semicolons; the files are OASIS XML catalogs used to define how public
identifiers and system identifiers (URIs) used in a source document, query, or
schema are to be redirected, typically to resources available locally. For more
details see L<Using XML
catalogs|https://www.saxonica.com/documentation10/index.html#!sourcedocs/xml-catalogs>.
=item B<-config>:I<filename>
Indicates that configuration information should be taken from the supplied
configuration file. Any options supplied on the command line override options
specified in the configuration file.
=item B<-dtd>:(C<on>|C<off>|C<recover>)
Setting B<-dtd>:C<on> requests DTD-based validation of the source file and of
any files read using the C<doc()> function. Requires an XML parser that
supports validation. The setting B<-dtd>:C<off> (which is the default)
suppresses DTD validation. The setting B<-dtd>:C<recover> performs DTD
validation but treats the error as non-fatal if it fails. Note that any
external DTD is likely to be read even if not used for validation, because DTDs
can contain definitions of entities.
=item B<-expand>:(C<on>|C<off>)
Normally, if validation using a DTD or schema is requested, any fixed or
default values defined in the DTD or schema will be expanded. Specifying
B<-expand>:C<off> suppresses this. (In the case of DTD-defined defaults, this
might not work with all XML parsers. It does work with the Xerces parser
(default for Java) and the Microsoft parser (default for .NET).)
=item B<-explain>[:I<filename>]
Display a query execution plan. This is a representation of the expression tree
after rewriting by the optimizer. If no file name is specified the output is
sent to the standard error stream. The output is a tree in XML format.
=item B<-ext>:(C<on>|C<off>)
If B<-ext>:C<off> is specified, suppress calls on dynamically-loaded external
Java functions. This does not affect calls on integrated extension functions,
including Saxon and EXSLT extension functions. This option is useful when
loading an untrusted query, perhaps from a remote site using an C<http://> URL;
it ensures that the query cannot call arbitrary Java methods and thereby gain
privileged access to resources on your machine.
=item B<-init>:I<initializer>
The value is the name of a user-supplied class that implements the interface
C<Initializer>; this initializer will be called during the initialization
process, and may be used to set any options required on the C<Configuration>
programmatically.
=item B<-l>[:(C<on>|C<off>)]
If B<-l> or B<-l>:C<on> is specified, causes line and column numbers to be
maintained for source documents. These are accessible using the extension
functions C<saxon:line-number()> and C<saxon:column-number()>. Line numbers are
useful when the purpose of the query is to find errors or anomalies in the
source XML file. Without this option, line numbers are available while source
documents are being parsed and validated, but they are not retained in the tree
representation of the document.
=item B<-mr>:C<classname>
Use the specified C<ModuleURIResolver> to process all query module URIs. The
C<ModuleURIResolver> is a user-defined class that implements the
C<ModuleURIResolver> interface. It is invoked to process URIs used in the
import module declaration in the query prolog, and (if B<-u> is also specified,
or if the file name begins with C<http:>, C<https:>, C<file:> or C<classpath:>)
to process the URI of the query source file provided on the command line.
=item B<-now>:I<yyyy-mm-ddThh:mm:ss+hh:mm>
Sets the value of C<current-dateTime()> (and C<implicit-timezone()>) for the
query. This is designed for testing, to enable repeatable results to be
obtained for comparison with reference results, or to test that queries can
handle significant dates and times such as end-of-year processing.
=item B<-o>:I<filename>
Send output to named file. In the absence of this option, the results go to
standard output. The output format depends on whether the C<-wrap> option is
present. The file is created if it does not already exist; any necessary
directories will also be created. If the file does exist, it is overwritten
(even if the query fails).
=item B<-opt>:[B<->]I<flags>
Allows individual optimizations to be enabled or disabled selectively. There is
a set of single-letter flags identifying particular optimizations:
=over
=item C<c>:
generate bytecode
=item C<d>:
detect void path expressions
=item C<e>:
cache regular expressions
=item C<f>:
inline functions
=item C<g>:
extract global variables
=item C<j>:
just-in-time compilation of template rules (currently XSLT-only)
=item C<k>:
create keys
=item C<l>:
loop lifting
=item C<m>:
miscellaneous
=item C<n>:
constant folding
=item C<r>:
template rule-sets (not relevant to XQuery)
=item C<s>:
extract common subexpressions
=item C<t>:
tail call optimization
=item C<v>:
inline variables
=item C<w>:
create switch statements
=item C<x>:
index predicates
=back
A value such as B<-opt>:C<gs> runs with only the selected optimizations;
B<-opt>:C<-gs> runs with the selected optimizations disabled and all others
enabled. The value B<-opt>:C<0> suppresses all optimizations. The default is
full optimization; this feature allows optimization to be suppressed in cases
where reducing compile time is important, or where optimization gets in the way
of debugging, or causes extension functions with side-effects to behave
unpredictably. (Note however, that even with no optimization, lazy evaluation
may still cause the evaluation order to be not as expected.)
=item B<-outval>:(C<recover>|C<fatal>)
Normally, if validation of result documents is requested, a validation error is
fatal. Setting the option B<-outval>:C<recover> causes such validation failures
to be treated as warnings. The validation message is written both to the
standard error stream, and (where possible) as a comment in the result document
itself.
=item B<-p>[:(C<on>|C<off>)]
Enable recognition of query parameters (such as C<xinclude=yes>) in the
C<StandardURIResolver>. This option is available in Saxon-PE and Saxon-EE only.
It cannot be used in conjunction with the B<-r> option, and it automatically
switches on the B<-u> and B<-sa> options. The effect is that Saxon-specific
query parameters are recognized in a URI. One query parameter that is
recognized is C<val>. This may take the values C<strict>, C<lax>, or C<strip>.
For example, C<source.xml?val=strict> loads a document with strict schema
validation.
=item B<-projection>:(C<on>|C<off>)
Use (or dont use) document projection. Document Projection is a mechanism that
analyzes a query to determine what parts of a document it can potentially
access, and then while building a tree to represent the document, leaves out
those parts of the tree that cannot make any difference to the result of the
query. Requires Saxon-EE.
=item B<-q>:I<queryfile>
Identifies the file containing the query. If this is the last option then the
B<-q>: prefix may be omitted. The file can be specified as “C<->” to read the
query from standard input: in this case the base URI is that of the current
directory.
=item B<-qs>:I<querystring>
Allows the query to be specified inline (if it contains spaces, you will need
quotes around the expression to keep the command line processor happy). The
static base URI of the query is taken from the current working directory. So,
for example, C<java net.sf.saxon.Query -qs:doc('a.xml')//p[1]> selects elements
within the file F<a.xml> in the current directory.
=item B<-quit>:(C<on>|C<off>)
With the default setting, on, the command will quit the Java VM and return an
exit code if a failure occurs. This is useful when running from an operating
system shell. With the setting B<-quit>:C<off> the command instead throws a
C<RunTimeException>, which is more useful when the command is invoked from
another Java application such as Ant.
=item B<-r>:I<classname>
Use the specified C<URIResolver> to process all URIs. The C<URIResolver> is a
user-defined class, that implements the C<URIResolver> interface defined in
JAXP, whose function is to take a URI supplied as a string, and return a SAX
C<InputSource>. It is invoked to process URIs used in the C<doc()> function,
and (if B<-u> is also specified) to process the URI of the source file provided
on the command line.
=item B<-repeat>:I<integer>
Performs the transformation I<N> times, where I<N> is the specified integer.
This option is useful for performance measurement, since timings for the first
few runs of the query are often dominated by Java warm-up time.
=item B<-s>:I<filename-or-URI>
Take input from the specified file. If the B<-u> option is specified, or if the
name begins with “C<file:>” or “C<http:>”, then the name is assumed to be a URI
rather than a filename. This file must contain an XML document. The document
node of the document is made available to the query as the context item. The
filename can be specified as “C<->” to read the source document from standard
input: in this case the base URI is that of the current directory.
=item B<-sa>
Invoke a schema-aware query. Requires Saxon-EE to be installed.
=item B<-scmin>:I<filename>
Loads a precompiled schema component model from the given file. The file should
be generated in a previous run using the B<-export> option. When this option is
used, the B<-xsd> option should not be present. Schemas loaded from an SCM file
are assumed to be valid, without checking.
This option is retained for compatibility. From Saxon 9.7, SCM files can also
be supplied in the B<-xsd> option.
=item B<-stream>:(C<on>|C<off>)
Use (or dont use) streaming. Streaming allows simple queries to be executed
while the source document is being read, avoiding the need to build a tree
representation of the document in memory. For information about the kind of
queries that can be streamed, see L<Streaming
XQuery|https://www.saxonica.com/documentation10/index.html#!sourcedocs/streaming/streamed-query>.
If the query cannot be streamed, execution will fail with diagnostic errors.
Requires Saxon-EE.
=item B<-strip>:(C<all>|C<none>|C<ignorable>)
Specifies what whitespace is to be stripped from source documents (applies both
to the principal source document and to any documents loaded for example using
the C<doc()> function). The default is none: no whitespace stripping.
=item B<-t>
Display version and timing information to the standard error output. The output
also traces the files that are read and written, and extension modules that are
loaded.
=item B<-T>[:I<classname>]
Notify query tracing information. Also switches line numbering on for the
source document. If a classname is specified, it is a user-defined class, which
must implement C<TraceListener>. If the classname is omitted, a system-supplied
trace listener is used. This traces execution of function calls to the standard
error output. For performance profiling, set classname to
C<net.sf.saxon.trace.TimingTraceListener>. This creates an output file giving
timings for each instruction executed. This output file can subsequently be
analyzed to give an execution time profile for the query. See L<Performance
Analysis|https://www.saxonica.com/documentation10/index.html#!using-xquery/performanceanalysis>.
=item B<-TB>:I<filename>
Monitors generation of hot-spot bytecode and produces an XML report on the
given filename giving, for each expression that was a candidate for bytecode
generation and that was actually executed, data about the number of times and
speed of execution in the interpreter, the number of times and speed of
execution in compiled form, and the cost of compilation. Note that if an
expression I<A> contains an expression I<B> and both are candidates for
bytecode generation, then the statistics for I<B> relate only to the time
before I<A> was compiled in its own right.
=item B<-TJ>
Switches on tracing of the binding of calls to external Java methods. This is
useful when analyzing why Saxon fails to find a Java method to match an
extension function call in the query, or why it chooses one method over another
when several are available.
=item B<-Tlevel>:(C<none>|C<low>|C<normal>|C<high>)
Controls the level of detail of the tracing produced by the C<-T> option. The
values are:
=over
=item C<none>
effectively switches tracing off.
=item C<low>
traces function calls.
=item C<normal>
traces execution of significant expressions such as element constructors.
=item C<high>
traces execution of finer-grained expressions such as the clauses of a FLWOR expression.
=back
=item B<-Tout>:I<filename>
Directs the output of tracing to a specified file (assuming that B<-T> is
enabled).
=item B<-TP>:I<filename>
This is equivalent to setting B<-T>:C<net.sf.saxon.trace.TimedTraceListener>
and B<-traceout>:I<filename>; that is, it causes trace profile information to
be set to the specified file. This output file can subsequently be analyzed to
give an execution time profile for the query. See L<Performance
Analysis|https://www.saxonica.com/documentation10/index.html#!using-xquery/performanceanalysis>.
=item B<-traceout>:I<filename>
Indicates that the output of the CC<trace()> function should be directed to a
specified file. Alternatively, specify C<#out> to direct the output to
C<System.out>, C<#err> to send it to C<System.err> (the default), or C<#null>
to have it discarded. This option is ignored when a trace listener is in use:
in that case, C<trace()> output goes to the registered trace listener.
=item B<-tree>:(C<linked>|C<tiny>|C<tinyc>)
Selects the implementation of the internal tree model. C<tiny> selects the
“tiny tree” model (the default), C<linked> selects the linked tree model,
C<tinyc> selects the “condensed tiny tree” model. See L<Choosing a tree
model|https://www.saxonica.com/documentation10/index.html#!sourcedocs/choosingmodel>.
=item B<-u>
Indicates that the name of the source document is a URI; otherwise it is taken
as a filename, unless it starts with C<http:>, C<https:>, C<file:> or
C<classpath:>, in which case it is taken as a URL.
=item B<-update>:(C<on>|C<off>|C<discard>)
Indicates whether XQuery Update syntax is accepted. This option requires
Saxon-EE. The value on enables XQuery update; any eligible files updated by the
query are written back to filestore. A file is eligible for updating if it was
read using the C<doc()> or C<collection()> functions using a URI that
represents an updateable location. The context document supplied using the
B<-s> option is not eligible for updating. The default value off disables
update (any use of XQuery update syntax is an error). The value discard allows
XQuery Update syntax, but modifications made to files are not saved in
filestore. If the document supplied in the B<-s> option is updated, the updated
document is serialized as the result of the query (writing it to the B<-o>
destination); updates to any other documents are simply discarded. Use of the
B<-t> option is recommended: it gives feedback on which files have been updated
by the query.
=item B<-val>[:(C<strict>|C<lax>)]
Requests schema-based validation of the source file and of any files read using
the C<doc()> function. This option is available only with Saxon-EE, and it
automatically switches on the B<-sa> option. Specify B<-val> or
B<-val>:C<strict> to request strict validation, or B<-val>:C<lax> for lax
validation.
=item B<-wrap>
Wraps the result sequence in an XML element structure that indicates the type
of each node or atomic value in the query result. This format can handle any
type of query result. In the absence of this option, the command effectively
wraps a C<document{}> constructor around the supplied query, so that the result
is a single XML document, which is then serialized. This will fail if the query
result includes constructs that cannot be added to a document node in this way,
notably free-standing attribute nodes.
=item B<-x>:I<classname>
Use the specified SAX parser for the source file and any files loaded using the
C<doc()> function. The parser must be the fully-qualified class name of a Java
class that implements the C<org.xml.sax.XMLReader> or
C<javax.xml.parsers.SAXParserFactory> interface, and it must be instantiable
using a zero-argument public constructor.
=item B<-xi>:(C<on>|C<off>)
Apply XInclude processing to all source XML documents (but not to schema
documents). This currently only works when documents are parsed using the
Xerces parser, which is the default in JDK 1.5 and later.
=item B<-xmlversion>:(C<1.0>|C<1.1>)
If B<-xmlversion>:C<1.1> is specified, allows XML 1.1 and XML Namespaces 1.1
constructs. This option must be set if source documents using XML 1.1 are to be
read, or if result documents are to be serialized as XML 1.1. This option also
enables use of XML 1.1 constructs within the query itself.
=item B<-xsd>:I<file1>;I<file2>;I<file3>…
Loads additional schema documents. The declarations in these schema documents
are available when validating source documents (or for use by the C<validate{}>
expression). This option may also be used to supply the locations of schema
documents that are imported into the query, in the case where the import schema
declaration gives the target namespace of the schema but not its location.
=item B<-xsdversion>:(C<1.0>|C<1.1>)
If B<-xsdversion>:C<1.1> is specified (the default), allows schema documents
using XML Schema 1.1 to be read, and XML Schema 1.1 constructs such as
assertions.
=item B<-xsiloc>:(C<on>|C<off>)
If set to C<on> (the default) the schema processor attempts to load any schema
documents referenced in C<xsi:schemaLocation> and
C<xsi:noNamespaceSchemaLocation> attributes in the instance document, unless a
schema for the specified namespace (or non-namespace) is already available. If
set to C<off>, these attributes are ignored.
=item C<--feature>:I<value>
Set a feature defined in the C<Configuration> interface. The names of features
are defined in the Javadoc for class Feature (alternatively see L<Configuration
Features|https://www.saxonica.com/documentation10/index.html#!configuration/config-features>):
the value used here is the part of the name after the last “C</>”, for example
B<--allow-external-functions>:C<off>. Only features accepting a string or
boolean may be set; for booleans the values C<true>/C<false>, C<on>/C<off>,
C<yes>/C<no>, and C<1>/C<0> are recognized.
=item B<-?>
Display command syntax.
=item B<--?>
Display a list of features that are available using the B<--feature>:I<value>
syntax.
=back
=head2 COMMAND LINE PARAMETERS
A I<param> takes the form I<name>=I<value>, name being the name of the
parameter, and value the value of the parameter. These parameters are
accessible within the query as external variables, using the C<$>I<name>
syntax, provided they are declared in the query prolog. If there is no such
declaration, the supplied parameter value is silently ignored.
A param preceded by a leading question mark (C<?>) is interpreted as an XPath
expression. For example, C<?time=current-dateTime()> sets the value of the
external variable C<$time> to the value of the current date and time, as an
instance of C<xs:dateTime>, while C<?debug=false()> sets the value of the
variable C<$debug> to the boolean value false. If the parameter has a required
type (for example C<declare variable $p as xs:date external;>), then the
supplied value must be compatible with this type according to the standard
rules for converting function arguments (it doesnt need to satisfy the
stricter rules that apply to variable initialization). The static context for
the XPath expression includes only the standard namespaces conventionally bound
to the prefixes C<xs>, C<fn>, C<xsi>, and C<saxon>. The static base URI (used
when calling the C<doc()> function) is the current directory. The dynamic
context contains no context item, position, or size, and no variables.
A param preceded by a leading plus sign (C<+>) is interpreted as a filename or
directory. The content of the file is parsed as XML, and the resulting document
node is passed to the stylesheet as the value of the parameter. If the
parameter value is a directory, then all the immediately contained files are
parsed as XML, and the resulting sequence of document nodes is passed as the
value of the parameter. For example, C<+lookup=lookup.xml> sets the value of
the external variable lookup to the document node at the root of the tree
representing the parsed contents of the file F<lookup.xml>.
A param preceded by a leading exclamation mark (C<!>) is interpreted as a
serialization parameter. For example, C<!indent=yes> requests indented output,
and C<!encoding=iso-8859-1> requests that the serialized output be in ISO
8859/1 encoding. This is equivalent to specifying the option declaration
declare option C<saxon:output "indent=yes">; or declare option C<saxon:output
"encoding=iso-8859-1">; in the query prolog. If you are using the bash shell,
you will need to escape “C<!>” as “C<\!>”.
Under Windows, and some other operating systems, it is possible to supply a
value containing spaces by enclosing it in double quotes, for example
C<name="John Smith">. This is a feature of the operating system shell, not
something Saxon does, so it may not work the same way under every operating
system.
If the parameter name is in a non-null namespace, the parameter can be given a
value using the syntax C<{>I<uri>C<}>I<localname>=I<value>. Here uri is the
namespace URI of the parameters name, and localname is the local part of the
name.
This applies also to output parameters. For example, you can set the
indentation level to 4 by using the parameter
C<!{http://saxon.sf.net/}indent-spaces=4>. In this case, however, lexical
QNames using the prefix saxon are also recognized, for example
C<!saxon:indent-spaces=4>. For the extended set of output parameters supported
by Saxon, see L<Additional serialization
parameters|https://www.saxonica.com/documentation10/index.html#!extensions/output-extras>.
=head1 SEE ALSO
L<https://www.saxonica.com/documentation10/index.html>, L<saxon10(1)>,
L<gizmo10(1)>.
=head1 AUTHOR
Michael H. Kay E<lt>L<mike@saxonica.com>E<gt>