mirror of
git://git.sv.gnu.org/findutils.git
synced 2026-02-17 04:10:34 +01:00
* find/defs.h (struct options): Add mount member and rename stay_on_filesystem to xdev. * find/ftsfind.c (find): Set FTS_MOUNT flag when -mount is enabled. * find/parser.c (parse_table): Use a separate parser for -mount. (parse_mount): Declare and define function. (parse_xdev): Use xdev option flag. * find/util.c (set_option_defaults): Initialize new struct members. * doc/find.texi (node Filesystems): Add new section describing the new behaviour of -mount and specify the current behaviour of -xdev. * find/find.1: Document the new -mount behaviour and specify current behaviour of -xdev. * NEWS (Changes in find): Mention the -mount behaviour change.
6054 lines
230 KiB
Plaintext
6054 lines
230 KiB
Plaintext
\input texinfo @c -*-texinfo-*-
|
||
@c %**start of header
|
||
@setfilename find.info
|
||
@include version.texi
|
||
@settitle GNU Findutils @value{VERSION}
|
||
@c For double-sided printing, uncomment:
|
||
@c @setchapternewpage odd
|
||
@c %**end of header
|
||
|
||
@include dblocation.texi
|
||
|
||
@iftex
|
||
@finalout
|
||
@end iftex
|
||
|
||
@dircategory Basics
|
||
@direntry
|
||
* Finding files: (find). Operating on files matching certain criteria.
|
||
@end direntry
|
||
|
||
@dircategory Individual utilities
|
||
@direntry
|
||
* find: (find)Finding Files. Finding and acting on files.
|
||
* locate: (find)Invoking locate. Finding files in a database.
|
||
* updatedb: (find)Invoking updatedb. Building the locate database.
|
||
* xargs: (find)Invoking xargs. Operating on many files.
|
||
@end direntry
|
||
|
||
@copying
|
||
This manual documents version @value{VERSION} of the GNU utilities for finding
|
||
files that match certain criteria and performing various operations on them.
|
||
|
||
Copyright @copyright{} 1994--2026 Free Software Foundation, Inc.
|
||
|
||
@quotation
|
||
Permission is granted to copy, distribute and/or modify this document
|
||
under the terms of the GNU Free Documentation License, Version 1.3 or
|
||
any later version published by the Free Software Foundation; with no
|
||
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
|
||
A copy of the license is included in the section entitled
|
||
``GNU Free Documentation License''.
|
||
@end quotation
|
||
@end copying
|
||
|
||
@titlepage
|
||
@title GNU Findutils
|
||
@subtitle Finding files
|
||
@subtitle version @value{VERSION}, @value{UPDATED}
|
||
@author by David MacKenzie and James Youngman
|
||
|
||
@page
|
||
@vskip 0pt plus 1filll
|
||
@insertcopying
|
||
@end titlepage
|
||
|
||
@contents
|
||
|
||
@ifnottex
|
||
@node Top
|
||
@top GNU Findutils
|
||
@comment node-name, next, previous, up
|
||
@insertcopying
|
||
@end ifnottex
|
||
|
||
@c The master menu, created with texinfo-master-menu, goes here.
|
||
|
||
@menu
|
||
* Introduction:: Summary of the tasks this manual describes.
|
||
* Finding Files:: Finding files that match certain criteria.
|
||
* Actions:: Doing things to files you have found.
|
||
* Databases:: Maintaining file name databases.
|
||
* File Permissions:: How to control access to files.
|
||
* Date input formats:: Specifying literal times.
|
||
* Configuration:: Options you can select at compile time.
|
||
* Reference:: Summary of how to invoke the programs.
|
||
* Common Tasks:: Solutions to common real-world problems.
|
||
* Worked Examples:: Examples demonstrating more complex points.
|
||
* Security Considerations:: Security issues relating to findutils.
|
||
* Error Messages:: Explanations of some messages you might see.
|
||
* History:: History of find, xargs and locate.
|
||
* GNU Free Documentation License:: Copying and sharing this manual.
|
||
* Primary Index:: The components of @code{find} expressions.
|
||
@end menu
|
||
|
||
@node Introduction
|
||
@chapter Introduction
|
||
|
||
This manual shows how to find files that meet criteria you specify,
|
||
and how to perform various actions on the files that you find. The
|
||
principal programs that you use to perform these tasks are
|
||
@code{find}, @code{locate}, and @code{xargs}. Some of the examples in
|
||
this manual use capabilities specific to the GNU versions of those
|
||
programs.
|
||
|
||
See @ref{History} for a history of @code{find}, @code{locate} and
|
||
@code{xargs}. The current maintainers of GNU findutils (and this
|
||
manual) are Bernhard Voelker and James Youngman. Many other people
|
||
have contributed bug fixes, small improvements, and helpful
|
||
suggestions. Thanks!
|
||
|
||
@findex bugs, reporting
|
||
To report a bug in GNU findutils, please use the form on the Savannah
|
||
web site at
|
||
@code{https://savannah.gnu.org/bugs/?group=findutils}. Reporting bugs
|
||
this way means that you will then be able to track progress in fixing
|
||
the problem.
|
||
|
||
If you don't have web access, you can also just send mail to the
|
||
mailing list. The mailing list @email{bug-findutils@@gnu.org} carries
|
||
discussion of bugs in findutils, questions and answers about the
|
||
software and discussion of the development of the programs. To join
|
||
the list, send email to @email{bug-findutils-request@@gnu.org}.
|
||
|
||
Please read any relevant sections of this manual before asking for
|
||
help on the mailing list. You may also find it helpful to read the
|
||
NON-BUGS section of the @code{find} manual page.
|
||
|
||
If you ask for help on the mailing list, people will be able to help
|
||
you much more effectively if you include the following things:
|
||
|
||
@itemize @bullet
|
||
@item The version of the software you are running. You can find this
|
||
out by running @samp{locate --version}.
|
||
@item What you were trying to do
|
||
@item The @emph{exact} command line you used
|
||
@item The @emph{exact} output you got (if this is very long, try to
|
||
find a smaller example which exhibits the same problem)
|
||
@item The output you expected to get
|
||
@end itemize
|
||
|
||
It may also be the case that the bug you are describing has already
|
||
been fixed, if it is a bug. Please check the most recent findutils
|
||
releases at @url{ftp://ftp.gnu.org/gnu/findutils} and, if possible,
|
||
the development branch at @url{ftp://alpha.gnu.org/gnu/findutils}.
|
||
If you take the time to check that your bug still exists in current
|
||
releases, this will greatly help people who want to help you solve
|
||
your problem. Please also be aware that if you obtained findutils as
|
||
part of the GNU/Linux 'distribution', the distributions often lag
|
||
seriously behind findutils releases, even the stable release. Please
|
||
check the GNU FTP site.
|
||
|
||
@menu
|
||
* Scope::
|
||
* Overview::
|
||
@end menu
|
||
|
||
@node Scope
|
||
@section Scope
|
||
|
||
For brevity, the word @dfn{file} in this manual means a regular file,
|
||
a directory, a symbolic link, or any other kind of node that has a
|
||
directory entry. A directory entry is also called a @dfn{file name}.
|
||
A file name may contain some, all, or none of the directories in a
|
||
path that leads to the file. These are all examples of what this
|
||
manual calls ``file names'':
|
||
|
||
@example
|
||
parser.c
|
||
README
|
||
./budget/may-94.sc
|
||
fred/.cshrc
|
||
/usr/local/include/termcap.h
|
||
@end example
|
||
|
||
A @dfn{directory tree} is a directory and the files it contains, all
|
||
of its subdirectories and the files they contain, etc. It can also be
|
||
a single non-directory file.
|
||
|
||
These programs enable you to find the files in one or more directory
|
||
trees that:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
have names that contain certain text or match a certain pattern;
|
||
@item
|
||
are links to certain files;
|
||
@item
|
||
were last used during a certain period of time;
|
||
@item
|
||
are within a certain size range;
|
||
@item
|
||
are of a certain type (regular file, directory, symbolic link, etc.);
|
||
@item
|
||
are owned by a certain user or group;
|
||
@item
|
||
have certain access permissions or special mode bits;
|
||
@item
|
||
contain text that matches a certain pattern;
|
||
@item
|
||
are within a certain depth in the directory tree;
|
||
@item
|
||
or some combination of the above.
|
||
@end itemize
|
||
|
||
Once you have found the files you're looking for (or files that are
|
||
potentially the ones you're looking for), you can do more to them than
|
||
simply list their names. You can get any combination of the files'
|
||
attributes, or process the files in many ways, either individually or
|
||
in groups of various sizes. Actions that you might want to perform on
|
||
the files you have found include, but are not limited to:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
view or edit
|
||
@item
|
||
store in an archive
|
||
@item
|
||
remove or rename
|
||
@item
|
||
change access permissions
|
||
@item
|
||
classify into groups
|
||
@end itemize
|
||
|
||
This manual describes how to perform each of those tasks, and more.
|
||
|
||
@node Overview
|
||
@section Overview
|
||
|
||
The principal programs used for making lists of files that match given
|
||
criteria and running commands on them are @code{find}, @code{locate},
|
||
and @code{xargs}. An additional command, @code{updatedb}, is used by
|
||
system administrators to create databases for @code{locate} to use.
|
||
|
||
@code{find} searches for files in a directory hierarchy and prints
|
||
information about the files it found. It is run like this:
|
||
|
||
@example
|
||
find @r{[}@var{file}@dots{}@r{]} @r{[}@var{expression}@r{]}
|
||
@end example
|
||
|
||
@noindent
|
||
Here is a typical use of @code{find}. This example prints the names
|
||
of all files in the directory tree rooted in @file{/usr/src} whose
|
||
name ends with @samp{.c} and that are larger than 100 KiB.
|
||
@example
|
||
find /usr/src -name '*.c' -size +100k -print
|
||
@end example
|
||
|
||
Notice that the wildcard must be enclosed in quotes in order to
|
||
protect it from expansion by the shell.
|
||
|
||
@code{locate} searches special file name databases for file names that
|
||
match patterns. The system administrator runs the @code{updatedb}
|
||
program to create the databases. @code{locate} is run like this:
|
||
|
||
@example
|
||
locate @r{[}@var{option}@dots{}@r{]} @var{pattern}@dots{}
|
||
@end example
|
||
|
||
@noindent
|
||
This example prints the names of all files in the default file name
|
||
database whose name ends with @samp{Makefile} or @samp{makefile}.
|
||
Which file names are stored in the database depends on how the system
|
||
administrator ran @code{updatedb}.
|
||
@example
|
||
locate '*[Mm]akefile'
|
||
@end example
|
||
|
||
The name @code{xargs}, pronounced EX-args, means ``combine
|
||
arguments.'' @code{xargs} builds and executes command lines by
|
||
gathering together arguments it reads on the standard input. Most
|
||
often, these arguments are lists of file names generated by
|
||
@code{find}. @code{xargs} is run like this:
|
||
|
||
@example
|
||
xargs @r{[}@var{option}@dots{}@r{]} @r{[}@var{command} @r{[}@var{initial-arguments}@r{]}@r{]}
|
||
@end example
|
||
|
||
@noindent
|
||
The following command searches the files listed in the file
|
||
@file{file-list} and prints all of the lines in them that contain the
|
||
word @samp{typedef}.
|
||
@example
|
||
xargs grep typedef < file-list
|
||
@end example
|
||
|
||
@node Finding Files
|
||
@chapter Finding Files
|
||
|
||
By default, @code{find} prints to the standard output the names of the
|
||
files that match the given criteria. @xref{Actions}, for how to get
|
||
more information about the matching files.
|
||
|
||
|
||
@menu
|
||
* find Expressions::
|
||
* Starting points::
|
||
* Name::
|
||
* Links::
|
||
* Time::
|
||
* Size::
|
||
* Type::
|
||
* Owner::
|
||
* Mode Bits::
|
||
* Contents::
|
||
* Directories::
|
||
* Filesystems::
|
||
* Combining Primaries With Operators::
|
||
@end menu
|
||
|
||
@node find Expressions
|
||
@section @code{find} Expressions
|
||
|
||
The expression that @code{find} uses to select files consists of one
|
||
or more @dfn{primaries}, each of which is a separate command line
|
||
argument to @code{find}. @code{find} evaluates the expression each
|
||
time it processes a file. An expression can contain any of the
|
||
following types of primaries:
|
||
|
||
@table @dfn
|
||
@item options
|
||
affect overall operation rather than the processing of a specific
|
||
file;
|
||
@item tests
|
||
return a true or false value, depending on the file's attributes;
|
||
@item actions
|
||
have side effects and return a true or false value; and
|
||
@item operators
|
||
connect the other arguments and affect when and whether they are
|
||
evaluated.
|
||
@end table
|
||
|
||
You can omit the operator between two primaries; it defaults to
|
||
@samp{-and}. @xref{Combining Primaries With Operators}, for ways to
|
||
connect primaries into more complex expressions.
|
||
|
||
The @samp{-print} action is performed on all files for which the
|
||
entire expression is true (@pxref{Print File Name}), unless the
|
||
expression contains an action other than @samp{-prune} or
|
||
@samp{-quit}.
|
||
Actions which inhibit the default @samp{-print} are
|
||
@samp{-delete},
|
||
@samp{-execdir},
|
||
@samp{-exec},
|
||
@samp{-fls},
|
||
@samp{-fprint0},
|
||
@samp{-fprintf},
|
||
@samp{-fprint},
|
||
@samp{-ls},
|
||
@samp{-okdir},
|
||
@samp{-ok},
|
||
@samp{-print0},
|
||
@samp{-printf}
|
||
and @samp{-print}.
|
||
|
||
Options take effect immediately, rather than being evaluated for each
|
||
file when their place in the expression is reached. Therefore, for
|
||
clarity, it is best to place them at the beginning of the expression.
|
||
There are exceptions to this; @samp{-regextype}, @samp{-daystart} and @samp{-follow}
|
||
have different effects depending on where in the command line they
|
||
appear. This can be confusing, so it's best to keep them at the
|
||
beginning, too.
|
||
|
||
Many of the primaries take arguments, which immediately follow them in
|
||
the next command line argument to @code{find}. Some arguments are
|
||
file names, patterns, or other strings; others are numbers. Numeric
|
||
arguments can be specified as
|
||
|
||
@table @code
|
||
@item +@var{n}
|
||
for greater than @var{n},
|
||
@item -@var{n}
|
||
for less than @var{n},
|
||
@item @var{n}
|
||
for exactly @var{n}.
|
||
@end table
|
||
|
||
|
||
@node Starting points
|
||
@section Starting points
|
||
|
||
GNU @code{find} searches the directory tree rooted at each given starting-point
|
||
by evaluating the given expression from left to right, according to the
|
||
rules of operator precedence, until the outcome is known (the left hand side
|
||
is false for @samp{and} operations, true for @samp{or}), at which point
|
||
@code{find} moves on to the next file name.
|
||
|
||
If no starting-point is specified, the current directory @samp{.} is assumed.
|
||
|
||
A double dash @samp{--} could theoretically be used to signal that any remaining
|
||
arguments are not options, but this does not really work due to the way
|
||
@code{find} determines the end of the list of starting point arguments:
|
||
it does that by reading until an expression argument comes (which also starts
|
||
with a @samp{-}).
|
||
Now, if a starting point argument would begin with a @samp{-}, then @code{find}
|
||
would treat it as expression argument instead.
|
||
Thus, to ensure that all start points are taken as such, and especially to
|
||
prevent that wildcard patterns expanded by the calling shell are not mistakenly
|
||
treated as expression arguments, it is generally safer to prefix wildcards
|
||
or dubious path names with either @samp{./}, or to use absolute path names
|
||
starting with @samp{/}.
|
||
|
||
Alternatively, it is generally safe though non-portable to use the GNU option
|
||
@samp{-files0-from} to pass arbitrary starting points to @code{find}.
|
||
|
||
@deffn Option -files0-from file
|
||
|
||
Read the starting points from @file{file} instead of getting them on the
|
||
command line.
|
||
In contrast to the known limitations of passing starting points via arguments
|
||
on the command line, namely the limitation of the amount of file names,
|
||
and the inherent ambiguity of file names clashing with option names,
|
||
using this option allows to safely pass an arbitrary number of starting points
|
||
to @code{find}.
|
||
|
||
Using this option and passing starting points on the command line is mutually
|
||
exclusive, and is therefore not allowed at the same time.
|
||
|
||
The @file{file} argument is mandatory.
|
||
One can use @samp{-files0-from -} to read the list of starting points from the
|
||
standard input stream, and e.g. from a pipe.
|
||
In this case, the actions @samp{-ok} and @samp{-okdir} are not allowed,
|
||
because they would obviously interfere with reading from standard input
|
||
in order to get a user confirmation.
|
||
|
||
The starting points in @file{file} have to be separated by ASCII NUL characters.
|
||
Two consecutive NUL characters, i.e., a starting point with a Zero-length
|
||
file name is not allowed and will lead to an error diagnostic followed by
|
||
a non-Zero exit code later.
|
||
|
||
In the case the given @file{file} is empty, @code{find} does not process any
|
||
starting point and therefore will exit immediately after parsing the program
|
||
arguments.
|
||
This is unlike the standard invocation where @code{find} assumes the current
|
||
directory as starting point if no path argument is passed.
|
||
|
||
The processing of the starting points is otherwise as usual, e.g. @code{find}
|
||
will recurse into subdirectories unless otherwise prevented.
|
||
To process only the starting points, one can additionally pass @samp{-maxdepth 0}.
|
||
|
||
Further notes:
|
||
if a file is listed more than once in the input file, it is unspecified
|
||
whether it is visited more than once.
|
||
If the @file{file} is mutated during the operation of @code{find}, the result
|
||
is unspecified as well.
|
||
Finally, the seek position within the named @samp{file} at the time @code{find}
|
||
exits, be it with @samp{-quit} or in any other way, is also unspecified.
|
||
By "unspecified" here is meant that it may or may not work or do any specific
|
||
thing, and that the behavior may change from platform to platform, or from
|
||
findutils release to release.
|
||
|
||
Example:
|
||
Given that another program @code{proggy} pre-filters and creates a huge
|
||
NUL-separated list of files, process those as starting points, and find
|
||
all regular, empty files among them:
|
||
@example
|
||
$ proggy | find -files0-from - -maxdepth 0 -type f -empty
|
||
@end example
|
||
The use of @samp{-files0-from -} means to read the names of the starting points
|
||
from standard input, i.e., from the pipe; and @samp{-maxdepth 0} ensures that
|
||
only explicitly those entries are examined without recursing into directories
|
||
(in the case one of the starting points is one).
|
||
@end deffn
|
||
|
||
|
||
@node Name
|
||
@section Name
|
||
|
||
Here are ways to search for files whose name matches a certain
|
||
pattern. @xref{Shell Pattern Matching}, for a description of the
|
||
@var{pattern} arguments to these tests.
|
||
|
||
Each of these tests has a case-sensitive version and a
|
||
case-insensitive version, whose name begins with @samp{i}. In a
|
||
case-insensitive comparison, the patterns @samp{fo*} and @samp{F??}
|
||
match the file names @file{Foo}, @samp{FOO}, @samp{foo}, @samp{fOo},
|
||
etc.
|
||
|
||
@menu
|
||
* Base Name Patterns::
|
||
* Full Name Patterns::
|
||
* Fast Full Name Search::
|
||
* Shell Pattern Matching:: Wildcards used by these programs.
|
||
@end menu
|
||
|
||
@node Base Name Patterns
|
||
@subsection Base Name Patterns
|
||
|
||
@deffn Test -name pattern
|
||
@deffnx Test -iname pattern
|
||
True if the base of the file name (the path with the leading
|
||
directories removed) matches shell pattern @var{pattern}. For
|
||
@samp{-iname}, the match is case-insensitive.@footnote{Because we
|
||
need to perform case-insensitive matching, the GNU fnmatch
|
||
implementation is always used; if the C library includes the GNU
|
||
implementation, we use that and otherwise we use the one from gnulib}
|
||
To ignore a whole directory tree, use @samp{-prune}
|
||
(@pxref{Directories}). As an example, to find Texinfo source files in
|
||
@file{/usr/local/doc}:
|
||
|
||
@example
|
||
find /usr/local/doc -name '*.texi'
|
||
@end example
|
||
|
||
Notice that the wildcard must be enclosed in quotes in order to
|
||
protect it from expansion by the shell.
|
||
|
||
As of findutils version 4.2.2, patterns for @samp{-name} and
|
||
@samp{-iname} match a file name with a leading @samp{.}. For
|
||
example the command @samp{find /tmp -name \*bar} match the file
|
||
@file{/tmp/.foobar}. Braces within the pattern (@samp{@{@}}) are not
|
||
considered to be special (that is, @code{find . -name 'foo@{1,2@}'}
|
||
matches a file named @file{foo@{1,2@}}, not the files @file{foo1} and
|
||
@file{foo2}.
|
||
|
||
Because the leading directories are removed, the file names considered
|
||
for a match with @samp{-name} will never include a slash, so
|
||
@samp{-name a/b} will never match anything (you probably need to use
|
||
@samp{-path} instead).
|
||
|
||
The @samp{-iname} option appeared first in POSIX Issue 8 (IEEE Std 1003.1-2024)
|
||
while GNU @code{find} supports it since version 3.8 (1993).
|
||
@end deffn
|
||
|
||
|
||
@node Full Name Patterns
|
||
@subsection Full Name Patterns
|
||
|
||
@deffn Test -path pattern
|
||
@deffnx Test -wholename pattern
|
||
True if the entire file name, starting with the command line argument
|
||
under which the file was found, matches shell pattern @var{pattern}.
|
||
To ignore a whole directory tree, use @samp{-prune} rather than
|
||
checking every file in the tree (@pxref{Directories}). The ``entire
|
||
file name'' as used by @code{find} starts with the starting-point
|
||
specified on the command line, and is not converted to an absolute
|
||
pathname, so for example @code{cd /; find tmp -wholename /tmp} will
|
||
never match anything.
|
||
|
||
Find compares the @samp{-path} argument with the concatenation of a
|
||
directory name and the base name of the file it's considering.
|
||
Since the concatenation will never end with a slash, @samp{-path}
|
||
arguments ending in @samp{/} will match nothing (except perhaps a
|
||
start point specified on the command line).
|
||
|
||
The name @samp{-wholename} is GNU-specific, but @samp{-path} is more
|
||
portable: the latter is supported by HP-UX @code{find} and is part of the
|
||
POSIX standard (since IEEE Std 1003.1-2008).
|
||
|
||
@end deffn
|
||
|
||
@deffn Test -ipath pattern
|
||
@deffnx Test -iwholename pattern
|
||
These tests are like @samp{-wholename} and @samp{-path}, but the match
|
||
is case-insensitive.
|
||
@end deffn
|
||
|
||
|
||
In the context of the tests @samp{-path}, @samp{-wholename},
|
||
@samp{-ipath} and @samp{-iwholename}, a ``full path'' is the name of
|
||
all the directories traversed from @code{find}'s start point to the
|
||
file being tested, followed by the base name of the file itself.
|
||
These paths are often not absolute paths; for example
|
||
|
||
@example
|
||
$ cd /tmp
|
||
$ mkdir -p foo/bar/baz
|
||
$ find foo -path foo/bar -print
|
||
foo/bar
|
||
$ find foo -path /tmp/foo/bar -print
|
||
$ find /tmp/foo -path /tmp/foo/bar -print
|
||
/tmp/foo/bar
|
||
@end example
|
||
|
||
Notice that the second @code{find} command prints nothing, even though
|
||
@file{/tmp/foo/bar} exists and was examined by @code{find}.
|
||
|
||
Unlike file name expansion on the command line, a @samp{*} in the pattern
|
||
will match both @samp{/} and leading dots in file names:
|
||
|
||
@example
|
||
$ find . -path '*f'
|
||
./quux/bar/baz/f
|
||
$ find . -path '*/*config'
|
||
./quux/bar/baz/.config
|
||
@end example
|
||
|
||
|
||
@deffn Test -regex expr
|
||
@deffnx Test -iregex expr
|
||
True if the entire file name matches regular expression @var{expr}.
|
||
This is a match on the whole path, not a search. For example, to
|
||
match a file named @file{./fubar3}, you can use the regular expression
|
||
@samp{.*bar.} or @samp{.*b.*3}, but not @samp{f.*r3}.
|
||
For @samp{-iregex}, the match is case-insensitive.
|
||
|
||
As for @samp{-path}, the candidate file name never ends with a slash,
|
||
so regular expressions which only match something that ends in slash
|
||
will always fail.
|
||
|
||
There are several varieties of regular expressions; by default this
|
||
test uses GNU Emacs regular expressions, but this can be changed with
|
||
the option @samp{-regextype}.
|
||
@end deffn
|
||
|
||
@deffn Option -regextype name
|
||
This option controls the variety of regular expression syntax
|
||
understood by the @samp{-regex} and @samp{-iregex} tests. This option
|
||
is positional; that is, it only affects regular expressions which
|
||
occur later in the command line. If this option is not given, GNU
|
||
Emacs regular expressions are assumed. Currently-implemented types
|
||
are
|
||
|
||
|
||
@table @samp
|
||
@item emacs
|
||
Regular expressions compatible with GNU Emacs; this is also the
|
||
default behaviour if this option is not used.
|
||
@item posix-awk
|
||
Regular expressions compatible with the POSIX awk command (not GNU awk)
|
||
@item posix-basic
|
||
POSIX Basic Regular Expressions.
|
||
@item posix-egrep
|
||
Regular expressions compatible with the POSIX egrep command
|
||
@item posix-extended
|
||
POSIX Extended Regular Expressions
|
||
@end table
|
||
|
||
@ref{Regular Expressions} for more information on the regular
|
||
expression dialects understood by GNU findutils.
|
||
|
||
|
||
@end deffn
|
||
|
||
@node Fast Full Name Search
|
||
@subsection Fast Full Name Search
|
||
|
||
To search for files by name without having to actually scan the
|
||
directories on the disk (which can be slow), you can use the
|
||
@code{locate} program. For each shell pattern you give it,
|
||
@code{locate} searches one or more databases of file names and
|
||
displays the file names that contain the pattern. @xref{Shell Pattern
|
||
Matching}, for details about shell patterns.
|
||
|
||
If a pattern is a plain string -- it contains no
|
||
metacharacters -- @code{locate} displays all file names in the database
|
||
that contain that string. If a pattern contains
|
||
metacharacters, @code{locate} only displays file names that match the
|
||
pattern exactly. As a result, patterns that contain metacharacters
|
||
should usually begin with a @samp{*}, and will most often end with one
|
||
as well. The exceptions are patterns that are intended to explicitly
|
||
match the beginning or end of a file name.
|
||
|
||
If you only want @code{locate} to match against the last component of
|
||
the file names (the ``base name'' of the files) you can use the
|
||
@samp{--basename} option. The opposite behaviour is the default, but
|
||
can be selected explicitly by using the option @samp{--wholename}.
|
||
|
||
The command
|
||
@example
|
||
locate @var{pattern}
|
||
@end example
|
||
|
||
is almost equivalent to
|
||
@example
|
||
find @var{directories} -name @var{pattern}
|
||
@end example
|
||
|
||
where @var{directories} are the directories for which the file name
|
||
databases contain information. The differences are that the
|
||
@code{locate} information might be out of date, and that by default
|
||
@code{locate} matches wildcards against the whole file name (not just
|
||
its base name) (@pxref{Shell Pattern Matching}).
|
||
|
||
The file name databases contain lists of files that were on the system
|
||
when the databases were last updated. The system administrator can
|
||
choose the file name of the default database, the frequency with which
|
||
the databases are updated, and the directories for which they contain
|
||
entries.
|
||
|
||
Here is how to select which file name databases @code{locate}
|
||
searches. The default is system-dependent. At the time this document
|
||
was generated, the default was @file{@value{LOCATE_DB}}.
|
||
|
||
@table @code
|
||
@item --database=@var{path}
|
||
@itemx -d @var{path}
|
||
Instead of searching the default file name database, search the file
|
||
name databases in @var{path}, which is a colon-separated list of
|
||
database file names. You can also use the environment variable
|
||
@env{LOCATE_PATH} to set the list of database files to search. The
|
||
option overrides the environment variable if both are used.
|
||
@end table
|
||
|
||
GNU @code{locate} can read file name databases generated by the
|
||
@code{slocate} package. However, these generally contain a list of
|
||
all the files on the system, and so when using this database,
|
||
@code{locate} will produce output only for files which are accessible
|
||
to you. @xref{Invoking locate}, for a description of the
|
||
@samp{--existing} option which is used to do this.
|
||
|
||
The @code{updatedb} program can also generate database in a format
|
||
compatible with @code{slocate}. @xref{Invoking updatedb}, for a
|
||
description of its @samp{--dbformat} and @samp{--output} options.
|
||
|
||
|
||
@node Shell Pattern Matching
|
||
@subsection Shell Pattern Matching
|
||
|
||
@code{find} and @code{locate} can compare file names, or parts of file
|
||
names, to shell patterns. A @dfn{shell pattern} is a string that may
|
||
contain the following special characters, which are known as
|
||
@dfn{wildcards} or @dfn{metacharacters}.
|
||
|
||
You must quote patterns that contain metacharacters to prevent the
|
||
shell from expanding them itself. Double and single quotes both work;
|
||
so does escaping with a backslash.
|
||
|
||
@table @code
|
||
@item *
|
||
Matches any zero or more characters.
|
||
|
||
@item ?
|
||
Matches any one character.
|
||
|
||
@item [@var{string}]
|
||
Matches exactly one character that is a member of the string
|
||
@var{string}. This is called a @dfn{character class}. As a
|
||
shorthand, @var{string} may contain ranges, which consist of two
|
||
characters with a dash between them. For example, the class
|
||
@samp{[a-z0-9_]} matches a lowercase letter, a number, or an
|
||
underscore. You can negate a class by placing a @samp{!} or @samp{^}
|
||
immediately after the opening bracket. Thus, @samp{[^A-Z@@]} matches
|
||
any character except an uppercase letter or an at sign.
|
||
|
||
@item \
|
||
Removes the special meaning of the character that follows it. This
|
||
works even in character classes.
|
||
@end table
|
||
|
||
In the @code{find} tests that do shell pattern matching (@samp{-name},
|
||
@samp{-wholename}, etc.), wildcards in the pattern will match a
|
||
@samp{.} at the beginning of a file name. This is also the case for
|
||
@code{locate}. Thus, @samp{find -name '*macs'} will match a file
|
||
named @file{.emacs}, as will @samp{locate '*macs'}.
|
||
|
||
Slash characters have no special significance in the shell pattern
|
||
matching that @code{find} and @code{locate} do, unlike in the shell,
|
||
in which wildcards do not match them. Therefore, a pattern
|
||
@samp{foo*bar} can match a file name @samp{foo3/bar}, and a pattern
|
||
@samp{./sr*sc} can match a file name @samp{./src/misc}.
|
||
|
||
If you want to locate some files with the @samp{locate} command but
|
||
don't need to see the full list you can use the @samp{--limit} option
|
||
to see just a small number of results, or the @samp{--count} option to
|
||
display only the total number of matches.
|
||
|
||
@node Links
|
||
@section Links
|
||
|
||
There are two ways that files can be linked together. @dfn{Symbolic
|
||
links} are a special type of file whose contents are a portion of the
|
||
name of another file. @dfn{Hard links} are multiple directory entries
|
||
for one file; the file names all have the same index node
|
||
(@dfn{inode}) number on the disk.
|
||
|
||
@menu
|
||
* Symbolic Links::
|
||
* Hard Links::
|
||
@end menu
|
||
|
||
@node Symbolic Links
|
||
@subsection Symbolic Links
|
||
|
||
Symbolic links are names that reference other files. GNU @code{find}
|
||
will handle symbolic links in one of two ways; firstly, it can
|
||
dereference the links for you - this means that if it comes across a
|
||
symbolic link, it examines the file that the link points to, in order
|
||
to see if it matches the criteria you have specified. Secondly, it
|
||
can check the link itself in case you might be looking for the actual
|
||
link. If the file that the symbolic link points to is also within the
|
||
directory hierarchy you are searching with the @code{find} command,
|
||
you may not see a great deal of difference between these two
|
||
alternatives.
|
||
|
||
By default, @code{find} examines symbolic links themselves when it
|
||
finds them (and, if it later comes across the linked-to file, it will
|
||
examine that, too). If you would prefer @code{find} to dereference
|
||
the links and examine the file that each link points to, specify the
|
||
@samp{-L} option to @code{find}. You can explicitly specify the
|
||
default behaviour by using the @samp{-P} option. The @samp{-H}
|
||
option is a half-way-between option which ensures that any symbolic
|
||
links listed on the command line are dereferenced, but other symbolic
|
||
links are not.
|
||
|
||
Symbolic links are different from ``hard links'' in the sense that you
|
||
need permission to search the directories
|
||
in the linked-to file name to
|
||
dereference the link. This can mean that even if you specify the
|
||
@samp{-L} option, @code{find} may not be able to determine the
|
||
properties of the file that the link points to (because you don't have
|
||
sufficient permission). In this situation, @code{find} uses the
|
||
properties of the link itself. This also occurs if a symbolic link
|
||
exists but points to a file that is missing, or where the symbolic
|
||
link points to itself (directly or indirectly).
|
||
|
||
The options controlling the behaviour of @code{find} with respect to
|
||
links are as follows:
|
||
|
||
@table @samp
|
||
@item -P
|
||
@code{find} does not dereference symbolic links at all. This is the
|
||
default behaviour. This option must be specified before any of the
|
||
file names on the command line.
|
||
@item -H
|
||
@code{find} does not dereference symbolic links (except in the case of
|
||
file names on the command line, which are dereferenced). If a
|
||
symbolic link cannot be dereferenced, the information for the symbolic
|
||
link itself is used. This option must be specified before any of the
|
||
file names on the command line.
|
||
@item -L
|
||
@code{find} dereferences symbolic links where possible, and where this
|
||
is not possible it uses the properties of the symbolic link itself.
|
||
This option must be specified before any of the file names on the
|
||
command line. Use of this option also implies the same behaviour as
|
||
the @samp{-noleaf} option. If you later use the @samp{-H} or
|
||
@samp{-P} options, this does not turn off @samp{-noleaf}.
|
||
|
||
Actions that can cause symbolic links to become broken while
|
||
@samp{find} is executing (for example @samp{-delete}) can give rise to
|
||
confusing behaviour. Take for example the command line
|
||
@samp{find -L . -type d -delete}. This will delete empty
|
||
directories. If a subtree includes only directories and symbolic
|
||
links to directories, this command may still not successfully delete
|
||
it, since deletion of the target of the symbolic link will cause the
|
||
symbolic link to become broken and @samp{-type d} is false for broken
|
||
symbolic links.
|
||
|
||
@item -follow
|
||
This option forms part of the ``expression'' and must be specified
|
||
after the file names, but it is otherwise equivalent to @samp{-L}.
|
||
The @samp{-follow} option affects only those tests which appear after
|
||
it on the command line. This option is deprecated. Where possible,
|
||
you should use @samp{-L} instead.
|
||
@end table
|
||
|
||
The following differences in behaviour occur when the @samp{-L} option
|
||
is used:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
@code{find} follows symbolic links to directories when searching
|
||
directory trees.
|
||
@item
|
||
@samp{-lname} and @samp{-ilname} always return false (unless they
|
||
happen to match broken symbolic links).
|
||
@item
|
||
@samp{-type} reports the types of the files that symbolic links point
|
||
to. This means that in combination with @samp{-L}, @samp{-type l}
|
||
will be true only for broken symbolic links. To check for symbolic
|
||
links when @samp{-L} has been specified, use @samp{-xtype l}.
|
||
@item
|
||
Implies @samp{-noleaf} (@pxref{Directories}).
|
||
@end itemize
|
||
|
||
If the @samp{-L} option or the @samp{-H} option is used,
|
||
the file names used as arguments to @samp{-newer}, @samp{-anewer}, and
|
||
@samp{-cnewer} are dereferenced and the timestamp from the pointed-to
|
||
file is used instead (if possible -- otherwise the timestamp from the
|
||
symbolic link is used).
|
||
|
||
@deffn Test -lname pattern
|
||
@deffnx Test -ilname pattern
|
||
True if the file is a symbolic link whose contents match shell pattern
|
||
@var{pattern}. For @samp{-ilname}, the match is case-insensitive.
|
||
@xref{Shell Pattern Matching}, for details about the @var{pattern}
|
||
argument. If the @samp{-L} option is in effect, this test will always
|
||
return false for symbolic links unless they are broken. So, to list
|
||
any symbolic links to @file{sysdep.c} in the current directory and its
|
||
subdirectories, you can do:
|
||
|
||
@example
|
||
find . -lname '*sysdep.c'
|
||
@end example
|
||
@end deffn
|
||
|
||
@node Hard Links
|
||
@subsection Hard Links
|
||
|
||
Hard links allow more than one name to refer to the same file on a
|
||
file system, i.e., to the same inode. To find all the names which refer
|
||
to the same file as @var{name}, use @samp{-samefile @var{name}}.
|
||
|
||
@deffn Test -samefile @var{name}
|
||
True if the file is a hard link to the same inode as @var{name}.
|
||
This implies that @var{name} and the file reside on the same file system,
|
||
i.e., they have the same device number.
|
||
|
||
Unless the @samp{-L} option is also given to follow symbolic links, one may
|
||
confine the search to one file system by using the @samp{-xdev} option.
|
||
This is useful because hard links cannot point outside a single file system,
|
||
so this can cut down on needless searching.
|
||
|
||
If the @samp{-L} option is in effect, then dereferencing of symbolic links
|
||
applies both to the @var{name} argument of the @samp{-samefile} primary and
|
||
to each file examined during the traversal of the directory hierarchy.
|
||
Therefore, @samp{find -L -samefile @var{name}} will find both hard links and
|
||
symbolic links pointing to the file referenced by @var{name}.
|
||
@end deffn
|
||
|
||
@command{find} also allows searching for files by inode number.
|
||
|
||
This can occasionally be useful in diagnosing problems with file systems;
|
||
for example, @command{fsck} and @command{lsof} tend to print inode numbers.
|
||
Inode numbers also occasionally turn up in log messages for some types of
|
||
software.
|
||
|
||
You can learn a file's inode number and the number of links to it by
|
||
running @samp{ls -li}, @samp{stat} or @samp{find -ls}.
|
||
|
||
You can search for hard links to inode number NUM by using @samp{-inum
|
||
NUM}. If there are any file system mount points below the directory
|
||
where you are starting the search, use the @samp{-xdev} option unless
|
||
you are also using the @samp{-L} option. Using @samp{-xdev} saves
|
||
needless searching, since hard links to a file must be on the
|
||
same file system. @xref{Filesystems}.
|
||
|
||
@deffn Test -inum n
|
||
True if the file has inode number @var{n}. The @samp{+} and @samp{-} qualifiers
|
||
also work, though these are rarely useful.
|
||
|
||
Please note that the @samp{-inum} primary simply compares the inode number
|
||
against the given @var{n}.
|
||
This means that a search for a certain inode number in several file systems
|
||
may return several files with that inode number, but as each file system has
|
||
its own device number, those files are not necessarily hard links to the
|
||
same file.
|
||
|
||
Therefore, it is much of the time easier to use @samp{-samefile} rather than
|
||
this option.
|
||
@end deffn
|
||
|
||
@command{find} also allows searching for files that have a certain number of
|
||
links, with @samp{-links}.
|
||
|
||
A directory normally has at least two hard links: the entry named in its parent
|
||
directory, and the @file{.} entry inside of the directory.
|
||
If a directory has subdirectories, each of those also has a hard link called
|
||
@file{..} to its parent directory.
|
||
|
||
The @file{.} and @file{..} directory entries are not normally searched unless
|
||
they are mentioned on the @code{find} command line.
|
||
|
||
@deffn Test -links n
|
||
File has @var{n} hard links.
|
||
@end deffn
|
||
|
||
@deffn Test -links +n
|
||
File has more than @var{n} hard links.
|
||
@end deffn
|
||
|
||
@deffn Test -links -n
|
||
File has fewer than @var{n} hard links.
|
||
@end deffn
|
||
|
||
@node Time
|
||
@section Time
|
||
|
||
Each file has three timestamps, which record the last time that
|
||
certain operations were performed on the file:
|
||
|
||
@enumerate
|
||
@item
|
||
access (read the file's contents)
|
||
@item
|
||
change the status (modify the file or its attributes)
|
||
@item
|
||
modify (change the file's contents)
|
||
@end enumerate
|
||
|
||
Some systems also provide a timestamp that indicates when a file was
|
||
@emph{created}. For example, the UFS2 filesystem under NetBSD-3.1
|
||
records the @emph{birth time} of each file. This information is also
|
||
available under other versions of BSD and some versions of Cygwin.
|
||
However, even on systems which support file birth time, files may
|
||
exist for which this information was not recorded (for example, UFS1
|
||
file systems simply do not contain this information).
|
||
|
||
You can search for files whose timestamps are within a certain age
|
||
range, or compare them to other timestamps.
|
||
|
||
@menu
|
||
* Age Ranges::
|
||
* Comparing Timestamps::
|
||
@end menu
|
||
|
||
@node Age Ranges
|
||
@subsection Age Ranges
|
||
|
||
These tests are mainly useful with ranges (@samp{+@var{n}} and
|
||
@samp{-@var{n}}).
|
||
|
||
@deffn Test -atime n
|
||
@deffnx Test -ctime n
|
||
@deffnx Test -mtime n
|
||
True if the file was last accessed (or its status changed, or it was
|
||
modified) @var{n}*24 hours ago. The number of 24-hour periods since
|
||
the file's timestamp is always rounded down; therefore 0 means ``less
|
||
than 24 hours ago'', 1 means ``between 24 and 48 hours ago'', and so
|
||
forth. Fractional values are supported but this only really makes
|
||
sense for the case where ranges (@samp{+@var{n}} and @samp{-@var{n}})
|
||
are used.
|
||
@end deffn
|
||
|
||
@deffn Test -amin n
|
||
@deffnx Test -cmin n
|
||
@deffnx Test -mmin n
|
||
True if the file was last accessed (or its status changed, or it was
|
||
modified) @var{n} minutes ago. These tests provide finer granularity
|
||
of measurement than @samp{-atime} et al., but rounding is done in a
|
||
similar way (again, fractions are supported). For example, to list
|
||
files in @file{/u/bill} that were last read from 2 to 6 minutes ago:
|
||
|
||
@example
|
||
find /u/bill -amin +2 -amin -6
|
||
@end example
|
||
@end deffn
|
||
|
||
@deffn Option -daystart
|
||
Measure times from the beginning of today rather than from 24 hours
|
||
ago. So, to list the regular files in your home directory that were
|
||
modified yesterday, do
|
||
|
||
@example
|
||
find ~/ -daystart -type f -mtime 1
|
||
@end example
|
||
|
||
The @samp{-daystart} option is unlike most other options in that it
|
||
has an effect on the way that other tests are performed. The affected
|
||
tests are @samp{-amin}, @samp{-cmin}, @samp{-mmin}, @samp{-atime},
|
||
@samp{-ctime} and @samp{-mtime}. The @samp{-daystart} option only
|
||
affects the behaviour of any tests which appear after it on the
|
||
command line.
|
||
@end deffn
|
||
|
||
@node Comparing Timestamps
|
||
@subsection Comparing Timestamps
|
||
|
||
@deffn Test -newerXY reference
|
||
Succeeds if timestamp @samp{X} of the file being considered is newer
|
||
than timestamp @samp{Y} of the file @file{reference}.
|
||
Fails if the timestamps are @emph{equal}.
|
||
The letters @samp{X} and @samp{Y} can be any of the following letters:
|
||
|
||
@table @samp
|
||
@item a
|
||
Last-access time of @file{reference}
|
||
@item B
|
||
Birth time of @file{reference} (when this is not known, the test cannot succeed)
|
||
@item c
|
||
Last-change time of @file{reference}
|
||
@item m
|
||
Last-modification time of @file{reference}
|
||
@item t
|
||
The @file{reference} argument is interpreted as a literal time, rather
|
||
than the name of a file. @xref{Date input formats}, for a description
|
||
of how the timestamp is understood. Tests of the form @samp{-newerXt}
|
||
are valid but tests of the form @samp{-newertY} are not.
|
||
@end table
|
||
|
||
For example the test @code{-newerac /tmp/foo} succeeds for all files
|
||
which have been accessed more recently than @file{/tmp/foo} was
|
||
changed. Here @samp{X} is @samp{a} and @samp{Y} is @samp{c}.
|
||
|
||
Not all files have a known birth time. If @samp{Y} is @samp{B} and
|
||
the birth time of @file{reference} is not available, @code{find} exits
|
||
with an explanatory error message. If @samp{X} is @samp{B} and we do
|
||
not know the birth time the file currently being considered, the test
|
||
simply fails (that is, it behaves like @code{-false} does).
|
||
|
||
Some operating systems (for example, most implementations of Unix) do
|
||
not support file birth times. Some others, for example NetBSD-3.1,
|
||
do. Even on operating systems which support file birth times, the
|
||
information may not be available for specific files. For example,
|
||
under NetBSD, file birth times are supported on UFS2 file systems, but
|
||
not UFS1 file systems.
|
||
|
||
@end deffn
|
||
|
||
|
||
|
||
There are two ways to list files in @file{/usr} modified after
|
||
@emph{the start} of February 1 of the current year. One uses
|
||
@samp{-newermt}:
|
||
|
||
@example
|
||
find /usr -newermt "Feb 1"
|
||
@end example
|
||
|
||
The other way of doing this works on the versions of find before 4.3.3:
|
||
|
||
@c Idea from Rick Sladkey.
|
||
@example
|
||
touch -t 02010000 /tmp/stamp$$
|
||
find /usr -newer /tmp/stamp$$
|
||
rm -f /tmp/stamp$$
|
||
@end example
|
||
|
||
@deffn Test -anewer reference
|
||
@deffnx Test -cnewer reference
|
||
@deffnx Test -newer reference
|
||
True if the time of the last access (or status change or data modification)
|
||
of the current file is more recent than that of the last data modification
|
||
of the @var{reference} file.
|
||
False if the timestamps are equal.
|
||
As such, @samp{-anewer} is equivalent to @samp{-neweram},
|
||
@samp{-cnewer} to @samp{-newercm}, and @samp{-newer} to @samp{-newermm}.
|
||
|
||
If @var{reference} is a symbolic link and the @samp{-H} option or the @samp{-L}
|
||
option is in effect, then the time of the last data modification of the file
|
||
it points to is always used.
|
||
|
||
These tests are affected by @samp{-follow} only if @samp{-follow} comes before
|
||
them on the command line. @xref{Symbolic Links}, for more information on
|
||
@samp{-follow}.
|
||
|
||
As an example, to list any files modified since
|
||
@file{/bin/sh} was last modified:
|
||
|
||
@example
|
||
find . -newer /bin/sh
|
||
@end example
|
||
@end deffn
|
||
|
||
@deffn Test -used n
|
||
True if the file was last accessed @var{n} days after its status was
|
||
last changed. Useful for finding files that are not being used, and
|
||
could perhaps be archived or removed to save disk space.
|
||
@end deffn
|
||
|
||
@node Size
|
||
@section Size
|
||
|
||
@deffn Test -size n@r{[}bckwMG@r{]}
|
||
True if the file uses @var{n} units of space, rounding up. The units
|
||
are 512-byte blocks by default, but they can be changed by adding a
|
||
one-character suffix to @var{n}:
|
||
|
||
@table @code
|
||
@item b
|
||
512-byte blocks (never 1024)
|
||
@item c
|
||
bytes
|
||
@item w
|
||
2-byte words
|
||
@item k
|
||
Kibibytes (KiB, units of 1024 bytes)
|
||
@item M
|
||
Mebibytes (MiB, units of 1024 * 1024 = 1048576 bytes)
|
||
@item G
|
||
Gibibytes (GiB, units of 1024 * 1024 * 1024 = 1073741824 bytes)
|
||
@end table
|
||
|
||
The `b' suffix always considers blocks to be 512 bytes. This is not
|
||
affected by the setting (or non-setting) of the @env{POSIXLY_CORRECT}
|
||
environment variable. This behaviour is different from the behaviour of
|
||
the @samp{-ls} action). If you want to use 1024-byte units, use the
|
||
`k' suffix instead.
|
||
|
||
The number can be prefixed with a `+' or a `-'. A plus sign indicates
|
||
that the test should succeed if the file uses at least @var{n} units
|
||
of storage (a common use of this test) and a minus sign
|
||
indicates that the test should succeed if the file uses less than
|
||
@var{n} units of storage; i.e., an exact size of @var{n} units does not match.
|
||
Bear in mind that the size is rounded up to
|
||
the next unit. Therefore @samp{-size -1M} is not equivalent to
|
||
@samp{-size -1048576c}. The former only matches empty files, the latter
|
||
matches files from 0 to 1,048,575 bytes. There is no `=' prefix, because
|
||
that's the default anyway.
|
||
|
||
The size is simply the st_size member of the struct stat populated by
|
||
the lstat (or stat) system call, rounded up as shown above. In other words, it's
|
||
consistent with the result you get for @samp{ls -l}.
|
||
This handling of sparse files differs from the output of the @samp{%k}
|
||
and @samp{%b} format specifiers for the @samp{-printf} predicate.
|
||
|
||
@end deffn
|
||
|
||
@deffn Test -empty
|
||
True if the file is empty and is either a regular file or a directory.
|
||
This might help determine good candidates for deletion. This test is
|
||
useful with @samp{-depth} (@pxref{Directories}) and @samp{-delete}
|
||
(@pxref{Single File}).
|
||
@end deffn
|
||
|
||
@node Type
|
||
@section Type
|
||
|
||
@deffn Test -type c
|
||
True if the file is of type @var{c}:
|
||
|
||
@table @code
|
||
@item b
|
||
block (buffered) special
|
||
@item c
|
||
character (unbuffered) special
|
||
@item d
|
||
directory
|
||
@item p
|
||
named pipe (FIFO)
|
||
@item f
|
||
regular file
|
||
@item l
|
||
symbolic link; if @samp{-L} is in effect, this is true only for broken
|
||
symbolic links. If you want to search for symbolic links when
|
||
@samp{-L} is in effect, use @samp{-xtype} instead of @samp{-type}.
|
||
@item s
|
||
socket
|
||
@item D
|
||
door (Solaris)
|
||
@end table
|
||
|
||
As a GNU extension, multiple file types can be provided as a combined list
|
||
separated by comma @samp{,}. For example, @samp{-type f,d,l} is logically
|
||
interpreted as @samp{( -type f -o -type d -o -type l )}.
|
||
@end deffn
|
||
|
||
@deffn Test -xtype c
|
||
This test behaves the same as @samp{-type} unless the file is a
|
||
symbolic link. If the file is a symbolic link, the result is as
|
||
follows (in the table below, @samp{X} should be understood to
|
||
represent any letter except @samp{l}):
|
||
|
||
@table @samp
|
||
@item @samp{-P -xtype l}
|
||
True if the symbolic link is broken or has an infinite loop
|
||
@item @samp{-P -xtype X}
|
||
True if the (ultimate) target file is of type @samp{X}.
|
||
@item @samp{-L -xtype l}
|
||
Always true
|
||
@item @samp{-L -xtype X}
|
||
False unless the symbolic link is broken or has an infinite loop
|
||
@end table
|
||
|
||
In other words, for non-broken symbolic links, @samp{-xtype} checks
|
||
the type of the file that @samp{-type} does not check. For broken
|
||
symbolic links (or loops), @samp{-xtype} behaves like @samp{-type}
|
||
does. Symbolic links pointing to things the user has no access to are
|
||
not considered to be broken.
|
||
|
||
The @samp{-H} option also affects the behaviour of @samp{-xtype}.
|
||
When @samp{-H} is in effect, @samp{-xtype} behaves as if @samp{-L} had
|
||
been specified when examining files listed on the command line, and as
|
||
if @samp{-P} had been specified otherwise. If neither @samp{-H} nor
|
||
@samp{-L} was specified, @samp{-xtype} behaves as if @samp{-P} had
|
||
been specified.
|
||
|
||
@xref{Symbolic Links}, for more information on @samp{-follow} and
|
||
@samp{-L}.
|
||
@end deffn
|
||
|
||
@node Owner
|
||
@section Owner
|
||
|
||
@deffn Test -user uname
|
||
@deffnx Test -group gname
|
||
True if the file is owned by user @var{uname} (belongs to group
|
||
@var{gname}). A numeric ID is allowed.
|
||
@end deffn
|
||
|
||
@deffn Test -uid n
|
||
@deffnx Test -gid n
|
||
True if the file's numeric user ID (group ID) is @var{n}. These tests
|
||
support ranges (@samp{+@var{n}} and @samp{-@var{n}}), unlike
|
||
@samp{-user} and @samp{-group}.
|
||
@end deffn
|
||
|
||
@deffn Test -nouser
|
||
@deffnx Test -nogroup
|
||
True if no user corresponds to the file's numeric user ID (no group
|
||
corresponds to the numeric group ID). These cases usually mean that
|
||
the files belonged to users who have since been removed from the
|
||
system. You probably should change the ownership of such files to an
|
||
existing user or group, using the @code{chown} or @code{chgrp}
|
||
program.
|
||
@end deffn
|
||
|
||
@node Mode Bits
|
||
@section File Mode Bits
|
||
|
||
@xref{File Permissions}, for information on how file mode bits are
|
||
structured and how to specify them.
|
||
|
||
Four tests determine what users can do with files. These are
|
||
@samp{-readable}, @samp{-writable}, @samp{-executable} and
|
||
@samp{-perm}. The first three tests ask the operating system if the
|
||
current user can perform the relevant operation on a file, while
|
||
@samp{-perm} just examines the file's mode. The file mode may give
|
||
a misleading impression of what the user can actually do, because the
|
||
file may have an access control list, or exist on a read-only
|
||
filesystem, for example. Of these four tests though, only
|
||
@samp{-perm} is specified by the POSIX standard.
|
||
|
||
The @samp{-readable}, @samp{-writable} and @samp{-executable} tests
|
||
are implemented via the @code{access} system call. This is
|
||
implemented within the operating system itself. If the file being
|
||
considered is on an NFS filesystem, the remote system may allow or
|
||
forbid read or write operations for reasons of which the NFS client
|
||
cannot take account. This includes user-ID mapping, either in the
|
||
general sense or the more restricted sense in which remote superusers
|
||
are treated by the NFS server as if they are the local user
|
||
@samp{nobody} on the NFS server.
|
||
|
||
None of the tests in this section should be used to verify that a user
|
||
is authorised to perform any operation (on the file being tested or
|
||
any other file) because of the possibility of a race condition. That
|
||
is, the situation may change between the test and an action being
|
||
taken on the basis of the result of that test.
|
||
|
||
|
||
@deffn Test -readable
|
||
True if the file can be read by the invoking user.
|
||
@end deffn
|
||
|
||
@deffn Test -writable
|
||
True if the file can be written by the invoking user. This is an
|
||
in-principle check, and other things may prevent a successful write
|
||
operation; for example, the filesystem might be full.
|
||
@end deffn
|
||
|
||
@deffn Test -executable
|
||
True if the file can be executed/searched by the invoking user.
|
||
@end deffn
|
||
|
||
@deffn Test -perm pmode
|
||
|
||
True if the file's mode bits match @var{pmode}, which can be
|
||
either a symbolic or numeric @var{mode} (@pxref{File Permissions})
|
||
optionally prefixed by @samp{-} or @samp{/}.
|
||
|
||
Note that @var{pmode} starts with all file mode bits cleared, i.e.,
|
||
does not relate to the process's file creation bit mask (also known
|
||
as @command{umask}).
|
||
|
||
A @var{pmode} that starts with neither @samp{-} nor @samp{/} matches
|
||
if @var{mode} exactly matches the file mode bits.
|
||
(To avoid confusion with an obsolete GNU extension, @var{mode}
|
||
must not start with a @samp{+} immediately followed by an octal digit.)
|
||
|
||
A @var{pmode} that starts with @samp{-} matches if
|
||
@emph{all} the file mode bits set in @var{mode} are set for the file;
|
||
bits not set in @var{mode} are ignored.
|
||
|
||
A @var{pmode} that starts with @samp{/} matches if
|
||
@emph{any} of the file mode bits set in @var{mode} are set for the file;
|
||
bits not set in @var{mode} are ignored.
|
||
This is a GNU extension.
|
||
|
||
If you don't use the @samp{/} or @samp{-} form with a symbolic mode
|
||
string, you may have to specify a rather complex mode string. For
|
||
example @samp{-perm g=w} will only match files that have mode 0020
|
||
(that is, ones for which group write permission is the only file mode bit
|
||
set). It is more likely that you will want to use the @samp{/} or
|
||
@samp{-} forms, for example @samp{-perm -g=w}, which matches any file
|
||
with group write permission.
|
||
|
||
|
||
@table @samp
|
||
@item -perm 664
|
||
Match files that have read and write permission for their owner,
|
||
and group, but that the rest of the world can read but not write to.
|
||
Do not match files that meet these criteria but have other file mode
|
||
bits set (for example if someone can execute/search the file).
|
||
|
||
@item -perm -664
|
||
Match files that have read and write permission for their owner,
|
||
and group, but that the rest of the world can read but not write to,
|
||
without regard to the presence of any extra file mode bits (for
|
||
example the executable bit). This matches a file with mode
|
||
0777, for example.
|
||
|
||
@item -perm /222
|
||
Match files that are writable by somebody (their owner, or
|
||
their group, or anybody else).
|
||
|
||
@item -perm /022
|
||
Match files that are writable by their group or everyone else - the latter
|
||
often called @dfn{other}. The files don't have to be writable by both the
|
||
group and other to be matched; either will do.
|
||
|
||
@item -perm /g+w,o+w
|
||
As above.
|
||
|
||
@item -perm /g=w,o=w
|
||
As above.
|
||
|
||
@item -perm -022
|
||
Match files that are writable by both their group and everyone else.
|
||
|
||
@item -perm -g+w,o+w
|
||
As above.
|
||
|
||
@item -perm -444 -perm /222 ! -perm /111
|
||
Match files that are readable for everybody, have at least one
|
||
write bit set (i.e., somebody can write to them), but that cannot be
|
||
executed/searched by anybody. Note that in some shells the @samp{!} must be
|
||
escaped.
|
||
|
||
@item -perm -a+r -perm /a+w ! -perm /a+x
|
||
As above.
|
||
|
||
@end table
|
||
|
||
@quotation Warning
|
||
If you specify @samp{-perm /000} or @samp{-perm /mode} where the
|
||
symbolic mode @samp{mode} has no bits set, the test matches all files.
|
||
Versions of GNU @code{find} prior to 4.3.3 matched no files in this
|
||
situation.
|
||
@end quotation
|
||
|
||
@end deffn
|
||
|
||
@deffn Test -context pattern
|
||
True if file's SELinux context matches the pattern @var{pattern}.
|
||
The pattern uses shell glob matching.
|
||
|
||
This predicate is supported only on @code{find} versions compiled with
|
||
SELinux support and only when SELinux is enabled.
|
||
@end deffn
|
||
|
||
@node Contents
|
||
@section Contents
|
||
|
||
To search for files based on their contents, you can use the
|
||
@code{grep} program. For example, to find out which C source files in
|
||
the current directory contain the string @samp{thing}, you can do:
|
||
|
||
@example
|
||
grep -l thing *.[ch]
|
||
@end example
|
||
|
||
If you also want to search for the string in files in subdirectories,
|
||
you can combine @code{grep} with @code{find} and @code{xargs}, like
|
||
this:
|
||
|
||
@example
|
||
find . -name '*.[ch]' | xargs grep -l thing
|
||
@end example
|
||
|
||
The @samp{-l} option causes @code{grep} to print only the names of
|
||
files that contain the string, rather than the lines that contain it.
|
||
The string argument (@samp{thing}) is actually a regular expression,
|
||
so it can contain metacharacters. This method can be refined a little
|
||
by using the @samp{-r} option to make @code{xargs} not run @code{grep}
|
||
if @code{find} produces no output, and using the @code{find} action
|
||
@samp{-print0} and the @code{xargs} option @samp{-0} to avoid
|
||
misinterpreting files whose names contain spaces:
|
||
|
||
@example
|
||
find . -name '*.[ch]' -print0 | xargs -r -0 grep -l thing
|
||
@end example
|
||
|
||
For a fuller treatment of finding files whose contents match a
|
||
pattern, see the manual page for @code{grep}.
|
||
|
||
@node Directories
|
||
@section Directories
|
||
|
||
Here is how to control which directories @code{find} searches, and how
|
||
it searches them. These two options allow you to process a horizontal
|
||
slice of a directory tree.
|
||
|
||
@deffn Option -maxdepth levels
|
||
Descend at most @var{levels} (a non-negative integer) levels of
|
||
directories below the command line arguments. Using @samp{-maxdepth 0}
|
||
means only apply the tests and actions to the command line arguments.
|
||
|
||
@example
|
||
$ mkdir -p dir/d1/d2/d3/d4/d5/d6
|
||
|
||
$ find dir -maxdepth 1
|
||
dir
|
||
dir/d1
|
||
|
||
$ find dir -mindepth 5
|
||
dir/d1/d2/d3/d4/d5
|
||
dir/d1/d2/d3/d4/d5/d6
|
||
|
||
$ find dir -mindepth 2 -maxdepth 4
|
||
dir/d1/d2
|
||
dir/d1/d2/d3
|
||
dir/d1/d2/d3/d4
|
||
@end example
|
||
@end deffn
|
||
|
||
@deffn Option -mindepth levels
|
||
Do not apply any tests or actions at levels less than @var{levels} (a
|
||
non-negative integer). Using @samp{-mindepth 1} means process all files
|
||
except the command line arguments.
|
||
|
||
See @samp{-maxdepth} for examples.
|
||
@end deffn
|
||
|
||
@deffn Option -depth
|
||
Process each directory's contents before the directory itself. Doing
|
||
this is a good idea when producing lists of files to archive with
|
||
@code{cpio} or @code{tar}. If a directory does not have write
|
||
permission for its owner, its contents can still be restored from the
|
||
archive since the directory's permissions are restored after its
|
||
contents.
|
||
@end deffn
|
||
|
||
@deffn Option -d
|
||
This is a deprecated synonym for @samp{-depth}, for compatibility with
|
||
Mac OS X, FreeBSD and OpenBSD. The @samp{-depth} option is a POSIX
|
||
feature, so it is better to use that.
|
||
@end deffn
|
||
|
||
@deffn Action -prune
|
||
If the file is a directory, do not descend into it. The result is
|
||
true. For example, to skip the directory @file{src/emacs} and all
|
||
files and directories under it, and print the names of the other files
|
||
found:
|
||
|
||
@example
|
||
find . -wholename './src/emacs' -prune -o -print
|
||
@end example
|
||
|
||
The above command will not print @file{./src/emacs} among its list of
|
||
results. This however is not due to the effect of the @samp{-prune}
|
||
action (which only prevents further descent, it doesn't make sure we
|
||
ignore that item). Instead, this effect is due to the use of
|
||
@samp{-o}. Since the left hand side of the ``or'' condition has
|
||
succeeded for @file{./src/emacs}, it is not necessary to evaluate the
|
||
right-hand-side (@samp{-print}) at all for this particular file. If
|
||
you wanted to print that directory name you could use either an extra
|
||
@samp{-print} action:
|
||
|
||
@example
|
||
find . -wholename './src/emacs' -prune -print -o -print
|
||
@end example
|
||
|
||
or use the comma operator:
|
||
|
||
@example
|
||
find . -wholename './src/emacs' -prune , -print
|
||
@end example
|
||
|
||
If the @samp{-depth} option is in effect, the subdirectories will have
|
||
already been visited in any case. Hence @samp{-prune} has no effect
|
||
in this case.
|
||
|
||
Because @samp{-delete} implies @samp{-depth}, using @samp{-prune} in
|
||
combination with @samp{-delete} may well result in the deletion of
|
||
more files than you intended.
|
||
@end deffn
|
||
|
||
|
||
@deffn Action -quit
|
||
Exit immediately (with return value zero if no errors have occurred).
|
||
This is different to @samp{-prune} because @samp{-prune} only applies
|
||
to the contents of pruned directories, while @samp{-quit} simply makes
|
||
@code{find} stop immediately. No child processes will be left
|
||
running. Any command lines which have been built by @samp{-exec
|
||
... @{@} +} or @samp{-execdir ... \+} are invoked before the program is
|
||
exited. After @samp{-quit} is executed, no more files specified on
|
||
the command line will be processed. For example, @samp{find /tmp/foo
|
||
/tmp/bar -print -quit} will print only @samp{/tmp/foo}. One common
|
||
use of @samp{-quit} is to stop searching the file system once we have
|
||
found what we want. For example, if we want to find just a single
|
||
file we can do this:
|
||
@example
|
||
find / -name needle -print -quit
|
||
@end example
|
||
@noindent
|
||
@end deffn
|
||
|
||
@deffn Option -noleaf
|
||
Do not optimize by assuming that directories contain 2 fewer
|
||
subdirectories than their hard link count. This option is needed when
|
||
searching filesystems that do not follow the Unix directory-link
|
||
convention, such as CD-ROM or MS-DOS filesystems or AFS volume mount
|
||
points. Each directory on a normal Unix filesystem has at least 2
|
||
hard links: its name and its @file{.} entry. Additionally, its
|
||
subdirectories (if any) each have a @file{..} entry linked to that
|
||
directory. When @code{find} is examining a directory, after it has
|
||
statted 2 fewer subdirectories than the directory's link count, it
|
||
knows that the rest of the entries in the directory are
|
||
non-directories (@dfn{leaf} files in the directory tree). If only the
|
||
files' names need to be examined, there is no need to stat them; this
|
||
gives a significant increase in search speed.
|
||
@end deffn
|
||
|
||
@deffn Option -ignore_readdir_race
|
||
If a file disappears after its name has been read from a directory but
|
||
before @code{find} gets around to examining the file with @code{stat},
|
||
don't issue an error message. If you don't specify this option, an
|
||
error message will be issued.
|
||
|
||
Furthermore, @code{find} with the @samp{-ignore_readdir_race} option
|
||
will ignore errors of the @samp{-delete} action in the case the file
|
||
has disappeared since the parent directory was read: it will not output
|
||
an error diagnostic, and the return code of the @samp{-delete} action
|
||
will be true.
|
||
|
||
This option can be useful in system
|
||
scripts (cron scripts, for example) that examine areas of the
|
||
filesystem that change frequently (mail queues, temporary directories,
|
||
and so forth), because this scenario is common for those sorts of
|
||
directories. Completely silencing error messages from @code{find} is
|
||
undesirable, so this option neatly solves the problem. There is no
|
||
way to search one part of the filesystem with this option on and part
|
||
of it with this option off, though. When this option is turned on and
|
||
find discovers that one of the start-point files specified on the
|
||
command line does not exist, no error message will be issued.
|
||
|
||
@end deffn
|
||
|
||
@deffn Option -noignore_readdir_race
|
||
This option reverses the effect of the @samp{-ignore_readdir_race}
|
||
option.
|
||
@end deffn
|
||
|
||
|
||
@node Filesystems
|
||
@section Filesystems
|
||
|
||
A @dfn{filesystem} is a section of a disk, either on the local host or
|
||
mounted from a remote host over a network. Searching network
|
||
filesystems can be slow, so it is common to make @code{find} avoid
|
||
them.
|
||
|
||
There are two ways to avoid searching certain filesystems. One way is
|
||
to tell @code{find} to only search one filesystem:
|
||
|
||
@deffn Option -mount
|
||
Ignore files on other devices.
|
||
@end deffn
|
||
|
||
@deffn Option -xdev
|
||
Don't descend into directories on other devices.
|
||
@end deffn
|
||
|
||
The other way is to check the type of filesystem each file is on, and
|
||
not descend directories that are on undesirable filesystem types:
|
||
|
||
@deffn Test -fstype type
|
||
True if the file is on a filesystem of type @var{type}. The valid
|
||
filesystem types vary among different versions of Unix; an incomplete
|
||
list of filesystem types that are accepted on some version of Unix or
|
||
another is:
|
||
@example
|
||
autofs ext3 ext4 fuse.sshfs nfs proc sshfs sysfs ufs tmpfs xfs
|
||
@end example
|
||
You can use @samp{-printf} with the @samp{%F} directive to see the
|
||
types of your filesystems. The @samp{%D} directive shows the device
|
||
number. @xref{Print File Information}. @samp{-fstype} is usually
|
||
used with @samp{-prune} to avoid searching remote filesystems
|
||
(@pxref{Directories}).
|
||
@end deffn
|
||
|
||
@node Combining Primaries With Operators
|
||
@section Combining Primaries With Operators
|
||
|
||
Operators build a complex expression from tests and actions.
|
||
The operators are, in order of decreasing precedence:
|
||
|
||
@table @code
|
||
@item @asis{( @var{expr} )}
|
||
@findex ()
|
||
Force precedence. True if @var{expr} is true.
|
||
|
||
@item @asis{! @var{expr}}
|
||
@itemx @asis{-not @var{expr}}
|
||
@findex !
|
||
@findex -not
|
||
True if @var{expr} is false. In some shells, it is necessary to
|
||
protect the @samp{!} from shell interpretation by quoting it.
|
||
|
||
@item @asis{@var{expr1 expr2}}
|
||
@itemx @asis{@var{expr1} -a @var{expr2}}
|
||
@itemx @asis{@var{expr1} -and @var{expr2}}
|
||
@findex -and
|
||
@findex -a
|
||
And; @var{expr2} is not evaluated if @var{expr1} is false.
|
||
|
||
@item @asis{@var{expr1} -o @var{expr2}}
|
||
@itemx @asis{@var{expr1} -or @var{expr2}}
|
||
@findex -or
|
||
@findex -o
|
||
Or; @var{expr2} is not evaluated if @var{expr1} is true.
|
||
|
||
@item @asis{@var{expr1} , @var{expr2}}
|
||
@findex ,
|
||
List; both @var{expr1} and @var{expr2} are always evaluated. True if
|
||
@var{expr2} is true. The value of @var{expr1} is discarded. This
|
||
operator lets you do multiple independent operations on one traversal,
|
||
without depending on whether other operations succeeded. The two
|
||
operations @var{expr1} and @var{expr2} are not always fully
|
||
independent, since @var{expr1} might have side effects like touching
|
||
or deleting files, or it might use @samp{-prune} which would also
|
||
affect @var{expr2}.
|
||
@end table
|
||
|
||
@code{find} searches the directory tree rooted at each file name by
|
||
evaluating the expression from left to right, according to the rules
|
||
of precedence, until the outcome is known (the left hand side is false
|
||
for @samp{-and}, true for @samp{-or}), at which point @code{find}
|
||
moves on to the next file name.
|
||
|
||
There are two other tests that can be useful in complex expressions:
|
||
|
||
@deffn Test -true
|
||
Always true.
|
||
@end deffn
|
||
|
||
@deffn Test -false
|
||
Always false.
|
||
@end deffn
|
||
|
||
@node Actions
|
||
@chapter Actions
|
||
|
||
There are several ways you can print information about the files that
|
||
match the criteria you gave in the @code{find} expression. You can
|
||
print the information either to the standard output or to a file that
|
||
you name. You can also execute commands that have the file names as
|
||
arguments. You can use those commands as further filters to select
|
||
files.
|
||
|
||
@menu
|
||
* Print File Name::
|
||
* Print File Information::
|
||
* Run Commands::
|
||
* Delete Files::
|
||
* Adding Tests::
|
||
@end menu
|
||
|
||
@node Print File Name
|
||
@section Print File Name
|
||
|
||
@deffn Action -print
|
||
True; print the entire file name on the standard output, followed by a
|
||
newline. If there is the faintest possibility that one of the files
|
||
for which you are searching might contain a newline, you should use
|
||
@samp{-print0} instead.
|
||
@end deffn
|
||
|
||
@deffn Action -fprint file
|
||
True; print the entire file name into file @var{file}, followed by a
|
||
newline. If @var{file} does not exist when @code{find} is run, it is
|
||
created; if it does exist, it is truncated to 0 bytes. The named
|
||
output file is always created, even if no output is sent to it. The
|
||
file names @file{/dev/stdout} and @file{/dev/stderr} are handled
|
||
specially; they refer to the standard output and standard error
|
||
output, respectively.
|
||
|
||
If there is the faintest possibility that one of the files for which
|
||
you are searching might contain a newline, you should use
|
||
@samp{-fprint0} instead.
|
||
@end deffn
|
||
|
||
|
||
@c @deffn Option -show-control-chars how
|
||
@c This option affects how some of @code{find}'s actions treat
|
||
@c unprintable characters in file names. If @samp{how} is
|
||
@c @samp{literal}, any subsequent actions (i.e., actions further on in the
|
||
@c command line) print file names as-is.
|
||
@c
|
||
@c If this option is not specified, it currently defaults to @samp{safe}.
|
||
@c If @samp{how} is @samp{safe}, C-like backslash escapes are used to
|
||
@c indicate the non-printable characters for @samp{-ls} and @samp{-fls}.
|
||
@c On the other hand, @samp{-print}, @samp{-fprint}, @samp{-fprintf} and
|
||
@c @code{-printf} all quote unprintable characters if the data is going
|
||
@c to a tty, and otherwise the data is emitted literally.
|
||
@c
|
||
@c @table @code
|
||
@c @item -ls
|
||
@c Escaped if @samp{how} is @samp{safe}
|
||
@c @item -fls
|
||
@c Escaped if @samp{how} is @samp{safe}
|
||
@c @item -print
|
||
@c Always quoted if standard output is a tty,
|
||
@c @samp{-show-control-chars} is ignored
|
||
@c @item -print0
|
||
@c Always literal, never escaped
|
||
@c @item -fprint
|
||
@c Always quoted if the destination is a tty;
|
||
@c @samp{-show-control-chars} is ignored
|
||
@c @item -fprint0
|
||
@c Always literal, never escaped
|
||
@c @item -fprintf
|
||
@c If the destination is a tty, the @samp{%f},
|
||
@c @samp{%F}, @samp{%h}, @samp{%l}, @samp{%p},
|
||
@c and @samp{%P} directives produce quoted
|
||
@c strings if standard output is a tty and are treated
|
||
@c literally otherwise.
|
||
@c @item -printf
|
||
@c As for @code{-fprintf}.
|
||
@c @end table
|
||
@c @end deffn
|
||
|
||
|
||
@node Print File Information
|
||
@section Print File Information
|
||
|
||
@deffn Action -ls
|
||
True; list the current file in @samp{ls -dils} format on the standard
|
||
output. The output looks like this:
|
||
|
||
@smallexample
|
||
204744 17 -rw-r--r-- 1 djm staff 17337 Nov 2 1992 ./lwall-quotes
|
||
@end smallexample
|
||
|
||
The fields are:
|
||
|
||
@enumerate
|
||
@item
|
||
The inode number of the file. @xref{Hard Links}, for how to find
|
||
files based on their inode number.
|
||
|
||
@item
|
||
the number of blocks in the file. The block counts are of 1K blocks,
|
||
unless the environment variable @env{POSIXLY_CORRECT} is set, in
|
||
which case 512-byte blocks are used. @xref{Size}, for how to find
|
||
files based on their size.
|
||
|
||
@item
|
||
The file's type and file mode bits. The type is shown as a dash for a
|
||
regular file; for other file types, a letter like for @samp{-type} is
|
||
used (@pxref{Type}). The file mode bits are read, write, and execute/search for
|
||
the file's owner, its group, and other users, respectively; a dash
|
||
means the permission is not granted. @xref{File Permissions}, for
|
||
more details about file permissions. @xref{Mode Bits}, for how to
|
||
find files based on their file mode bits.
|
||
|
||
@item
|
||
The number of hard links to the file.
|
||
|
||
@item
|
||
The user who owns the file.
|
||
|
||
@item
|
||
The file's group.
|
||
|
||
@item
|
||
The file's size in bytes.
|
||
|
||
@item
|
||
The date the file was last modified.
|
||
|
||
@item
|
||
The file's name. @samp{-ls} quotes non-printable characters in the
|
||
file names using C-like backslash escapes. This may change soon, as
|
||
the treatment of unprintable characters is harmonised for @samp{-ls},
|
||
@samp{-fls}, @samp{-print}, @samp{-fprint}, @samp{-printf} and
|
||
@samp{-fprintf}.
|
||
@end enumerate
|
||
@end deffn
|
||
|
||
@deffn Action -fls file
|
||
True; like @samp{-ls} but write to @var{file} like @samp{-fprint}
|
||
(@pxref{Print File Name}). The named output file is always created,
|
||
even if no output is sent to it.
|
||
@end deffn
|
||
|
||
@deffn Action -printf format
|
||
True; print @var{format} on the standard output, interpreting @samp{\}
|
||
escapes and @samp{%} directives (more details in the following
|
||
sections).
|
||
|
||
Field widths and precisions can be specified as with the @code{printf} C
|
||
function. Format flags (like @samp{#} for example) may not work as you
|
||
expect because many of the fields, even numeric ones, are printed with
|
||
%s. Numeric flags which are affected in this way include @samp{G},
|
||
@samp{U}, @samp{b}, @samp{D}, @samp{k} and @samp{n}. This difference in
|
||
behaviour means though that the format flag @samp{-} will work; it
|
||
forces left-alignment of the field. Unlike @samp{-print},
|
||
@samp{-printf} does not add a newline at the end of the string. If you
|
||
want a newline at the end of the string, add a @samp{\n}.
|
||
|
||
As an example, an approximate equivalent of @samp{-ls} with
|
||
null-terminated filenames can be achieved with this @code{-printf}
|
||
format:
|
||
|
||
@example
|
||
find -printf "%i %4k %M %3n %-8u %-8g %8s %T+ %p\n->%l\0" | cat
|
||
@end example
|
||
|
||
A practical reason for doing this would be to get literal filenames in
|
||
the output, instead of @samp{-ls}'s backslash-escaped names. (Using
|
||
@code{cat} here prevents this happening for the @samp{%p} format
|
||
specifier; @pxref{Unusual Characters in File Names}). This format also
|
||
outputs a uniform timestamp format.
|
||
|
||
As for symbolic links, the format above outputs the target of the symbolic link
|
||
on a second line, following @samp{\n->}. There is nothing following the arrow
|
||
for file types other than symbolic links.
|
||
Another approach, for complete consistency, would be to @code{-fprintf} the
|
||
symbolic links into a separate file, so they too can be null-terminated.
|
||
@end deffn
|
||
|
||
@deffn Action -fprintf file format
|
||
True; like @samp{-printf} but write to @var{file} like @samp{-fprint}
|
||
(@pxref{Print File Name}). The output file is always created, even if
|
||
no output is ever sent to it.
|
||
@end deffn
|
||
|
||
@menu
|
||
* Escapes::
|
||
* Format Directives::
|
||
* Time Formats::
|
||
* Formatting Flags::
|
||
@end menu
|
||
|
||
@node Escapes
|
||
@subsection Escapes
|
||
|
||
The escapes that @samp{-printf} and @samp{-fprintf} recognise are:
|
||
|
||
@table @code
|
||
@item \a
|
||
Alarm bell.
|
||
@item \b
|
||
Backspace.
|
||
@item \c
|
||
Stop printing from this format immediately and flush the output.
|
||
@item \f
|
||
Form feed.
|
||
@item \n
|
||
Newline.
|
||
@item \r
|
||
Carriage return.
|
||
@item \t
|
||
Horizontal tab.
|
||
@item \v
|
||
Vertical tab.
|
||
@item \\
|
||
A literal backslash (@samp{\}).
|
||
@item \0
|
||
ASCII NUL.
|
||
@item \NNN
|
||
The character whose ASCII code is NNN (octal).
|
||
@end table
|
||
|
||
A @samp{\} character followed by any other character is treated as an
|
||
ordinary character, so they both are printed, and a warning message is
|
||
printed to the standard error output (because it was probably a typo).
|
||
|
||
@node Format Directives
|
||
@subsection Format Directives
|
||
|
||
@samp{-printf} and @samp{-fprintf} support the following format
|
||
directives to print information about the file being processed. The C
|
||
@code{printf} function, field width and precision specifiers are
|
||
supported, as applied to string (%s) types. That is, you can specify
|
||
"minimum field width"."maximum field width" for each directive.
|
||
Format flags (like @samp{#} for example) may not work as you expect
|
||
because many of the fields, even numeric ones, are printed with %s.
|
||
The format flag @samp{-} does work; it forces left-alignment of the
|
||
field.
|
||
|
||
@samp{%%} is a literal percent sign. @xref{Reserved and Unknown
|
||
Directives}, for a description of how format directives not mentioned
|
||
below are handled.
|
||
|
||
A @samp{%} at the end of the format argument causes undefined
|
||
behaviour since there is no following character. In some locales, it
|
||
may hide your door keys, while in others it may remove the final page
|
||
from the novel you are reading.
|
||
|
||
@menu
|
||
* Name Directives::
|
||
* Ownership Directives::
|
||
* Size Directives::
|
||
* Location Directives::
|
||
* Time Directives::
|
||
* Other Directives::
|
||
* Reserved and Unknown Directives::
|
||
@end menu
|
||
|
||
@node Name Directives
|
||
@subsubsection Name Directives
|
||
|
||
@table @code
|
||
@item %p
|
||
@c supports %-X.Yp
|
||
File's name (not the absolute path name, but the name of the file as
|
||
it was encountered by @code{find} - that is, as a relative path from
|
||
one of the starting points).
|
||
@item %f
|
||
File's name with any leading directories removed (only the last
|
||
element). That is, the basename of the file.
|
||
@c supports %-X.Yf
|
||
@item %h
|
||
Leading directories of file's name (all but the last element and the
|
||
slash before it). That is, the dirname of the file. If the file's
|
||
name contains no slashes (for example because it was named on the
|
||
command line and is in the current working directory), then ``%h''
|
||
expands to ``.''. This prevents ``%h/%f'' expanding to ``/foo'',
|
||
which would be surprising and probably not desirable.
|
||
@c supports %-X.Yh
|
||
@item %P
|
||
File's name with the name of the command line argument under which
|
||
it was found removed from the beginning.
|
||
@c supports %-X.YP
|
||
@item %H
|
||
Command line argument under which file was found.
|
||
@c supports %-X.YH
|
||
@end table
|
||
|
||
For some corner-cases, the interpretation of the @samp{%f} and
|
||
@samp{%h} format directives is not obvious. Here is an example
|
||
including some output:
|
||
|
||
@example
|
||
$ find \
|
||
. .. / /tmp /tmp/TRACE compile compile/64/tests/find \
|
||
-maxdepth 0 -printf '%p: [%h][%f]\n'
|
||
.: [.][.]
|
||
..: [.][..]
|
||
/: [][/]
|
||
/tmp: [][tmp]
|
||
/tmp/TRACE: [/tmp][TRACE]
|
||
compile: [.][compile]
|
||
compile/64/tests/find: [compile/64/tests][find]
|
||
@end example
|
||
|
||
@node Ownership Directives
|
||
@subsubsection Ownership Directives
|
||
|
||
@table @code
|
||
@item %g
|
||
@c supports %-X.Yg
|
||
File's group name, or numeric group ID if the group has no name.
|
||
@item %G
|
||
@c supports %-X.Yg
|
||
@c TODO: Needs to support # flag and 0 flag
|
||
File's numeric group ID.
|
||
@item %u
|
||
@c supports %-X.Yu
|
||
File's user name, or numeric user ID if the user has no name.
|
||
@item %U
|
||
@c supports %-X.Yu
|
||
@c TODO: Needs to support # flag
|
||
File's numeric user ID.
|
||
@item %m
|
||
@c full support, including # and 0.
|
||
File's mode bits (in octal). If you always want to have a leading
|
||
zero on the number, use the '#' format flag, for example '%#m'.
|
||
|
||
The file mode bit numbers used are the traditional Unix
|
||
numbers, which will be as expected on most systems, but if your
|
||
system's file mode bit layout differs from the traditional Unix
|
||
semantics, you will see a difference between the mode as printed by
|
||
@samp{%m} and the mode as it appears in @code{struct stat}.
|
||
|
||
@item %M
|
||
File's type and mode bits (in symbolic form, as for @code{ls}). This
|
||
directive is supported in findutils 4.2.5 and later.
|
||
@end table
|
||
|
||
@node Size Directives
|
||
@subsubsection Size Directives
|
||
|
||
@table @code
|
||
@item %k
|
||
The amount of disk space used for this file in 1K blocks. Since disk
|
||
space is allocated in multiples of the filesystem block size this is
|
||
usually greater than %s/1024, but it can also be smaller if the file
|
||
is a sparse file (that is, it has ``holes'').
|
||
@item %b
|
||
The amount of disk space used for this file in 512-byte blocks. Since
|
||
disk space is allocated in multiples of the filesystem block size this
|
||
is usually greater than %s/512, but it can also be smaller if the
|
||
file is a sparse file (that is, it has ``holes'').
|
||
@item %s
|
||
File's size in bytes.
|
||
@item %S
|
||
File's sparseness. This is calculated as @code{(BLOCKSIZE*st_blocks /
|
||
st_size)}. The exact value you will get for an ordinary file of a
|
||
certain length is system-dependent. However, normally sparse files
|
||
will have values less than 1.0, and files which use indirect blocks
|
||
and have few holes may have a value which is greater than 1.0. The
|
||
value used for BLOCKSIZE is system-dependent, but is usually 512
|
||
bytes. If the file size is zero, the value printed is undefined. On
|
||
systems which lack support for st_blocks, a file's sparseness is
|
||
assumed to be 1.0.
|
||
@end table
|
||
|
||
@node Location Directives
|
||
@subsubsection Location Directives
|
||
|
||
@table @code
|
||
@item %d
|
||
File's depth in the directory tree (depth below a file named on the
|
||
command line, not depth below the root directory). Files named on the
|
||
command line have a depth of 0. Subdirectories immediately below them
|
||
have a depth of 1, and so on.
|
||
@item %D
|
||
The device number on which the file exists (the @code{st_dev} field of
|
||
@code{struct stat}), in decimal.
|
||
@item %F
|
||
Type of the filesystem the file is on; this value can be used for
|
||
@samp{-fstype} (@pxref{Directories}).
|
||
@item %l
|
||
Object of symbolic link (empty string if file is not a symbolic link).
|
||
@item %i
|
||
File's inode number (in decimal).
|
||
@item %n
|
||
Number of hard links to file.
|
||
@item %y
|
||
Type of the file as used with @samp{-type}. If the file is a symbolic
|
||
link, @samp{l} will be printed.
|
||
@item %Y
|
||
Type of the file as used with @samp{-type}. If the file is a symbolic
|
||
link, it is dereferenced. If the file is a broken symbolic link,
|
||
@samp{N} is printed.
|
||
When determining the type of the target of a symbolic link, and a loop is
|
||
encountered, then @samp{L} is printed (e.g. for a symbolic link to itself);
|
||
@samp{?} is printed for any other error (like e.g. @samp{permission denied}).
|
||
|
||
@end table
|
||
|
||
@node Time Directives
|
||
@subsubsection Time Directives
|
||
|
||
Some of these directives use the C @code{ctime} function. Its output
|
||
depends on the current locale, but it typically looks like
|
||
|
||
@example
|
||
Wed Nov 2 00:42:36 1994
|
||
@end example
|
||
|
||
@table @code
|
||
@item %a
|
||
File's last access time in the format returned by the C @code{ctime}
|
||
function.
|
||
|
||
@item %A@var{k}
|
||
File's last access time in the format specified by @var{k}
|
||
(@pxref{Time Formats}).
|
||
|
||
@item %B@var{k}
|
||
File's birth time, i.e., its creation time, in the format specified by @var{k}
|
||
(@pxref{Time Formats}).
|
||
|
||
This directive produces an empty string if the underlying operating system or
|
||
filesystem does not support birth times.
|
||
|
||
@item %c
|
||
File's last status change time in the format returned by the C
|
||
@code{ctime} function.
|
||
|
||
@item %C@var{k}
|
||
File's last status change time in the format specified by @var{k}
|
||
(@pxref{Time Formats}).
|
||
|
||
@item %t
|
||
File's last modification time in the format returned by the C
|
||
@code{ctime} function.
|
||
|
||
@item %T@var{k}
|
||
File's last modification time in the format specified by @var{k}
|
||
(@pxref{Time Formats}).
|
||
@end table
|
||
|
||
@node Other Directives
|
||
@subsubsection Other Directives
|
||
|
||
@table @code
|
||
@item %Z
|
||
File's SELinux context, or empty string if the file has no SELinux context.
|
||
@end table
|
||
|
||
@node Reserved and Unknown Directives
|
||
@subsubsection Reserved and Unknown Directives
|
||
|
||
The @samp{%(}, @samp{%@{} and @samp{%[} format directives, with or
|
||
without field with and precision specifications, are reserved for
|
||
future use. Don't use them and don't rely on current experiment to
|
||
predict future behaviour. To print @samp{(}, simply use @samp{(}
|
||
rather than @samp{%(}. Likewise for @samp{@{} and @samp{[}.
|
||
|
||
Similarly, a @samp{%} character followed by any other unrecognised
|
||
character (i.e., not a known directive or @code{printf} field width
|
||
and precision specifier), is discarded (but the unrecognised character
|
||
is printed), and a warning message is printed to the standard error
|
||
output (because it was probably a typo). Don't rely on this
|
||
behaviour, because other directives may be added in the future.
|
||
|
||
|
||
@node Time Formats
|
||
@subsection Time Formats
|
||
|
||
Below is an incomplete list of formats for the directives @samp{%A}, @samp{%B},
|
||
@samp{%C}, and @samp{%T}, which print the file's timestamps.
|
||
Please refer to the documentation of @code{strftime} for the full list.
|
||
Some of these formats might not be available on all systems, due to differences
|
||
in the implementation of the C @code{strftime} function.
|
||
|
||
@menu
|
||
* Time Components::
|
||
* Date Components::
|
||
* Combined Time Formats::
|
||
@end menu
|
||
|
||
@node Time Components
|
||
@subsubsection Time Components
|
||
|
||
The following format directives print single components of the time.
|
||
|
||
@table @code
|
||
@item H
|
||
hour (00..23)
|
||
@item I
|
||
hour (01..12)
|
||
@item k
|
||
hour ( 0..23)
|
||
@item l
|
||
hour ( 1..12)
|
||
@item p
|
||
locale's AM or PM
|
||
@item Z
|
||
time zone (e.g., EDT), or nothing if no time zone is determinable
|
||
@item M
|
||
minute (00..59)
|
||
@item S
|
||
second (00..61). There is a fractional part.
|
||
@item @@
|
||
seconds since Jan. 1, 1970, 00:00 GMT, with fractional part.
|
||
@end table
|
||
|
||
The fractional part of the seconds field is of indeterminate length
|
||
and precision. That is, the length of the fractional part of the
|
||
seconds field will in general vary between findutils releases and
|
||
between systems. This means that it is unwise to assume that field
|
||
has any specific length. The length of this field is not usually a
|
||
guide to the precision of timestamps in the underlying file system.
|
||
|
||
|
||
|
||
@node Date Components
|
||
@subsubsection Date Components
|
||
|
||
The following format directives print single components of the date.
|
||
|
||
@table @code
|
||
@item a
|
||
locale's abbreviated weekday name (Sun..Sat)
|
||
@item A
|
||
locale's full weekday name, variable length (Sunday..Saturday)
|
||
@item b
|
||
@itemx h
|
||
locale's abbreviated month name (Jan..Dec)
|
||
@item B
|
||
locale's full month name, variable length (January..December)
|
||
@item m
|
||
month (01..12)
|
||
@item d
|
||
day of month (01..31)
|
||
@item w
|
||
day of week (0..6)
|
||
@item j
|
||
day of year (001..366)
|
||
@item U
|
||
week number of year with Sunday as first day of week (00..53)
|
||
@item W
|
||
week number of year with Monday as first day of week (00..53)
|
||
@item Y
|
||
year (1970@dots{})
|
||
@item y
|
||
last two digits of year (00..99)
|
||
@end table
|
||
|
||
@node Combined Time Formats
|
||
@subsubsection Combined Time Formats
|
||
|
||
The following format directives print combinations of time and date
|
||
components.
|
||
|
||
@table @code
|
||
@item r
|
||
time, 12-hour (hh:mm:ss [AP]M)
|
||
@item T
|
||
time, 24-hour (hh:mm:ss.xxxxxxxxxx)
|
||
@item X
|
||
locale's time representation (H:M:S). The seconds field includes a
|
||
fractional part.
|
||
@item c
|
||
locale's date and time in ctime format (Sat Nov 04 12:02:33 EST
|
||
1989). This format does not include any fractional part in the
|
||
seconds field.
|
||
@item D
|
||
date (mm/dd/yy)
|
||
@item F
|
||
date (yyyy-mm-dd)
|
||
@item x
|
||
locale's date representation (mm/dd/yy)
|
||
@item +
|
||
Date and time, separated by '+', for example
|
||
`2004-04-28+22:22:05.0000000000'.
|
||
The time is given in the current timezone (which may be affected by
|
||
setting the @env{TZ} environment variable). This is a GNU extension. The
|
||
seconds field includes a fractional part.
|
||
@end table
|
||
|
||
@node Formatting Flags
|
||
@subsection Formatting Flags
|
||
|
||
The @samp{%m} and @samp{%d} directives support the @samp{#}, @samp{0}
|
||
and @samp{+} flags, but the other directives do not, even if they
|
||
print numbers. Numeric directives that do not support these flags
|
||
include
|
||
|
||
@samp{G},
|
||
@samp{U},
|
||
@samp{b},
|
||
@samp{D},
|
||
@samp{k} and
|
||
@samp{n}.
|
||
|
||
All fields support the format flag @samp{-}, which makes fields
|
||
left-aligned. That is, if the field width is greater than the actual
|
||
contents of the field, the requisite number of spaces are printed
|
||
after the field content instead of before it.
|
||
|
||
@node Run Commands
|
||
@section Run Commands
|
||
|
||
You can use the list of file names created by @code{find} or
|
||
@code{locate} as arguments to other commands. In this way you can
|
||
perform arbitrary actions on the files.
|
||
|
||
@menu
|
||
* Single File::
|
||
* Multiple Files::
|
||
* Querying::
|
||
@end menu
|
||
|
||
@node Single File
|
||
@subsection Single File
|
||
|
||
Here is how to run a command on one file at a time.
|
||
|
||
@deffn Action -execdir command ;
|
||
Execute @var{command}; true if @var{command} returns zero. @code{find}
|
||
takes all arguments after @samp{-execdir} to be part of the command until
|
||
an argument consisting of @samp{;} is reached.
|
||
It replaces the string @samp{@{@}} by the current file name being processed
|
||
everywhere it occurs in the command.
|
||
Each file name except for the root directory @samp{/} is prepended with
|
||
@samp{./}.
|
||
Both of these constructions need to be escaped (with a @samp{\}) or quoted to
|
||
protect them from expansion by the shell.
|
||
The command is executed in the directory which @code{find} was searching at the
|
||
time the action was executed (that is, @{@} will expand to a file in the local
|
||
directory).
|
||
|
||
For example, to compare each C header file in or below the current
|
||
directory with the file @file{/tmp/master}:
|
||
|
||
@example
|
||
find . -name '*.h' -execdir diff -u '@{@}' /tmp/master ';'
|
||
@end example
|
||
@end deffn
|
||
|
||
If you use @samp{-execdir}, you must ensure that the @env{PATH}
|
||
variable contains only absolute directory names. Having an empty
|
||
element in @env{PATH} or explicitly including @samp{.} (or any other
|
||
non-absolute name) is insecure. GNU find will refuse to run if you
|
||
use @samp{-execdir} and it thinks your @env{PATH} setting is
|
||
insecure. For example:
|
||
|
||
@table @samp
|
||
@item /bin:/usr/bin:
|
||
Insecure; empty path element (at the end)
|
||
@item :/bin:/usr/bin:/usr/local/bin
|
||
Insecure; empty path element (at the start)
|
||
@item /bin:/usr/bin::/usr/local/bin
|
||
Insecure; empty path element (two colons in a row)
|
||
@item /bin:/usr/bin:.:/usr/local/bin
|
||
Insecure; @samp{.} is a path element (@file{.} is not an absolute file name)
|
||
@item /bin:/usr/bin:sbin:/usr/local/bin
|
||
Insecure; @samp{sbin} is not an absolute file name
|
||
@item /bin:/usr/bin:/sbin:/usr/local/bin
|
||
Secure (if you control the contents of those directories and any access to them)
|
||
@end table
|
||
|
||
Another similar option, @samp{-exec} is supported, but is less secure.
|
||
@xref{Security Considerations}, for a discussion of the security
|
||
problems surrounding @samp{-exec}.
|
||
|
||
|
||
@deffn Action -exec command ;
|
||
This insecure variant of the @samp{-execdir} action is specified by
|
||
POSIX. Like @samp{-execdir command ;} it is true if zero is
|
||
returned by @var{command}. The main difference is that the command is
|
||
executed in the directory from which @code{find} was invoked, meaning
|
||
that @samp{@{@}} is expanded to a relative path starting with the name
|
||
of one of the starting directories, rather than just the basename of
|
||
the matched file.
|
||
|
||
While some implementations of @code{find} replace the @samp{@{@}} only
|
||
where it appears on its own in an argument, GNU @code{find} replaces
|
||
@samp{@{@}} wherever it appears.
|
||
@end deffn
|
||
|
||
|
||
@node Multiple Files
|
||
@subsection Multiple Files
|
||
|
||
Sometimes you need to process files one at a time. But usually this
|
||
is not necessary, and, it is faster to run a command on as many files
|
||
as possible at a time, rather than once per file. Doing this saves on
|
||
the time it takes to start up the command each time.
|
||
|
||
The @samp{-execdir} and @samp{-exec} actions have variants that build
|
||
command lines containing as many matched files as possible.
|
||
|
||
@deffn Action -execdir command @{@} +
|
||
This works as for @samp{-execdir command ;}, except that the result is always
|
||
true, and the @samp{@{@}} at the end of the command is expanded to a list of
|
||
names of matching files.
|
||
Each file name except for the root directory @samp{/} is prepended with
|
||
@samp{./}.
|
||
This expansion is done in such a way as to avoid exceeding the maximum command
|
||
line length available on the system.
|
||
Only one @samp{@{@}} is allowed within the command, and it must appear at the
|
||
end, immediately before the @samp{+}.
|
||
A @samp{+} appearing in any position other than immediately after @samp{@{@}}
|
||
is not considered to be special (that is, it does not terminate the command).
|
||
@end deffn
|
||
|
||
|
||
@deffn Action -exec command @{@} +
|
||
This insecure variant of the @samp{-execdir} action is specified by
|
||
POSIX. The main difference is that the command is executed in the
|
||
directory from which @code{find} was invoked, meaning that @samp{@{@}}
|
||
is expanded to a relative path starting with the name of one of the
|
||
starting directories, rather than just the basename of the matched
|
||
file. The result is always true.
|
||
@end deffn
|
||
|
||
Before @code{find} exits, any partially-built command lines are
|
||
executed. This happens even if the exit was caused by the
|
||
@samp{-quit} action. However, some types of error (for example not
|
||
being able to invoke @code{stat()} on the current directory) can cause
|
||
an immediate fatal exit. In this situation, any partially-built
|
||
command lines will not be invoked (this prevents possible infinite
|
||
loops).
|
||
|
||
At first sight, it looks like the list of filenames to be processed
|
||
can only be at the end of the command line, and that this might be a
|
||
problem for some commands (@code{cp} and @code{rsync} for example).
|
||
|
||
However, there is a slightly obscure but powerful workaround for this
|
||
problem which takes advantage of the behaviour of @code{sh -c}:
|
||
|
||
@example
|
||
find startpoint -tests @dots{} -exec sh -c 'scp "$@@" remote:/dest' sh @{@} +
|
||
@end example
|
||
|
||
In the example above, the filenames we want to work on need to occur
|
||
on the @code{scp} command line before the name of the destination. We
|
||
use the shell to invoke the command @code{scp "$@@" remote:/dest} and
|
||
the shell expands @code{"$@@"} to the list of filenames we want to
|
||
process.
|
||
|
||
Another, but less secure, way to run a command on more than one file
|
||
at once, is to use the @code{xargs} command, which is invoked like
|
||
this:
|
||
|
||
@example
|
||
xargs @r{[}@var{option}@dots{}@r{]} @r{[}@var{command} @r{[}@var{initial-arguments}@r{]}@r{]}
|
||
@end example
|
||
|
||
@code{xargs} normally reads arguments from the standard input. These
|
||
arguments are delimited by blanks (which can be protected with double
|
||
or single quotes or a backslash) or newlines. It executes the
|
||
@var{command} (the default is @file{echo}) one or more times with any
|
||
@var{initial-arguments} followed by arguments read from standard
|
||
input. Blank lines on the standard input are ignored. If the
|
||
@samp{-L} option is in use, trailing blanks indicate that @code{xargs}
|
||
should consider the following line to be part of this one.
|
||
|
||
Instead of blank-delimited names, it is safer to use @samp{find
|
||
-print0} or @samp{find -fprint0} and process the output by giving the
|
||
@samp{-0} or @samp{--null} option to @code{xargs}, GNU @code{tar},
|
||
GNU @code{cpio}, or @code{perl}. The @code{locate} command also has a
|
||
@samp{-0} or @samp{--null} option which does the same thing.
|
||
|
||
You can use shell command substitution (backquotes) to process a list
|
||
of arguments, like this:
|
||
|
||
@example
|
||
grep -l sprintf `find $HOME -name '*.c' -print`
|
||
@end example
|
||
|
||
However, that method produces an error if the length of the @samp{.c}
|
||
file names exceeds the operating system's command line length limit.
|
||
@code{xargs} avoids that problem by running the command as many times
|
||
as necessary without exceeding the limit:
|
||
|
||
@example
|
||
find $HOME -name '*.c' -print | xargs grep -l sprintf
|
||
@end example
|
||
|
||
However, if the command needs to have its standard input be a terminal
|
||
(@code{less}, for example), you have to use the shell command
|
||
substitution method or use either the @samp{--arg-file} option or the
|
||
@samp{--open-tty} option of @code{xargs}.
|
||
|
||
The @code{xargs} command will usually process all of its input,
|
||
building command lines and executing them.
|
||
The processing stops earlier and immediately if the tool reads a line containing
|
||
the end-of-file marker string specified with the @samp{--eof} option,
|
||
or if one of the launched commands exits with a status of 255.
|
||
The latter will cause @code{xargs} to issue an error message and exit with
|
||
status 124.
|
||
|
||
@menu
|
||
* Unsafe File Name Handling::
|
||
* Safe File Name Handling::
|
||
* Unusual Characters in File Names::
|
||
* Limiting Command Size::
|
||
* Controlling Parallelism::
|
||
* Interspersing File Names::
|
||
@end menu
|
||
|
||
@node Unsafe File Name Handling
|
||
@subsubsection Unsafe File Name Handling
|
||
|
||
Because file names can contain quotes, backslashes, blank characters,
|
||
and even newlines, it is not safe to process them using @code{xargs}
|
||
in its default mode of operation. But since most files' names do not
|
||
contain blanks, this problem occurs only infrequently. If you are
|
||
only searching through files that you know have safe names, then you
|
||
need not be concerned about it.
|
||
|
||
Error messages issued by @code{find} and @code{locate} quote unusual
|
||
characters in file names in order to prevent unwanted changes in the
|
||
terminal's state.
|
||
|
||
|
||
@c This example is adapted from:
|
||
@c From: pfalstad@stone.Princeton.EDU (Paul John Falstad)
|
||
@c Newsgroups: comp.unix.shell
|
||
@c Subject: Re: Beware xargs security holes
|
||
@c Date: 16 Oct 90 19:12:06 GMT
|
||
@c
|
||
In many applications, if @code{xargs} botches processing a file
|
||
because its name contains special characters, some data might be lost.
|
||
The importance of this problem depends on the importance of the data
|
||
and whether anyone notices the loss soon enough to correct it.
|
||
However, here is an extreme example of the problems that using
|
||
blank-delimited names can cause. If the following command is run
|
||
daily from @code{cron}, then any user can remove any file on the
|
||
system:
|
||
|
||
@example
|
||
find / -name '#*' -atime +7 -print | xargs rm
|
||
@end example
|
||
|
||
For example, you could do something like this:
|
||
|
||
@example
|
||
eg$ echo > '#
|
||
vmunix'
|
||
@end example
|
||
|
||
@noindent
|
||
and then @code{cron} would delete @file{/vmunix}, if it ran
|
||
@code{xargs} with @file{/} as its current directory.
|
||
|
||
To delete other files, for example @file{/u/joeuser/.plan}, you could
|
||
do this:
|
||
|
||
@example
|
||
eg$ mkdir '#
|
||
'
|
||
eg$ cd '#
|
||
'
|
||
eg$ mkdir u u/joeuser u/joeuser/.plan'
|
||
'
|
||
eg$ echo > u/joeuser/.plan'
|
||
/#foo'
|
||
eg$ cd ..
|
||
eg$ find . -name '#*' -print | xargs echo
|
||
./# ./# /u/joeuser/.plan /#foo
|
||
@end example
|
||
|
||
@node Safe File Name Handling
|
||
@subsubsection Safe File Name Handling
|
||
|
||
Here is how to make @code{find} output file names so that they can be
|
||
used by other programs without being mangled or misinterpreted. You
|
||
can process file names generated this way by giving the @samp{-0} or
|
||
@samp{--null} option to GNU @code{xargs}, GNU @code{tar}, GNU
|
||
@code{cpio}, or @code{perl}.
|
||
|
||
Both @code{find . -print0} and @code{xargs -0} are
|
||
POSIX-conforming since Issue 8 (IEEE Std 1003.1-2024).
|
||
|
||
@deffn Action -print0
|
||
True; print the entire file name on the standard output, followed by a
|
||
null character.
|
||
@end deffn
|
||
|
||
@deffn Action -fprint0 file
|
||
True; like @samp{-print0} but write to @var{file} like @samp{-fprint}
|
||
(@pxref{Print File Name}). The output file is always created.
|
||
@end deffn
|
||
|
||
As of findutils version 4.2.4, the @code{locate} program also has a
|
||
@samp{--null} option which does the same thing. For similarity with
|
||
@code{xargs}, the short form of the option @samp{-0} can also be used.
|
||
|
||
If you want to be able to handle file names safely but need to run
|
||
commands which want to be connected to a terminal on their input, you
|
||
can use the @samp{--open-tty} option to @code{xargs} or the
|
||
@samp{--arg-file} option to @code{xargs} like this:
|
||
|
||
@example
|
||
find / -name xyzzy -print0 > list
|
||
xargs --null --arg-file=list munge
|
||
@end example
|
||
|
||
The example above runs the @code{munge} program on all the files named
|
||
@file{xyzzy} that we can find, but @code{munge}'s input will still be
|
||
the terminal (or whatever the shell was using as standard input). If
|
||
your shell has the ``process substitution'' feature @samp{<(...)}, you
|
||
can do this in just one step:
|
||
|
||
@example
|
||
xargs --null --arg-file=<(find / -name xyzzy -print0) munge
|
||
@end example
|
||
|
||
@node Unusual Characters in File Names
|
||
@subsubsection Unusual Characters in File Names
|
||
As discussed above, you often need to be careful about how the names
|
||
of files are handled by @code{find} and other programs. If the output
|
||
of @code{find} is not going to another program but instead is being
|
||
shown on a terminal, this can still be a problem. For example, some
|
||
character sequences can reprogram the function keys on some terminals.
|
||
@xref{Security Considerations}, for a discussion of other security
|
||
problems relating to @code{find}.
|
||
|
||
Unusual characters are handled differently by various
|
||
actions, as described below.
|
||
|
||
@table @samp
|
||
@item -print0
|
||
@itemx -fprint0
|
||
Always print the exact file name, unchanged, even if the output is
|
||
going to a terminal.
|
||
@item -ok
|
||
@itemx -okdir
|
||
Always print the exact file name, unchanged. This will probably
|
||
change in a future release.
|
||
@item -ls
|
||
@itemx -fls
|
||
Unusual characters are always escaped. White space, backslash, and
|
||
double quote characters are printed using C-style escaping (for
|
||
example @samp{\f}, @samp{\"}). Other unusual characters are printed
|
||
using an octal escape. Other printable characters (for @samp{-ls} and
|
||
@samp{-fls} these are the characters between octal 041 and 0176) are
|
||
printed as-is.
|
||
@item -printf
|
||
@itemx -fprintf
|
||
If the output is not going to a terminal, it is printed as-is.
|
||
Otherwise, the result depends on which directive is in use:
|
||
|
||
@table @asis
|
||
@item %D, %F, %H, %Y, %y
|
||
These expand to values which are not under control of files' owners,
|
||
and so are printed as-is.
|
||
@item %a, %b, %c, %d, %g, %G, %i, %k, %m, %M, %n, %s, %t, %u, %U
|
||
These have values which are under the control of files' owners but
|
||
which cannot be used to send arbitrary data to the terminal, and so
|
||
these are printed as-is.
|
||
@item %f, %h, %l, %p, %P
|
||
The output of these directives is quoted if the output is going to a
|
||
terminal. The setting of the @env{LC_CTYPE} environment
|
||
variable is used to determine which characters need to be quoted.
|
||
|
||
This quoting is performed in the same way as for GNU @code{ls}. This
|
||
is not the same quoting mechanism as the one used for @samp{-ls} and
|
||
@samp{fls}. If you are able to decide what format to use for the
|
||
output of @code{find} then it is normally better to use @samp{\0} as a
|
||
terminator than to use newline, as file names can contain white space
|
||
and newline characters.
|
||
@end table
|
||
@item -print
|
||
@itemx -fprint
|
||
Quoting is handled in the same way as for the @samp{%p} directive of
|
||
@samp{-printf} and @samp{-fprintf}. If you are using @code{find} in a
|
||
script or in a situation where the matched files might have arbitrary
|
||
names, you should consider using @samp{-print0} instead of
|
||
@samp{-print}.
|
||
@end table
|
||
|
||
|
||
The @code{locate} program quotes and escapes unusual characters in
|
||
file names in the same way as @code{find}'s @samp{-print} action.
|
||
|
||
The behaviours described above may change soon, as the treatment of
|
||
unprintable characters is harmonised for @samp{-ls}, @samp{-fls},
|
||
@samp{-print}, @samp{-fprint}, @samp{-printf} and @samp{-fprintf}.
|
||
|
||
@node Limiting Command Size
|
||
@subsubsection Limiting Command Size
|
||
|
||
@code{xargs} gives you control over how many arguments it passes to
|
||
the command each time it executes it. By default, it uses up to
|
||
@code{ARG_MAX} - 2k, or 128k, whichever is smaller, characters per
|
||
command. It uses as many lines and arguments as fit within that
|
||
limit. The following options modify those values.
|
||
|
||
@table @code
|
||
@item --no-run-if-empty
|
||
@itemx -r
|
||
If the standard input does not contain any nonblanks, do not run the
|
||
command. By default, the command is run once even if there is no
|
||
input. This option is a GNU extension.
|
||
|
||
@item --max-lines@r{[}=@var{max-lines}@r{]}
|
||
@itemx -L @var{max-lines}
|
||
@itemx -l@r{[}@var{max-lines}@r{]}
|
||
Use at most @var{max-lines} nonblank input lines per command line;
|
||
@var{max-lines} defaults to 1 if omitted; omitting the argument is not
|
||
allowed in the case of the @samp{-L} option. Trailing blanks cause an
|
||
input line to be logically continued on the next input line, for the
|
||
purpose of counting the lines. Implies @samp{-x}. The preferred name
|
||
for this option is @samp{-L} as this is specified by POSIX.
|
||
|
||
@item --max-args=@var{max-args}
|
||
@itemx -n @var{max-args}
|
||
Use at most @var{max-args} arguments per command line. Fewer than
|
||
@var{max-args} arguments will be used if the size (see the @samp{-s}
|
||
option) is exceeded, unless the @samp{-x} option is given, in which
|
||
case @code{xargs} will exit.
|
||
|
||
@item --max-chars=@var{max-chars}
|
||
@itemx -s @var{max-chars}
|
||
Use at most @var{max-chars} characters per command line, including the
|
||
command initial arguments and the terminating nulls at the ends of the
|
||
argument strings. If you specify a value for this option which is too
|
||
large or small, a warning message is printed and the appropriate upper
|
||
or lower limit is used instead. You can use @samp{--show-limits}
|
||
option to understand the command-line limits applying to @code{xargs}
|
||
and how this is affected by any other options. The POSIX limits shown
|
||
when you do this have already been adjusted to take into account the
|
||
size of your environment variables.
|
||
|
||
The largest allowed value is system-dependent, and is calculated as
|
||
the argument length limit for exec, less the size of your environment,
|
||
less 2048 bytes of headroom. If this value is more than 128KiB,
|
||
128Kib is used as the default value; otherwise, the default value is
|
||
the maximum.
|
||
@end table
|
||
|
||
@node Controlling Parallelism
|
||
@subsubsection Controlling Parallelism
|
||
|
||
Normally, @code{xargs} runs one command at a time. This is called
|
||
"serial" execution; the commands happen in a series, one after another.
|
||
If you'd like @code{xargs} to do things in "parallel", you can ask it
|
||
to do so, either when you invoke it, or later while it is running.
|
||
Running several commands at one time can make the entire operation
|
||
go more quickly, if the commands are independent, and if your system
|
||
has enough resources to handle the load. When parallelism works in
|
||
your application, @code{xargs} provides an easy way to get your work
|
||
done faster.
|
||
|
||
@table @code
|
||
@item --max-procs=@var{max-procs}
|
||
@itemx -P @var{max-procs}
|
||
Run up to @var{max-procs} processes at a time; the default is 1. If
|
||
@var{max-procs} is 0, @code{xargs} will run as many processes as
|
||
possible at a time. Use the @samp{-n}, @samp{-s}, or @samp{-L} option
|
||
with @samp{-P}; otherwise chances are that the command will be run
|
||
only once. If a child process exits with status 255, @code{xargs} will
|
||
still wait for all child processes to exit (before version 4.9.0 this
|
||
might not happen).
|
||
@end table
|
||
|
||
If @code{xargs} is run without the @samp{-P} option, it will not
|
||
change the handling of the @code{SIGUSR1} and @code{SIGUSR2} signals.
|
||
This means they will terminate the @code{xargs} program unless those
|
||
signals were set to be ignored in the parent process of @code{xargs}.
|
||
If you do not want parallel execution but you also do not want these
|
||
signals to be fatal, you can specify @code{-P 1}.
|
||
|
||
Suppose you have a directory tree of large image files and a
|
||
@code{makeallsizes} script that takes a single file name and creates
|
||
various sized images from it (thumbnail-sized, web-page-sized,
|
||
printer-sized, and the original large file). The script is doing
|
||
enough work that it takes significant time to run, even on a single
|
||
image. You could run:
|
||
|
||
@example
|
||
find originals -name '*.jpg' | xargs -L 1 makeallsizes
|
||
@end example
|
||
|
||
This will run @code{makeallsizes @var{filename}} once for each @code{.jpg}
|
||
file in the @code{originals} directory. However, if your system has
|
||
two central processors, this script will only keep one of them busy.
|
||
Instead, you could probably finish in about half the time by running:
|
||
|
||
@example
|
||
find originals -name '*.jpg' | xargs -L 1 -P 2 makeallsizes
|
||
@end example
|
||
|
||
@code{xargs} will run the first two commands in parallel, and then
|
||
whenever one of them terminates, it will start another one, until
|
||
the entire job is done.
|
||
|
||
The same idea can be generalized to as many processors as you have handy.
|
||
It also generalizes to other resources besides processors. For example,
|
||
if @code{xargs} is running commands that are waiting for a response from a
|
||
distant network connection, running a few in parallel may reduce the
|
||
overall latency by overlapping their waiting time.
|
||
|
||
If you are running commands in parallel, you need to think about how they should
|
||
arbitrate access to any resources that they share.
|
||
For example, if more than one of them tries to print to standard output, the
|
||
output will be produced in an indeterminate order (and very likely mixed up)
|
||
unless the processes collaborate in some way to prevent this.
|
||
Using some kind of locking scheme is one way to prevent such problems.
|
||
In general, using a locking scheme will help ensure correct output but reduce
|
||
performance.
|
||
If you don't want to tolerate the performance difference, simply arrange for
|
||
each process to produce a separate output file (or otherwise use separate
|
||
resources).
|
||
|
||
@code{xargs} also allows ``turning up'' or ``turning down'' its parallelism
|
||
in the middle of a run. Suppose you are keeping your four-processor
|
||
system busy for hours, processing thousands of images using @code{-P 4}.
|
||
Now, in the middle of the run, you or someone else wants you to reduce
|
||
your load on the system, so that something else will run faster.
|
||
If you interrupt @code{xargs}, your job will be half-done, and it
|
||
may take significant manual work to resume it only for the remaining
|
||
images. If you suspend @code{xargs} using your shell's job controls
|
||
(e.g. @code{control-Z}), then it will get no work done while suspended.
|
||
|
||
Find out the process ID of the @code{xargs} process, either from your
|
||
shell or with the @code{ps} command. After you send it the signal
|
||
@code{SIGUSR2}, @code{xargs} will run one fewer command in parallel.
|
||
If you send it the signal @code{SIGUSR1}, it will run one more command
|
||
in parallel. For example:
|
||
|
||
@example
|
||
shell$ xargs <allimages -L 1 -P 4 makeallsizes &
|
||
[4] 27643
|
||
... at some later point ...
|
||
shell$ kill -USR2 27643
|
||
shell$ kill -USR2 %4
|
||
@end example
|
||
|
||
The first @code{kill} command will cause @code{xargs} to wait for
|
||
two commands to terminate before starting the next command (reducing
|
||
the parallelism from 4 to 3). The second @code{kill} will reduce it from
|
||
3 to 2. (@code{%4} works in some shells as a shorthand for the process
|
||
ID of the background job labeled @code{[4]}.)
|
||
|
||
Similarly, if you started a long @code{xargs} job without parallelism, you
|
||
can easily switch it to start running two commands in parallel by sending
|
||
it a @code{SIGUSR1}.
|
||
|
||
@code{xargs} will never terminate any existing commands when you ask it
|
||
to run fewer processes. It merely waits for the excess commands to
|
||
finish. If you ask it to run more commands, it will start the next
|
||
one immediately (if it has more work to do). If the degree of
|
||
parallelism is already 1, sending @code{SIGUSR2} will have no further
|
||
effect (since @code{--max-procs=0} means that there should be no limit
|
||
on the number of processes to run).
|
||
|
||
There is an implementation-defined limit on the number of processes.
|
||
This limit is shown with @code{xargs --show-limits}. The limit is at
|
||
least 127 on all systems (and on the author's system it is
|
||
2147483647).
|
||
|
||
If you send several identical signals quickly, the operating system
|
||
does not guarantee that each of them will be delivered to @code{xargs}.
|
||
This means that you can't rapidly increase or decrease the parallelism by
|
||
more than one command at a time. You can avoid this problem by sending
|
||
a signal, observing the result, then sending the next one; or merely by
|
||
delaying for a few seconds between signals (unless your system is very
|
||
heavily loaded).
|
||
|
||
Whether or not parallel execution will work well for you depends on
|
||
the nature of the command you are running in parallel, on the
|
||
configuration of the system on which you are running the command, and
|
||
on the other work being done on the system at the time.
|
||
|
||
@node Interspersing File Names
|
||
@subsubsection Interspersing File Names
|
||
|
||
@code{xargs} can insert the name of the file it is processing between
|
||
arguments you give for the command. Unless you also give options to
|
||
limit the command size (@pxref{Limiting Command Size}), this mode of
|
||
operation is equivalent to @samp{find -exec} (@pxref{Single File}).
|
||
|
||
@table @code
|
||
@item -I @var{replace-str}
|
||
@itemx --replace@r{[}=@var{replace-str}@r{]}
|
||
@itemx -i@r{[}@var{replace-str}@r{]}
|
||
Replace occurrences of @var{replace-str} in the initial arguments with
|
||
names read from standard input.
|
||
Also, unquoted blanks do not terminate arguments;
|
||
instead, the input is split at newlines only.
|
||
If @var{replace-str} is omitted (omitting it is allowed only for @samp{-i}
|
||
and @samp{--replace}), it defaults to @samp{@{@}} (like for @samp{find -exec}).
|
||
Implies @samp{-x} and @samp{-L 1}.
|
||
The @samp{-i} option is deprecated in favour of the @samp{-I} option.
|
||
@end table
|
||
|
||
As an example, to sort each file in the @file{bills}
|
||
directory, leaving the output in that file name with @file{.sorted}
|
||
appended, you could do:
|
||
|
||
@example
|
||
find bills -type f | xargs -I XX sort -o XX.sorted XX
|
||
@end example
|
||
|
||
@noindent
|
||
The equivalent command using @samp{find -execdir} is:
|
||
|
||
@example
|
||
find bills -type f -execdir sort -o '@{@}.sorted' '@{@}' ';'
|
||
@end example
|
||
|
||
When you use the @samp{-I} option, each line read from the input is
|
||
buffered internally. This means that there is an upper limit on the
|
||
length of input line that @code{xargs} will accept when used with the
|
||
@samp{-I} option. To work around this limitation, you can use the
|
||
@samp{-s} option to increase the amount of buffer space that xargs
|
||
uses, and you can also use an extra invocation of xargs to ensure that
|
||
very long lines do not occur. For example:
|
||
|
||
@example
|
||
somecommand | xargs -s 50000 echo | xargs -I '@{@}' -s 100000 rm '@{@}'
|
||
@end example
|
||
|
||
Here, the first invocation of @code{xargs} has no input line length
|
||
limit because it doesn't use the @samp{-I} option. The second
|
||
invocation of @code{xargs} does have such a limit, but we have ensured
|
||
that it never encounters a line which is longer than it can
|
||
handle.
|
||
|
||
This is not an ideal solution. Instead, the @samp{-I} option should
|
||
not impose a line length limit (apart from any limit imposed by the
|
||
operating system) and so one might consider this limitation to be a
|
||
bug. A better solution would be to allow @code{xargs -I} to
|
||
automatically move to a larger value for the @samp{-s} option when
|
||
this is needed.
|
||
|
||
This sort of problem doesn't occur with the output of @code{find}
|
||
because it emits just one filename per line.
|
||
|
||
@node Querying
|
||
@subsection Querying
|
||
|
||
To ask the user whether to execute a command on a single file, you can
|
||
use the @code{find} primary @samp{-okdir} instead of @samp{-execdir},
|
||
and the @code{find} primary @samp{-ok} instead of @samp{-exec}:
|
||
|
||
@deffn Action -okdir command ;
|
||
Like @samp{-execdir} (@pxref{Single File}), but ask the user first.
|
||
If the user does not agree to run the command, just return false.
|
||
Otherwise, run it, with standard input redirected from
|
||
@file{/dev/null}.
|
||
|
||
This action may not be specified together with the @samp{-files0-from} option.
|
||
|
||
The response to the prompt is matched against a pair of regular
|
||
expressions to determine if it is a yes or no response. These regular
|
||
expressions are obtained from the system (@code{nl_langinfo} items
|
||
YESEXPR and NOEXPR are used) if the @env{POSIXLY_CORRECT} environment
|
||
variable is set and the system has such patterns available. Otherwise,
|
||
@code{find}'s message translations are used. In either case, the
|
||
@env{LC_MESSAGES} environment variable will determine the regular
|
||
expressions used to determine if the answer is affirmative or negative.
|
||
The interpretation of the regular expressions themselves will be
|
||
affected by the environment variables @env{LC_CTYPE} (character
|
||
classes) and @env{LC_COLLATE} (character ranges and equivalence
|
||
classes).
|
||
@end deffn
|
||
|
||
@deffn Action -ok command ;
|
||
This insecure variant of the @samp{-okdir} action is specified by
|
||
POSIX. The main difference is that the command is executed in the
|
||
directory from which @code{find} was invoked, meaning that @samp{@{@}}
|
||
is expanded to a relative path starting with the name of one of the
|
||
starting directories, rather than just the basename of the matched
|
||
file. If the command is run, its standard input is redirected from
|
||
@file{/dev/null}.
|
||
|
||
This action may not be specified together with the @samp{-files0-from} option.
|
||
@end deffn
|
||
|
||
When processing multiple files with a single command, to query the
|
||
user you give @code{xargs} the following option. When using this
|
||
option, you might find it useful to control the number of files
|
||
processed per invocation of the command (@pxref{Limiting Command
|
||
Size}).
|
||
|
||
@table @code
|
||
@item --interactive
|
||
@itemx -p
|
||
Prompt the user about whether to run each command line and read a line
|
||
from the terminal. Only run the command line if the response starts
|
||
with @samp{y} or @samp{Y}. Implies @samp{-t}.
|
||
@end table
|
||
|
||
@node Delete Files
|
||
@section Delete Files
|
||
|
||
@deffn Action -delete
|
||
Delete files or directories; true if removal succeeded. If the
|
||
removal failed, an error message is issued and @code{find}'s exit status
|
||
will be nonzero (when it eventually exits).
|
||
|
||
@quotation Warning
|
||
Don't forget that @code{find} evaluates the command line as an expression,
|
||
so putting @samp{-delete} first will make @code{find} try to delete everything
|
||
below the starting points you specified.
|
||
@end quotation
|
||
|
||
The use of the @samp{-delete} action on the command line automatically
|
||
turns on the @samp{-depth} option.
|
||
As in turn @samp{-depth} makes @samp{-prune} ineffective, the @samp{-delete}
|
||
action cannot usefully be combined with @samp{-prune}.
|
||
|
||
Often, the user might want to test a @code{find} command line with @samp{-print}
|
||
prior to adding @samp{-delete} for the actual removal run. To avoid surprising
|
||
results, it is usually best to remember to use @samp{-depth} explicitly during
|
||
those earlier test runs.
|
||
|
||
See @ref{Cleaning Up} for a deeper discussion about good use cases of the
|
||
@samp{-delete} action and those with surprising results.
|
||
|
||
The @samp{-delete} action will fail to remove a directory unless it is empty.
|
||
|
||
Together with the @samp{-ignore_readdir_race} option, @code{find} will
|
||
ignore errors of the @samp{-delete} action in the case the file has disappeared
|
||
since the parent directory was read: it will not output an error diagnostic, not
|
||
change the exit code to nonzero, and the return code of the @samp{-delete}
|
||
action will be true.
|
||
@end deffn
|
||
|
||
@node Adding Tests
|
||
@section Adding Tests
|
||
|
||
You can test for file attributes that none of the @code{find} builtin
|
||
tests check. To do this, use @code{xargs} to run a program that
|
||
filters a list of files printed by @code{find}. If possible, use
|
||
@code{find} builtin tests to pare down the list, so the program run by
|
||
@code{xargs} has less work to do. The tests builtin to @code{find}
|
||
will likely run faster than tests that other programs perform.
|
||
|
||
For reasons of efficiency it is often useful to limit the number of
|
||
times an external program has to be run. For this reason, it is often
|
||
a good idea to implement ``extended'' tests by using @code{xargs}.
|
||
|
||
For example, here is a way to print the names of all of the unstripped
|
||
binaries in the @file{/usr/local} directory tree. Builtin tests avoid
|
||
running @code{file} on files that are not regular files or are not
|
||
executable.
|
||
|
||
@example
|
||
find /usr/local -type f -perm /a=x | xargs file |
|
||
grep 'not stripped' | cut -d: -f1
|
||
@end example
|
||
|
||
@noindent
|
||
The @code{cut} program removes everything after the file name from the
|
||
output of @code{file}.
|
||
|
||
However, using @code{xargs} can present important security problems
|
||
(@pxref{Security Considerations}). These can be avoided by using
|
||
@samp{-execdir}. The @samp{-execdir} action is also a useful way of
|
||
putting your own test in the middle of a set of other tests or actions
|
||
for @code{find} (for example, you might want to use @samp{-prune}).
|
||
|
||
@c Idea from Martin Weitzel.
|
||
To place a special test somewhere in the middle of a @code{find}
|
||
expression, you can use @samp{-execdir} (or, less securely,
|
||
@samp{-exec}) to run a program that performs the test. Because
|
||
@samp{-execdir} evaluates to the exit status of the executed program,
|
||
you can use a program (which can be a shell script) that tests for a
|
||
special attribute and make it exit with a true (zero) or false
|
||
(non-zero) status. It is a good idea to place such a special test
|
||
@emph{after} the builtin tests, because it starts a new process which
|
||
could be avoided if a builtin test evaluates to false.
|
||
|
||
Here is a shell script called @code{unstripped} that checks whether
|
||
its argument is an unstripped binary file:
|
||
|
||
@example
|
||
#! /bin/sh
|
||
file "$1" | grep "not stripped" >/dev/null
|
||
@end example
|
||
|
||
|
||
This script relies on the shell exiting with the status of
|
||
the last command in the pipeline, in this case @code{grep}. The
|
||
@code{grep} command exits with a true status if it found any matches,
|
||
false if not. Here is an example of using the script (assuming it is
|
||
in your search path). It lists the stripped executables (and shell
|
||
scripts) in the file @file{sbins} and the unstripped ones in
|
||
@file{ubins}.
|
||
|
||
@example
|
||
find /usr/local -type f -perm /a=x \
|
||
\( -execdir unstripped '@{@}' \; -fprint ubins -o -fprint sbins \)
|
||
@end example
|
||
|
||
|
||
@node Databases
|
||
@chapter File Name Databases
|
||
|
||
The file name databases used by @code{locate} contain lists of files
|
||
that were in particular directory trees when the databases were last
|
||
updated. The file name of the default database is determined when
|
||
@code{locate} and @code{updatedb} are configured and installed. The
|
||
frequency with which the databases are updated and the directories for
|
||
which they contain entries depend on how often @code{updatedb} is run,
|
||
and with which arguments.
|
||
|
||
You can obtain some statistics about the databases by using
|
||
@samp{locate --statistics}.
|
||
|
||
@menu
|
||
* Database Locations::
|
||
* Database Formats::
|
||
* Newline Handling::
|
||
@end menu
|
||
|
||
|
||
@node Database Locations
|
||
@section Database Locations
|
||
|
||
There can be multiple file name databases. Users can select which
|
||
databases @code{locate} searches using the @env{LOCATE_PATH}
|
||
environment variable or a command line option. The system
|
||
administrator can choose the file name of the default database, the
|
||
frequency with which the databases are updated, and the directories
|
||
for which they contain entries. File name databases are updated by
|
||
running the @code{updatedb} program, typically nightly.
|
||
|
||
In networked environments, it often makes sense to build a database at
|
||
the root of each filesystem, containing the entries for that
|
||
filesystem. @code{updatedb} is then run for each filesystem on the
|
||
fileserver where that filesystem is on a local disk, to prevent
|
||
thrashing the network.
|
||
|
||
@xref{Invoking updatedb}, for the description of the options to
|
||
@code{updatedb}. These options can be used to specify which
|
||
directories are indexed by each database file.
|
||
|
||
The default location for the locate database depends on how findutils
|
||
is built, but the findutils installation accompanying this manual uses
|
||
the default location @file{@value{LOCATE_DB}}.
|
||
|
||
If no database exists at @file{@value{LOCATE_DB}} but the user did not
|
||
specify where to look (by using @samp{-d} or setting
|
||
@env{LOCATE_PATH}), then @code{locate} will also check for a
|
||
``secure'' database in @file{/var/lib/slocate/slocate.db}.
|
||
|
||
@node Database Formats
|
||
@section Database Formats
|
||
|
||
The file name databases contain lists of files that were in particular
|
||
directory trees when the databases were last updated. The file name
|
||
database format changed starting with GNU @code{locate} version 4.0 to
|
||
allow machines with different byte orderings to share the databases.
|
||
|
||
GNU @code{locate} can read both the old pre-findutils-4.0 database
|
||
format and the @samp{LOCATE02} database format. Support for the old
|
||
database format will shortly be removed from @code{locate}. It has
|
||
already been removed from @code{updatedb}.
|
||
|
||
If you run @samp{locate --statistics}, the resulting summary indicates
|
||
the type of each @code{locate} database. You select which database
|
||
format @code{updatedb} will use with the @samp{--dbformat} option.
|
||
|
||
The @samp{slocate} database format is very similar to @samp{LOCATE02}
|
||
and is also supported (in both @code{updatedb} and @code{locate}).
|
||
|
||
@menu
|
||
* LOCATE02 Database Format::
|
||
* Sample LOCATE02 Database::
|
||
* slocate Database Format::
|
||
* Old Database Format::
|
||
@end menu
|
||
|
||
@node LOCATE02 Database Format
|
||
@subsection LOCATE02 Database Format
|
||
|
||
@code{updatedb} runs a program called @code{frcode} to
|
||
@dfn{front-compress} the list of file names, which reduces the
|
||
database size by a factor of 4 to 5. Front-compression (also known as
|
||
incremental encoding) works as follows.
|
||
|
||
The database entries are a sorted list (case-insensitively, for users'
|
||
convenience). Since the list is sorted, each entry is likely to share
|
||
a prefix (initial string) with the previous entry. Each database
|
||
entry begins with an offset-differential count byte, which is the
|
||
additional number of characters of prefix of the preceding entry to
|
||
use beyond the number that the preceding entry is using of its
|
||
predecessor. (The counts can be negative.) Following the count is a
|
||
null-terminated ASCII remainder -- the part of the name that follows
|
||
the shared prefix.
|
||
|
||
If the offset-differential count is larger than can be stored in a
|
||
byte (+/-127), the byte has the value 0x80 and the count follows in a
|
||
2-byte word, with the high byte first (network byte order).
|
||
|
||
Every database begins with a dummy entry for a file called
|
||
@file{LOCATE02}, which @code{locate} checks for to ensure that the
|
||
database file has the correct format; it ignores the entry in doing
|
||
the search.
|
||
|
||
Databases cannot be concatenated together, even if the first (dummy)
|
||
entry is trimmed from all but the first database. This is because the
|
||
offset-differential count in the first entry of the second and
|
||
following databases will be wrong.
|
||
|
||
In the output of @samp{locate --statistics}, the new database format
|
||
is referred to as @samp{LOCATE02}.
|
||
|
||
@node Sample LOCATE02 Database
|
||
@subsection Sample LOCATE02 Database
|
||
|
||
Sample input to @code{frcode}:
|
||
@c with nulls changed to newlines:
|
||
|
||
@example
|
||
/usr/src
|
||
/usr/src/cmd/aardvark.c
|
||
/usr/src/cmd/armadillo.c
|
||
/usr/tmp/zoo
|
||
@end example
|
||
|
||
Length of the longest prefix of the preceding entry to share:
|
||
|
||
@example
|
||
0 /usr/src
|
||
8 /cmd/aardvark.c
|
||
14 rmadillo.c
|
||
5 tmp/zoo
|
||
@end example
|
||
|
||
Output from @code{frcode}, with trailing nulls changed to newlines
|
||
and count bytes made printable:
|
||
|
||
@example
|
||
0 LOCATE02
|
||
0 /usr/src
|
||
8 /cmd/aardvark.c
|
||
6 rmadillo.c
|
||
-9 tmp/zoo
|
||
@end example
|
||
|
||
(6 = 14 - 8, and -9 = 5 - 14)
|
||
|
||
@node slocate Database Format
|
||
@subsection slocate Database Format
|
||
|
||
The @code{slocate} program uses a database format similar to, but not
|
||
quite the same as, GNU @code{locate}. The first byte of the database
|
||
specifies its @dfn{security level}. If the security level is 0,
|
||
@code{slocate} will read, match and print filenames on the basis of
|
||
the information in the database only. However, if the security level
|
||
byte is 1, @code{slocate} omits entries from its output if the
|
||
invoking user is unable to access them. The second byte of the
|
||
database is zero. The second byte is immediately followed by the
|
||
first database entry. The first entry in the database is not preceded
|
||
by any differential count or dummy entry. Instead the differential
|
||
count for the first item is assumed to be zero.
|
||
|
||
Starting with the second entry (if any) in the database, data is
|
||
interpreted as for the GNU LOCATE02 format.
|
||
|
||
@node Old Database Format
|
||
@subsection Old Database Format
|
||
|
||
The old database format is used by Unix @code{locate} and @code{find}
|
||
programs and pre-4.0 releases of GNU findutils. @code{locate}
|
||
understands this format, though @code{updatedb} will no longer produce
|
||
it.
|
||
|
||
The old format differs from @samp{LOCATE02} in the following ways.
|
||
Instead of each entry starting with an offset-differential count byte
|
||
and ending with a null, byte values from 0 through 28 indicate
|
||
offset-differential counts from -14 through 14. The byte value
|
||
indicating that a long offset-differential count follows is 0x1e (30),
|
||
not 0x80. The long counts are stored in host byte order, which is not
|
||
necessarily network byte order, and host integer word size, which is
|
||
usually 4 bytes. They also represent a count 14 less than their
|
||
value. The database lines have no termination byte; the start of the
|
||
next line is indicated by its first byte having a value <= 30.
|
||
|
||
In addition, instead of starting with a dummy entry, the old database
|
||
format starts with a 256 byte table containing the 128 most common
|
||
bigrams in the file list. A bigram is a pair of adjacent bytes.
|
||
Bytes in the database that have the high bit set are indexes (with the
|
||
high bit cleared) into the bigram table. The bigram and
|
||
offset-differential count coding makes these databases 20-25% smaller
|
||
than the new format, but makes them not 8-bit clean. Any byte in a
|
||
file name that is in the ranges used for the special codes is replaced
|
||
in the database by a question mark, which not coincidentally is the
|
||
shell wildcard to match a single character. The old format therefore
|
||
cannot faithfully store entries with non-ASCII characters.
|
||
|
||
Because the long counts are stored as
|
||
native-order machine words, the database format is not easily used in
|
||
environments which differ in terms of byte order. If locate databases
|
||
are to be shared between machines, the @samp{LOCATE02} database format should
|
||
be used. This has other benefits as discussed above. However, the
|
||
length of the filename currently being processed can normally be used
|
||
to place reasonable limits on the long counts and so this information
|
||
is used by locate to help it guess the byte ordering of the old format
|
||
database. Unless it finds evidence to the contrary, @code{locate}
|
||
will assume that the byte order of the database is the same as the
|
||
native byte order of the machine running @code{locate}. The output of
|
||
@samp{locate --statistics} also includes information about the byte
|
||
order of old-format databases.
|
||
|
||
The output of @samp{locate --statistics} will give an incorrect count
|
||
of the number of file names containing newlines or high-bit characters
|
||
for old-format databases.
|
||
|
||
Old versions of GNU @code{locate} fail to correctly handle very long
|
||
file names, possibly leading to security problems relating to a heap
|
||
buffer overrun. @xref{Security Considerations for locate}, for a
|
||
detailed explanation.
|
||
|
||
@node Newline Handling
|
||
@section Newline Handling
|
||
|
||
Within the database, file names are terminated with a null character.
|
||
This is the case for both the old and the new format.
|
||
|
||
When the new database format is being used, the compression technique
|
||
used to generate the database though relies on the ability to sort the
|
||
list of files before they are presented to @code{frcode}.
|
||
|
||
If the system's sort command allows separating its input list of
|
||
files with null characters via the @samp{-z} option, this option
|
||
is used and therefore @code{updatedb} and @code{locate} will both
|
||
correctly handle file names containing newlines. If the @code{sort}
|
||
command lacks support for this, the list of files is delimited with
|
||
the newline character, meaning that parts of file names containing
|
||
newlines will be incorrectly sorted. This can result in both
|
||
incorrect matches and incorrect failures to match.
|
||
|
||
@node File Permissions
|
||
@chapter File Permissions
|
||
|
||
@include perm.texi
|
||
|
||
@include parse-datetime.texi
|
||
|
||
@node Configuration
|
||
@chapter Configuration
|
||
|
||
The findutils source distribution includes a @code{configure} script
|
||
which examines the system and generates files required to build
|
||
findutils. See the files @file{README} and @file{INSTALL}.
|
||
|
||
A number of options can be specified on the @code{configure} command
|
||
line, and many of these are straightforward, adequately documented in
|
||
the @code{--help} output, or not normally useful. Options which are
|
||
useful or which are not obvious are explained here.
|
||
|
||
@menu
|
||
* Leaf Optimisation:: Take advantage of Unix file system semantics.
|
||
* d_type Optimisation:: Take advantage of file type information.
|
||
@end menu
|
||
|
||
@node Leaf Optimisation
|
||
@section Leaf Optimisation
|
||
|
||
Files in Unix file systems have a link count which indicates how many
|
||
names point to the same inode. Directories in Unix filesystems have a
|
||
@file{..} entry which functions as a hard link to the parent directory
|
||
and a @file{.} entry which functions as a link to the directory itself.
|
||
The @file{..} entry of the root directory also points to the root.
|
||
This means that @code{find} can deduce the number of subdirectories a
|
||
directory has, simply by subtracting 2 from the directory's link
|
||
count. This allows @file{find} saving @code{stat} calls which would
|
||
otherwise be needed to discover which directory entries are
|
||
subdirectories.
|
||
|
||
File systems which don't have these semantics should simply return a
|
||
value less than 2 in the @code{st_nlinks} member of @code{struct stat}
|
||
in response to a successful call to @code{stat}.
|
||
|
||
If you are building @code{find} for a system on which the value of
|
||
@code{st_nlinks} is unreliable, you can specify
|
||
@code{--disable-leaf-optimisation} to @code{configure} to prevent this
|
||
assumption being made.
|
||
|
||
@node d_type Optimisation
|
||
@section d_type Optimisation
|
||
|
||
When this feature is enabled, @code{find} takes advantage of the fact
|
||
that on some systems @code{readdir} will return the type of a file in
|
||
@code{struct dirent}.
|
||
|
||
|
||
@node Reference
|
||
@chapter Reference
|
||
|
||
Below are summaries of the command line syntax for the programs
|
||
discussed in this manual.
|
||
|
||
@menu
|
||
* Invoking find::
|
||
* Invoking locate::
|
||
* Invoking updatedb::
|
||
* Invoking xargs::
|
||
* Regular Expressions::
|
||
* Environment Variables::
|
||
@end menu
|
||
|
||
@node Invoking find
|
||
@section Invoking @code{find}
|
||
|
||
@example
|
||
find @r{[-H] [-L] [-P] [-D @var{debugoptions}] [-O@var{level}]} @r{[}@var{file}@dots{}@r{]} @r{[}@var{expression}@r{]}
|
||
@end example
|
||
|
||
@code{find} searches the directory tree rooted at each file name
|
||
@var{file} by evaluating the @var{expression} on each file it finds in
|
||
the tree.
|
||
|
||
The command line may begin with the @samp{-H}, @samp{-L}, @samp{-P},
|
||
@samp{-D} and @samp{-O} options. These are followed by a list of
|
||
files or directories that should be searched. If no files to search
|
||
are specified, the current directory (@file{.}) is used.
|
||
|
||
This list of files to search is followed by a list of expressions
|
||
describing the files we wish to search for. The first part of the
|
||
expression is recognised by the fact that it begins with @samp{-}
|
||
followed by some other letters (for example @samp{-print}), or is
|
||
either @samp{(} or @samp{!}. Any arguments after it are the rest of
|
||
the expression.
|
||
|
||
If no expression is given, the expression @samp{-print} is used.
|
||
|
||
The @code{find} command exits with status zero if all files matched
|
||
are processed successfully, greater than zero if errors occur.
|
||
|
||
The @code{find} program also recognises two options for administrative
|
||
use:
|
||
|
||
@table @samp
|
||
@item --help
|
||
Print a summary of the command line usage and exit.
|
||
@item --version
|
||
Print the version number of @code{find} and exit.
|
||
@end table
|
||
|
||
The @samp{-version} option is a synonym for @samp{--version}
|
||
|
||
|
||
@menu
|
||
* Filesystem Traversal Options::
|
||
* Warning Messages::
|
||
* Optimisation Options::
|
||
* Debug Options::
|
||
* Find Expressions::
|
||
@end menu
|
||
|
||
@node Filesystem Traversal Options
|
||
@subsection Filesystem Traversal Options
|
||
|
||
The options @samp{-H}, @samp{-L} or @samp{-P} may be specified at the
|
||
start of the command line (if none of these is specified, @samp{-P} is
|
||
assumed). If you specify more than one of these options, the last one
|
||
specified takes effect (but note that the @samp{-follow} option is
|
||
equivalent to @samp{-L}).
|
||
|
||
@table @code
|
||
@item -P
|
||
Never follow symbolic links (this is the default), except in the case
|
||
of the @samp{-xtype} predicate.
|
||
@item -L
|
||
Always follow symbolic links, except in the case of the @samp{-xtype}
|
||
predicate.
|
||
@item -H
|
||
Follow symbolic links specified in the list of files to search, or
|
||
which are otherwise specified on the command line.
|
||
@end table
|
||
|
||
If @code{find} would follow a symbolic link, but cannot for any reason
|
||
(for example, because it has insufficient permissions or the link is
|
||
broken), it falls back on using the properties of the symbolic link
|
||
itself. @ref{Symbolic Links} for a more complete description of how
|
||
symbolic links are handled.
|
||
|
||
@node Warning Messages
|
||
@subsection Warning Messages
|
||
|
||
If there is an error on the @code{find} command line, an error message
|
||
is normally issued. However, there are some usages that are
|
||
inadvisable but which @code{find} should still accept. Under these
|
||
circumstances, @code{find} may issue a warning message.
|
||
|
||
By default, warnings are enabled only if @code{find} is being run
|
||
interactively (specifically, if the standard input is a terminal) and
|
||
the @env{POSIXLY_CORRECT} environment variable is not set. Warning
|
||
messages can be controlled explicitly by the use of options on the
|
||
command line:
|
||
|
||
@table @code
|
||
@item -warn
|
||
Issue warning messages where appropriate.
|
||
@item -nowarn
|
||
Do not issue warning messages.
|
||
@end table
|
||
|
||
These options take effect at the point on the command line where they
|
||
are specified. Therefore it's not useful to specify @samp{-nowarn} at
|
||
the end of the command line. The warning messages affected by the
|
||
above options are triggered by:
|
||
|
||
@itemize @minus
|
||
@item
|
||
Use of the @samp{-d} option which is deprecated; please use
|
||
@samp{-depth} instead, since the latter is POSIX-compliant.
|
||
@item
|
||
Specifying an option (for example @samp{-mindepth}) after a non-option
|
||
(for example @samp{-type} or @samp{-print}) on the command line.
|
||
@item
|
||
Use of the @samp{-name} or @samp{-iname} option with a slash character
|
||
in the pattern. Since the name predicates only compare against the
|
||
basename of the visited files, the only file that can match a slash is
|
||
the root directory itself.
|
||
@end itemize
|
||
|
||
The default behaviour above is designed to work in that way so that
|
||
existing shell scripts don't generate spurious errors, but people will
|
||
be made aware of the problem.
|
||
|
||
Some warning messages are issued for less common or more serious
|
||
problems, and consequently cannot be turned off:
|
||
|
||
@itemize @minus
|
||
@item
|
||
Use of an unrecognised backslash escape sequence with @samp{-fprintf}
|
||
@item
|
||
Use of an unrecognised formatting directive with @samp{-fprintf}
|
||
@end itemize
|
||
|
||
@node Optimisation Options
|
||
@subsection Optimisation Options
|
||
|
||
The @samp{-O@var{level}} option sets @code{find}'s optimisation level
|
||
to @var{level}. The default optimisation level is 1.
|
||
|
||
At certain optimisation levels (but not by default), @code{find}
|
||
reorders tests to speed up execution while preserving the overall
|
||
effect; that is, predicates with side effects are not reordered
|
||
relative to each other. The optimisations performed at each
|
||
optimisation level are as follows.
|
||
|
||
@table @asis
|
||
@item 0
|
||
Currently equivalent to optimisation level 1.
|
||
|
||
@item 1
|
||
This is the default optimisation level and corresponds to the
|
||
traditional behaviour. Expressions are reordered so that tests based
|
||
only on the names of files (for example@samp{ -name} and
|
||
@samp{-regex}) are performed first.
|
||
|
||
@item 2
|
||
Any @samp{-type} or @samp{-xtype} tests are performed after any tests
|
||
based only on the names of files, but before any tests that require
|
||
information from the inode. On many modern versions of Unix, file
|
||
types are returned by @code{readdir()} and so these predicates are
|
||
faster to evaluate than predicates which need to stat the file first.
|
||
|
||
If you use the @samp{-fstype FOO} predicate and specify a filesystem
|
||
type @samp{FOO} which is not known (that is, present in
|
||
@file{/etc/mtab}) at the time @code{find} starts, that predicate is
|
||
equivalent to @samp{-false}.
|
||
|
||
@item 3
|
||
At this optimisation level, the full cost-based query optimizer is
|
||
enabled. The order of tests is modified so that cheap (i.e., fast)
|
||
tests are performed first and more expensive ones are performed later,
|
||
if necessary. Within each cost band, predicates are evaluated earlier
|
||
or later according to whether they are likely to succeed or not. For
|
||
@samp{-o}, predicates which are likely to succeed are evaluated
|
||
earlier, and for @samp{-a}, predicates which are likely to fail are
|
||
evaluated earlier.
|
||
@end table
|
||
|
||
The re-ordering of operations performed by the cost-based optimizer
|
||
can result in user-visible behaviour change. For example, the
|
||
@samp{-readable} and @samp{-empty} predicates are sensitive to
|
||
re-ordering. If they are run in the order @samp{-empty -readable}, an
|
||
error message will be issued for unreadable directories. If they are
|
||
run in the order @samp{-readable -empty}, no error message will be
|
||
issued. This is the reason why such operation re-ordering is not
|
||
performed at the default optimisation level.
|
||
|
||
@node Debug Options
|
||
@subsection Debug Options
|
||
|
||
The @samp{-D} option makes @code{find} produce diagnostic output.
|
||
Much of the information is useful only for diagnosing problems, and so
|
||
most people will not find this option helpful.
|
||
|
||
The list of debug options should be comma separated. Compatibility of
|
||
the debug options is not guaranteed between releases of findutils.
|
||
For a complete list of valid debug options, see the output of
|
||
@code{find -D help}.
|
||
|
||
Valid debug options include:
|
||
@table @samp
|
||
@item tree
|
||
Show the expression tree in its original and optimized form.
|
||
@item stat
|
||
Print messages as files are examined with the stat and lstat system
|
||
calls. The find program tries to minimise such calls.
|
||
@item opt
|
||
Prints diagnostic information relating to the optimisation of the
|
||
expression tree; see the @samp{-O} option.
|
||
@item rates
|
||
Prints a summary indicating how often each predicate succeeded or
|
||
failed.
|
||
@item all
|
||
Enable all of the other debug options (but @samp{help}).
|
||
@item help
|
||
Explain the debugging options.
|
||
@end table
|
||
|
||
@node Find Expressions
|
||
@subsection Find Expressions
|
||
|
||
The final part of the @code{find} command line is a list of
|
||
expressions. @xref{Primary Index}, for a summary of all of the tests,
|
||
actions, and options that the expression can contain. If the
|
||
expression is missing, @samp{-print} is assumed.
|
||
|
||
@node Invoking locate
|
||
@section Invoking @code{locate}
|
||
|
||
@example
|
||
locate @r{[}@var{option}@dots{}@r{]} @var{pattern}@dots{}
|
||
@end example
|
||
|
||
For each @var{pattern} given @code{locate} searches one or more file
|
||
name databases returning each match of @var{pattern}.
|
||
|
||
@table @code
|
||
@item --all
|
||
@itemx -A
|
||
Print only names which match all non-option arguments, not those
|
||
matching one or more non-option arguments.
|
||
|
||
@item --basename
|
||
@itemx -b
|
||
The specified pattern is matched against just the last component of
|
||
the name of a file in the @code{locate} database. This last
|
||
component is also called the ``base name''. For example, the base
|
||
name of @file{/tmp/mystuff/foo.old.c} is @file{foo.old.c}. If the
|
||
pattern contains metacharacters, it must match the base name exactly.
|
||
If not, it must match part of the base name.
|
||
|
||
@item --count
|
||
@itemx -c
|
||
Instead of printing the matched file names, just print the total
|
||
number of matches found, unless @samp{--print} (@samp{-p}) is also
|
||
present.
|
||
|
||
|
||
@item --database=@var{path}
|
||
@itemx -d @var{path}
|
||
Instead of searching the default @code{locate} database
|
||
@file{@value{LOCATE_DB}}, @code{locate} searches the file
|
||
name databases in @var{path}, which is a colon-separated list of
|
||
database file names. You can also use the environment variable
|
||
@env{LOCATE_PATH} to set the list of database files to search. The
|
||
option overrides the environment variable if both are used. Empty
|
||
elements in @var{path} (that is, a leading or trailing colon, or two
|
||
colons in a row) are taken to stand for the default database.
|
||
A database can be supplied on standard input, using @samp{-} as an element
|
||
of @samp{path}. If more than one element of @samp{path} is @samp{-},
|
||
later instances are ignored (but a warning message is printed).
|
||
|
||
@item --existing
|
||
@itemx -e
|
||
Only print out such names which currently exist (instead of such names
|
||
which existed when the database was created). Note that this may slow
|
||
down the program a lot, if there are many matches in the database.
|
||
The way in which broken symbolic links are treated is affected by the
|
||
@samp{-L}, @samp{-P} and @samp{-H} options. Please note that it is
|
||
possible for the file to be deleted after @code{locate} has checked
|
||
that it exists, but before you use it. This option is automatically
|
||
turned on when reading an @code{slocate} database in secure mode
|
||
(@pxref{slocate Database Format}).
|
||
|
||
@item --non-existing
|
||
@itemx -E
|
||
Only print out such names which currently do not exist (instead of
|
||
such names which existed when the database was created). Note that
|
||
this may slow down the program a lot, if there are many matches in the
|
||
database. The way in which broken symbolic links are treated is
|
||
affected by the @samp{-L}, @samp{-P} and @samp{-H} options. Please
|
||
note that @code{locate} checks that the file does not exist, but a
|
||
file of the same name might be created after @code{locate}'s check but
|
||
before you read @code{locate}'s output.
|
||
|
||
@item --follow
|
||
@itemx -L
|
||
If testing for the existence of files (with the @samp{-e} or @samp{-E}
|
||
options), consider broken symbolic links to be non-existing. This is
|
||
the default behaviour.
|
||
|
||
@item --nofollow
|
||
@itemx -P
|
||
@itemx -H
|
||
If testing for the existence of files (with the @samp{-e} or @samp{-E}
|
||
options), treat broken symbolic links as if they were existing files.
|
||
The @samp{-H} form of this option is provided purely for similarity
|
||
with @code{find}; the use of @samp{-P} is recommended over @samp{-H}.
|
||
|
||
@item --ignore-case
|
||
@itemx -i
|
||
Ignore case distinctions in both the pattern and the file names.
|
||
|
||
@item --limit=N
|
||
@itemx -l N
|
||
Limit the number of results printed to N. When used with the
|
||
@samp{--count} option, the value printed will never be larger than
|
||
this limit.
|
||
@item --max-database-age=D
|
||
Normally, @code{locate} will issue a warning message when it searches
|
||
a database which is more than 8 days old. This option changes that
|
||
value to something other than 8. The effect of specifying a negative
|
||
value is undefined.
|
||
@item --mmap
|
||
@itemx -m
|
||
Accepted but does nothing. The option is supported only to provide
|
||
compatibility with BSD's @code{locate}.
|
||
|
||
@item --null
|
||
@itemx -0
|
||
Results are separated with the ASCII NUL character rather than the
|
||
newline character. To get the full benefit of this option,
|
||
use the new @code{locate} database format (that is the default
|
||
anyway).
|
||
|
||
@item --print
|
||
@itemx -p
|
||
Print search results when they normally would not be due to
|
||
use of @samp{--statistics} (@samp{-S}) or @samp{--count}
|
||
(@samp{-c}).
|
||
|
||
@item --wholename
|
||
@itemx -w
|
||
The specified pattern is matched against the whole name of the file in
|
||
the @code{locate} database. If the pattern contains metacharacters,
|
||
it must match exactly. If not, it must match part of the whole file
|
||
name. This is the default behaviour.
|
||
|
||
@item --regex
|
||
@itemx -r
|
||
Instead of using substring or shell glob matching, the pattern
|
||
specified on the command line is understood to be a regular
|
||
expression. GNU Emacs-style regular expressions are assumed unless
|
||
the @samp{--regextype} option is also given. File names from the
|
||
@code{locate} database are matched using the specified regular
|
||
expression. If the @samp{-i} flag is also given, matching is
|
||
case-insensitive. Matches are performed against the whole path name,
|
||
and so by default a pathname will be matched if any part of it matches
|
||
the specified regular expression. The regular expression may use
|
||
@samp{^} or @samp{$} to anchor a match at the beginning or end of a
|
||
pathname.
|
||
|
||
@item --regextype
|
||
This option changes the regular expression syntax and behaviour used
|
||
by the @samp{--regex} option. @ref{Regular Expressions} for more
|
||
information on the regular expression dialects understood by GNU
|
||
findutils.
|
||
|
||
@item --stdio
|
||
@itemx -s
|
||
Accepted but does nothing. The option is supported only to provide
|
||
compatibility with BSD's @code{locate}.
|
||
|
||
@item --statistics
|
||
@itemx -S
|
||
Print some summary information for each @code{locate} database. No
|
||
search is performed unless non-option arguments are given.
|
||
Although the BSD version of locate also has this option, the format of the
|
||
output is different.
|
||
|
||
@item --help
|
||
Print a summary of the command line usage for @code{locate} and exit.
|
||
|
||
@item --version
|
||
Print the version number of @code{locate} and exit.
|
||
@end table
|
||
|
||
@node Invoking updatedb
|
||
@section Invoking @code{updatedb}
|
||
|
||
@example
|
||
updatedb @r{[}@var{option}@dots{}@r{]}
|
||
@end example
|
||
|
||
@code{updatedb} creates and updates the database of file names used by
|
||
@code{locate}. @code{updatedb} generates a list of files similar to
|
||
the output of @code{find} and then uses utilities for optimizing the
|
||
database for performance. @code{updatedb} is often run periodically
|
||
as a @code{cron} job and configured with environment variables or
|
||
command options. Typically, operating systems have a shell script
|
||
that ``exports'' configurations for variable definitions and uses
|
||
another shell script that ``sources'' the configuration file into the
|
||
environment and then executes @code{updatedb} in the environment.
|
||
|
||
@table @code
|
||
@item --findoptions='@var{OPTION}@dots{}'
|
||
Global options to pass on to @code{find}.
|
||
The environment variable @env{FINDOPTIONS} also sets this value.
|
||
Default is none.
|
||
|
||
@item --localpaths='@var{path}@dots{}'
|
||
Non-network directories to put in the database.
|
||
Default is @file{/}.
|
||
|
||
@item --netpaths='@var{path}@dots{}'
|
||
Network (NFS, AFS, RFS, etc.) directories to put in the database.
|
||
The environment variable @env{NETPATHS} also sets this value.
|
||
Default is none.
|
||
|
||
@item --prunepaths='@var{path}@dots{}'
|
||
Directories to omit from the database, which would otherwise be
|
||
included. The environment variable @env{PRUNEPATHS} also sets this
|
||
value. Default is @file{/tmp /usr/tmp /var/tmp /afs}. The paths are
|
||
used as regular expressions (with @code{find ... -regex}, so you need
|
||
to specify these paths in the same way that @code{find} will encounter
|
||
them. This means for example that the paths must not include trailing
|
||
slashes.
|
||
|
||
@item --prunefs='@var{path}@dots{}'
|
||
Filesystems to omit from the database, which would otherwise be
|
||
included. Note that files are pruned when a filesystem is reached;
|
||
Any filesystem mounted under an undesired filesystem will be ignored.
|
||
The environment variable @env{PRUNEFS} also sets this value. Default
|
||
is @file{nfs NFS proc}.
|
||
|
||
@item --output=@var{dbfile}
|
||
The database file to build. The default is system-dependent, but
|
||
when this document was formatted it was @file{@value{LOCATE_DB}}.
|
||
|
||
@item --localuser=@var{user}
|
||
The user to search the non-network directories as, using @code{su}.
|
||
Default is to search the non-network directories as the current user.
|
||
You can also use the environment variable @env{LOCALUSER} to set this user.
|
||
|
||
@item --netuser=@var{user}
|
||
The user to search network directories as, using @code{su}. Default
|
||
@code{user} is @code{daemon}. You can also use the environment variable
|
||
@env{NETUSER} to set this user.
|
||
|
||
@item --dbformat=@var{FORMAT}
|
||
Generate the locate database in format @code{FORMAT}. Supported
|
||
database formats include @code{LOCATE02} (which is the default) and
|
||
@code{slocate}. The @code{slocate} format exists for compatibility
|
||
with @code{slocate}. @xref{Database Formats}, for a detailed
|
||
description of each format.
|
||
|
||
@item --help
|
||
Print a summary of the command line usage and exit.
|
||
@item --version
|
||
Print the version number of @code{updatedb} and exit.
|
||
@end table
|
||
|
||
@node Invoking xargs
|
||
@section Invoking @code{xargs}
|
||
|
||
@example
|
||
xargs @r{[}@var{option}@dots{}@r{]} @r{[}@var{command} @r{[}@var{initial-arguments}@r{]}@r{]}
|
||
@end example
|
||
|
||
@code{xargs} executes the @var{command} - the default is @file{echo} - one or
|
||
more times with any @var{initial-arguments} followed by arguments read from
|
||
standard input.
|
||
|
||
@code{xargs} exits with the following status:
|
||
|
||
@table @asis
|
||
@item 0
|
||
if it succeeds
|
||
@item 123
|
||
if any invocation of the command exited with status 1-125
|
||
@item 124
|
||
if the command exited with status 255
|
||
@item 125
|
||
if the command is killed by a signal
|
||
@item 126
|
||
if the command cannot be run
|
||
@item 127
|
||
if the command is not found
|
||
@item 1
|
||
if some other error occurred.
|
||
@end table
|
||
|
||
Exit codes greater than 128 are used by the shell to indicate that
|
||
a program died due to a fatal signal.
|
||
|
||
|
||
@menu
|
||
* xargs options::
|
||
* Conflicting xargs options::
|
||
* Invoking the shell from xargs::
|
||
@end menu
|
||
|
||
@node xargs options
|
||
@subsection xargs options
|
||
|
||
@table @code
|
||
@item --
|
||
@findex option delimiter, --
|
||
Delimit the option list. Later arguments, if any, are treated as operands even
|
||
if they begin with @samp{-}. For example, @samp{xargs -- --help} runs the
|
||
command @samp{--help} (found in @env{PATH}) instead of printing the usage text,
|
||
and @samp{xargs -- --mycommand} runs the command @samp{--mycommand} instead of
|
||
rejecting this as unrecognized option.
|
||
|
||
@item --arg-file@r{=@var{inputfile}}
|
||
@itemx -a @r{@var{inputfile}}
|
||
Read names from the file @var{inputfile} instead of standard input.
|
||
If you use this option, the standard input stream remains unchanged
|
||
when commands are run.
|
||
Otherwise, standard input is redirected from @file{/dev/null}.
|
||
|
||
@item --null
|
||
@itemx -0
|
||
Input file names are terminated by a null character instead of by
|
||
whitespace, and any quotes and backslash characters are not considered
|
||
special (every character is taken literally).
|
||
Disables the end-of-file string, which is treated like any other argument.
|
||
|
||
@item --delimiter @var{delim}
|
||
@itemx -d @var{delim}
|
||
|
||
Input file names are terminated by the specified character @var{delim}
|
||
instead of by whitespace, and any quotes and backslash characters are
|
||
not considered special (every character is taken literally). Disables
|
||
the logical end-of-file marker string, which is treated like any other
|
||
argument.
|
||
|
||
The specified delimiter may be a single character, a C-style character
|
||
escape such as @samp{\n}, or an octal or hexadecimal escape code.
|
||
Octal and hexadecimal escape codes are understood as for the
|
||
@code{printf} command. Multibyte characters are not supported.
|
||
|
||
@item -E @var{eof-str}
|
||
@itemx --eof@r{[}=@var{eof-str}@r{]}
|
||
@itemx -e@r{[}@var{eof-str}@r{]}
|
||
|
||
Set the logical end-of-file marker string to @var{eof-str}. If the
|
||
logical end-of-file marker string occurs as a line of input, the rest of
|
||
the input is ignored. If @var{eof-str} is omitted (@samp{-e}) or blank
|
||
(either @samp{-e} or @samp{-E}), there is no logical end-of-file marker
|
||
string. The @samp{-e} form of this option is deprecated in favour of
|
||
the POSIX-compliant @samp{-E} option, which you should use instead. As
|
||
of GNU @code{xargs} version 4.2.9, the default behaviour of @code{xargs}
|
||
is not to have a logical end-of-file marker string. The POSIX standard
|
||
(IEEE Std 1003.1, 2004 Edition) allows this.
|
||
|
||
The logical end-of-file marker string is not treated specially if the
|
||
@samp{-d} or the @samp{-0} options are in effect. That is, when either
|
||
of these options are in effect, the whole input file will be read even
|
||
if @samp{-E} was used.
|
||
|
||
@item --help
|
||
Print a summary of the options to @code{xargs} and exit.
|
||
|
||
@item -I @var{replace-str}
|
||
@itemx --replace@r{[}=@var{replace-str}@r{]}
|
||
@itemx -i@r{[}@var{replace-str}@r{]}
|
||
Replace occurrences of @var{replace-str} in the initial arguments with
|
||
names read from standard input.
|
||
Also, unquoted blanks do not terminate arguments;
|
||
instead, the input is split at newlines only.
|
||
If @var{replace-str} is omitted (omitting it is allowed only for @samp{-i}
|
||
and @samp{--replace}), it defaults to @samp{@{@}} (like for @samp{find -exec}).
|
||
Implies @samp{-x} and @samp{-L 1}.
|
||
The @samp{-i} option is deprecated in favour of the @samp{-I} option.
|
||
|
||
@item -L @var{max-lines}
|
||
@itemx --max-lines@r{[}=@var{max-lines}@r{]}
|
||
@itemx -l@r{[}@var{max-lines}@r{]}
|
||
Use at most @var{max-lines} non-blank input lines per command line.
|
||
For @samp{-l}, @var{max-lines} defaults to 1 if omitted. For
|
||
@samp{-L}, the argument is mandatory. Trailing blanks cause an input
|
||
line to be logically continued on the next input line, for the purpose
|
||
of counting the lines. Implies @samp{-x}. The @samp{-l} form of this
|
||
option is deprecated in favour of the POSIX-compliant @samp{-L}
|
||
option.
|
||
|
||
@item --max-args=@var{max-args}
|
||
@itemx -n @var{max-args}
|
||
Use at most @var{max-args} arguments per command line. Fewer than
|
||
@var{max-args} arguments will be used if the size (see the @samp{-s}
|
||
option) is exceeded, unless the @samp{-x} option is given, in which
|
||
case @code{xargs} will exit.
|
||
|
||
@item --open-tty
|
||
@itemx -o
|
||
Reopen standard input as @file{/dev/tty} in the child process before executing
|
||
the command, thus allowing that command to be associated to the terminal
|
||
while @code{xargs} reads from a different stream, e.g. from a pipe.
|
||
This is useful if you want @code{xargs} to run an interactive application.
|
||
@example
|
||
grep -lZ PATTERN * | xargs -0o -n1 vi
|
||
@end example
|
||
|
||
|
||
@item --interactive
|
||
@itemx -p
|
||
Prompt the user about whether to run each command line and read a line
|
||
from the terminal. Only run the command line if the response starts
|
||
with @samp{y} or @samp{Y}. Implies @samp{-t}.
|
||
|
||
@item --no-run-if-empty
|
||
@itemx -r
|
||
If the standard input is completely empty, do not run the
|
||
command. By default, the command is run once even if there is no
|
||
input.
|
||
|
||
@item --max-chars=@var{max-chars}
|
||
@itemx -s @var{max-chars}
|
||
Use at most @var{max-chars} characters per command line, including the
|
||
command, initial arguments and any terminating nulls at the ends of
|
||
the argument strings.
|
||
|
||
@item --show-limits
|
||
Display the limits on the command-line length which are imposed by the
|
||
operating system, @code{xargs}' choice of buffer size and the
|
||
@samp{-s} option. Pipe the input from @file{/dev/null} (and perhaps
|
||
specify @samp{--no-run-if-empty}) if you don't want @code{xargs} to do
|
||
anything.
|
||
|
||
@item --verbose
|
||
@itemx -t
|
||
Print the command line on the standard error output before executing
|
||
it.
|
||
|
||
@item --version
|
||
Print the version number of @code{xargs} and exit.
|
||
|
||
@item --exit
|
||
@itemx -x
|
||
Exit if the size (see the @samp{-s} option) is exceeded.
|
||
|
||
|
||
@item --max-procs=@var{max-procs}
|
||
@itemx -P @var{max-procs}
|
||
Run simultaneously up to @var{max-procs} processes at once; the default is 1. If
|
||
@var{max-procs} is 0, @code{xargs} will run as many processes as
|
||
possible simultaneously. @xref{Controlling Parallelism}, for
|
||
information on dynamically controlling parallelism.
|
||
|
||
@item --process-slot-var=@var{environment-variable-name}
|
||
Set the environment variable @env{environment-variable-name} to a
|
||
unique value in each running child process. Each value is a decimal
|
||
integer. Values are reused once child processes exit. This can be
|
||
used in a rudimentary load distribution scheme, for example.
|
||
@end table
|
||
|
||
@node Conflicting xargs options
|
||
@subsection Conflicting options
|
||
The options @samp{--max-lines} (@samp{-L}, @samp{-l}), @samp{--replace}
|
||
(@samp{-I}, @samp{-i}) and @samp{--max-args} (@samp{-n}) are mutually exclusive.
|
||
|
||
If some of them are specified at the same time, then @code{xargs} will
|
||
generally use the option specified last on the command line, i.e., it will
|
||
reset the value of the offending option (given before) to its default value.
|
||
Additionally, @code{xargs} will issue a warning diagnostic on standard error.
|
||
|
||
@example
|
||
$ seq 4 | xargs -L2 -n3
|
||
xargs: warning: options --max-lines and --max-args/-n are \
|
||
mutually exclusive, ignoring previous --max-lines value
|
||
1 2 3
|
||
4
|
||
@end example
|
||
|
||
The exception to this rule is that the special @var{max-args} value @var{1} is
|
||
ignored after the @samp{--replace} option and its short-option aliases @samp{-I}
|
||
and @samp{-i}, because it would not actually conflict.
|
||
@example
|
||
$ seq 2 | xargs --replace -n1 echo a-@{@}-b
|
||
a-1-b
|
||
a-2-b
|
||
@end example
|
||
|
||
@node Invoking the shell from xargs
|
||
@subsection Invoking the shell from xargs
|
||
|
||
Normally, @code{xargs} will exec the command you specified directly,
|
||
without invoking a shell. This is normally the behaviour one would
|
||
want. It's somewhat more efficient and avoids problems with shell
|
||
metacharacters, for example. However, sometimes it is necessary to
|
||
manipulate the environment of a command before it is run, in a way
|
||
that @code{xargs} does not directly support.
|
||
|
||
Invoking a shell from @code{xargs} is a good way of performing such
|
||
manipulations. However, some care must be taken to prevent problems,
|
||
for example unwanted interpretation of shell metacharacters.
|
||
|
||
This command moves a set of files into an archive directory:
|
||
|
||
@example
|
||
find /foo -maxdepth 1 -atime +366 -exec mv @{@} /archive \;
|
||
@end example
|
||
|
||
However, this will only move one file at a time. We cannot in this
|
||
case use @code{-exec ... +} because the matched file names are added
|
||
at the end of the command line, while the destination directory would
|
||
need to be specified last. We also can't use @code{xargs} in the
|
||
obvious way for the same reason. One way of working around this
|
||
problem is to make use of the special properties of GNU @code{mv}; it
|
||
has a @code{-t} option that allows specifying the target directory
|
||
before the list of files to be moved. However, while this
|
||
technique works for GNU @code{mv}, it doesn't solve the more general
|
||
problem.
|
||
|
||
Here is a more general technique for solving this problem:
|
||
|
||
@example
|
||
find /foo -maxdepth 1 -atime +366 -print0 |
|
||
xargs -r0 sh -c 'mv "$@@" /archive' move
|
||
@end example
|
||
|
||
Here, a shell is being invoked. There are two shell instances to think
|
||
about. The first is the shell which launches the @code{xargs} command
|
||
(this might be the shell into which you are typing, for example). The
|
||
second is the shell launched by @code{xargs} (in fact it will probably
|
||
launch several, one after the other, depending on how many files need to
|
||
be archived). We'll refer to this second shell as a subshell.
|
||
|
||
Our example uses the @code{-c} option of @code{sh}. Its argument is a
|
||
shell command to be executed by the subshell. Along with the rest of
|
||
that command, the $@@ is enclosed by single quotes to make sure it is
|
||
passed to the subshell without being expanded by the parent shell. It
|
||
is also enclosed with double quotes so that the subshell will expand
|
||
@code{$@@} correctly even if one of the file names contains a space or
|
||
newline.
|
||
|
||
The subshell will use any non-option arguments as positional
|
||
parameters (that is, in the expansion of @code{$@@}). Because
|
||
@code{xargs} launches the @code{sh -c} subshell with a list of files,
|
||
those files will end up as the expansion of @code{$@@}.
|
||
|
||
You may also notice the @samp{move} at the end of the command line.
|
||
This is used as the value of @code{$0} by the subshell. We include it
|
||
because otherwise the name of the first file to be moved would be used
|
||
instead. If that happened it would not be included in the subshell's
|
||
expansion of @code{$@@}, and so it wouldn't actually get moved.
|
||
|
||
|
||
Another reason to use the @code{sh -c} construct could be to
|
||
perform redirection:
|
||
|
||
@example
|
||
find /usr/include -name '*.h' | xargs grep -wl mode_t |
|
||
xargs -r sh -c 'exec emacs "$@@" < /dev/tty' Emacs
|
||
@end example
|
||
|
||
Notice that we use the shell builtin @code{exec} here. That's simply
|
||
because the subshell needs to do nothing once Emacs has been invoked.
|
||
Therefore instead of keeping a @code{sh} process around for no reason,
|
||
we just arrange for the subshell to exec Emacs, saving an extra
|
||
process creation.
|
||
|
||
Although GNU @code{xargs} and the implementations on some other platforms
|
||
like BSD support the @samp{-o} option to achieve the same, the above is
|
||
the portable way to redirect standard input to @file{/dev/tty}.
|
||
|
||
Sometimes, though, it can be helpful to keep the shell process around:
|
||
|
||
@example
|
||
find /foo -maxdepth 1 -atime +366 -print0 |
|
||
xargs -r0 sh -c 'mv "$@@" /archive || exit 255' move
|
||
@end example
|
||
|
||
Here, the shell will exit with status 255 if any @code{mv} failed.
|
||
This causes @code{xargs} to stop immediately.
|
||
|
||
|
||
@node Regular Expressions
|
||
@section Regular Expressions
|
||
|
||
The @samp{-regex} and @samp{-iregex} tests of @code{find} allow
|
||
matching by regular expression, as does the @samp{--regex} option of
|
||
@code{locate}.
|
||
|
||
Your locale configuration affects how regular expressions are
|
||
interpreted. @xref{Environment Variables}, for a description of how
|
||
your locale setup affects the interpretation of regular expressions.
|
||
|
||
There are also several different types of regular expression, and
|
||
these are interpreted differently. Normally, the type of regular
|
||
expression used by @code{find} and @code{locate} is almost identical to
|
||
that used in GNU Emacs. The single difference is that in @code{find}
|
||
and @code{locate}, a @samp{.} will match a newline character.
|
||
|
||
Both @code{find} and @code{locate} provide an option which allows
|
||
selecting an alternative regular expression syntax; for @code{find}
|
||
this is the @samp{-regextype} option, and for @code{locate} this is
|
||
the @samp{--regextype} option.
|
||
|
||
These options take a single argument, which indicates the specific
|
||
regular expression syntax and behaviour that should be used. This
|
||
should be one of the following:
|
||
|
||
@include regexprops.texi
|
||
|
||
@node Environment Variables
|
||
@section Environment Variables
|
||
@c TODO: check the variable index still contains references to these
|
||
@table @code
|
||
@item LANG
|
||
Provides a default value for the internationalisation variables that
|
||
are unset or null.
|
||
|
||
@item LC_ALL
|
||
If set to a non-empty string value, override the values of all the
|
||
other internationalisation variables.
|
||
|
||
@item LC_COLLATE
|
||
The POSIX standard specifies that this variable affects the pattern
|
||
matching to be used for the @samp{-name} option. GNU find uses the
|
||
GNU version of the @code{fnmatch} library function.
|
||
|
||
This variable also affects the interpretation of the response to
|
||
@code{-ok}; while the @env{LC_MESSAGES} variable selects the actual
|
||
pattern used to interpret the response to @code{-ok}, the interpretation
|
||
of any bracket expressions in the pattern will be affected by the
|
||
@env{LC_COLLATE} variable.
|
||
|
||
@item LC_CTYPE
|
||
This variable affects the treatment of character classes used in
|
||
regular expression and with
|
||
the @samp{-name} test, if the @code{fnmatch} function supports this.
|
||
|
||
This variable also affects the interpretation of any character classes
|
||
in the regular expressions used to interpret the response to the
|
||
prompt issued by @code{-ok}. The @env{LC_CTYPE} environment variable will
|
||
also affect which characters are considered to be unprintable when
|
||
filenames are printed (@pxref{Unusual Characters in File Names}).
|
||
|
||
@item LC_MESSAGES
|
||
Determines the locale to be used for internationalised messages,
|
||
including the interpretation of the response to the prompt made by the
|
||
@code{-ok} action.
|
||
|
||
@item NLSPATH
|
||
Determines the location of the internationalisation message catalogues.
|
||
|
||
@item PATH
|
||
Affects the directories which are searched to find the executables
|
||
invoked by @samp{-exec}, @samp{-execdir} @samp{-ok} and @samp{-okdir}.
|
||
If the @env{PATH} environment variable includes the current directory
|
||
(by explicitly including @samp{.} or by having an empty element), and
|
||
the find command line includes @samp{-execdir} or @samp{-okdir},
|
||
@code{find} will refuse to run. @xref{Security Considerations}, for a
|
||
more detailed discussion of security matters.
|
||
|
||
@item POSIXLY_CORRECT
|
||
Determines the block size used by @samp{-ls} and @samp{-fls}. If
|
||
@env{POSIXLY_CORRECT} is set, blocks are units of 512 bytes. Otherwise
|
||
they are units of 1024 bytes.
|
||
|
||
Setting this variable also turns off warning messages (that is, implies
|
||
@samp{-nowarn}) by default, because POSIX requires that apart from
|
||
the output for @samp{-ok}, all messages printed on standard error are
|
||
diagnostics and must result in a non-zero exit status.
|
||
|
||
When @env{POSIXLY_CORRECT} is set, the response to the prompt made by the
|
||
@code{-ok} action is interpreted according to the system's message
|
||
catalogue, as opposed to according to @code{find}'s own message
|
||
translations.
|
||
|
||
@item TZ
|
||
Affects the time zone used for some of the time-related format
|
||
directives of @samp{-printf} and @samp{-fprintf}.
|
||
@end table
|
||
|
||
|
||
|
||
@node Common Tasks
|
||
@chapter Common Tasks
|
||
|
||
The sections that follow contain some extended examples that both give
|
||
a good idea of the power of these programs, and show you how to solve
|
||
common real-world problems.
|
||
|
||
@menu
|
||
* Viewing And Editing::
|
||
* Archiving::
|
||
* Cleaning Up::
|
||
* Strange File Names::
|
||
* Fixing Permissions::
|
||
* Classifying Files::
|
||
@end menu
|
||
|
||
@node Viewing And Editing
|
||
@section Viewing And Editing
|
||
|
||
To view a list of files that meet certain criteria, simply run your
|
||
file viewing program with the file names as arguments. Shells
|
||
substitute a command enclosed in backquotes with its output, so the
|
||
whole command looks like this:
|
||
|
||
@example
|
||
less `find /usr/include -name '*.h' | xargs grep -l mode_t`
|
||
@end example
|
||
|
||
@noindent
|
||
You can edit those files by giving an editor name instead of a file
|
||
viewing program:
|
||
|
||
@example
|
||
emacs `find /usr/include -name '*.h' | xargs grep -l mode_t`
|
||
@end example
|
||
|
||
Because there is a limit to the length of any individual command line,
|
||
there is a limit to the number of files that can be handled in this way.
|
||
We can get around this difficulty by using @code{xargs} like this:
|
||
|
||
@example
|
||
find /usr/include -name '*.h' | xargs grep -l mode_t > todo
|
||
xargs --arg-file=todo emacs
|
||
@end example
|
||
|
||
Here, @code{xargs} will run @code{emacs} as many times as necessary to
|
||
visit all of the files listed in the file @file{todo}. Generating a
|
||
temporary file is not always convenient, though. This command does
|
||
much the same thing without needing one:
|
||
|
||
@example
|
||
find /usr/include -name '*.h' | xargs grep -l mode_t |
|
||
xargs sh -c 'emacs "$@@" < /dev/tty' Emacs
|
||
@end example
|
||
|
||
The example above illustrates a useful trick; Using @code{sh -c} you
|
||
can invoke a shell command from @code{xargs}. The @code{$@@} in the
|
||
command line is expanded by the shell to a list of arguments as
|
||
provided by @code{xargs}. The single quotes in the command line
|
||
protect the @code{$@@} against expansion by your interactive shell
|
||
(which will normally have no arguments and thus expand @code{$@@} to
|
||
nothing). The capitalised @samp{Emacs} on the command line is used as
|
||
@code{$0} by the shell that @code{xargs} launches.
|
||
|
||
Please note that the implementations in GNU @code{xargs} and at least BSD
|
||
support the @samp{-o} option as extension to achieve the same, while the
|
||
above is the portable way to redirect standard input to @file{/dev/tty}.
|
||
|
||
@node Archiving
|
||
@section Archiving
|
||
|
||
You can pass a list of files produced by @code{find} to a file
|
||
archiving program. GNU @code{tar} and @code{cpio} can both read lists
|
||
of file names from the standard input -- either delimited by nulls (the
|
||
safe way) or by blanks (the lazy, risky default way). To use
|
||
null-delimited names, give them the @samp{--null} option. You can
|
||
store a file archive in a file, write it on a tape, or send it over a
|
||
network to extract on another machine.
|
||
|
||
One common use of @code{find} to archive files is to send a list of
|
||
the files in a directory tree to @code{cpio}. Use @samp{-depth} so if
|
||
a directory does not have write permission for its owner, its contents
|
||
can still be restored from the archive since the directory's
|
||
permissions are restored after its contents. Here is an example of
|
||
doing this using @code{cpio}; you could use a more complex @code{find}
|
||
expression to archive only certain files.
|
||
|
||
@example
|
||
find . -depth -print0 |
|
||
cpio --create --null --format=crc --file=/dev/nrst0
|
||
@end example
|
||
|
||
You could restore that archive using this command:
|
||
|
||
@example
|
||
cpio --extract --null --make-dir --unconditional \
|
||
--preserve --file=/dev/nrst0
|
||
@end example
|
||
|
||
Here are the commands to do the same things using @code{tar}:
|
||
|
||
@example
|
||
find . -depth -print0 |
|
||
tar --create --null --files-from=- --file=/dev/nrst0
|
||
|
||
tar --extract --null --preserve-perm --same-owner \
|
||
--file=/dev/nrst0
|
||
@end example
|
||
|
||
@c Idea from Rick Sladkey.
|
||
Here is an example of copying a directory from one machine to another:
|
||
|
||
@example
|
||
find . -depth -print0 | cpio -0o -Hnewc |
|
||
rsh @var{other-machine} "cd `pwd` && cpio -i0dum"
|
||
@end example
|
||
|
||
@node Cleaning Up
|
||
@section Cleaning Up
|
||
|
||
@c Idea from Jim Meyering.
|
||
This section gives examples of removing unwanted files in various
|
||
situations. Here is a command to remove the CVS backup files created
|
||
when an update requires a merge:
|
||
|
||
@example
|
||
find . -name '.#*' -print0 | xargs -0r rm -f
|
||
@end example
|
||
|
||
If your @code{find} command removes directories, you may find that
|
||
you get a spurious error message when @code{find} tries to recurse
|
||
into a directory that has now been removed. Using the @samp{-depth}
|
||
option will normally resolve this problem.
|
||
|
||
@c What does the following sentence mean? Why is -delete safer? --kasal
|
||
@c The command above works, but the following is safer:
|
||
|
||
It is also possible to use the @samp{-delete} action:
|
||
|
||
@example
|
||
find . -depth -name '.#*' -delete
|
||
@end example
|
||
|
||
@c Idea from Franc,ois Pinard.
|
||
You can run this command to clean out your clutter in @file{/tmp}.
|
||
You might place it in the file your shell runs when you log out
|
||
(@file{.bash_logout}, @file{.logout}, or @file{.zlogout}, depending on
|
||
which shell you use).
|
||
|
||
@example
|
||
find /tmp -depth -user "$LOGNAME" -type f -delete
|
||
@end example
|
||
|
||
@c Idea from Noah Friedman.
|
||
To remove old Emacs backup and auto-save files, you can use a command
|
||
like the following. It is especially important in this case to use
|
||
null-terminated file names because Emacs packages like the VM mailer
|
||
often create temporary file names with spaces in them, like
|
||
@file{#reply to David J. MacKenzie<1>#}.
|
||
|
||
@example
|
||
find ~ \( -name '*~' -o -name '#*#' \) -print0 |
|
||
xargs --no-run-if-empty --null rm -vf
|
||
@end example
|
||
|
||
Removing old files from @file{/tmp} is commonly done from @code{cron}:
|
||
|
||
@c Idea from Kaveh Ghazi.
|
||
@example
|
||
find /tmp /var/tmp -depth -not -type d -mtime +3 -delete
|
||
find /tmp /var/tmp -depth -mindepth 1 -type d -empty -delete
|
||
@end example
|
||
|
||
The second @code{find} command above cleans out empty directories
|
||
depth-first (@samp{-delete} implies @samp{-depth} anyway), hoping that
|
||
the parents become empty and can be removed too. It uses
|
||
@samp{-mindepth} to avoid removing @file{/tmp} itself if it becomes
|
||
totally empty.
|
||
|
||
|
||
Lastly, an example of a program that almost certainly does not do what
|
||
the user intended:
|
||
|
||
@c inspired by Savannah bug #20865 (Bruno De Fraine)
|
||
@example
|
||
find dirname -delete -name quux
|
||
@end example
|
||
|
||
If the user hoped to delete only files named @file{quux} they will get
|
||
an unpleasant surprise; this command will attempt to delete everything
|
||
at or below the starting point @file{dirname}. This is because
|
||
@code{find} evaluates the items on the command line as an expression.
|
||
The @code{find} program will normally execute an action if the
|
||
preceding action succeeds. Here, there is no action or test before
|
||
the @samp{-delete} so it will always be executed. The @samp{-name
|
||
quux} test will be performed for files we successfully deleted, but
|
||
that test has no effect since @samp{-delete} also disables the default
|
||
@samp{-print} operation. So the above example will probably delete a
|
||
lot of files the user didn't want to delete.
|
||
|
||
This command is also likely to do something you did not intend:
|
||
@example
|
||
find dirname -path dirname/foo -prune -o -delete
|
||
@end example
|
||
|
||
Because @samp{-delete} turns on @samp{-depth}, the @samp{-prune}
|
||
action has no effect and files in @file{dirname/foo} will be deleted
|
||
too.
|
||
|
||
|
||
@node Strange File Names
|
||
@section Strange File Names
|
||
|
||
@c Idea from:
|
||
@c From: tmatimar@isgtec.com (Ted Timar)
|
||
@c Newsgroups: comp.unix.questions,comp.unix.shell,comp.answers,news.answers
|
||
@c Subject: Unix - Frequently Asked Questions (2/7) [Frequent posting]
|
||
@c Subject: How do I remove a file with funny characters in the filename ?
|
||
@c Date: Thu Mar 18 17:16:55 EST 1993
|
||
@code{find} can help you remove or rename a file with strange
|
||
characters in its name. People are sometimes stymied by files whose
|
||
names contain characters such as spaces, tabs, control characters, or
|
||
characters with the high bit set. The simplest way to remove such
|
||
files is:
|
||
|
||
@example
|
||
rm -i @var{some*pattern*that*matches*the*problem*file}
|
||
@end example
|
||
|
||
@code{rm} asks you whether to remove each file matching the given
|
||
pattern. If you are using an old shell, this approach might not work
|
||
if the file name contains a character with the high bit set; the shell
|
||
may strip it off. A more reliable way is:
|
||
|
||
@example
|
||
find . -maxdepth 1 @var{tests} -okdir rm '@{@}' \;
|
||
@end example
|
||
|
||
@noindent
|
||
where @var{tests} uniquely identify the file. The @samp{-maxdepth 1}
|
||
option prevents @code{find} from wasting time searching for the file
|
||
in any subdirectories; if there are no subdirectories, you may omit
|
||
it. A good way to uniquely identify the problem file is to figure out
|
||
its inode number; use
|
||
|
||
@example
|
||
ls -i
|
||
@end example
|
||
|
||
Suppose you have a file whose name contains control characters, and
|
||
you have found that its inode number is 12345. This command prompts
|
||
you for whether to remove it:
|
||
|
||
@example
|
||
find . -maxdepth 1 -inum 12345 -okdir rm -f '@{@}' \;
|
||
@end example
|
||
|
||
If you don't want to be asked, perhaps because the file name may
|
||
contain a strange character sequence that will mess up your screen
|
||
when printed, then use @samp{-execdir} instead of @samp{-okdir}.
|
||
|
||
If you want to rename the file instead, you can use @code{mv} instead
|
||
of @code{rm}:
|
||
|
||
@example
|
||
find . -maxdepth 1 -inum 12345 -okdir mv '@{@}' @var{new-file-name} \;
|
||
@end example
|
||
|
||
@node Fixing Permissions
|
||
@section Fixing Permissions
|
||
|
||
Suppose you want to make sure that everyone can write to the
|
||
directories in a certain directory tree. Here is a way to find
|
||
directories lacking either user or group write permission (or both),
|
||
and fix their permissions:
|
||
|
||
@example
|
||
find . -type d -not -perm -ug=w | xargs chmod ug+w
|
||
@end example
|
||
|
||
@noindent
|
||
You could also reverse the operations, if you want to make sure that
|
||
directories do @emph{not} have world write permission.
|
||
|
||
@node Classifying Files
|
||
@section Classifying Files
|
||
|
||
@c Idea from:
|
||
@c From: martin@mwtech.UUCP (Martin Weitzel)
|
||
@c Newsgroups: comp.unix.wizards,comp.unix.questions
|
||
@c Subject: Advanced usage of 'find' (Re: Unix security automating script)
|
||
@c Date: 22 Mar 90 15:05:19 GMT
|
||
If you want to classify a set of files into several groups based on
|
||
different criteria, you can use the comma operator to perform multiple
|
||
independent tests on the files. Here is an example:
|
||
|
||
@example
|
||
find / -type d \( -perm -o=w -fprint allwrite , \
|
||
-perm -o=x -fprint allexec \)
|
||
|
||
echo "Directories that can be written to by everyone:"
|
||
cat allwrite
|
||
echo ""
|
||
echo "Directories with search permissions for everyone:"
|
||
cat allexec
|
||
@end example
|
||
|
||
@code{find} has only to make one scan through the directory tree
|
||
(which is one of the most time consuming parts of its work).
|
||
|
||
@node Worked Examples
|
||
@chapter Worked Examples
|
||
|
||
The tools in the findutils package, and in particular @code{find},
|
||
have a large number of options. This means that quite often,
|
||
there is more than one way to do things. Some of the options
|
||
and facilities only exist for compatibility with other tools, and
|
||
findutils provides improved ways of doing things.
|
||
|
||
This chapter describes a number of useful tasks that are commonly
|
||
performed, and compares the different ways of achieving them.
|
||
|
||
@menu
|
||
* Deleting Files::
|
||
* Copying A Subset of Files::
|
||
* Updating A Timestamp File::
|
||
* Finding the Shallowest Instance::
|
||
@end menu
|
||
|
||
@node Deleting Files
|
||
@section Deleting Files
|
||
|
||
One of the most common tasks that @code{find} is used for is locating
|
||
files that can be deleted. This might include:
|
||
|
||
@itemize
|
||
@item
|
||
Files last modified more than 3 years ago which haven't been accessed
|
||
for at least 2 years
|
||
@item
|
||
Files belonging to a certain user
|
||
@item
|
||
Temporary files which are no longer required
|
||
@end itemize
|
||
|
||
This example concentrates on the actual deletion task rather than on
|
||
sophisticated ways of locating the files that need to be deleted.
|
||
We'll assume that the files we want to delete are old files underneath
|
||
@file{/var/tmp/stuff}.
|
||
|
||
@subsection The Traditional Way
|
||
|
||
The traditional way to delete files in @file{/var/tmp/stuff} that have
|
||
not been modified in over 90 days would have been:
|
||
|
||
@smallexample
|
||
find /var/tmp/stuff -mtime +90 -exec /bin/rm @{@} \;
|
||
@end smallexample
|
||
|
||
The above command uses @samp{-exec} to run the @code{/bin/rm} command
|
||
to remove each file. This approach works and in fact would have
|
||
worked in Version 7 Unix in 1979. However, there are a number of
|
||
problems with this approach.
|
||
|
||
|
||
The most obvious problem with the approach above is that it causes
|
||
@code{find} to fork every time it finds a file that needs to delete,
|
||
and the child process then has to use the @code{exec} system call to
|
||
launch @code{/bin/rm}. All this is quite inefficient. If we are
|
||
going to use @code{/bin/rm} to do this job, it is better to make it
|
||
delete more than one file at a time.
|
||
|
||
The most obvious way of doing this is to use the shell's command
|
||
expansion feature:
|
||
|
||
@smallexample
|
||
/bin/rm `find /var/tmp/stuff -mtime +90 -print`
|
||
@end smallexample
|
||
or you could use the more modern form
|
||
@smallexample
|
||
/bin/rm $(find /var/tmp/stuff -mtime +90 -print)
|
||
@end smallexample
|
||
|
||
The commands above are much more efficient than the first attempt.
|
||
However, there is a problem with them. The shell has a maximum
|
||
command length which is imposed by the operating system (the actual
|
||
limit varies between systems). This means that while the command
|
||
expansion technique will usually work, it will suddenly fail when
|
||
there are lots of files to delete. Since the task is to delete
|
||
unwanted files, this is precisely the time we don't want things to go
|
||
wrong.
|
||
|
||
There is also a second problem with this method. We will discuss that
|
||
below.
|
||
|
||
@subsection Making Use of @code{xargs}
|
||
|
||
So, is there a way to be more efficient in the use of @code{fork()}
|
||
and @code{exec()} without running up against this limit?
|
||
Yes, we can be almost optimally efficient by making use
|
||
of the @code{xargs} command. The @code{xargs} command reads arguments
|
||
from its standard input and builds them into command lines. We can
|
||
use it like this:
|
||
|
||
@smallexample
|
||
find /var/tmp/stuff -mtime +90 -print | xargs /bin/rm
|
||
@end smallexample
|
||
|
||
For example if the files found by @code{find} are
|
||
@file{/var/tmp/stuff/A},
|
||
@file{/var/tmp/stuff/B} and
|
||
@file{/var/tmp/stuff/C} then @code{xargs} might issue the commands
|
||
|
||
@smallexample
|
||
/bin/rm /var/tmp/stuff/A /var/tmp/stuff/B
|
||
/bin/rm /var/tmp/stuff/C
|
||
@end smallexample
|
||
|
||
The above assumes that @code{xargs} has a very small maximum command
|
||
line length. The real limit is much larger but the idea is that
|
||
@code{xargs} will run @code{/bin/rm} as many times as necessary to get
|
||
the job done, given the limits on command line length.
|
||
|
||
This usage of @code{xargs} is pretty efficient, and the @code{xargs}
|
||
command is widely implemented (all modern versions of Unix offer it).
|
||
So far then, the news is all good. However, there is bad news too.
|
||
|
||
@subsection Unusual characters in filenames
|
||
|
||
Unix-like systems allow any characters to appear in file names with
|
||
the exception of the ASCII NUL character and the slash.
|
||
Slashes can occur in path names (as the directory separator) but
|
||
not in the names of actual directory entries. This means that the
|
||
list of files that @code{xargs} reads could in fact contain white space
|
||
characters -- spaces, tabs and newline characters. Since by default,
|
||
@code{xargs} assumes that the list of files it is reading uses white
|
||
space as an argument separator, it cannot correctly handle the case
|
||
where a filename actually includes white space. This makes the
|
||
default behaviour of @code{xargs} almost useless for handling
|
||
arbitrary data.
|
||
|
||
To solve this problem, GNU findutils introduced the @samp{-print0}
|
||
action for @code{find}. This uses the ASCII NUL character to separate
|
||
the entries in the file list that it produces. This is the ideal
|
||
choice of separator since it is the only character that cannot appear
|
||
within a path name. The @samp{-0} option to @code{xargs} makes it
|
||
assume that arguments are separated with ASCII NUL instead of white
|
||
space. It also turns off another misfeature in the default behaviour
|
||
of @code{xargs}, which is that it pays attention to quote characters
|
||
in its input. Some versions of @code{xargs} also terminate when they
|
||
see a lone @samp{_} in the input, but GNU @code{find} no longer does
|
||
that (since it has become an optional behaviour in the Unix standard).
|
||
|
||
So, putting @code{find -print0} together with @code{xargs -0} we get
|
||
this command:
|
||
|
||
@smallexample
|
||
find /var/tmp/stuff -mtime +90 -print0 | xargs -0 /bin/rm
|
||
@end smallexample
|
||
|
||
The result is an efficient way of proceeding that correctly handles
|
||
all the possible characters that could appear in the list of files to
|
||
delete. This is good news. However, there is, as I'm sure you're
|
||
expecting, also more bad news. The problem is that this is not a
|
||
portable construct. Support for @samp{-print0} is not universal.
|
||
|
||
Although some other versions of Unix (notably BSD-derived ones)
|
||
support @samp{-print0}, this is only required in POSIX from Issue 8
|
||
(IEEE Std 1003.1-2024). So, is there a more universal mechanism?
|
||
|
||
@subsection Going back to @code{-exec}
|
||
|
||
There is indeed a more universal mechanism, which is a slight
|
||
modification to the @samp{-exec} action. The normal @samp{-exec}
|
||
action assumes that the command to run is terminated with a semicolon
|
||
(the semicolon normally has to be quoted in order to protect it from
|
||
interpretation as the shell command separator). The SVR4 edition of
|
||
Unix introduced a slight variation, which involves terminating the
|
||
command with @samp{+} instead:
|
||
|
||
@smallexample
|
||
find /var/tmp/stuff -mtime +90 -exec /bin/rm @{@} \+
|
||
@end smallexample
|
||
|
||
The above use of @samp{-exec} causes @code{find} to build up a long
|
||
command line and then issue it. This can be less efficient than some
|
||
uses of @code{xargs}; for example @code{xargs} allows building up
|
||
new command lines while the previous command is still executing, and
|
||
allows specifying a number of commands to run in parallel.
|
||
However, the @code{find @dots{} -exec @dots{} +} construct has the advantage
|
||
of wide portability. GNU findutils did not support @samp{-exec @dots{} +}
|
||
until version 4.2.12; one of the reasons for this is that it already
|
||
had the @samp{-print0} action in any case.
|
||
|
||
|
||
@subsection A more secure version of @code{-exec}
|
||
|
||
The command above seems to be efficient and portable. However,
|
||
within it lurks a security problem. The problem is shared with
|
||
all the commands we've tried in this worked example so far, too. The
|
||
security problem is a race condition; that is, if it is possible for
|
||
somebody to manipulate the filesystem that you are searching while you
|
||
are searching it, it is possible for them to persuade your @code{find}
|
||
command to cause the deletion of a file that you can delete but they
|
||
normally cannot.
|
||
|
||
The problem occurs because the @samp{-exec} action is defined by the
|
||
POSIX standard to invoke its command with the same working directory
|
||
as @code{find} had when it was started. This means that the arguments
|
||
which replace the @{@} include a relative path from @code{find}'s
|
||
starting point down the file that needs to be deleted. For example,
|
||
|
||
@smallexample
|
||
find /var/tmp/stuff -mtime +90 -exec /bin/rm @{@} \+
|
||
@end smallexample
|
||
|
||
might actually issue the command:
|
||
|
||
@smallexample
|
||
/bin/rm /var/tmp/stuff/A /var/tmp/stuff/B /var/tmp/stuff/passwd
|
||
@end smallexample
|
||
|
||
Notice the file @file{/var/tmp/stuff/passwd}. Likewise, the command:
|
||
|
||
@smallexample
|
||
cd /var/tmp && find stuff -mtime +90 -exec /bin/rm @{@} \+
|
||
@end smallexample
|
||
|
||
might actually issue the command:
|
||
|
||
@smallexample
|
||
/bin/rm stuff/A stuff/B stuff/passwd
|
||
@end smallexample
|
||
|
||
If an attacker can rename @file{stuff} to something else (making use
|
||
of their write permissions in @file{/var/tmp}) they can replace it
|
||
with a symbolic link to @file{/etc}. That means that the
|
||
@code{/bin/rm} command will be invoked on @file{/etc/passwd}. If you
|
||
are running your @code{find} command as root, the attacker has just managed
|
||
to delete a vital file. All they needed to do to achieve this was
|
||
replace a subdirectory with a symbolic link at the vital moment.
|
||
|
||
There is however, a simple solution to the problem. This is an action
|
||
which works a lot like @code{-exec} but doesn't need to traverse a
|
||
chain of directories to reach the file that it needs to work on. This
|
||
is the @samp{-execdir} action, which was introduced by the BSD family
|
||
of operating systems. The command,
|
||
|
||
@smallexample
|
||
find /var/tmp/stuff -mtime +90 -execdir /bin/rm @{@} \+
|
||
@end smallexample
|
||
|
||
might delete a set of files by performing these actions:
|
||
|
||
@enumerate
|
||
@item
|
||
Change directory to /var/tmp/stuff/foo
|
||
@item
|
||
Invoke @code{/bin/rm ./file1 ./file2 ./file3}
|
||
@item
|
||
Change directory to /var/tmp/stuff/bar
|
||
@item
|
||
Invoke @code{/bin/rm ./file99 ./file100 ./file101}
|
||
@end enumerate
|
||
|
||
This is a much more secure method. We are no longer exposed to a race
|
||
condition. For many typical uses of @code{find}, this is the best
|
||
strategy. It's reasonably efficient, but the length of the command
|
||
line is limited not just by the operating system limits, but also by
|
||
how many files we actually need to delete from each directory.
|
||
|
||
Is it possible to do any better? In the case of general file
|
||
processing, no. However, in the specific case of deleting files it is
|
||
indeed possible to do better.
|
||
|
||
@subsection Using the @code{-delete} action
|
||
|
||
The most efficient and secure method of solving this problem is to use
|
||
the @samp{-delete} action:
|
||
|
||
@smallexample
|
||
find /var/tmp/stuff -mtime +90 -delete
|
||
@end smallexample
|
||
|
||
This alternative is more efficient than any of the @samp{-exec} or
|
||
@samp{-execdir} actions, since it entirely avoids the overhead of
|
||
forking a new process and using @code{exec} to run @code{/bin/rm}. It
|
||
is also normally more efficient than @code{xargs} for the same
|
||
reason. The file deletion is performed from the directory containing
|
||
the entry to be deleted, so the @samp{-delete} action has the same
|
||
security advantages as the @samp{-execdir} action has.
|
||
|
||
The @samp{-delete} action was introduced by the BSD family of
|
||
operating systems.
|
||
|
||
@subsection Improving things still further
|
||
|
||
Is it possible to improve things still further? Not without either
|
||
modifying the system library to the operating system or having more specific
|
||
knowledge of the layout of the filesystem and disk I/O subsystem, or
|
||
both.
|
||
|
||
The @code{find} command traverses the filesystem, reading
|
||
directories. It then issues a separate system call for each file to
|
||
be deleted. If we could modify the operating system, there are
|
||
potential gains that could be made:
|
||
|
||
@itemize
|
||
@item
|
||
We could have a system call to which we pass more than one filename
|
||
for deletion
|
||
@item
|
||
Alternatively, we could pass in a list of inode numbers (on GNU/Linux
|
||
systems, @code{readdir()} also returns the inode number of each
|
||
directory entry) to be deleted.
|
||
@end itemize
|
||
|
||
The above possibilities sound interesting, but from the kernel's point
|
||
of view it is difficult to enforce standard Unix access controls for
|
||
such processing by inode number. Such a facility would probably
|
||
need to be restricted to the superuser.
|
||
|
||
Another way of improving performance would be to increase the
|
||
parallelism of the process. For example if the directory hierarchy we
|
||
are searching is actually spread across a number of disks, we might
|
||
somehow be able to arrange for @code{find} to process each disk in
|
||
parallel. In practice GNU @code{find} doesn't have such an intimate
|
||
understanding of the system's filesystem layout and disk I/O
|
||
subsystem.
|
||
|
||
However, since the system administrator can have such an understanding
|
||
they can take advantage of it like so:
|
||
|
||
@smallexample
|
||
find /var/tmp/stuff1 -mtime +90 -delete &
|
||
find /var/tmp/stuff2 -mtime +90 -delete &
|
||
find /var/tmp/stuff3 -mtime +90 -delete &
|
||
find /var/tmp/stuff4 -mtime +90 -delete &
|
||
wait
|
||
@end smallexample
|
||
|
||
In the example above, four separate instances of @code{find} are used
|
||
to search four subdirectories in parallel. The @code{wait} command
|
||
simply waits for all of these to complete. Whether this approach is
|
||
more or less efficient than a single instance of @code{find} depends
|
||
on a number of things:
|
||
|
||
@itemize
|
||
@item
|
||
Are the directories being searched in parallel actually on separate
|
||
disks? If not, this parallel search might just result in a lot of
|
||
disk head movement and so the speed might even be slower.
|
||
@item
|
||
Other activity - are other programs also doing things on those disks?
|
||
@end itemize
|
||
|
||
|
||
@subsection Conclusion
|
||
|
||
The fastest and most secure way to delete files with the help of
|
||
@code{find} is to use @samp{-delete}. Using @code{xargs -0 -P N} can
|
||
also make effective use of the disk, but it is not as secure.
|
||
|
||
In the case where we're doing things other than deleting files, the
|
||
most secure alternative is @samp{-execdir @dots{} +}, but this is not as
|
||
portable as the insecure action @samp{-exec @dots{} +}.
|
||
|
||
The @samp{-delete} action is not completely portable, but the only
|
||
other possibility which is as secure (@samp{-execdir}) is no more
|
||
portable. The most efficient portable alternative is @samp{-exec
|
||
@dots{}+}, but this is insecure and isn't supported by versions of GNU
|
||
findutils prior to 4.2.12.
|
||
|
||
@node Copying A Subset of Files
|
||
@section Copying A Subset of Files
|
||
|
||
Suppose you want to copy some files from @file{/source-dir} to
|
||
@file{/dest-dir}, but there are a small number of files in
|
||
@file{/source-dir} you don't want to copy.
|
||
|
||
One option of course is @code{cp /source-dir /dest-dir} followed by
|
||
deletion of the unwanted material under @file{/dest-dir}. But often
|
||
that can be inconvenient, because for example we would have copied a
|
||
large amount of extraneous material, or because @file{/dest-dir} is
|
||
too small. Naturally there are many other possible reasons why this
|
||
strategy may be unsuitable.
|
||
|
||
So we need to have some way of identifying which files we want to
|
||
copy, and we need to have a way of copying that file list. The second
|
||
part of this condition is met by @code{cpio -p}. Of course, we can
|
||
identify the files we wish to copy by using @code{find}. Here is a
|
||
command that solves our problem:
|
||
|
||
@example
|
||
cd /source-dir
|
||
find . -name '.snapshot' -prune -o \( \! -name '*~' -print0 \) |
|
||
cpio -pmd0 /dest-dir
|
||
@end example
|
||
|
||
The first part of the @code{find} command here identifies files or
|
||
directories named @file{.snapshot} and tells @code{find} not to
|
||
recurse into them (since they do not need to be copied). The
|
||
combination @code{-name '.snapshot' -prune} yields false for anything
|
||
that didn't get pruned, but it is exactly those files we want to
|
||
copy. Therefore we need to use an OR (@samp{-o}) condition to
|
||
introduce the rest of our expression. The remainder of the expression
|
||
simply arranges for the name of any file not ending in @samp{~} to be
|
||
printed.
|
||
|
||
Using @code{-print0} ensures that white space characters in file names
|
||
do not pose a problem. The @code{cpio} command does the actual work
|
||
of copying files. The program as a whole fails if the @code{cpio}
|
||
program returns nonzero. If the @code{find} command returns non-zero
|
||
on the other hand, the Unix shell will not diagnose a problem (since
|
||
@code{find} is not the last command in the pipeline).
|
||
|
||
|
||
@node Updating A Timestamp File
|
||
@section Updating A Timestamp File
|
||
|
||
Suppose we have a directory full of files which is maintained with a
|
||
set of automated tools; perhaps one set of tools updates them and
|
||
another set of tools uses the result. In this situation, it might be
|
||
useful for the second set of tools to know if the files have recently
|
||
been changed. It might be useful, for example, to have a 'timestamp'
|
||
file which gives the timestamp on the newest file in the collection.
|
||
|
||
We can use @code{find} to achieve this, but there are several
|
||
different ways to do it.
|
||
|
||
@subsection Updating the Timestamp The Wrong Way
|
||
|
||
The obvious but wrong answer is just to use @samp{-newer}:
|
||
|
||
@smallexample
|
||
find subdir -type f -newer timestamp -exec touch -r @{@} timestamp \;
|
||
@end smallexample
|
||
|
||
This does the right sort of thing but has a bug. Suppose that two
|
||
files in the subdirectory have been updated, and that these are called
|
||
@file{file1} and @file{file2}. The command above will update
|
||
@file{timestamp} with the modification time of @file{file1} or that of
|
||
@file{file2}, but we don't know which one. Since the timestamps on
|
||
@file{file1} and @file{file2} will in general be different, this could
|
||
well be the wrong value.
|
||
|
||
One solution to this problem is to modify @code{find} to recheck the
|
||
modification time of @file{timestamp} every time a file is to be
|
||
compared against it, but that will reduce the performance of
|
||
@code{find}.
|
||
|
||
@subsection Using the test utility to compare timestamps
|
||
|
||
The @code{test} command can be used to compare timestamps:
|
||
|
||
@smallexample
|
||
find subdir -type f -exec test @{@} -nt timestamp \; -exec touch -r @{@} timestamp \;
|
||
@end smallexample
|
||
|
||
This will ensure that any changes made to the modification time of
|
||
@file{timestamp} that take place during the execution of @code{find}
|
||
are taken into account. This resolves our earlier problem, but
|
||
unfortunately this runs much more slowly.
|
||
|
||
@subsection A combined approach
|
||
|
||
We can of course still use @samp{-newer} to cut down on the number of
|
||
calls to @code{test}:
|
||
|
||
@smallexample
|
||
find subdir -type f -newer timestamp -and \
|
||
-exec test @{@} -nt timestamp \; -and \
|
||
-exec touch -r @{@} timestamp \;
|
||
@end smallexample
|
||
|
||
Here, the @samp{-newer} test excludes all the files which are
|
||
definitely older than or the same age as
|
||
the timestamp, but all the files which are newer
|
||
than the old value of the timestamp are compared against the current
|
||
updated timestamp.
|
||
|
||
This is indeed faster in general, but the speed difference will depend
|
||
on how many updated files there are.
|
||
|
||
@subsection Using @code{-printf} and @code{sort} to compare timestamps
|
||
|
||
It is possible to use the @samp{-printf} action to abandon the use of
|
||
@code{test} entirely:
|
||
|
||
@smallexample
|
||
newest="$(find subdir -type f -newer timestamp -printf "%A@@:%p\n" |
|
||
env LC_ALL=C sh -c 'sort -n | tail -n1 | cut -d: -f2-' )"
|
||
touch -r "$@{newest:-timestamp@}" timestamp
|
||
@end smallexample
|
||
|
||
The command above works by generating a list of the timestamps and
|
||
names of all the files which are newer than the timestamp. The
|
||
@code{sort}, @code{tail} and @code{cut} commands simply pull out the
|
||
name of the file with the largest timestamp value (that is, the latest
|
||
file). We run those programs (which normally read and write text)
|
||
with with the @env{LC_ALL} environment variable set to @samp{C} in
|
||
order to avoid character encoding problems; file names are not
|
||
guaranteed to have a valid (or consistent) character encoding. The
|
||
@code{touch} command is then used to update the timestamp,
|
||
|
||
The @code{"$@{newest:-timestamp@}"} expression simply expands to the
|
||
value of @code{$newest} if that variable is set, but to
|
||
@file{timestamp} otherwise. This ensures that an argument is always
|
||
given to the @samp{-r} option of the @code{touch} command.
|
||
|
||
@c We used to warn the reader about older versions of Find where %A@
|
||
@c didn't include a fractional part, but since Findutils 4.3.3 was
|
||
@c released in 2007, people are unlikely to have a problem today.
|
||
|
||
@subsection Solving the problem with @code{make}
|
||
|
||
Another tool which often works with timestamps is @code{make}. We can
|
||
use @code{find} to generate a @file{Makefile} file on the fly and then
|
||
use @code{make} to update the timestamps:
|
||
|
||
@smallexample
|
||
makefile="$(mktemp)"
|
||
find subdir \
|
||
-type f \
|
||
\( \! -xtype l \) \
|
||
-newer timestamp \
|
||
-printf "timestamp:: %p\n\ttouch -r %p timestamp\n\n" > "$makefile"
|
||
make -f "$makefile"
|
||
rm -f "$makefile"
|
||
@end smallexample
|
||
|
||
Unfortunately although the solution above is quite elegant, it fails
|
||
to cope with white space within file names, and adjusting it to do so
|
||
would require a rather complex shell script.
|
||
|
||
|
||
@subsection Coping with odd filenames too
|
||
|
||
We can fix both of these problems (looping and problems with white
|
||
space), and do things more efficiently too. The following command
|
||
works with newlines and doesn't need to sort the list of filenames.
|
||
|
||
@smallexample
|
||
find subdir -type f -newer timestamp -printf "%A@@:%p\0" |
|
||
perl -0 newest.pl |
|
||
xargs --no-run-if-empty --null --replace \
|
||
find @{@} -maxdepth 0 -newer timestamp -exec touch -r @{@} timestamp \;
|
||
@end smallexample
|
||
|
||
The first @code{find} command generates a list of files which are
|
||
newer than (and not the same age as) the original timestamp file,
|
||
and prints a list of them with
|
||
their timestamps. The @file{newest.pl} script simply filters out all
|
||
the filenames which have timestamps which are older than whatever the
|
||
newest file is:
|
||
|
||
@smallexample
|
||
@verbatim
|
||
#! /usr/bin/perl -0
|
||
my @newest = ();
|
||
my $latest_stamp = undef;
|
||
while (<>) {
|
||
my ($stamp, $name) = split(/:/);
|
||
if (!defined($latest_stamp) || ($tstamp > $latest_stamp)) {
|
||
$latest_stamp = $stamp;
|
||
@newest = ();
|
||
}
|
||
if ($tstamp >= $latest_stamp) {
|
||
push @newest, $name;
|
||
}
|
||
}
|
||
print join("\0", @newest);
|
||
@end verbatim
|
||
@end smallexample
|
||
|
||
This prints a list of zero or more files, all of which are newer than
|
||
the original timestamp file, and which have the same timestamp as each
|
||
other, to the nearest second. The second @code{find} command takes
|
||
each resulting file one at a time, and if that is newer than the
|
||
timestamp file, the timestamp is updated.
|
||
|
||
@node Finding the Shallowest Instance
|
||
@section Finding the Shallowest Instance
|
||
|
||
Suppose you maintain local copies of sources from various projects,
|
||
each with their own choice of directory organisation and source code
|
||
management (SCM) tool. You need to periodically synchronize each
|
||
project with its upstream tree. As the number local repositories
|
||
grows, so does the work involved in maintaining synchronization. SCM
|
||
utilities typically create some sort of administrative directory: .svn
|
||
for Subversion, CVS for CVS, and so on. These directories can be used
|
||
as a key to search for the bases of the project source trees. Suppose
|
||
we have the following directory structure:
|
||
|
||
@smallexample
|
||
repo/project1/CVS
|
||
repo/gnu/project2/.svn
|
||
repo/gnu/project3/.svn
|
||
repo/gnu/project3/src/.svn
|
||
repo/gnu/project3/doc/.svn
|
||
repo/project4/.git
|
||
@end smallexample
|
||
|
||
One would expect to update each of the @file{projectX} directories,
|
||
but not their subdirectories (src, doc, etc.). To locate the project
|
||
roots, we would need to find the least deeply nested directories
|
||
containing an SCM-related subdirectory. The following command
|
||
discovers those roots efficiently. It is efficient because it avoids
|
||
searching subdirectories inside projects whose SCM directory we
|
||
already found.
|
||
|
||
@smallexample
|
||
find repo/ \
|
||
\( \
|
||
-exec test -d @{@}/.svn \; -or \
|
||
-exec test -d @{@}/.git \; -or \
|
||
-exec test -d @{@}/CVS \; \
|
||
\) -print -prune
|
||
@end smallexample
|
||
|
||
Output:
|
||
|
||
@smallexample
|
||
repo/project1
|
||
repo/gnu/project2
|
||
repo/gnu/project3
|
||
repo/project4
|
||
@end smallexample
|
||
|
||
In this example, @command{test} is used to tell if we are currently
|
||
examining a directory which appears to be a project's root directory
|
||
(because it has an SCM subdirectory). When we find a project root,
|
||
there is no need to search inside it, and @code{-prune} makes sure
|
||
that we descend no further.
|
||
|
||
For large, complex trees like the Linux kernel, this will prevent
|
||
searching a large portion of the structure, saving a good deal of
|
||
time.
|
||
|
||
|
||
@node Security Considerations
|
||
@chapter Security Considerations
|
||
|
||
Security considerations are important if you are using @code{find} or
|
||
@code{xargs} to search for or process files that don't belong to you
|
||
or which other people have control. Security considerations
|
||
relating to @code{locate} may also apply if you have files which you
|
||
do not want others to see.
|
||
|
||
The most severe forms of security problems affecting
|
||
@code{find} and related programs are when third parties bring
|
||
about a situation allowing them to do something
|
||
they would normally not be able to accomplish. This is called @emph{privilege
|
||
elevation}. This might include deleting files they would not normally
|
||
be able to delete. It is common for the operating system to periodically
|
||
invoke @code{find} for self-maintenance purposes. These invocations of
|
||
@code{find} are particularly problematic from a security point of view
|
||
as these are often invoked by the superuser and search the entire
|
||
filesystem hierarchy. Generally, the severity of any associated problem depends
|
||
on what the system is going to do with the files found by @code{find}.
|
||
|
||
@menu
|
||
* Levels of Risk:: What is your level of exposure to security problems?
|
||
* Security Considerations for find:: Security problems with find
|
||
* Security Considerations for xargs:: Security problems with xargs
|
||
* Security Considerations for locate:: Security problems with locate
|
||
* Security Summary:: That was all very complex, what does it boil down to?
|
||
* Further Reading on Security::
|
||
@end menu
|
||
|
||
|
||
@node Levels of Risk
|
||
@section Levels of Risk
|
||
|
||
There are some security risks inherent in the use of @code{find},
|
||
@code{xargs} and (to a lesser extent) @code{locate}. The severity of
|
||
these risks depends on what sort of system you are using:
|
||
|
||
@table @strong
|
||
@item High risk
|
||
Multi-user systems where you do not control (or trust) the other
|
||
users, and on which you execute @code{find}, including areas where
|
||
those other users can manipulate the filesystem (for example beneath
|
||
@file{/home} or @file{/tmp}).
|
||
|
||
@item Medium Risk
|
||
Systems where the actions of other users can create file names chosen
|
||
by them, but to which they don't have access while @code{find} is
|
||
being run. This access might include leaving programs running (shell
|
||
background jobs, @code{at} or @code{cron} tasks, for example). On
|
||
these sorts of systems, carefully written commands (avoiding use of
|
||
@samp{-print} for example) should not expose you to a high degree of
|
||
risk. Most systems fall into this category.
|
||
|
||
@item Low Risk
|
||
Systems to which untrusted parties do not have access, cannot create
|
||
file names of their own choice (even remotely) and which contain no
|
||
security flaws which might enable an untrusted third party to gain
|
||
access. Most systems do not fall into this category because there are
|
||
many ways in which external parties can affect the names of files that
|
||
are created on your system. The system on which I am writing this for
|
||
example automatically downloads software updates from the Internet;
|
||
the names of the files in which these updates exist are chosen by
|
||
third parties@footnote{Of course, I trust these parties to a large
|
||
extent anyway, because I install software provided by them; I choose
|
||
to trust them in this way, and that's a deliberate choice}.
|
||
@end table
|
||
|
||
In the discussion above, ``risk'' denotes the likelihood that someone
|
||
can cause @code{find}, @code{xargs}, @code{locate} or some other
|
||
program which is controlled by them to do something you did not
|
||
intend. The levels of risk suggested do not take any account of the
|
||
consequences of this sort of event. That is, if you operate a ``low
|
||
risk'' type system, but the consequences of a security problem are
|
||
disastrous, then you should still give serious thought to all the
|
||
possible security problems, many of which of course will not be
|
||
discussed here -- this section of the manual is intended to be
|
||
informative but not comprehensive or exhaustive.
|
||
|
||
If you are responsible for the operation of a system where the
|
||
consequences of a security problem could be very important, you should
|
||
do two things:
|
||
|
||
@enumerate
|
||
@item Define a security policy which defines who is allowed to do what
|
||
on your system.
|
||
@item Seek competent advice on how to enforce your policy, detect
|
||
breaches of that policy, and take account of any potential problems
|
||
that might fall outside the scope of your policy.
|
||
@end enumerate
|
||
|
||
|
||
@node Security Considerations for find
|
||
@section Security Considerations for @code{find}
|
||
|
||
|
||
Some of the actions @code{find} might take have a direct effect;
|
||
these include @code{-exec} and @code{-delete}. However, it is also
|
||
common to use @code{-print} explicitly or implicitly, and so if
|
||
@code{find} produces the wrong list of file names, that can also be a
|
||
security problem; consider the case for example where @code{find} is
|
||
producing a list of files to be deleted.
|
||
|
||
We normally assume that the @code{find} command line expresses the
|
||
file selection criteria and actions that the user had in mind -- that
|
||
is, the command line is ``trusted'' data.
|
||
|
||
From a security analysis point of view, the output of @code{find}
|
||
should be correct; that is, the output should contain only the names
|
||
of those files which meet the user's criteria specified on the command
|
||
line. This applies for the @code{-exec} and @code{-delete} actions;
|
||
one can consider these to be part of the output.
|
||
|
||
On the other hand, the contents of the filesystem can be manipulated
|
||
by other people, and hence we regard this as ``untrusted'' data. This
|
||
implies that the @code{find} command line is a filter which converts
|
||
the untrusted contents of the filesystem into a correct list of output
|
||
files.
|
||
|
||
The filesystem will in general change while @code{find} is searching
|
||
it; in fact, most of the potential security problems with @code{find}
|
||
relate to this issue in some way.
|
||
|
||
@dfn{Race conditions} are a general class of security problem where the
|
||
relative ordering of actions taken by @code{find} (for example) and
|
||
something else are critically important in getting the correct and expected result@footnote{This is more or less the
|
||
definition of the term ``race condition''} .
|
||
|
||
For @code{find}, an attacker might move or rename files or directories in
|
||
the hope that an action might be taken against a file which was not
|
||
normally intended to be affected. Alternatively, this sort of attack
|
||
might be intended to persuade @code{find} to search part of the
|
||
filesystem which would not normally be included in the search
|
||
(defeating the @code{-prune} action for example).
|
||
|
||
@menu
|
||
* Problems with -exec and filenames::
|
||
* Changing the Current Working Directory::
|
||
* Race Conditions with -exec::
|
||
* Race Conditions with -print and -print0::
|
||
@end menu
|
||
|
||
@node Problems with -exec and filenames
|
||
@subsection Problems with @code{-exec} and filenames
|
||
|
||
It is safe in many cases to use the @samp{-execdir} action with any
|
||
file name. Because @samp{-execdir} prefixes the arguments it passes
|
||
to programs with @samp{./}, you will not accidentally pass an argument
|
||
which is interpreted as an option. For example the file @file{-f}
|
||
would be passed to @code{rm} as @file{./-f}, which is harmless.
|
||
|
||
However, your degree of safety does depend on the nature of the
|
||
program you are running. For example constructs such as these two commands
|
||
|
||
@example
|
||
# risky
|
||
find -exec sh -c "something @{@}" \;
|
||
find -execdir sh -c "something @{@}" \;
|
||
@end example
|
||
|
||
are very dangerous. The reason for this is that the @samp{@{@}} is
|
||
expanded to a filename which might contain a semicolon or other
|
||
characters special to the shell. If for example someone creates the
|
||
file @file{/tmp/foo; rm -rf $HOME} then the two commands above could
|
||
delete someone's home directory.
|
||
|
||
So for this reason do not run any command which will pass untrusted
|
||
data (such as the names of files) to commands which interpret
|
||
arguments as commands to be further interpreted (for example
|
||
@samp{sh}).
|
||
|
||
In the case of the shell, there is a clever workaround for this
|
||
problem:
|
||
|
||
@example
|
||
# safer
|
||
find -exec sh -c 'something "$@@"' sh @{@} \;
|
||
find -execdir sh -c 'something "$@@"' sh @{@} \;
|
||
@end example
|
||
|
||
This approach is not guaranteed to avoid every problem, but it is much
|
||
safer than substituting data of an attacker's choice into the text of
|
||
a shell command.
|
||
|
||
@node Changing the Current Working Directory
|
||
@subsection Changing the Current Working Directory
|
||
|
||
As @code{find} searches the filesystem, it finds subdirectories and
|
||
then searches within them by changing its working directory. First,
|
||
@code{find} reaches and recognises a subdirectory. It then decides if that
|
||
subdirectory meets the criteria for being searched; that is, any
|
||
@samp{-xdev} or @samp{-prune} expressions are taken into account. The
|
||
@code{find} program will then change working directory and proceed to
|
||
search the directory.
|
||
|
||
A race condition attack might take the form that once the checks
|
||
relevant to @samp{-xdev} and @samp{-prune} have been done, an attacker
|
||
might rename the directory that was being considered, and put in its
|
||
place a symbolic link that actually points somewhere else.
|
||
|
||
The idea behind this attack is to fool @code{find} into going into the
|
||
wrong directory. This would leave @code{find} with a working
|
||
directory chosen by an attacker, bypassing any protection apparently
|
||
provided by @samp{-xdev} and @samp{-prune}, and any protection
|
||
provided by being able to @emph{not} list particular directories on
|
||
the @code{find} command line. This form of attack is particularly
|
||
problematic if the attacker can predict when the @code{find} command
|
||
will be run, as is the case with @code{cron} tasks for example.
|
||
|
||
GNU @code{find} has specific safeguards to prevent this general class
|
||
of problem. The exact form of these safeguards depends on the
|
||
properties of your system.
|
||
|
||
@menu
|
||
* O_NOFOLLOW:: Safely changing directory using @code{fchdir}.
|
||
* Systems without O_NOFOLLOW:: Checking for symbolic links after @code{chdir}.
|
||
@end menu
|
||
|
||
@node O_NOFOLLOW
|
||
@subsubsection @code{O_NOFOLLOW}
|
||
|
||
If your system supports the @code{O_NOFOLLOW} flag @footnote{GNU/Linux
|
||
(kernel version 2.1.126 and later) and FreeBSD (3.0-CURRENT and later)
|
||
support this} to the @code{open(2)} system call, @code{find} uses it
|
||
to safely change directories. The target directory is first opened
|
||
and then @code{find} changes working directory with the
|
||
@code{fchdir()} system call. This ensures that symbolic links are not
|
||
followed, preventing the sort of race condition attack in which use
|
||
is made of symbolic links.
|
||
|
||
If for any reason this approach does not work, @code{find} will fall
|
||
back on the method which is normally used if @code{O_NOFOLLOW} is not
|
||
supported.
|
||
|
||
You can tell if your system supports @code{O_NOFOLLOW} by running
|
||
|
||
@example
|
||
find --version | grep Features
|
||
@end example
|
||
|
||
This will tell you the version number and which features are enabled.
|
||
For example, if I run this on my system now, this gives:
|
||
@example
|
||
Features enabled: D_TYPE O_NOFOLLOW(enabled) LEAF_OPTIMISATION \
|
||
FTS(FTS_CWDFD) CBO(level=2)
|
||
@end example
|
||
|
||
Here, you can see that I am running a version of @code{find} which was
|
||
built from the development (git) code prior to the release of
|
||
findutils-4.5.12, and that several features including @code{O_NOFOLLOW} are
|
||
present. @code{O_NOFOLLOW} is qualified with ``enabled''. This simply means
|
||
that the current system seems to support @code{O_NOFOLLOW}. This check is
|
||
needed because it is possible to build @code{find} on a system that
|
||
defines @code{O_NOFOLLOW} and then run it on a system that ignores the
|
||
@code{O_NOFOLLOW} flag. We try to detect such cases at startup by checking
|
||
the operating system and version number; when this happens you will
|
||
see @samp{O_NOFOLLOW(disabled)} instead.
|
||
|
||
@node Systems without O_NOFOLLOW
|
||
@subsubsection Systems without @code{O_NOFOLLOW}
|
||
|
||
The strategy for preventing this type of problem on systems that lack
|
||
support for the @code{O_NOFOLLOW} flag is more complex. Each time
|
||
@code{find} changes directory, it examines the directory it is about
|
||
to move to, issues the @code{chdir()} system call, and then checks
|
||
that it has ended up in the subdirectory it expected. If all is as
|
||
expected, processing continues as normal. However, there are two main
|
||
reasons why the directory might change: the use of an automounter and
|
||
someone removing the old directory and replacing it with something
|
||
else while @code{find} is trying to descend into it.
|
||
|
||
Where a filesystem ``automounter'' is in use it can be the case that
|
||
the use of the @code{chdir()} system call can itself cause a new
|
||
filesystem to be mounted at that point. On systems that do not
|
||
support @code{O_NOFOLLOW}, this will cause @code{find}'s security check to
|
||
fail.
|
||
|
||
However, this does not normally represent a security problem, since
|
||
the automounter configuration is normally set up by the system
|
||
administrator. Therefore, if the @code{chdir()} sanity check fails,
|
||
@code{find} will make one more attempt@footnote{This may not be the
|
||
case for the fts-based executable}. If that succeeds, execution
|
||
carries on as normal. This is the usual case for automounters.
|
||
|
||
Where an attacker is trying to exploit a race condition, the problem
|
||
may not have gone away on the second attempt. If this is the case,
|
||
@code{find} will issue a warning message and then ignore that
|
||
subdirectory. When this happens, actions such as @samp{-exec} or
|
||
@samp{-print} may already have taken place for the problematic
|
||
subdirectory. This is because @code{find} applies tests and actions
|
||
to directories before searching within them (unless @samp{-depth} was
|
||
specified).
|
||
|
||
Because of the nature of the directory-change operation and security
|
||
check, in the worst case the only things that @code{find} would have
|
||
done with the directory are to move into it and back out to the
|
||
original parent. No operations would have been performed within that
|
||
directory.
|
||
|
||
@node Race Conditions with -exec
|
||
@subsection Race Conditions with @code{-exec}
|
||
|
||
The @samp{-exec} action causes another program to be run. It passes
|
||
to the program the name of the file which is being considered at the
|
||
time. The invoked program will typically then perform some action
|
||
on that file. Once again, there is a race condition which can be
|
||
exploited here. We shall take as a specific example the command
|
||
|
||
@example
|
||
find /tmp -path /tmp/umsp/passwd -exec /bin/rm
|
||
@end example
|
||
|
||
In this simple example, we are identifying just one file to be deleted
|
||
and invoking @code{/bin/rm} to delete it. A problem exists because
|
||
there is a time gap between the point where @code{find} decides that
|
||
it needs to process the @samp{-exec} action and the point where the
|
||
@code{/bin/rm} command actually issues the @code{unlink()} system
|
||
call to delete the file from the filesystem. Within this time period, an attacker can rename the
|
||
@file{/tmp/umsp} directory, replacing it with a symbolic link to
|
||
@file{/etc}. There is no way for @code{/bin/rm} to determine that it
|
||
is working on the same file that @code{find} had in mind. Once the
|
||
symbolic link is in place, the attacker has persuaded @code{find} to
|
||
cause the deletion of the @file{/etc/passwd} file, which is not the
|
||
effect intended by the command which was actually invoked.
|
||
|
||
One possible defence against this type of attack is to modify the
|
||
behaviour of @samp{-exec} so that the @code{/bin/rm} command is run
|
||
with the argument @file{./passwd} and a suitable choice of working
|
||
directory. This would allow the normal sanity check that @code{find}
|
||
performs to protect against this form of attack too. Unfortunately,
|
||
this strategy cannot be used as the POSIX standard specifies that the
|
||
current working directory for commands invoked with @samp{-exec} must
|
||
be the same as the current working directory from which @code{find}
|
||
was invoked. This means that the @samp{-exec} action is inherently
|
||
insecure and can't be fixed.
|
||
|
||
GNU @code{find} implements a more secure variant of the @samp{-exec}
|
||
action, @samp{-execdir}. The @samp{-execdir} action
|
||
ensures that it is not necessary to dereference subdirectories to
|
||
process target files. The current directory used to invoke programs
|
||
is the same as the directory in which the file to be processed exists
|
||
(@file{/tmp/umsp} in our example, and only the basename of the file to
|
||
be processed is passed to the invoked command, with a @samp{./}
|
||
prepended (giving @file{./passwd} in our example).
|
||
|
||
The @samp{-execdir} action refuses to do anything if the current
|
||
directory is included in the @env{PATH} environment variable. This
|
||
is necessary because @samp{-execdir} runs programs in the same
|
||
directory in which it finds files -- in general, such a directory
|
||
might be writable by untrusted users. For similar reasons,
|
||
@samp{-execdir} does not allow @samp{@{@}} to appear in the name of
|
||
the command to be run.
|
||
|
||
@node Race Conditions with -print and -print0
|
||
@subsection Race Conditions with @code{-print} and @code{-print0}
|
||
|
||
The @samp{-print} and @samp{-print0} actions can be used to produce a
|
||
list of files matching some criteria, which can then be used with some
|
||
other command, perhaps with @code{xargs}. Unfortunately, this means
|
||
that there is an unavoidable time gap between @code{find} deciding
|
||
that one or more files meet its criteria and the relevant command
|
||
being executed. For this reason, the @samp{-print} and @samp{-print0}
|
||
actions are just as insecure as @samp{-exec}.
|
||
|
||
In fact, since the construction
|
||
|
||
@example
|
||
find @dots{} -print | xargs @enddots{}
|
||
@end example
|
||
|
||
does not cope correctly with newlines or other ``white space'' in
|
||
file names, and copes poorly with file names containing quotes, the
|
||
@samp{-print} action is less secure even than @samp{-print0}.
|
||
|
||
|
||
@comment node-name, next, previous, up
|
||
@comment @node Security Considerations for xargs
|
||
@node Security Considerations for xargs
|
||
@section Security Considerations for @code{xargs}
|
||
|
||
The description of the race conditions affecting the @samp{-print}
|
||
action of @code{find} shows that @code{xargs} cannot be secure if it
|
||
is possible for an attacker to modify a filesystem after @code{find}
|
||
has started but before @code{xargs} has completed all its actions.
|
||
|
||
However, there are other security issues that exist even if it is not
|
||
possible for an attacker to have access to the filesystem in real
|
||
time. Firstly, if it is possible for an attacker to create files with
|
||
names of their choice on the filesystem, then @code{xargs} is
|
||
insecure unless the @samp{-0} option is used. If a file with the name
|
||
@file{/home/someuser/foo/bar\n/etc/passwd} exists (assume that
|
||
@samp{\n} stands for a newline character), then @code{find @dots{} -print}
|
||
can be persuaded to print three separate lines:
|
||
|
||
@example
|
||
/home/someuser/foo/bar
|
||
|
||
/etc/passwd
|
||
@end example
|
||
|
||
If it finds a blank line in the input, @code{xargs} will ignore it.
|
||
Therefore, if some action is to be taken on the basis of this list of
|
||
files, the @file{/etc/passwd} file would be included even if this was
|
||
not the intent of the person running find. There are circumstances in
|
||
which an attacker can use this to their advantage. The same
|
||
consideration applies to file names containing ordinary spaces rather
|
||
than newlines, except that of course the list of file names will no
|
||
longer contain an ``extra'' newline.
|
||
|
||
This problem is an unavoidable consequence of the default behaviour of
|
||
the @code{xargs} command, which is specified by the POSIX standard.
|
||
The only ways to avoid this problem are either to avoid all use of
|
||
@code{xargs} in favour for example of @samp{find -exec} or (where
|
||
available) @samp{find -execdir}, or to use the @samp{-0} option, which
|
||
ensures that @code{xargs} considers file names to be separated by
|
||
ASCII NUL characters rather than whitespace. However, useful as this
|
||
option is, the POSIX standard did not make it mandatory prior to
|
||
Issue 8 (IEEE Std 1003.1-2024).
|
||
|
||
POSIX also specifies that @code{xargs} without @code{-0} interprets quoting and trailing
|
||
whitespace specially in filenames, too. This means that using
|
||
@code{find ... -print | xargs ...} can cause the commands run by
|
||
@code{xargs} to receive a list of file names which is not the same as
|
||
the list printed by @code{find}. The interpretation of quotes and
|
||
trailing whitespace is turned off by the @samp{-0} argument to
|
||
@code{xargs}, which is another reason to use that option.
|
||
|
||
@comment node-name, next, previous, up
|
||
@node Security Considerations for locate
|
||
@section Security Considerations for @code{locate}
|
||
|
||
@subsection Race Conditions
|
||
It is fairly unusual for the output of @code{locate} to be fed into
|
||
another command. However, if this were to be done, this would raise
|
||
the same set of security issues as the use of @samp{find @dots{} -print}.
|
||
Although the problems relating to whitespace in file names can be
|
||
resolved by using @code{locate}'s @samp{-0} option, this still leaves
|
||
the race condition problems associated with @samp{find @dots{} -print0}.
|
||
There is no way to avoid these problems in the case of @code{locate}.
|
||
|
||
@node Security Summary
|
||
@section Summary
|
||
|
||
Where untrusted parties can create files on the system, or affect the
|
||
names of files that are created, all uses for @code{find},
|
||
@code{locate} and @code{xargs} have known security problems except the
|
||
following:
|
||
|
||
@table @asis
|
||
@item Informational use only
|
||
Uses where the programs are used to prepare lists of file names upon
|
||
which no further action will ever be taken.
|
||
|
||
@item @samp{-delete}
|
||
Use of the @samp{-delete} action with @code{find} to delete files
|
||
which meet specified criteria
|
||
|
||
@item @samp{-execdir}
|
||
Use of the @samp{-execdir} action with @code{find} where the
|
||
@env{PATH} environment variable contains directories which contain
|
||
only trusted programs.
|
||
@end table
|
||
|
||
|
||
@node Further Reading on Security
|
||
@section Further Reading on Security
|
||
|
||
While there are a number of books on computer security, there are also
|
||
useful articles on the web that touch on the issues described above:
|
||
|
||
@table @url
|
||
@item https://goo.gl/DAvh
|
||
@c https://www.securecoding.cert.org/confluence/display/seccode/MSC09-C.+Character+Encoding+-+Use+Subset+of+ASCII+for+Safety
|
||
This article describes some of the unfortunate effects of allowing
|
||
free choice of file names.
|
||
@item https://cwe.mitre.org/data/definitions/78.html
|
||
Describes OS Command Injection
|
||
@item https://cwe.mitre.org/data/definitions/73.html
|
||
Describes problems arising from allowing remote computers to send
|
||
requests which specify file names of their choice
|
||
@item https://cwe.mitre.org/data/definitions/116.html
|
||
Describes problems relating to encoding file names and escaping
|
||
characters. This article is relevant to findutils because for command
|
||
lines processed via the shell, the encoding and escaping rules are
|
||
already set by the shell. For example command lines like @code{find
|
||
... -print | some-shell-script} require specific care.
|
||
@item https://xkcd.com/327/
|
||
A humorous and pithy summary of the broader problem.
|
||
@end table
|
||
|
||
@comment node-name, next, previous, up
|
||
@node Error Messages
|
||
@chapter Error Messages
|
||
|
||
This section describes some of the error messages sometimes made by
|
||
@code{find}, @code{xargs}, or @code{locate}, explains them and in some
|
||
cases provides advice as to what you should do about this.
|
||
|
||
This manual is written in English. The GNU findutils software
|
||
features translations of error messages for many languages. For this
|
||
reason the error messages produced by the programs are made to be as
|
||
self-explanatory as possible. This approach avoids leaving people to
|
||
figure out which test an English-language error message corresponds
|
||
to. Error messages which are self-explanatory will not normally be
|
||
mentioned in this document. For those messages mentioned in this
|
||
document, only the English-language version of the message will be
|
||
listed.
|
||
|
||
@menu
|
||
* Error Messages From find::
|
||
* Error Messages From xargs::
|
||
* Error Messages From locate::
|
||
* Error Messages From updatedb::
|
||
@end menu
|
||
|
||
@node Error Messages From find
|
||
@section Error Messages From @code{find}
|
||
|
||
Most error messages produced by find are self-explanatory. Error
|
||
messages sometimes include a filename. When this happens, the
|
||
filename is quoted in order to prevent any unusual characters in the
|
||
filename making unwanted changes in the state of the terminal.
|
||
|
||
@table @samp
|
||
@item invalid predicate `-foo'
|
||
This means that the @code{find} command line included something that
|
||
started with a dash or other special character. The @code{find}
|
||
program tried to interpret this as a test, action or option, but
|
||
didn't recognise it. If it was intended to be a test, check what was
|
||
specified against the documentation. If, on the other hand, the
|
||
string is the name of a file which has been expanded from a wildcard
|
||
(for example because you have a @samp{*} on the command line),
|
||
consider using @samp{./*} or just @samp{.} instead.
|
||
|
||
@item unexpected extra predicate
|
||
This usually happens if you have an extra bracket on the command line
|
||
(for example @samp{find . -print \)}).
|
||
|
||
@item Warning: filesystem /path/foo has recently been mounted
|
||
@itemx Warning: filesystem /path/foo has recently been unmounted
|
||
These messages might appear when @code{find} moves into a directory
|
||
and finds that the device number and inode are different from what it
|
||
expected them to be. If the directory @code{find} has moved into is
|
||
on a network filesystem (NFS), it will not issue this message, because
|
||
@code{automount} frequently mounts new filesystems on directories as
|
||
you move into them (that is how it knows you want to use the
|
||
filesystem). So, if you do see this message, be wary --
|
||
@code{automount} may not have been responsible. Consider the
|
||
possibility that someone else is manipulating the filesystem while
|
||
@code{find} is running. Some people might do this in order to mislead
|
||
@code{find} or persuade it to look at one set of files when it thought
|
||
it was looking at another set.
|
||
|
||
@item /path/foo changed during execution of find (old device number 12345, new device number 6789, filesystem type is <whatever>) [ref XXX]
|
||
This message is issued when @code{find} moves into a directory and ends up
|
||
somewhere it didn't expect to be. This happens in one of two
|
||
circumstances. Firstly, this happens when @code{automount} intervenes
|
||
on a system where @code{find} doesn't know how to determine what
|
||
the current set of mounted filesystems is.
|
||
|
||
Secondly, this can happen when the device number of a directory
|
||
appears to change during a change of current directory, but
|
||
@code{find} is moving up the filesystem hierarchy rather than down into it.
|
||
In order to prevent @code{find} wandering off into some unexpected
|
||
part of the filesystem, we stop it at this point.
|
||
|
||
@item Don't know how to use getmntent() to read `/etc/mtab'. This is a bug.
|
||
This message is issued when a problem similar to the above occurs on a
|
||
system where @code{find} doesn't know how to figure out the current
|
||
list of mount points. Ask for help on @email{bug-findutils@@gnu.org}.
|
||
|
||
@item /path/foo/bar changed during execution of find (old inode number 12345, new inode number 67893, filesystem type is <whatever>) [ref XXX]"),
|
||
This message is issued when @code{find} moves into a directory and
|
||
discovers that the inode number of that directory
|
||
is different from the inode number that it obtained when it examined the
|
||
directory previously. This usually means that while
|
||
@code{find} was deep in a directory hierarchy doing a
|
||
time consuming operation, somebody has moved one of the parent directories to
|
||
another location in the same filesystem. This may or may not have been done
|
||
maliciously. In any case, @code{find} stops at this point
|
||
to avoid traversing parts of the filesystem that it wasn't
|
||
intended to. You can use @code{ls -li} or @code{find /path -inum
|
||
12345 -o -inum 67893} to find out more about what has happened.
|
||
|
||
@item sanity check of the fnmatch() library function failed.
|
||
Please submit a bug report. You may well be asked questions about
|
||
your system, and if you compiled the @code{findutils} code yourself,
|
||
you should keep your copy of the build tree around. The likely
|
||
explanation is that your system has a buggy implementation of
|
||
@code{fnmatch} that looks enough like the GNU version to fool
|
||
@code{configure}, but which doesn't work properly.
|
||
|
||
@item cannot fork
|
||
This normally happens if you use the @code{-exec} action or
|
||
something similar (@code{-ok} and so forth) but the system has run out
|
||
of free process slots. This is either because the system is very busy
|
||
and the system has reached its maximum process limit, or because you
|
||
have a resource limit in place and you've reached it. Check the
|
||
system for runaway processes (with @code{ps}, if possible). Some process
|
||
slots are normally reserved for use by @samp{root}.
|
||
|
||
@item some-program terminated by signal 99
|
||
Some program which was launched with @code{-exec} or similar was killed
|
||
with a fatal signal. This is just an advisory message.
|
||
@end table
|
||
|
||
|
||
@node Error Messages From xargs
|
||
@section Error Messages From @code{xargs}
|
||
|
||
@table @samp
|
||
@item environment is too large for exec
|
||
This message means that you have so many environment variables set (or
|
||
such large values for them) that there is no room within the
|
||
system-imposed limits on program command line argument length to
|
||
invoke any program. This is an unlikely situation and is more likely
|
||
result of an attempt to test the limits of @code{xargs}, or break it.
|
||
Please try unsetting some environment variables, or exiting the
|
||
current shell. You can also use @samp{xargs --show-limits} to
|
||
understand the relevant sizes.
|
||
|
||
@item argument list too long
|
||
You are using the @samp{-I} option and @code{xargs} doesn't have
|
||
enough space to build a command line because it has read a really
|
||
large item and it doesn't fit. You may be able to work around this
|
||
problem with the @samp{-s} option, but the default size is pretty
|
||
large. This is a rare situation and is more likely an attempt to test
|
||
the limits of @code{xargs}, or break it. Otherwise, you will need to
|
||
try to shorten the problematic argument or not use @code{xargs}.
|
||
|
||
@item argument line too long
|
||
You are using the @samp{-L} or @samp{-l} option and one of the input
|
||
lines is too long. You may be able to work around this problem with
|
||
the @samp{-s} option, but the default size is pretty large. If you
|
||
can modify the your @code{xargs} command not to use @samp{-L} or
|
||
@samp{-l}, that will be more likely to result in success.
|
||
|
||
@item cannot fork
|
||
See the description of the similar message for @code{find}.
|
||
|
||
@item <program>: exited with status 255; aborting
|
||
When a command run by @code{xargs} exits with status 255, @code{xargs}
|
||
is supposed to stop. If this is not what you intended, wrap the
|
||
program you are trying to invoke in a shell script which doesn't
|
||
return status 255.
|
||
|
||
@item <program>: terminated by signal 99
|
||
See the description of the similar message for @code{find}.
|
||
|
||
@item cannot set SIGUSR1 signal handler
|
||
@code{xargs} is having trouble preparing for you to be able to send it
|
||
signals to increase or decrease the parallelism of its processing.
|
||
If you don't plan to send it those signals, this warning can be ignored
|
||
(though if you're a programmer, you may want to help us figure out
|
||
why @code{xargs} is confused by your operating system).
|
||
|
||
@item failed to redirect standard input of the child process
|
||
@code{xargs} redirects the standard input stream of the command to be run to
|
||
either @file{/dev/null} or to @file{/dev/tty} for the @samp{-o} option.
|
||
See the manual of the system call @code{dup2(2)}.
|
||
@end table
|
||
|
||
@node Error Messages From locate
|
||
@section Error Messages From @code{locate}
|
||
|
||
@table @samp
|
||
@item warning: database @file{@value{LOCATE_DB}} is more than 8 days old
|
||
The @code{locate} program relies on a database which is periodically
|
||
built by the @code{updatedb} program. That hasn't happened in a long
|
||
time. To fix this problem, run @code{updatedb} manually. This can
|
||
often happen on systems that are generally not left on, so the
|
||
periodic ``cron'' task which normally does this doesn't get a chance
|
||
to run.
|
||
|
||
@item locate database @file{@value{LOCATE_DB}} is corrupt or invalid
|
||
This should not happen. Re-run @code{updatedb}. If that works, but
|
||
@code{locate} still produces this error, run @code{locate --version}
|
||
and @code{updatedb --version}. These should produce the same output.
|
||
If not, you are using a mixed toolset; check your @env{PATH}
|
||
environment variable and your shell aliases (if you have any). If
|
||
both programs claim to be GNU versions, this is a bug; all versions of
|
||
these programs should interoperate without problem. Ask for help on
|
||
@email{bug-findutils@@gnu.org}.
|
||
@end table
|
||
|
||
|
||
@node Error Messages From updatedb
|
||
@section Error Messages From @code{updatedb}
|
||
|
||
The @code{updatedb} program (and the programs it invokes) do issue
|
||
error messages, but none seem to be candidates for guidance. If
|
||
you are having a problem understanding one of these, ask for help on
|
||
@email{bug-findutils@@gnu.org}.
|
||
|
||
|
||
@node History
|
||
@chapter History
|
||
|
||
The @code{xargs} and @code{find} programs have separate origins but
|
||
are collected together in Findutils because they are often used
|
||
together. While today they also share a small amount of
|
||
implementation, this wasn't originally the case. The @code{locate}
|
||
program started out as a feature of @code{find} but today it is a
|
||
separate program.
|
||
|
||
@section History of @code{find}
|
||
|
||
A @code{find} program appeared in Version 5 Unix as part of the
|
||
Programmer's Workbench project and was written by Dick Haight. Doug
|
||
McIlroy's @cite{A Research UNIX Reader: Annotated Excerpts from the
|
||
Programmer’s Manual, 1971-1986} provides some additional details; you
|
||
can read it on-line at
|
||
@url{https://www.cs.dartmouth.edu/~doug/reader.pdf}.
|
||
|
||
GNU @code{find} was originally written by Eric Decker, with
|
||
enhancements by David MacKenzie, Jay Plett, and Tim Wood. The idea
|
||
for @samp{find -print0} and @samp{xargs -0} came from Dan Bernstein.
|
||
|
||
@section History of @code{xargs}
|
||
|
||
The @code{xargs} program was invented by Herb Gellis at Bell Labs. In
|
||
his own words:
|
||
|
||
@quotation
|
||
Hi James, Thanks for reaching out. Yes I invented @code{xargs} way
|
||
back before we even released UNIX to the general public when it was
|
||
running on PDP-11 machines with little memory, and capable shell
|
||
programs were not there yet - kind of like early IBM PC-DOS command
|
||
lines. The name came about, first, by noticing at the time there were
|
||
no commands beginning with @samp{x} (silly reason, I know), and then
|
||
came up with, basically, "eXecute command with ARGumentS". This
|
||
obviously allowed one to process files, sequentially, including
|
||
batches of files, while the UNIX command line buffer was very tiny. I
|
||
don't remember exactly how small but possibly only 512 bytes. The very
|
||
first use intended was to allow compiling C programs that were broken
|
||
into many small routines whose total name length would exceed the
|
||
command line buffer. Hope this settles the matter! Oh, another arcane
|
||
factoid about @code{xargs} at the beginning was that I was able to
|
||
keep it smaller than 4k (I think that was the amount) which at the
|
||
time was the maximum size of a file segment on PDP-11/UNIX, so that
|
||
the program could be loaded completely on the first segment, without
|
||
having to go back and get chains of further segments - Hence fast!
|
||
@end quotation
|
||
|
||
In 2023, GNU @code{xargs} is unfortunately much larger, around 75KiB
|
||
when stripped.
|
||
|
||
GNU @code{xargs} isn't derived from the original Bell Labs program.
|
||
It was originally written by Mike Rendell, with enhancements by David
|
||
MacKenzie.
|
||
|
||
@section History of @code{locate}
|
||
|
||
4.3-BSD introduced the @dfn{fast-find} feature, in which the command
|
||
@code{find needle} would look for a file named @samp{needle}. This
|
||
took advantage of the fact that, at the time, there was no valid
|
||
two-argument @code{find} invocation. The implementation was much
|
||
faster than searching the whole file system in real time, because it
|
||
used a pre-built file name database. This functionality is described
|
||
in more detail in @cite{Finding Files Fast} by James Woods (Usenix
|
||
;login, Volume 8 Issue 1, pages. 8-10, 1983).
|
||
|
||
Standardisation of @code{find} led to this functionality being moved
|
||
into the @code{locate} program in 4.4-BSD. The command @code{find
|
||
needle} now unambiguously means ``start searching at the file
|
||
@code{needle} and print the names of the files you encounter''.
|
||
|
||
GNU @code{locate} and its associated utilities were originally written
|
||
by James Woods, with enhancements by David MacKenzie.
|
||
|
||
@node GNU Free Documentation License
|
||
@appendix GNU Free Documentation License
|
||
@include fdl.texi
|
||
|
||
@node Primary Index
|
||
@unnumbered @code{find} Primary Index
|
||
|
||
This is a list of all of the primaries (tests, actions, and options)
|
||
that make up @code{find} expressions for selecting files. @xref{find
|
||
Expressions}, for more information on expressions.
|
||
|
||
@printindex fn
|
||
|
||
@bye
|
||
|
||
@comment texi related words used by Emacs' spell checker ispell.el
|
||
|
||
@comment LocalWords: texinfo setfilename settitle setchapternewpage
|
||
@comment LocalWords: iftex finalout ifinfo DIR titlepage vskip pt
|
||
@comment LocalWords: filll dir samp dfn noindent xref pxref
|
||
@comment LocalWords: var deffn texi deffnx itemx emph asis
|
||
@comment LocalWords: findex smallexample subsubsection cindex
|
||
@comment LocalWords: dircategory direntry itemize
|
||
|
||
@comment other words used by Emacs' spell checker ispell.el
|
||
@comment LocalWords: README fred updatedb xargs Plett Rendell akefile
|
||
@comment LocalWords: args grep Filesystems fo foo fOo wildcards iname
|
||
@comment LocalWords: ipath regex iregex expr fubar regexps
|
||
@comment LocalWords: metacharacters macs sr sc inode lname ilname
|
||
@comment LocalWords: sysdep noleaf ls inum xdev filesystems usr atime
|
||
@comment LocalWords: ctime mtime amin cmin mmin al daystart Sladkey rm
|
||
@comment LocalWords: anewer cnewer bckw rf xtype uname gname uid gid
|
||
@comment LocalWords: nouser nogroup chown chgrp perm ch maxdepth
|
||
@comment LocalWords: mindepth cpio src CD AFS statted stat fstype ufs
|
||
@comment LocalWords: nfs tmp mfs printf fprint dils rw djm Nov lwall
|
||
@comment LocalWords: POSIXLY fls fprintf strftime locale's EDT GMT AP
|
||
@comment LocalWords: EST diff perl backquotes sprintf Falstad Oct cron
|
||
@comment LocalWords: eg vmunix mkdir afs allexec allwrite ARG bigram
|
||
@comment LocalWords: bigrams cd chmod comp crc CVS dbfile eof
|
||
@comment LocalWords: fileserver filesystem fn frcode Ghazi Hnewc iXX
|
||
@comment LocalWords: joeuser Kaveh localpaths localuser LOGNAME
|
||
@comment LocalWords: Meyering mv netpaths netuser nonblank nonblanks
|
||
@comment LocalWords: ois ok Pinard printindex proc procs prunefs
|
||
@comment LocalWords: prunepaths pwd RFS rmadillo rmdir rsh sbins str
|
||
@comment LocalWords: su Timar ubins ug unstripped vf VM Weitzel
|
||
@comment LocalWords: wildcard zlogout basename execdir wholename iwholename
|
||
@comment LocalWords: timestamp timestamps Solaris FreeBSD OpenBSD POSIX
|