mirror of
git://git.sv.gnu.org/findutils.git
synced 2026-02-19 21:15:24 +01:00
These instances represent hyphen-minus characters, so should be spelled in man(7) source as `\-`. * locate/locatedb.5: Do the above. Discussed at: https://lists.gnu.org/r/bug-findutils/2025-11/msg00094.html Copyright-paperwork-exempt: Yes
184 lines
6.4 KiB
Groff
184 lines
6.4 KiB
Groff
.TH LOCATEDB 5 2020-12-27 findutils \" -*- nroff -*-
|
|
.ie \n(.g \{\
|
|
. ds en \(en
|
|
.\}
|
|
.el \{\
|
|
. ds en \-
|
|
.\}
|
|
.SH NAME
|
|
locatedb \- front-compressed file name database
|
|
.
|
|
.SH DESCRIPTION
|
|
This manual page documents the format of file name databases for the
|
|
GNU version of
|
|
.BR locate .
|
|
The file name databases contain lists of files that were in
|
|
particular directory trees when the databases were last updated.
|
|
.P
|
|
There can be multiple databases. Users can select which databases
|
|
\fBlocate\fP searches using an environment variable or command line
|
|
option; see \fBlocate\fP(1). The system administrator can choose the
|
|
file name of the default database, the frequency with which the
|
|
databases are updated, and the directories for which they contain
|
|
entries. Normally, file name databases are updated by running the
|
|
\fBupdatedb\fP program periodically, typically nightly; see
|
|
\fBupdatedb\fP(1).
|
|
.
|
|
.SH GNU LOCATE02 database format
|
|
This is the default format of databases produced by
|
|
.BR updatedb .
|
|
The
|
|
.B updatedb
|
|
program runs
|
|
.B frcode
|
|
to compress the list of file names using front-compression, which
|
|
reduces the database size by a factor of 4 to 5. Front-compression
|
|
(also known as incremental encoding) works as follows.
|
|
.P
|
|
The database entries are a sorted list (case-insensitively, for users'
|
|
convenience). Since the list is sorted, each entry is likely to share
|
|
a prefix (initial string) with the previous entry. Each database
|
|
entry begins with an signed offset-differential count byte, which is
|
|
the additional number of characters of prefix of the preceding entry
|
|
to use beyond the number that the preceding entry is using of its
|
|
predecessor. (The counts can be negative.) Following the count is a
|
|
null-terminated ASCII remainder \(em the part of the name that follows
|
|
the shared prefix.
|
|
.P
|
|
If the offset-differential count is larger than can be stored in a
|
|
signed byte (\(+-127), the byte has the value 0x80 (binary 10000000)
|
|
and the actual count follows in a 2-byte word, with the high byte
|
|
first (network byte order). This count can also be negative (the sign
|
|
bit being in the first of the two bytes).
|
|
.P
|
|
Every database begins with a dummy entry for a file called `LOCATE02',
|
|
which \fBlocate\fP checks for to ensure that the database file has the
|
|
correct format; it ignores the entry in doing the search.
|
|
.P
|
|
Databases cannot be concatenated together, even if the first
|
|
(dummy) entry is trimmed from all but the first database. This
|
|
is because the offset-differential count in the first entry of the
|
|
second and following databases will be wrong.
|
|
.P
|
|
In the future, the data within the locate database may not be sorted
|
|
in any particular order. To obtain sorted results, pipe the output of
|
|
.B locate
|
|
through
|
|
.BR "sort -f" .
|
|
.
|
|
.SH slocate database format
|
|
The
|
|
.B slocate
|
|
program uses a database format similar to, but not quite the same as,
|
|
GNU
|
|
.BR locate .
|
|
The first byte of the database specifies its
|
|
.I security
|
|
.IR level .
|
|
If the security level is 0,
|
|
.B slocate
|
|
will read, match and print filenames on the basis of the information
|
|
in the database only. However, if the security level byte is 1,
|
|
.B slocate
|
|
omits entries from its output if the invoking user is unable to access
|
|
them. The second byte of the database is zero. The second byte is
|
|
followed by the first database entry. The first entry in the database
|
|
is not preceded by any differential count or dummy entry. Instead
|
|
the differential count for the first item is assumed to be zero.
|
|
.P
|
|
Starting with the second entry (if any) in the database, data is
|
|
interpreted as for the GNU LOCATE02 format.
|
|
.
|
|
.SH Old Locate Database format
|
|
There is also an old database format, used by Unix
|
|
.B locate
|
|
and
|
|
.B find
|
|
programs and earlier releases of the GNU ones. \fBupdatedb\fP runs
|
|
programs called \fBbigram\fP and \fBcode\fP to produce old-format
|
|
databases. The old format differs from the above description in the
|
|
following ways. Instead of each entry starting with an
|
|
offset-differential count byte and ending with a null, byte values
|
|
from 0 through 28 indicate offset-differential counts from -14 through
|
|
14. The byte value indicating that a long offset-differential count
|
|
follows is 0x1e (30), not 0x80. The long counts are stored in host
|
|
byte order, which is not necessarily network byte order, and host
|
|
integer word size, which is usually 4 bytes. They also represent a
|
|
count 14 less than their value. The database lines have no
|
|
termination byte; the start of the next line is indicated by its first
|
|
byte having a value \(<= 30.
|
|
.P
|
|
In addition, instead of starting with a dummy entry, the old database
|
|
format starts with a 256 byte table containing the 128 most common
|
|
bigrams in the file list. A bigram is a pair of adjacent bytes.
|
|
Bytes in the database that have the high bit set are indexes (with the
|
|
high bit cleared) into the bigram table. The bigram and
|
|
offset-differential count coding makes these databases 20\*(en25% smaller
|
|
than the new format, but makes them not 8-bit clean. Any byte in a
|
|
file name that is in the ranges used for the special codes is replaced
|
|
in the database by a question mark, which not coincidentally is the
|
|
shell wildcard to match a single character.
|
|
.
|
|
.SH EXAMPLE
|
|
.nf
|
|
\&
|
|
Input to \fBfrcode\fP:
|
|
.\" with nulls changed to newlines:
|
|
/usr/src
|
|
/usr/src/cmd/aardvark.c
|
|
/usr/src/cmd/armadillo.c
|
|
/usr/tmp/zoo
|
|
\&
|
|
Length of the longest prefix of the preceding entry to share:
|
|
0 /usr/src
|
|
8 /cmd/aardvark.c
|
|
14 rmadillo.c
|
|
5 tmp/zoo
|
|
\&
|
|
.fi
|
|
Output from \fBfrcode\fP, with trailing nulls changed to newlines
|
|
and count bytes made printable:
|
|
.nf
|
|
0 LOCATE02
|
|
0 /usr/src
|
|
8 /cmd/aardvark.c
|
|
6 rmadillo.c
|
|
\-9 tmp/zoo
|
|
\&
|
|
(6 = 14 \- 8, and \-9 = 5 \- 14)
|
|
.fi
|
|
.
|
|
.SH "REPORTING BUGS"
|
|
GNU findutils online help: <https://www.gnu.org/software/findutils/#get-help>
|
|
.br
|
|
Report any translation bugs to <https://translationproject.org/team/>
|
|
.PP
|
|
Report any other issue via the form at the GNU Savannah bug tracker:
|
|
.RS
|
|
<https://savannah.gnu.org/bugs/?group=findutils>
|
|
.RE
|
|
General topics about the GNU findutils package are discussed at the
|
|
.I bug\-findutils
|
|
mailing list:
|
|
.RS
|
|
<https://lists.gnu.org/mailman/listinfo/bug-findutils>
|
|
.RE
|
|
.
|
|
.SH COPYRIGHT
|
|
Copyright \(co 1994\*(en2026 Free Software Foundation, Inc.
|
|
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
|
|
.br
|
|
This is free software: you are free to change and redistribute it.
|
|
There is NO WARRANTY, to the extent permitted by law.
|
|
.
|
|
.SH "SEE ALSO"
|
|
.BR find (1),
|
|
.BR locate (1),
|
|
.BR xargs (1),
|
|
.BR locatedb (5)
|
|
.PP
|
|
Full documentation <https://www.gnu.org/software/findutils/locatedb>
|
|
.br
|
|
or available locally via:
|
|
.B info locatedb
|