- update to 1.4.21:
* Stop trying to check for incompatible C++ ABI between the compiler used to
build xapian-core and the compiler used to build code using xapian-core.
* Fix new warnings from GCC 12.
* Avoid undefined value use when unpacking a key in a corrupted glass docdata
table. We now skip further checks on the entry in this case.
* Merge allocations in MSVC directory reading compatibility code so we can
allocate in a single malloc() call.
* Add accept() wrapper which checks an assumption that Microsoft's SOCKET type
only actually holds 32 bit values even in 64 bit platforms and throws an
exception if violated.
* Eliminate a use of sprintf.
* Squash some unhelpful MSVC deprecation warnings.
* Declare dummy invalid parameter handler noexcept to fix a warning from MSVC.
* Include <stdlib.h> in configure check for sys_errlist as that's where it is
with mingw and MSVC.
OBS-URL: https://build.opensuse.org/request/show/1007094
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=101
- update to 1.4.20:
* Throw DatabaseNotFoundError when the database directory doesn't exist or
when it doesn't contain a Xapian database. Patch from Germán Méndez Bravo
in https://github.com/xapian/xapian/pull/258
* Improve exception message for attempting to remove an empty term (the
exception type is still InvalidArgumentError). Reported by David Bremner.
* Optimise when a value range is a superset of the slot bounds but the value
slot frequency is not equal to the document count by replacing the lower
bound with an empty string to make the bounds check very cheap.
* Avoid creating a PostList tree for an empty shard. This avoids pointless
work in an uncommon case, but also by handling this up front the code in
PostList subclasses for query operators can assume the shard isn't empty
which simplifies the code in several places.
* Remove lingering handling for database backends without slot bounds since
all backends have been required to support these since 1.4.11.
* Fix collection frequency estimates for positional operators. This affects
the weighting of positional operators in subqueries of OP_SYNONYM with
weighting schemes which use the collection frequency.
* xapian-check: Test decompress data in the spelling and synonym tables.
We don't have structure checking for these tables, but we can at least fetch
each entry and check for decompression problems.
* Improve error if a block is detected as overwritten in WritableDatabase.
Drop "are there multiple writers?" as it's rarely a useful question to ask
since we started using fcntl() locking as it's now very hard to get multiple
concurrent writers on a database. Instead suggest running xapian-check,
which is probably the best next step for a user who hits this problem.
OBS-URL: https://build.opensuse.org/request/show/989717
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=99
* New QueryParser::FLAG_NO_POSITIONS flag. With this flag enabled, any query
operations which would use positional information are replaced by the nearest
equivalent which doesn't (so phrase searches, NEAR and ADJ will result in
OP_AND). This is intended to replace the automatic conversion of OP_PHRASE,
etc to OP_AND when a database has no positional information, which will no
longer happen in the release series after 1.4.
* Give a compile error for code which adds a Database to WritableDatabase.
Prior to 1.4.19, this compiled and effectively created a "black-hole" shard
which quietly discarded any changes made to it.
In 1.4.19 it's still possible to perform this operation by assigning the
WritableDatabase to a Database first, which is harder to fix. This case
throws an exception on git master where it's easier to address.
* Fix TermIterator::skip_to() with sharded databases which sometimes was
failing to advance all the way to the requested term. Uncovered while
addressing warning from GCC's -Wduplicated-cond, reported by dcb in #816.
* Clamp edit distance to one less than the length of the word we've been asked
to correct, which makes the algorithm we use more efficient. We already
require suggestion to have at least one character in common, so the only
change to suggestions is we'll no longer suggest corrections which are
twice as long or longer even if the edit distance would allow it, which
seems like an improvement in itself.
* Minor optimisation expanding wildcards.
* PostingIterator::get_description(): For an all-docs iterator on a glass
database, get_description() would call get_docid() which isn't valid to
do once the iterator has reached the end.
* Expand allterms test coverage.
* Fetch wdf upper bound from postlist which avoids an extra postlist table
cursor seek per weighted query term, and also means we now use a per-shard
wdf upper bound for local shards which will in typically give a tighter
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=98
- update to 1.4.18:
* QueryParser::FLAG_ACCUMULATE: New flag. Previously the unstem and stoplist
data was always reset by a call to QueryParser::parse_query(), which makes
sense if you use the same QueryParser object to parse a series of independent
queries. If you're using the same QueryParser object to parse several fields
on the same query form, you may want to have the unstem and stoplist data
combined for all of them, in which case you can use this flag to prevent this
data from being reset.
* QueryParser::unstem_begin(): Eliminate unnecessary copying of the data.
* Fix typo in Swedish stopword list, syncing change made to Snowball by Daniel
Gómez Villanueva.
* Remove some French stop words with other meanings, syncing change made to
Snowball by PhilippeOuellet.
testsuite:
* Run testcase testlock4 using backend chert, not just using glass
* Skip testcase testlock4 on platforms that don't allow us to implement
Database::locked() (which notably include GNU Hurd and Microsoft Windows).
documentation:
* List DB_NO_TERMLIST in the WritableDatabase constructor API documentation
where we already list the other DB_* constants.
portability:
* Eliminate single use of std::mem_fun() which was deprecated in C++11 and
removed in C++17. Reported by Mateusz Pusz in #806.
* Add missing includes for std::numeric_limits<>. Reported by stac47 in #805.
* Work around mingw.org header issue. MSVC seems to implicitly include
<winerror.h> but mingw.org's headers don't, leading to ERROR_PIPE_CONNECTED
not being defined. Fixes https://github.com/xapian/xapian/pull/318, reported
by Alex Sandro.
* Suppress MSVC warnings about possible loss of data. The values involved are
the number of set bits in a value of integer type, so these warnings are
OBS-URL: https://build.opensuse.org/request/show/864450
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=96
- Update to 1.4.17:
+ API:
* Database::get_average_length(): Add this as an alias for Database::get_avlen().
In git master we've added this as a preferred new name - adding it to 1.4.x too
will make it easier for users to update to using this.
* Database::get_spelling_suggestion(): Optimise edit distance initialisation
loop to significantly reduce the cost of a typical edit distance calculation.
* Fix query expansion on sharded databases. The mechanism for passing in which
shard a TermList is from wasn't hooked up and as a result we'd always think
it's from the first shard, meaning the statistics would be wrong and that our
suggested terms may not have been as good as they should be in this
situation.
* Enquire::get_eset(): Use string::compare() to avoid 1/3 of the string compares
on average.
+ documentation:
* Update doxygen HTML headers and footers to resolve issues with some
interactive features of the API docs not working. Reported by Enrico Zini.
* Stop specifying obsolete doxygen settings PERL_PATH and MSCGEN_PATH.
* Clarify API docs for MSet::get_termfreq() to make it clear that this
considers all documents in the database, not only those that matched the
searched (it would sometimes be useful to be able to report the number of
occurrences of a term in the matched documents, but it's not something we
currently keep track of). Reported by Tadeusz Sośnierz and Peter Salomonsen.
OBS-URL: https://build.opensuse.org/request/show/829895
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=94
* MSet::snippet(): The snippet now includes trailing punctuation which carries
meaning or gives useful context. See
https://github.com/xapian/xapian/pull/180, reported by Robert Stepanek.
* MSet::snippet(): Fix segfault generating snippet from default-constructed
MSet. This probably isn't something you'd typically do, but it shouldn't
crash. Found during extended testing of #803 (which only affected git
master) which was reported by Robert Stepanek.
* Remove trailing full stop from exception messages. We conventionally don't
include one, but a few cases didn't follow that convention.
testsuite:
* Replace direct use of ftime() which gives deprecation warnings with recent
mingw. Reported by srinivasyadav22.
matcher:
* Fix segfault in rare cases in the query optimiser. We keep a pointer to the
most recent posting list to use as a hint for opening the next posting list,
but the existing mechanism to take ownership of this hint had a flaw. We now
invalidate the hint in situations where it might be indirectly deleted which
is safe, but somewhat conservative.
* Improve the optimisation of an always-matching OP_VALUE_GE to also take
effect when the value slot's lower bound is equal to the limit of the
OP_VALUE_GE. Patch from boda sadalla.
glass backend:
* Report the correct errno value if commit() fails. We were potentially
reporting ENOENT from an unlink() call cleaning up a temporary file prior to
throwing the exception instead.
documentation:
* Fix missing menus in API documentation. Newer doxygen generates .js files
which we also need to distribute and install. Reported by sec^nd on #xapian.
* Note OP_FILTER ignored subquery bug fixed in 1.4.15 as present in 1.4.14 and
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=93
- Update to 1.4.14:
* API:
+ Xapian::QueryParser: Handle "" inside a quoted phrase better. In a quoted
boolean term, "" is treated as an escaped ", so handle it in a compatible way
for quoted phrases. Previously we'd drop out of the phrase and start a new
phrase. Fixes#630, reported by Austin Clements.
+ Xapian::Stem: The constructor which takes a stemmer name now takes an
optional second bool parameter - if this is true, then an unknown stemmer
name falls back to using the "none" stemmer instead of throwing an exception.
This allows simply constructing a stemmer from an ISO language code without
having to worry about whether there's a stemmer for that language, and
without having to handle an exception if there isn't.
+ Xapian::Stem: Fix a bug with handling 4-byte UTF-8 sequences which
potentially affects most of the stemmers. None of the stemmers work in
languages where 4-byte UTF-8 sequences are part of the alphabet, but this
bug could result in invalid UTF-8 sequences in terms generated from text
containing high Unicode codepoints such as emoji, which can cause issues (for
example, in some language bindings). Fix synced from Snowball git post
2.0.0.
+ Xapian::Stem: Add a new is_none() method which tests if this is a "none"
stemmer.
+ Xapian::Weight: The total length of all documents is now made available to
Xapian::Weight subclasses, and this is now used by DLHWeight, DPHWeight and
LMWeight. To maintain ABI compatibility, internally this still fetches the
average length and the number of documents, multiplies them, then rounds the
result, but in the next release series this will be handled directly.
+ Xapian::Database::locked() on an inmemory database used to always return
false, but an inmemory Database is always actually a WritableDatabase
underneath, so now we always report true in this case because it's really
always report being locked for writing.
OBS-URL: https://build.opensuse.org/request/show/764595
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=89
- Update to 1.4.9:
* API:
+ Document::add_posting(): Fix bugs with the change in 1.4.8 to more
efficiently handle insertion of a batch of extra positions in ascending
order. These could lead to missing positions and corrupted encoded
positional data.
* remote backend:
+ Avoid hang if remote connection shutdown fails by not waiting for the
connection to close in this situation. Seems to fix occasional hangs seen on
macOS. Patch from Germán M. Bravo.
- Update to 1.4.8:
* API:
+ QueryParser,TermGenerator: Add new stemming mode STEM_SOME_FULL_POS.
This stores positional information for both stemmed and unstemmed terms,
allowing NEAR and ADJ to work with stemmed terms. The extra positional
information is likely to take up a significant amount of extra disk space so
the default STEM_SOME is likely to be a better choice for most users.
+ Database::check(): Fetch and decompress the document data to catch problems
with the splitting of large data into multiple entries, corruption of the
compressed data, etc. Also check that empty document data isn't explicitly
stored for glass.
+ Fix an incorrect type being used for term positions in the TermGenerator API.
These were Xapian::termcount but should be Xapian::termpos. Both are
typedefs for the same 32-bit unsigned integer type by default (almost always
"unsigned int") so this change is entirely compatible, except that if you
were configuring 1.4.7 or earlier with --enable-64bit-termcount you need to
also use the new --enable-64bit-termpos configure option with 1.4.8 and up or
rebuild your applications. This change was necessary to make
--enable-64bit-termpos actually useful.
+ Add Document::remove_postings() method which removes all postings in a
OBS-URL: https://build.opensuse.org/request/show/650355
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=87
* API:
+ Database::check(): Fix bogus error reports for documents with length
zero due to a new check added in 1.4.6 that the doclength was between the
stored upper and lower bounds, which failed to allow for the lower bound
ignoring documents with length zero (since documents indexed only by
boolean terms aren't involved in weighted searches).
+ Query: Use of Query::MatchAll in multithreaded code causes problems
because the reference counting gets messed up by concurrent updates.
Document that Query(string()) should be used instead of MatchAll in
multithreaded code, and avoid using it in library code.
* Stem:
+ Stemming algorithms added for Irish, Lithuanian, Nepali and Tamil.
+ Merge Snowball compiler changes which improve code generation.
+ Merge optimisations to the Arabic and Turkish stemmers.
* testsuite:
+ Fix duplicate test in apitest closedb10 testcase.
* See also https://xapian.org/docs/xapian-core-1.4.7/NEWS
OBS-URL: https://build.opensuse.org/request/show/644270
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=85
- Update to 1.4.6:
* API classes now support C++11 move semantics when using a compiler which
we are confident supports them (currently compilers which define
__cplusplus >= 201103 plus a special check for MSVC 2015 or later).
C++11 move semantics provide a clean and efficient way for threaded code to
hand-off Xapian objects to worker threads, but in this case it's very
unhelpful for availability of these semantics to vary by compiler as it
quietly leads to a build with non-threadsafe behaviour. To address this,
user code can #define XAPIAN_MOVE_SEMANTICS before #include <xapian.h> to
force this on, and will then get a compilation failure if the compiler
lacks suitable support.
* MSet::snippet():
+ We were only escaping output for HTML/XML in some cases, which would
potentially allow HTML to be injected into output (this fixes
bnc#1099925, CVE-2018-0499).
+ Include certain leading non-word characters in snippets. Previously we
started the snippet at the start of the first actual word, but there are
various cases where including non-word characters in front of the actual
word adds useful context or otherwise aids comprehension.
* Add MSetIterator::get_sort_key() method. The sort key has always been
available internally, but wasn't exposed via the public API before, which
seems like an oversight as the collapse key has long been available.
* Database::compact():
+ Allow Compactor::resolve_duplicate_metadata() implementations to delete
entries. Previously if an implementation returned an empty string this
would result in a user meta-data entry with an empty value, which isn't
normally achievable (empty meta-data values aren't stored), and so will
cause odd behaviour. We now handle an empty returned value by
interpreting it in the natural way - it means that the merged result is
to not set a value for that key in the output database.
OBS-URL: https://build.opensuse.org/request/show/620422
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=80
- Update to 1.4.5:
* Add Database::get_total_length() method. Previously you had to calculate
this from get_avlength() and get_doccount(), taking into account rounding
issues. But even then you couldn't reliably get the exact value when total
length is large since a double's mantissa has more limited precision than an
unsigned long long.
* Add Xapian::iterator_rewound() for bidirectional iterators, to test if the
iterator is at the start (useful for testing whether we're done when
iterating backwards).
* DatabaseOpeningError exceptions now provide errno via get_error_string()
rather than turning it into a string and including it in the exception
message.
* WritableDatabase::replace_document(): when passed a Document object which
came from a database and has unmodified values, we used to always read
those values into a memory structure. Now we only do this if the document
is being replaced to the same document ID which it came from, which should
make other cases a bit more efficient.
* Enquire::get_eset(): When approximating term frequencies we now round to the
nearest integer - previously we always rounded down.
* See also https://xapian.org/docs/xapian-core-1.4.5/NEWS
OBS-URL: https://build.opensuse.org/request/show/557120
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=78
- Update to 1.4.4:
* Database::check():
+ Fix checking a single table - changes in 1.4.2 broke such checks unless
you specified the table without any extension.
+ Errors from failing to find the file specified are now thrown as
DatabaseOpeningError (was DatabaseError, of which DatabaseOpeningError is
a subclass so existing code should continue to work). Also improved the
error message when the file doesn't exist is better.
* Drop OP_SCALE_WEIGHT over OP_VALUE_RANGE, OP_VALUE_GE and OP_VALUE_LE in
the Query constructor. These operators always return weight 0 so
OP_SCALE_WEIGHT over them has no effect. Eliminating it at query
construction time is cheap (we only need to check the type of the
subquery), eliminates the confusing "0 * " from the query description,
and means the OP_SCALE_WEIGHT Query object can be released sooner.
Inspired by Shivanshu Chauhan asking about the query description on IRC.
* Drop OP_SCALE_WEIGHT on the right side of OP_AND_NOT in the Query
constructor. OP_AND_NOT takes no weight from the right so OP_SCALE_WEIGHT
has no effect there. Eliminating it at query construction time is cheap
(just need to check the subquery's type), eliminates the confusing "0 * "
from the query description, and means the OP_SCALE_WEIGHT object can be
released sooner.
* See also https://xapian.org/docs/xapian-core-1.4.4/NEWS
OBS-URL: https://build.opensuse.org/request/show/507409
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=76
- Update to 1.4.3:
* MSet::snippet(): Favour candidate snippets which contain more of a diversity
of matching terms by discounting the relevance of repeated terms using an
exponential decay. A snippet which contains more terms from the query is
likely to be better than one which contains the same term or terms multiple
times, but a repeated term is still interesting, just less with each
additional appearance. Diversity issue highlighted by Robert Stepanek's
patch in https://github.com/xapian/xapian/pull/117 - testcases taken from his
patch.
* MSet::snippet(): New flag SNIPPET_EMPTY_WITHOUT_MATCH to get an empty snippet
if there are no matches in the text passed in. Implemented by Robert
Stepanek.
* Round MSet::get_matches_estimated() to an appropriate number of significant
figures. The algorithm used looks at the lower and upper bound and where the
estimate sits between them, and then picks an appropriate number of
significant figures. Thanks to Sébastien Le Callonnec for help sorting out a
portability issue on OS X.
* Add Database::locked() method - where possible this non-invasively checks if
the database is currently open for writing, which can be useful for
dashboards and other status reporting tools.
* See also https://xapian.org/docs/xapian-core-1.4.3/NEWS
- Update to 1.4.2:
* Add XAPIAN_AT_LEAST(A,B,C) macro.
* MSet::snippet(): Optimise snippet generation - it's now ~46% faster in a
simple test.
* Add Xapian::DOC_ASSUME_VALID flag which tells Database::get_document() that
it doesn't need to check that the passed docid is valid. Fixes#739,
reported by Germán M. Bravo.
OBS-URL: https://build.opensuse.org/request/show/453942
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=74
- Update to 1.4.1
* Constructing a Query for a non-reference counted PostingSource object will
now try to clone the PostingSource object (as happened in 1.3.4 and
earlier). This clone code was removed as part of the changes in 1.3.5 to
support optional reference counting of PostingSource objects, but that breaks
the case when the PostingSource object is on the stack and goes out of scope
before the Query object is used. Issue reported by Till Schäfer and analysed
by Daniel Vrátil in a bug report against Akonadi:
https://bugs.kde.org/show_bug.cgi?id=363741
* Add BM25PlusWeight class implementing the BM25+ weighting scheme, implemented
by Vivek Pal (https://github.com/xapian/xapian/pull/104).
* Add PL2PlusWeight class implementing the PL2+ weighting scheme, implemented
by Vivek Pal (https://github.com/xapian/xapian/pull/108).
* LMWeight: Implement Dir+ weighting scheme as DIRICHLET_PLUS_SMOOTHING.
Patch from Vivek Pal.
* Add CoordWeight class implementing coordinate matching. This can be useful
for specialised uses - e.g. to implement sorting by the number of matching
filters.
* DLHWeight,DPHWeight,PL2Weight: With these weighting schemes, the formulae
can give a negative weight contribution for a term in extreme cases. We
used to try to handle this by calculating a per-term lower bound on the
contribution and subtracting this from the contribution, but this idea
is fundamentally flawed as the total offset it adds to a document depends on
what combination of terms that document matches, meaning in general the
offset isn't the same for every matching document. So instead we now clamp
each term's weight contribution to be >= 0.
* TfIdfWeight: Always scale term weight by wqf - this seems the logical
approach as it matches the weighting we'd get if we weighted every non-unique
term in the query, as well as being explicit in the Piv+ formula.
OBS-URL: https://build.opensuse.org/request/show/439920
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=72
- update to 1.2.8:
* Add support to TermGenerator and QueryParser for indexing and searching CJK
text using n-grams. Currently this is only enabled when the environmental
variable XAPIAN_CJK_NGRAM is set to a non-empty value.
* overview.html,quickstart.html: Fix several factual errors.
* Improve documentation comments for several methods.
* Add documentation for function parameters which didn't have it.
OBS-URL: https://build.opensuse.org/request/show/98786
OBS-URL: https://build.opensuse.org/package/show/server:search/xapian-core?expand=0&rev=42