Point people to the official PCRE documentation instead, which is going
to be more up to date. This saves us periodically having to copy in and
reformat the PCRE documentation.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: #3037
Previous default used was 32KiB (the library default) which caused some
complex patterns to fail, see #2824. The memory will not be allocated
unless used.
There is no point to enable jit in g_regex_new, since JIT will be only
used when we do a first match, and at that point
enable_jit_with_match_options will be called again already and will
update the options set in g_regex_new. Instead just run it at first
match for the first time, to the same end result.
The flag is still ignored but this way we properly deprecate
at compile time without raising an unexpected criticals at runtime:
g_regex_new: assertion '(compile_options & ~G_REGEX_COMPILE_MASK) == 0' failed
and then failing to create the regex completely.
Fixes 8d5a44dc8 ("replace pcre1 with pcre2")
In case we got a compilation or match error we should try to provide
some useful error message, if possible, before returning a quite obscure
"internal error" or "unknown error" string.
So rely on PCRE2 strings even if they're not translated they can provide
better information than the ones we're currently giving.
Related to: https://gitlab.gnome.org/GNOME/glib/-/issues/2691
Related to: https://gitlab.gnome.org/GNOME/glib/-/issues/2760
Do not store jit status for regex unless during initial compilation.
After that, decide whether to use it depending on matching options.
In fact there are some matching options that are incompatible with JIT,
as the PCRE2 docs states:
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not
supported by the just-in-time (JIT) compiler. If it is set, JIT
matching is disabled and the interpretive code in pcre2_match() is
run. Apart from PCRE2_NO_JIT (obviously), the remaining options are
supported for JIT matching.
Fixes: GNOME/gtksourceview#283
There's no much point of pre-allocating offsets given that we're doing
this when needed if only have matches to store.
So let's just allocate the spaces for the dummy offset we depend on,
while allocate the others on demand.
While the ovector count would include all the allocated space, we only
care about the actual match values, so avoid wasting allocations and
just use the ones we need to hold the offsets.
In case PCRE2 returns an empty match
This can be easily tested by initializing the initial match data to a
value that is less than the expected match values (e.g. by calling
pcre2_match_data_create (1, NULL)), but we can't do it in our tests
without bigger changes.
We're using int for every size value while PCRE uses uint_32t or
PCRE2_SIZE (size_t in most platforms), let's use the same types to avoid
using different signs.
In case recalc_match_offsets() failed we were just returning, but in
such case, per the documentation we should still set the match_info (if
provided) and free the pcre2 code instance.
So let's just break the loop we're in it, as if we we've no matches set.
This also avoids re-allocating the offsets array and potentially
accessing to unset data.
In case we're getting NO-MATCH "errors", we were still recomputing the
match offsets and taking decisions based on that, that might lead to
undefined behavior.
Avoid this by just returning early a FALSE result (but with no error) in
case there's no result to proceed on.
Fixes: #2741
As per the PCRE2 port we still used to try to map the old GRegex flags
(PCRE1 based) with the new PCRE2 ones, but doing that we were also
mixing flags with enums, leading to unexpected behaviors when trying to
get new line and BSR options out of bigger flags arrays.
So, avoid doing any mapping and store the values as native PCRE2 flags
internally and converting them back only when requested.
This fixes some regressions on newline handling.
Fixes: #2729Fixes: #2688Fixes: GNOME/gtksourceview#278
man pcre2_pattern_info says that the 3rd argument must
point to uint32_t variable (except for some 2nd argument value),
so correctly use it. Especially using wrong size can cause
unexpected result on big endian.
closes: #2699
In case JIT is not available in pcre2 we printed warning about it. This
warning broke tests on systems which don't have JIT support in pcre2
(e.g. macos).
Since we ported gregex to pcre2, the JIT compiler is now available to be
used. Let's undeprecate G_REGEX_OPTIMIZE flag to control whether the JIT
compilation is requested, since using JIT is itself an optimization.
See [1] for details on its implementation in pcre2.
[1] http://pcre.org/current/doc/html/pcre2jit.htmlFixes: #566
Since commit 8d5a44dc in order to ensure that we were setting the errcode in
translate_compile_error(), we did an assert checking whether it was a
valid value, but we assumed that 0 was not a valid error, while it is as
it's the generic G_REGEX_ERROR_COMPILE.
So, set errcode and errmsg to invalid values before translating and
ensure we've change them.
Fixes: #2694
Add SPDX license (but not copyright) headers to all files which follow a
certain pattern in their existing non-machine-readable header comment.
This commit was entirely generated using the command:
```
git ls-files glib/*.[ch] | xargs perl -0777 -pi -e 's/\n \*\n \* This library is free software; you can redistribute it and\/or\n \* modify it under the terms of the GNU Lesser General Public/\n \*\n \* SPDX-License-Identifier: LGPL-2.1-or-later\n \*\n \* This library is free software; you can redistribute it and\/or\n \* modify it under the terms of the GNU Lesser General Public/igs'
```
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Helps: #1415
This should maintain equivalent functionality, apart from that now you
have to pass `--force-fallback-for libpcre` to `meson configure` in
order to use the subproject; rather than specifying
`-Dinternal_pcre=true` to use the internal copy.
This also fixes#642, as the wrapdb copy of libpcre is version 8.37.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Helps: #962Fixes: #642
If there were more subpatterns in the regex than matches (which can
happen if one or more of the subpatterns are optional),
`g_match_info_fetch()` was erroneously returning `NULL` rather than the
empty string. It should only return `NULL` when the `match_num`
specifies a subpattern which doesn’t exist in the regex.
This is complicated slightly by the fact that when using
`g_regex_match_all()`, more matches can be returned than there are
subpatterns, due to one or more subpatterns matching multiple times at
different offsets in the string.
This includes a fix for a unit test which was erroneously checking the
broken behaviour.
Thanks to Allison Karlitskaya for the minimal reproducer.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Fixes: #229
These variables were already (correctly) accessed atomically. The
`volatile` qualifier doesn’t help with that.
Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Helps: #600
This was mostly machine generated with the following command:
```
codespell \
--builtin clear,rare,usage \
--skip './po/*' --skip './.git/*' --skip './NEWS*' \
--write-changes .
```
using the latest git version of `codespell` as per [these
instructions](https://github.com/codespell-project/codespell#user-content-updating).
Then I manually checked each change using `git add -p`, made a few
manual fixups and dropped a load of incorrect changes.
There are still some outdated or loaded terms used in GLib, mostly to do
with git branch terminology. They will need to be changed later as part
of a wider migration of git terminology.
If I’ve missed anything, please file an issue!
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Make it a bit clearer that all lengths passed to GRegex methods are in
bytes (not characters). This is mentioned in the section overview, but
who reads that?
Signed-off-by: Philip Withnall <withnall@endlessm.com>
https://bugzilla.gnome.org/show_bug.cgi?id=748620