154 Commits

Author SHA1 Message Date
James Haggerty
71ea08d5fa gregex: clean up usage of _GRegex.jit_status 2023-12-18 11:23:56 +00:00
Philip Withnall
d07c59ed4e glib: Add (scope call) to a load of sort/equal callbacks
This fixes a load of g-ir-scanner warnings.

Signed-off-by: Philip Withnall <pwithnall@gnome.org>

Helps: #3037
2023-11-29 11:59:47 +00:00
Philip Withnall
2c4a0c83e1 docs: Drop the regex-syntax page
Point people to the official PCRE documentation instead, which is going
to be more up to date. This saves us periodically having to copy in and
reformat the PCRE documentation.

Signed-off-by: Philip Withnall <pwithnall@gnome.org>

Helps: #3037
2023-11-28 13:52:05 +00:00
Matthias Clasen
a47bdb2638 docs: Move GRegex SECTION
Move the contents to the struct docs.

Helps: #3037
2023-10-11 17:38:31 +01:00
Aleksei Rybalkin
5921ea112d gregex: if JIT stack limit is reached, fall back to interpretive matching
Helps: #2824
2023-08-21 10:39:27 +00:00
Aleksei Rybalkin
c3ff5b8eb3 gregex: set default max stack size for PCRE2 JIT compiler to 512KiB
Previous default used was 32KiB (the library default) which caused some
complex patterns to fail, see #2824. The memory will not be allocated
unless used.
2023-08-14 20:43:15 +02:00
Aleksei Rybalkin
842a105464 gregex: remove redundant call to enable_jit_with_match_options
There is no point to enable jit in g_regex_new, since JIT will be only
used when we do a first match, and at that point
enable_jit_with_match_options will be called again already and will
update the options set in g_regex_new. Instead just run it at first
match for the first time, to the same end result.
2023-08-14 20:32:48 +02:00
Bart Jacobs
2f0ec59c4a g_regex_escape_string: bad GIR: utf8[] -> utf8 2023-01-26 10:18:05 +00:00
Philip Withnall
fdd2d706b2 gregex: Prevent invalid memory access for unmatched subpatterns
Based on a test by Emmanuel Pacaud.

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>

Fixes: #2881
2023-01-11 12:14:04 +00:00
Philip Withnall
4fca3bba8f gregex: Remove an unreachable return statement
Spotted by Coverity.

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>

Coverity CID: #1497916
2022-10-18 15:05:30 +01:00
Philip Withnall
0cdbc530ca Merge branch 'regex-errors-msg-cleanups' into 'main'
gregex: Use pcre2 error messages if we don't provide a specific one

See merge request GNOME/glib!2913
2022-10-12 13:48:29 +00:00
Guido Günther
664ee9ca6a gregex: Drop explanation G_REGEX_JAVASCRIPT_COMPAT
It's not supported as of glib 2.74
2022-09-27 13:52:05 +02:00
Guido Günther
a164b49532 gregex: Allow G_REGEX_JAVASCRIPT_COMPAT in compile mask for g_regex_new
The flag is still ignored but this way we properly deprecate
at compile time without raising an unexpected criticals at runtime:

   g_regex_new: assertion '(compile_options & ~G_REGEX_COMPILE_MASK) == 0' failed

and then failing to create the regex completely.

Fixes 8d5a44dc8 ("replace pcre1 with pcre2")
2022-09-27 13:52:05 +02:00
Marco Trevisan (Treviño)
0f869ec5c6 regex: Use critical messages if an unexpected NULL parameter is provided
As programmer error we should be consistent in using criticals.
2022-09-21 13:48:18 +02:00
Marco Trevisan (Treviño)
6caf952e48 gregex: Use pcre2 error messages if we don't provide a specific one
In case we got a compilation or match error we should try to provide
some useful error message, if possible, before returning a quite obscure
"internal error" or "unknown error" string.

So rely on PCRE2 strings even if they're not translated they can provide
better information than the ones we're currently giving.

Related to: https://gitlab.gnome.org/GNOME/glib/-/issues/2691
Related to: https://gitlab.gnome.org/GNOME/glib/-/issues/2760
2022-09-21 13:47:56 +02:00
Marco Trevisan (Treviño)
bec68b2d74 glib/regex: Do not use JIT when using unsupported match options
Do not store jit status for regex unless during initial compilation.
After that, decide whether to use it depending on matching options.

In fact there are some matching options that are incompatible with JIT,
as the PCRE2 docs states:

  Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not
  supported by the just-in-time (JIT) compiler. If it is set, JIT
  matching is disabled and the interpretive code in pcre2_match() is
  run. Apart from PCRE2_NO_JIT (obviously), the remaining options are
  supported for JIT matching.

Fixes: GNOME/gtksourceview#283
2022-09-12 14:08:13 +02:00
Marco Trevisan (Treviño)
5e76cde5ff regex: Handle JIT errors more explicitly 2022-09-12 13:55:39 +02:00
Marco Trevisan (Treviño)
fe1c2628d5 regex: Avoid allocating offsets until we've a match
There's no much point of pre-allocating offsets given that we're doing
this when needed if only have matches to store.

So let's just allocate the spaces for the dummy offset we depend on,
while allocate the others on demand.
2022-09-12 13:55:39 +02:00
Marco Trevisan (Treviño)
e8628a7ed5 regex: Compute the offsets size based on match results
While the ovector count would include all the allocated space, we only
care about the actual match values, so avoid wasting allocations and
just use the ones we need to hold the offsets.
2022-09-12 13:55:39 +02:00
Marco Trevisan (Treviño)
aee84cb45c gregex: Avoid re-allocating if we have no size change
This is handled by the syscall underneath, but we can just avoid a call
cheaply.
2022-09-12 13:55:39 +02:00
Marco Trevisan (Treviño)
11521972f4 gregex: Handle the case we need to re-allocate the match data
In case PCRE2 returns an empty match

This can be easily tested by initializing the initial match data to a
value that is less than the expected match values (e.g. by calling
pcre2_match_data_create (1, NULL)), but we can't do it in our tests
without bigger changes.
2022-09-12 13:55:39 +02:00
Marco Trevisan (Treviño)
1d628dac92 regex: Use size types more in line with PCRE2 returned values
We're using int for every size value while PCRE uses uint_32t or
PCRE2_SIZE (size_t in most platforms), let's use the same types to avoid
using different signs.
2022-09-12 13:55:39 +02:00
Marco Trevisan (Treviño)
13ad4296ea gregex: Fix a potential PCRE2 code leak on reallocation failures
In case recalc_match_offsets() failed we were just returning, but in
such case, per the documentation we should still set the match_info (if
provided) and free the pcre2 code instance.

So let's just break the loop we're in it, as if we we've no matches set.
This also avoids re-allocating the offsets array and potentially
accessing to unset data.
2022-09-12 13:55:39 +02:00
Marco Trevisan (Treviño)
1f88976610 gregex: Do not try access the undefined match offsets if we have no match
In case we're getting NO-MATCH "errors", we were still recomputing the
match offsets and taking decisions based on that, that might lead to
undefined behavior.

Avoid this by just returning early a FALSE result (but with no error) in
case there's no result to proceed on.

Fixes: #2741
2022-09-12 13:55:39 +02:00
Marco Trevisan (Treviño)
1185a1304a gregex: Mark g_match_info_get_regex as transfer none
Since it had no explicit annotation, g-i was defaulting to transfer-full
while in this case the GRegex is owned by the GMatchInfo.
2022-09-12 13:55:39 +02:00
Marco Trevisan (Treviño)
d639c4ec00 regex: Do not mix PCRE2 Compile, Match, Newline and BSR flags
As per the PCRE2 port we still used to try to map the old GRegex flags
(PCRE1 based) with the new PCRE2 ones, but doing that we were also
mixing flags with enums, leading to unexpected behaviors when trying to
get new line and BSR options out of bigger flags arrays.

So, avoid doing any mapping and store the values as native PCRE2 flags
internally and converting them back only when requested.

This fixes some regressions on newline handling.

Fixes: #2729
Fixes: #2688
Fixes: GNOME/gtksourceview#278
2022-09-12 13:55:39 +02:00
Mamoru TASAKA
710ccee65c gregex: use correct size for pcre2_pattern_info
man pcre2_pattern_info says that the 3rd argument must
point to uint32_t variable (except for some 2nd argument value),
so correctly use it. Especially using wrong size can cause
unexpected result on big endian.

closes: #2699
2022-07-26 21:51:45 +09:00
Aleksei Rybalkin
6535c77b00 gregex: do not set match and recursion limits on match context
These are not really necessary, and cause breakages (e.g. #2700).
pcre2_set_recursion_limit is also deprecated.

Fixes: #2700
2022-07-25 16:48:03 +02:00
Aleksei Rybalkin
2c2e059cd3 gregex: use g_debug instead of g_warning in case JIT is not available
In case JIT is not available in pcre2 we printed warning about it. This
warning broke tests on systems which don't have JIT support in pcre2
(e.g. macos).
2022-07-22 20:29:07 +02:00
Aleksei Rybalkin
bcd8cb3e14 gregex: use G_REGEX_OPTIMIZE flag to enable JIT compilation
Since we ported gregex to pcre2, the JIT compiler is now available to be
used. Let's undeprecate G_REGEX_OPTIMIZE flag to control whether the JIT
compilation is requested, since using JIT is itself an optimization.
See [1] for details on its implementation in pcre2.

[1] http://pcre.org/current/doc/html/pcre2jit.html

Fixes: #566
2022-07-20 20:48:17 +00:00
Philip Withnall
34e5bb8b43 Merge branch 'gregex-match-info-leak-fix' into 'main'
gregex: Free match info if offset matching recalc failed

See merge request GNOME/glib!2827
2022-07-20 13:56:23 +00:00
Marco Trevisan (Treviño)
6c93ac876f gregex: Free match info if offset matching recalc failed
It's not probably ever happening in practice, but coverity found it and
it's easy enough to fix it.

Coverity CID: #1490730
2022-07-20 06:32:30 +02:00
Marco Trevisan (Treviño)
c05d09044f gregex: Ensure we translate the errcode without asserting on G_REGEX_ERROR_COMPILE
Since commit 8d5a44dc in order to ensure that we were setting the errcode in
translate_compile_error(), we did an assert checking whether it was a
valid value, but we assumed that 0 was not a valid error, while it is as
it's the generic G_REGEX_ERROR_COMPILE.

So, set errcode and errmsg to invalid values before translating and
ensure we've change them.

Fixes: #2694
2022-07-15 01:46:11 +02:00
Aleksei Rybalkin
5cd94a0982 gregex: use %s format specifier for localized error message 2022-07-14 13:14:31 +00:00
Aleksei Rybalkin
8d5a44dc8f replace pcre1 with pcre2 2022-07-12 11:46:34 +00:00
Simon McVittie
879b9cd669 gregex: Add G_REGEX_DEFAULT, G_REGEX_MATCH_DEFAULT
Signed-off-by: Simon McVittie <smcv@collabora.com>
2022-06-23 10:47:39 +01:00
Philip Withnall
70ee43f1e9 glib: Add SPDX license headers automatically
Add SPDX license (but not copyright) headers to all files which follow a
certain pattern in their existing non-machine-readable header comment.

This commit was entirely generated using the command:
```
git ls-files glib/*.[ch] | xargs perl -0777 -pi -e 's/\n \*\n \* This library is free software; you can redistribute it and\/or\n \* modify it under the terms of the GNU Lesser General Public/\n \*\n \* SPDX-License-Identifier: LGPL-2.1-or-later\n \*\n \* This library is free software; you can redistribute it and\/or\n \* modify it under the terms of the GNU Lesser General Public/igs'
```

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>

Helps: #1415
2022-05-18 09:19:02 +01:00
Gabor Karsay
7e64004db0 docs: mark macros, flags, enums with percent sign 2022-03-04 16:21:55 +00:00
Philip Withnall
74595ab64a Merge branch 'wip/pwithnall/962-drop-embedded-pcre' into 'main'
pcre: Drop internal libpcre copy

Closes #962 and #642

See merge request GNOME/glib!2144
2021-06-21 14:07:45 +00:00
Philip Withnall
9fbd7f3dc1 build: Drop the internal_pcre option in favour of the subproject
This should maintain equivalent functionality, apart from that now you
have to pass `--force-fallback-for libpcre` to `meson configure` in
order to use the subproject; rather than specifying
`-Dinternal_pcre=true` to use the internal copy.

This also fixes #642, as the wrapdb copy of libpcre is version 8.37.

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>

Helps: #962
Fixes: #642
2021-06-16 16:45:10 +01:00
Philip Withnall
b052620398 gregex: Fix return from g_match_info_fetch() for unmatched subpatterns
If there were more subpatterns in the regex than matches (which can
happen if one or more of the subpatterns are optional),
`g_match_info_fetch()` was erroneously returning `NULL` rather than the
empty string. It should only return `NULL` when the `match_num`
specifies a subpattern which doesn’t exist in the regex.

This is complicated slightly by the fact that when using
`g_regex_match_all()`, more matches can be returned than there are
subpatterns, due to one or more subpatterns matching multiple times at
different offsets in the string.

This includes a fix for a unit test which was erroneously checking the
broken behaviour.

Thanks to Allison Karlitskaya for the minimal reproducer.

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>

Fixes: #229
2021-06-09 14:39:20 +01:00
Philip Withnall
1314ff93fc glib: Drop unnecessary volatile qualifiers from internal variables
These variables were already (correctly) accessed atomically. The
`volatile` qualifier doesn’t help with that.

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>

Helps: #600
2020-11-20 14:40:19 +00:00
Philip Withnall
00bfb3ab44 tree: Fix various typos and outdated terminology
This was mostly machine generated with the following command:
```
codespell \
    --builtin clear,rare,usage \
    --skip './po/*' --skip './.git/*' --skip './NEWS*' \
    --write-changes .
```
using the latest git version of `codespell` as per [these
instructions](https://github.com/codespell-project/codespell#user-content-updating).

Then I manually checked each change using `git add -p`, made a few
manual fixups and dropped a load of incorrect changes.

There are still some outdated or loaded terms used in GLib, mostly to do
with git branch terminology. They will need to be changed later as part
of a wider migration of git terminology.

If I’ve missed anything, please file an issue!

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-06-12 15:01:08 +01:00
Дилян Палаузов
512655aa12 minor typos in the documentation (a/an) 2019-08-24 19:14:05 +00:00
Benjamin Otte
3aff811d13 Use G_GNUC_FALLTHROUGH where appropriate 2018-09-04 20:24:25 +02:00
Philip Withnall
0adbeacd01 gregex: Highlight in the docs that input must be in UTF-8
Signed-off-by: Philip Withnall <withnall@endlessm.com>

https://bugzilla.gnome.org/show_bug.cgi?id=748620
2018-05-08 12:27:55 +01:00
Philip Withnall
f75624f593 gregex: Highlight some argument names in the documentation
Signed-off-by: Philip Withnall <withnall@endlessm.com>

https://bugzilla.gnome.org/show_bug.cgi?id=748620
2018-05-08 12:27:55 +01:00
Philip Withnall
fe35f577b0 gregex: Clarify units in documentation
Make it a bit clearer that all lengths passed to GRegex methods are in
bytes (not characters). This is mentioned in the section overview, but
who reads that?

Signed-off-by: Philip Withnall <withnall@endlessm.com>

https://bugzilla.gnome.org/show_bug.cgi?id=748620
2018-05-08 12:27:55 +01:00
Volker Sobek
c7dc81ce78 docs: Escape some backslashes for markdown
These no longer showed up correctly in the documentation.

https://bugzilla.gnome.org/show_bug.cgi?id=727346
2017-10-05 15:07:09 +01:00
Philip Withnall
2c35acff7b gregex: Fix an assignment-after-free error
The match_info is freed just above this line, so this would result in a
write to freed memory.

Spotted by Leslie Zhai <xiangzhai83@gmail.com>.

https://bugzilla.gnome.org/show_bug.cgi?id=777077
2017-01-12 09:04:39 +00:00