add UTF-8 support.

* glib/gpattern.c: add UTF-8 support.

        * tests/patterntest.c: add UTF-8 and equality tests.

        * docs/reference/glib/Makefile.am (MKDB_OPTIONS): Add --sgml-mode.

        * docs/reference/glib/tmpl/patterns.sgml: Document UTF-8 support.
This commit is contained in:
Matthias Clasen
2001-11-14 22:22:34 +00:00
parent 2180c1ed05
commit a49a78a3b3
13 changed files with 206 additions and 121 deletions

View File

@@ -1,3 +1,9 @@
2001-11-14 Matthias Clasen <matthiasc@poet.de>
* glib/gpattern.c: add UTF-8 support.
* tests/patterntest.c: add UTF-8 and equality tests.
Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org> Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org>
* glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of * glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of

View File

@@ -1,3 +1,9 @@
2001-11-14 Matthias Clasen <matthiasc@poet.de>
* glib/gpattern.c: add UTF-8 support.
* tests/patterntest.c: add UTF-8 and equality tests.
Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org> Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org>
* glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of * glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of

View File

@@ -1,3 +1,9 @@
2001-11-14 Matthias Clasen <matthiasc@poet.de>
* glib/gpattern.c: add UTF-8 support.
* tests/patterntest.c: add UTF-8 and equality tests.
Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org> Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org>
* glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of * glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of

View File

@@ -1,3 +1,9 @@
2001-11-14 Matthias Clasen <matthiasc@poet.de>
* glib/gpattern.c: add UTF-8 support.
* tests/patterntest.c: add UTF-8 and equality tests.
Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org> Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org>
* glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of * glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of

View File

@@ -1,3 +1,9 @@
2001-11-14 Matthias Clasen <matthiasc@poet.de>
* glib/gpattern.c: add UTF-8 support.
* tests/patterntest.c: add UTF-8 and equality tests.
Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org> Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org>
* glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of * glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of

View File

@@ -1,3 +1,9 @@
2001-11-14 Matthias Clasen <matthiasc@poet.de>
* glib/gpattern.c: add UTF-8 support.
* tests/patterntest.c: add UTF-8 and equality tests.
Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org> Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org>
* glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of * glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of

View File

@@ -1,3 +1,9 @@
2001-11-14 Matthias Clasen <matthiasc@poet.de>
* glib/gpattern.c: add UTF-8 support.
* tests/patterntest.c: add UTF-8 and equality tests.
Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org> Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org>
* glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of * glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of

View File

@@ -1,3 +1,9 @@
2001-11-14 Matthias Clasen <matthiasc@poet.de>
* glib/gpattern.c: add UTF-8 support.
* tests/patterntest.c: add UTF-8 and equality tests.
Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org> Wed Nov 14 07:34:24 2001 Tim Janik <timj@gtk.org>
* glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of * glib/galloca.h (g_newa): provide g_newa(ctype, count) on top of

View File

@@ -1,10 +1,16 @@
2001-11-14 Matthias Clasen <matthiasc@poet.de>
* glib/Makefile.am (MKDB_OPTIONS): Add --sgml-mode.
* glib/tmpl/patterns.sgml: Document UTF-8 support.
Wed Nov 14 03:19:49 2001 Tim Janik <timj@gtk.org> Wed Nov 14 03:19:49 2001 Tim Janik <timj@gtk.org>
* gobject/tmp/param_value_types.sgml: more docs. * gobject/tmpl/param_value_types.sgml: more docs.
Tue Nov 13 21:31:58 2001 Tim Janik <timj@gtk.org> Tue Nov 13 21:31:58 2001 Tim Janik <timj@gtk.org>
* gobject/tmp/param_value_types.sgml: list parameter and * gobject/tmpl/param_value_types.sgml: list parameter and
value types. value types.
* gobject/tmpl/gparamspec.sgml: more docs for g_param_spec*() * gobject/tmpl/gparamspec.sgml: more docs for g_param_spec*()

View File

@@ -13,7 +13,7 @@ DOC_SOURCE_DIR=../../..
SCAN_OPTIONS=--deprecated-guards="G_DISABLE_DEPRECATED" SCAN_OPTIONS=--deprecated-guards="G_DISABLE_DEPRECATED"
# Extra options to supply to gtkdoc-mkdb # Extra options to supply to gtkdoc-mkdb
MKDB_OPTIONS= MKDB_OPTIONS=--sgml-mode
# Extra options to supply to gtkdoc-fixref # Extra options to supply to gtkdoc-fixref
FIXXREF_OPTIONS= FIXXREF_OPTIONS=

View File

@@ -19,26 +19,16 @@ Note that in contrast to <function>glob()</function>, the '/' character
be escaped to include them literally in a pattern. be escaped to include them literally in a pattern.
</para> </para>
<para> <para>
The pattern matcher is restricted to ASCII and will not work correctly with
multibyte UTF-8 characters in the pattern or in the string to match.
</para>
<para>
When multiple strings must be matched against the same pattern, it When multiple strings must be matched against the same pattern, it
is better to compile the pattern to a #GPatternSpec using is better to compile the pattern to a #GPatternSpec using
g_pattern_spec_new() and use g_pattern_match_string() instead of g_pattern_spec_new() and use g_pattern_match_string() instead of
g_pattern_match_simple(). This avoids the overhead of repeated g_pattern_match_simple(). This avoids the overhead of repeated
pattern compilation. pattern compilation.
</para>
<!-- ##### SECTION See_Also ##### -->
<para>
g_strreverse()
</para>
<!-- ##### STRUCT GPatternSpec ##### --> <!-- ##### STRUCT GPatternSpec ##### -->
<para> <para>
A <structname>GPatternSpec</structname> is the 'compiled' form of a pattern. A <structname>GPatternSpec</structname> is the 'compiled' form of a pattern.
This structure is opaque and its fields and cannot be accessed. This structure is opaque and its fields cannot be accessed directly.
</para> </para>
@@ -47,7 +37,7 @@ This structure is opaque and its fields and cannot be accessed.
Compiles a pattern to a #GPatternSpec. Compiles a pattern to a #GPatternSpec.
</para> </para>
@pattern: a zero terminated string. @pattern: a zero-terminated UTF-8 encoded string.
@Returns: a newly-allocated #GPatternSpec. @Returns: a newly-allocated #GPatternSpec.
@@ -77,16 +67,24 @@ string given is mandatory. The reversed string can be omitted by passing %NULL,
this is more efficient if the reversed version of the string to be matched is this is more efficient if the reversed version of the string to be matched is
not at hand, as g_pattern_match() will only construct it if the compiled pattern not at hand, as g_pattern_match() will only construct it if the compiled pattern
requires reverse matches. requires reverse matches.
Note that, if the user code will (possibly) match a string against a multitude of </para>
patterns containing wildcards, chances are high that some patterns will require <para>
a reversed string. In this case, it's more efficient to provide the reversed Note that, if the user code will (possibly) match a string against a multitude
string to avoid multiple constructions thereof in the various calls to of patterns containing wildcards, chances are high that some patterns will
require a reversed string. In this case, it's more efficient to provide the
reversed string to avoid multiple constructions thereof in the various calls to
g_pattern_match(). g_pattern_match().
</para> </para>
<para>
Note also that the reverse of a UTF-8 encoded string can in general
<emphasis>not</emphasis> be obtained by <function>g_strreverse()</function>.
This works only if the string doesn't contain any multibyte characters.
Glib doesn't currently offer a function to reverse UTF-8 encoded strings.
</para>
@pspec: a #GPatternSpec. @pspec: a #GPatternSpec.
@string_length: the length of @string. @string_length: the length of @string.
@string: the string to match. @string: the UTF-8 encoded string to match.
@string_reversed: the reverse of @string or %NULL. @string_reversed: the reverse of @string or %NULL.
@Returns: %TRUE if @string matches @pspec. @Returns: %TRUE if @string matches @pspec.
@@ -99,7 +97,7 @@ g_pattern_match() instead while supplying the reversed string.
</para> </para>
@pspec: a #GPatternSpec. @pspec: a #GPatternSpec.
@string: the string to match. @string: the UTF-8 encoded string to match.
@Returns: %TRUE if @string matches @pspec. @Returns: %TRUE if @string matches @pspec.
@@ -111,8 +109,8 @@ the pattern once with g_pattern_spec_new() and call g_pattern_match_string()
repetively. repetively.
</para> </para>
@pattern: the pattern. @pattern: the UTF-8 encoded pattern.
@string: the string to match. @string: the UTF-8 encoded string to match.
@Returns: %TRUE if @string matches @pspec. @Returns: %TRUE if @string matches @pspec.

View File

@@ -21,7 +21,8 @@
#include "gmacros.h" #include "gmacros.h"
#include "gmessages.h" #include "gmessages.h"
#include "gmem.h" #include "gmem.h"
#include "gutils.h" /* inline hassle */ #include "gunicode.h"
#include "gutils.h"
#include <string.h> #include <string.h>
/* keep enum and structure of gpattern.c and patterntest.c in sync */ /* keep enum and structure of gpattern.c and patterntest.c in sync */
@@ -34,44 +35,51 @@ typedef enum
G_MATCH_EXACT, /* "AAAAA" */ G_MATCH_EXACT, /* "AAAAA" */
G_MATCH_LAST G_MATCH_LAST
} GMatchType; } GMatchType;
struct _GPatternSpec struct _GPatternSpec
{ {
GMatchType match_type; GMatchType match_type;
guint pattern_length; guint pattern_length;
guint min_length;
gchar *pattern; gchar *pattern;
}; };
/* --- functions --- */ /* --- functions --- */
static inline void /**
instring_reverse (guint length, * g_utf8_reverse:
gchar *str) * string: a UTF-8 string.
*
* Reverses a UTF-8 string. The @string must be valid UTF-8 encoded text.
* (Use g_utf8_validate() on all text before trying to use UTF-8
* utility functions with it.)
*
* Note that unlike g_strreverse(), this function returns
* newly-allocated memory, which should be freed with g_free() when
* no longer needed.
*
* Returns: a newly-allocated string which is the reverse of @string.
*/
static gchar *
g_utf8_reverse (guint len, const gchar *string)
{ {
gchar *f, *l, *b; gchar *result;
const gchar *p;
gchar *m, *r, skip;
f = str; result = g_new (gchar, len + 1);
l = str + length - 1; r = result + len;
b = str + length / 2; p = string;
while (f < b) while (*p)
{ {
gchar tmp = *l; skip = g_utf8_skip[*(guchar*)p];
r -= skip;
*l-- = *f; for (m = r; skip; skip--)
*f++ = tmp; *m++ = *p++;
} }
} result[len] = 0;
static inline gchar* return result;
strdup_reverse (guint length,
const gchar *str)
{
gchar *t, *dest = g_new (gchar, length + 1);
t = dest + length;
*t-- = 0;
while (t >= dest)
*t-- = *str++;
return dest;
} }
static inline gboolean static inline gboolean
@@ -93,7 +101,7 @@ g_pattern_ph_match (const gchar *match_pattern,
case '?': case '?':
if (!*string) if (!*string)
return FALSE; return FALSE;
string++; string = g_utf8_next_char (string);
break; break;
case '*': case '*':
@@ -105,7 +113,7 @@ g_pattern_ph_match (const gchar *match_pattern,
{ {
if (!*string) if (!*string)
return FALSE; return FALSE;
string++; string = g_utf8_next_char (string);
} }
} }
while (ch == '*' || ch == '?'); while (ch == '*' || ch == '?');
@@ -117,7 +125,7 @@ g_pattern_ph_match (const gchar *match_pattern,
{ {
if (!*string) if (!*string)
return FALSE; return FALSE;
string++; string = g_utf8_next_char (string);
} }
string++; string++;
if (g_pattern_ph_match (pattern, string)) if (g_pattern_ph_match (pattern, string))
@@ -150,10 +158,11 @@ g_pattern_match (GPatternSpec *pspec,
g_return_val_if_fail (pspec != NULL, FALSE); g_return_val_if_fail (pspec != NULL, FALSE);
g_return_val_if_fail (string != NULL, FALSE); g_return_val_if_fail (string != NULL, FALSE);
if (pspec->min_length > string_length)
return FALSE;
switch (pspec->match_type) switch (pspec->match_type)
{ {
gboolean result;
gchar *tmp;
case G_MATCH_ALL: case G_MATCH_ALL:
return g_pattern_ph_match (pspec->pattern, string); return g_pattern_ph_match (pspec->pattern, string);
case G_MATCH_ALL_TAIL: case G_MATCH_ALL_TAIL:
@@ -161,47 +170,23 @@ g_pattern_match (GPatternSpec *pspec,
return g_pattern_ph_match (pspec->pattern, string_reversed); return g_pattern_ph_match (pspec->pattern, string_reversed);
else else
{ {
tmp = strdup_reverse (string_length, string); gboolean result;
gchar *tmp;
tmp = g_utf8_reverse (string_length, string);
result = g_pattern_ph_match (pspec->pattern, tmp); result = g_pattern_ph_match (pspec->pattern, tmp);
g_free (tmp); g_free (tmp);
return result; return result;
} }
case G_MATCH_HEAD: case G_MATCH_HEAD:
if (pspec->pattern_length > string_length) if (pspec->pattern_length == string_length)
return FALSE;
else if (pspec->pattern_length == string_length)
return strcmp (pspec->pattern, string) == 0; return strcmp (pspec->pattern, string) == 0;
else if (pspec->pattern_length) else if (pspec->pattern_length)
return strncmp (pspec->pattern, string, pspec->pattern_length) == 0; return strncmp (pspec->pattern, string, pspec->pattern_length) == 0;
else else
return TRUE; return TRUE;
case G_MATCH_TAIL: case G_MATCH_TAIL:
if (pspec->pattern_length > string_length) if (pspec->pattern_length)
return FALSE; return strcmp (pspec->pattern, string + (string_length - pspec->pattern_length)) == 0;
else if (pspec->pattern_length == string_length)
{
if (string_reversed)
return strcmp (pspec->pattern, string_reversed) == 0;
else
{
tmp = strdup_reverse (string_length, string);
result = strcmp (pspec->pattern, tmp) == 0;
g_free (tmp);
return result;
}
}
else if (pspec->pattern_length)
{
if (string_reversed)
return strncmp (pspec->pattern, string_reversed, pspec->pattern_length) == 0;
else
{
tmp = strdup_reverse (string_length, string);
result = strncmp (pspec->pattern, tmp, pspec->pattern_length) == 0;
g_free (tmp);
return result;
}
}
else else
return TRUE; return TRUE;
case G_MATCH_EXACT: case G_MATCH_EXACT:
@@ -222,6 +207,7 @@ g_pattern_spec_new (const gchar *pattern)
gboolean seen_joker = FALSE, seen_wildcard = FALSE, more_wildcards = FALSE; gboolean seen_joker = FALSE, seen_wildcard = FALSE, more_wildcards = FALSE;
gint hw_pos = -1, tw_pos = -1, hj_pos = -1, tj_pos = -1; gint hw_pos = -1, tw_pos = -1, hj_pos = -1, tj_pos = -1;
gboolean follows_wildcard = FALSE; gboolean follows_wildcard = FALSE;
guint pending_jokers = 0;
const gchar *s; const gchar *s;
gchar *d; gchar *d;
guint i; guint i;
@@ -231,6 +217,7 @@ g_pattern_spec_new (const gchar *pattern)
/* canonicalize pattern and collect necessary stats */ /* canonicalize pattern and collect necessary stats */
pspec = g_new (GPatternSpec, 1); pspec = g_new (GPatternSpec, 1);
pspec->pattern_length = strlen (pattern); pspec->pattern_length = strlen (pattern);
pspec->min_length = 0;
pspec->pattern = g_new (gchar, pspec->pattern_length + 1); pspec->pattern = g_new (gchar, pspec->pattern_length + 1);
d = pspec->pattern; d = pspec->pattern;
for (i = 0, s = pattern; *s != 0; s++) for (i = 0, s = pattern; *s != 0; s++)
@@ -249,17 +236,29 @@ g_pattern_spec_new (const gchar *pattern)
tw_pos = i; tw_pos = i;
break; break;
case '?': case '?':
if (hj_pos < 0) pending_jokers++;
hj_pos = i; pspec->min_length++;
tj_pos = i; continue;
/* fall through */
default: default:
for (; pending_jokers; pending_jokers--, i++) {
*d++ = '?';
if (hj_pos < 0)
hj_pos = i;
tj_pos = i;
}
follows_wildcard = FALSE; follows_wildcard = FALSE;
pspec->min_length++;
break; break;
} }
*d++ = *s; *d++ = *s;
i++; i++;
} }
for (; pending_jokers; pending_jokers--) {
*d++ = '?';
if (hj_pos < 0)
hj_pos = i;
tj_pos = i;
}
*d++ = 0; *d++ = 0;
seen_joker = hj_pos >= 0; seen_joker = hj_pos >= 0;
seen_wildcard = hw_pos >= 0; seen_wildcard = hw_pos >= 0;
@@ -271,8 +270,8 @@ g_pattern_spec_new (const gchar *pattern)
if (pspec->pattern[0] == '*') if (pspec->pattern[0] == '*')
{ {
pspec->match_type = G_MATCH_TAIL; pspec->match_type = G_MATCH_TAIL;
instring_reverse (pspec->pattern_length, pspec->pattern); memmove (pspec->pattern, pspec->pattern + 1, --pspec->pattern_length);
pspec->pattern[--pspec->pattern_length] = 0; pspec->pattern[pspec->pattern_length] = 0;
return pspec; return pspec;
} }
if (pspec->pattern[pspec->pattern_length - 1] == '*') if (pspec->pattern[pspec->pattern_length - 1] == '*')
@@ -295,8 +294,11 @@ g_pattern_spec_new (const gchar *pattern)
pspec->match_type = tw_pos > hw_pos ? G_MATCH_ALL_TAIL : G_MATCH_ALL; pspec->match_type = tw_pos > hw_pos ? G_MATCH_ALL_TAIL : G_MATCH_ALL;
else /* seen_joker */ else /* seen_joker */
pspec->match_type = tj_pos > hj_pos ? G_MATCH_ALL_TAIL : G_MATCH_ALL; pspec->match_type = tj_pos > hj_pos ? G_MATCH_ALL_TAIL : G_MATCH_ALL;
if (pspec->match_type == G_MATCH_ALL_TAIL) if (pspec->match_type == G_MATCH_ALL_TAIL) {
instring_reverse (pspec->pattern_length, pspec->pattern); gchar *tmp = pspec->pattern;
pspec->pattern = g_utf8_reverse (pspec->pattern_length, pspec->pattern);
g_free (tmp);
}
return pspec; return pspec;
} }

View File

@@ -16,10 +16,9 @@
* Free Software Foundation, Inc., 59 Temple Place - Suite 330, * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 02111-1307, USA. * Boston, MA 02111-1307, USA.
*/ */
#include <string.h>
#include "glib.h" #include "glib.h"
#include "glib/gpattern.h"
/* keep enum and structure of gpattern.c and patterntest.c in sync */ /* keep enum and structure of gpattern.c and patterntest.c in sync */
typedef enum typedef enum
@@ -31,10 +30,12 @@ typedef enum
G_MATCH_EXACT, /* "AAAAA" */ G_MATCH_EXACT, /* "AAAAA" */
G_MATCH_LAST G_MATCH_LAST
} GMatchType; } GMatchType;
struct _GPatternSpec struct _GPatternSpec
{ {
GMatchType match_type; GMatchType match_type;
guint pattern_length; guint pattern_length;
guint min_length;
gchar *pattern; gchar *pattern;
}; };
@@ -65,16 +66,22 @@ match_type_name (GMatchType match_type)
} }
} }
/* this leakes memory, but we don't care */
#define utf8(str) g_convert (str, -1, "Latin1", "UTF-8", NULL, NULL, NULL)
#define latin1(str) g_convert (str, -1, "UTF-8", "Latin1", NULL, NULL, NULL)
static gboolean static gboolean
test_compilation (gchar *src, test_compilation (gchar *src,
GMatchType match_type, GMatchType match_type,
gchar *pattern) gchar *pattern,
guint min)
{ {
GPatternSpec *spec; GPatternSpec *spec;
g_print ("compiling \"%s\" \t", src); g_print ("compiling \"%s\" \t", utf8(src));
spec = g_pattern_spec_new (src); spec = g_pattern_spec_new (src);
if (spec->match_type != match_type) if (spec->match_type != match_type)
{ {
g_print ("failed \t(match_type: %s, expected %s)\n", g_print ("failed \t(match_type: %s, expected %s)\n",
@@ -86,8 +93,8 @@ test_compilation (gchar *src,
if (strcmp (spec->pattern, pattern) != 0) if (strcmp (spec->pattern, pattern) != 0)
{ {
g_print ("failed \t(pattern: \"%s\", expected \"%s\")\n", g_print ("failed \t(pattern: \"%s\", expected \"%s\")\n",
spec->pattern, utf8(spec->pattern),
pattern); utf8(pattern));
return FALSE; return FALSE;
} }
@@ -99,6 +106,14 @@ test_compilation (gchar *src,
return FALSE; return FALSE;
} }
if (spec->min_length != min)
{
g_print ("failed \t(min_length: %d, expected %d)\n",
spec->min_length,
min);
return FALSE;
}
g_print ("passed (%s: \"%s\")\n", g_print ("passed (%s: \"%s\")\n",
match_type_name (spec->match_type), match_type_name (spec->match_type),
spec->pattern); spec->pattern);
@@ -111,7 +126,7 @@ test_match (gchar *pattern,
gchar *string, gchar *string,
gboolean match) gboolean match)
{ {
g_print ("matching \"%s\" against \"%s\" \t", string, pattern); g_print ("matching \"%s\" against \"%s\" \t", utf8(string), utf8(pattern));
if (g_pattern_match_simple (pattern, string) != match) if (g_pattern_match_simple (pattern, string) != match)
{ {
@@ -132,14 +147,14 @@ test_equal (gchar *pattern1,
GPatternSpec *p2 = g_pattern_spec_new (pattern2); GPatternSpec *p2 = g_pattern_spec_new (pattern2);
gboolean equal = g_pattern_spec_equal (p1, p2); gboolean equal = g_pattern_spec_equal (p1, p2);
g_print ("comparing \"%s\" with \"%s\" \t", pattern1, pattern2); g_print ("comparing \"%s\" with \"%s\" \t", utf8(pattern1), utf8(pattern2));
if (expected != equal) if (expected != equal)
{ {
g_print ("failed \t{%s, %u, \"%s\"} %s {%s, %u, \"%s\"}\n", g_print ("failed \t{%s, %u, \"%s\"} %s {%s, %u, \"%s\"}\n",
match_type_name (p1->match_type), p1->pattern_length, p1->pattern, match_type_name (p1->match_type), p1->pattern_length, utf8(p1->pattern),
expected ? "!=" : "==", expected ? "!=" : "==",
match_type_name (p2->match_type), p2->pattern_length, p2->pattern); match_type_name (p2->match_type), p2->pattern_length, utf8(p2->pattern));
} }
else else
g_print ("passed (%s)\n", equal ? "equal" : "unequal"); g_print ("passed (%s)\n", equal ? "equal" : "unequal");
@@ -150,9 +165,9 @@ test_equal (gchar *pattern1,
return expected == equal; return expected == equal;
} }
#define TEST_COMPILATION(src, type, pattern) { \ #define TEST_COMPILATION(src, type, pattern, min) { \
total++; \ total++; \
if (test_compilation (src, type, pattern)) \ if (test_compilation (latin1(src), type, latin1(pattern), min)) \
passed++; \ passed++; \
else \ else \
failed++; \ failed++; \
@@ -160,7 +175,7 @@ test_equal (gchar *pattern1,
#define TEST_MATCH(pattern, string, match) { \ #define TEST_MATCH(pattern, string, match) { \
total++; \ total++; \
if (test_match (pattern, string, match)) \ if (test_match (latin1(pattern), latin1(string), match)) \
passed++; \ passed++; \
else \ else \
failed++; \ failed++; \
@@ -168,7 +183,7 @@ test_equal (gchar *pattern1,
#define TEST_EQUAL(pattern1, pattern2, match) { \ #define TEST_EQUAL(pattern1, pattern2, match) { \
total++; \ total++; \
if (test_equal (pattern1, pattern2, match)) \ if (test_equal (latin1(pattern1), latin1(pattern2), match)) \
passed++; \ passed++; \
else \ else \
failed++; \ failed++; \
@@ -180,16 +195,21 @@ main (int argc, char** argv)
gint total = 0; gint total = 0;
gint passed = 0; gint passed = 0;
gint failed = 0; gint failed = 0;
gchar *string; gchar *string, *pattern;
TEST_COMPILATION("*A?B*", G_MATCH_ALL, "*A?B*"); TEST_COMPILATION("*A?B*", G_MATCH_ALL, "*A?B*", 3);
TEST_COMPILATION("ABC*DEFGH", G_MATCH_ALL_TAIL, "HGFED*CBA"); TEST_COMPILATION("ABC*DEFGH", G_MATCH_ALL_TAIL, "HGFED*CBA", 8);
TEST_COMPILATION("ABCDEF*GH", G_MATCH_ALL, "ABCDEF*GH"); TEST_COMPILATION("ABCDEF*GH", G_MATCH_ALL, "ABCDEF*GH", 8);
TEST_COMPILATION("ABC**?***??**DEF*GH", G_MATCH_ALL, "ABC*?*??*DEF*GH"); TEST_COMPILATION("ABC**?***??**DEF*GH", G_MATCH_ALL, "ABC*???DEF*GH", 11);
TEST_COMPILATION("*A?AA", G_MATCH_ALL_TAIL, "AA?A*"); TEST_COMPILATION("*A?AA", G_MATCH_ALL_TAIL, "AA?A*", 4);
TEST_COMPILATION("ABCD*", G_MATCH_HEAD, "ABCD"); TEST_COMPILATION("ABCD*", G_MATCH_HEAD, "ABCD", 4);
TEST_COMPILATION("*ABCD", G_MATCH_TAIL, "DCBA"); TEST_COMPILATION("*ABCD", G_MATCH_TAIL, "ABCD", 4);
TEST_COMPILATION("ABCDE", G_MATCH_EXACT, "ABCDE"); TEST_COMPILATION("ABCDE", G_MATCH_EXACT, "ABCDE", 5);
TEST_COMPILATION("A?C?E", G_MATCH_ALL, "A?C?E", 5);
TEST_COMPILATION("*?x", G_MATCH_ALL_TAIL, "x?*", 2);
TEST_COMPILATION("?*x", G_MATCH_ALL_TAIL, "x?*", 2);
TEST_COMPILATION("*?*x", G_MATCH_ALL_TAIL, "x?*", 2);
TEST_COMPILATION("x*??", G_MATCH_ALL_TAIL, "??*x", 3);
TEST_EQUAL ("*A?B*", "*A?B*", TRUE); TEST_EQUAL ("*A?B*", "*A?B*", TRUE);
TEST_EQUAL ("A*BCD", "A*BCD", TRUE); TEST_EQUAL ("A*BCD", "A*BCD", TRUE);
@@ -198,6 +218,8 @@ main (int argc, char** argv)
TEST_EQUAL ("*YZ", "*YZ", TRUE); TEST_EQUAL ("*YZ", "*YZ", TRUE);
TEST_EQUAL ("A1x", "A1x", TRUE); TEST_EQUAL ("A1x", "A1x", TRUE);
TEST_EQUAL ("AB*CD", "AB**CD", TRUE); TEST_EQUAL ("AB*CD", "AB**CD", TRUE);
TEST_EQUAL ("AB*?*CD", "AB*?CD", TRUE);
TEST_EQUAL ("AB*?CD", "AB?*CD", TRUE);
TEST_EQUAL ("AB*CD", "AB*?*CD", FALSE); TEST_EQUAL ("AB*CD", "AB*?*CD", FALSE);
TEST_EQUAL ("ABC*", "ABC?", FALSE); TEST_EQUAL ("ABC*", "ABC?", FALSE);
@@ -215,11 +237,20 @@ main (int argc, char** argv)
TEST_MATCH("?*x", "x", FALSE); TEST_MATCH("?*x", "x", FALSE);
TEST_MATCH("*?*x", "yx", TRUE); TEST_MATCH("*?*x", "yx", TRUE);
TEST_MATCH("*?*x", "xxxx", TRUE); TEST_MATCH("*?*x", "xxxx", TRUE);
TEST_MATCH("x*??", "xyzw", TRUE);
string = g_convert ("<EFBFBD>x", -1, "UTF-8", "Latin1", NULL, NULL, NULL); TEST_MATCH("*x", "<EFBFBD>x", TRUE);
TEST_MATCH("*x", string, TRUE); TEST_MATCH("?x", "<EFBFBD>x", TRUE);
TEST_MATCH("?x", string, TRUE); TEST_MATCH("??x", "<EFBFBD>x", FALSE);
TEST_MATCH("??x", string, FALSE); TEST_MATCH("ab<EFBFBD><EFBFBD>", "ab<EFBFBD><EFBFBD>", TRUE);
TEST_MATCH("ab<EFBFBD><EFBFBD>", "abao", FALSE);
TEST_MATCH("ab?<3F>", "ab<EFBFBD><EFBFBD>", TRUE);
TEST_MATCH("ab?<3F>", "abao", FALSE);
TEST_MATCH("ab<EFBFBD>?", "ab<EFBFBD><EFBFBD>", TRUE);
TEST_MATCH("ab<EFBFBD>?", "abao", FALSE);
TEST_MATCH("ab??", "ab<EFBFBD><EFBFBD>", TRUE);
TEST_MATCH("ab*", "ab<EFBFBD><EFBFBD>", TRUE);
TEST_MATCH("ab*<2A>", "ab<EFBFBD><EFBFBD>", TRUE);
TEST_MATCH("ab*<2A>", "aba<EFBFBD>x<EFBFBD>", TRUE);
g_print ("\n%u tests passed, %u failed\n", passed, failed); g_print ("\n%u tests passed, %u failed\n", passed, failed);