Commit Graph

200 Commits

Author SHA1 Message Date
Tobias Stoeckmann
4bc794e42a ghash: Handle hash tables with millions of entries
Multiplying a guint value with BIG_ENTRY_SIZE (8) can overflow the guint
data type if size reaches 2^29. Use the correct size_t type for 64 bit
systems to support such allocations.

A 32 bit system should fail its reallocation way earlier before reaching
such a large "size", i.e. item count, especially when reallocating.
Also, it would multiply with 4.

Closes #3724
2025-08-21 10:47:31 +02:00
Mavroudis Chatzilazaridis
cb1502ed66 docs: Fix invalid references and broken links
This commit fixes quite a few broken links and references, minor typos,
and improves wording in some sections.
2025-02-05 22:28:26 +02:00
Philip Withnall
7a7d8d548a Merge branch 'wfloat-conversion' into 'main'
build: Enable -Wfloat-conversion and fix warnings

See merge request GNOME/glib!4126
2024-09-17 17:57:11 +00:00
Adrien Plazas
46ec058d2a ghash: Fix the documentation of GHRFunc
This function type isn't only used by g_hash_table_foreach_remove(), and
what happens to the data when we return TRUE depends on the calling
function.

Includes a port to modern gi-docgen syntax by Emmanuele Bassi.

Signed-off-by: Adrien Plazas <adrien.plazas@codethink.co.uk>
2024-08-15 10:22:07 +00:00
Thomas Haller
f6cf2bcfd2 ghash: fix g_hash_table_steal_extended() when requesting key and value of a set
GHashTable optimizes for the "set" case, where key and value are the same.
See g_hash_table_add().

A user cannot see from outside, whether a GHashTable internally is a set
and shares the keys and values array. Adding one key/value pair with
differing key and value, will expand the GHashTable.

In all other cases, the GHashTable API hides this implementation detail
correctly. Except with g_hash_table_steal_extended(), when stealing both the
key and the value.

Fix that. This bug fix is obviously a change in behavior. In practice,
it's unlikely that somebody would notice, because GHashTable contains
opaque pointers and the user must know what the keys/values are and
be aware of their ownership semantics when stealing them. That means,
the change in behavior only affects instances that are internally a set,
of what the user most likely is aware and fills the table with
g_hash_table_add(). Such a user would not steal both the key and
values at the same time. Even if they do, then previously stealing the
value was pointless and would not give them what they wanted. It would
not have meaningfully worked, and since nobody reported a bug about this
yet, it's unlikely somebody noticed.

The more problematic case when the user exhibits the bug is when the
dictionary is unexpected a set internally. Imagine a mapping from numbers
to numbers (e.g. a permutation). If "unexpectedly" the dictionary contains
the identity permutation, steal-extended gives always NULL for the target
number.

The example is far fetched. In practice, it's unlikely that somebody is
gonna notice either way. That is not an argument for fixing anything.
The argument for fixing this, is that the bug breaks the illusion that
the set is only an internal optimization. That is ugly and inconsistent.
2024-08-09 19:24:08 +02:00
Philip Withnall
2713255574 glib: Add explicit casts for some double → other numeric type conversions
If we enable `-Wfloat-conversion`, these warn about a possible loss of
precision due to an implicit conversion from `double` to some other
numeric type.

The warning is correct: there is a possible loss of precision here. In
these instances, we don’t care, as the floating point arithmetic is
being done to do some imprecise scaling or imprecise timing. A loss of
precision is not a problem.

So, add an explicit cast to squash the warning.

Signed-off-by: Philip Withnall <pwithnall@gnome.org>

Helps: #3405
2024-06-28 14:24:55 +01:00
António Fernandes
c9cd385b17 hash: Explicitly annotate key in iter_next as nullable
There is no reason for key and value to have different annotations.
Both may return NULL as a valid value.

gpointer typed parameters are nullable by convention, so there is
no change here. The (nullable) annotation is just for humans really.
2023-12-04 09:53:35 +00:00
Philip Withnall
02725b9bbd ghashtable: Add various missing (transfer) annotations
Fixes some g-ir-scanner warnings.

Signed-off-by: Philip Withnall <pwithnall@gnome.org>

Helps: #3037
2023-11-29 12:02:49 +00:00
Philip Withnall
d07c59ed4e glib: Add (scope call) to a load of sort/equal callbacks
This fixes a load of g-ir-scanner warnings.

Signed-off-by: Philip Withnall <pwithnall@gnome.org>

Helps: #3037
2023-11-29 11:59:47 +00:00
Thomas Haller
3c09257ea1 glib: add internal g_uint_hash()/g_uint_equal()
We have g_int_hash()/g_int_equal(), which in practice might also work
with with pointers to unsigned integers. However, according to strict
interpretation of C, I think it is not valid to conflate the two.

Even if it were valid in all cases that we want to support, we should
still have separate g_uint_{hash,equal} functions (e.g. by just #define
them to their underlying g_int_{hash,equal} implementations).

Add instead internal hash/equal functions for guint.
2023-11-06 08:48:12 +01:00
Matthias Clasen
f88d96e2ad docs: Move the GHashTable SECTION
Move the content to the new data-structures.md file.

Helps: #3037
2023-10-11 17:38:30 +01:00
Alex Richardson
147777e342 GHash: Don't use SMALL_ENTRY_SIZE for CHERI
The code for SMALL_ENTRY_SIZE assumes pointers are no larger than 8 bytes,
so instead of trying to make it work disable the optimization for now.

Helps: https://gitlab.gnome.org/GNOME/glib/-/issues/2842
Co-authored-by: Graeme Jenkinson <graeme@capabilitieslimited.co.uk>
2023-02-10 18:22:20 +00:00
Alex Richardson
2aed76fc94 Fix GHashTableIter layout for CHERI targets
Last field needs to be pointer-size to match GHashTableIter. This happened
to work for most architecture due to alignment padding/pointer size, but
for CHERI targets with 128-bit pointers RealIter ends up being smaller
than GHashTableIter.

Helps: https://gitlab.gnome.org/GNOME/glib/-/issues/2842
2023-02-10 18:22:20 +00:00
Marco Trevisan (Treviño)
8465c1a055 ghash: Use unsigned types for number of nodes and occupied ones
It has always been considered an unsigned value, and we also returned it
straight as int in g_hash_table_size(), but it was actually used as an
int.

So use the same type of g_hash_table_size(). Not using more standard
unsigned not to risk that it may different from the guint typedef.
2022-12-16 19:33:01 +01:00
Marco Trevisan (Treviño)
d68e7bc84a ghash: Add functions to steal all keys and values preserving ownership
Add functions to steal all the keys or values from a ghash (especially
useful when it's used as a set), passing the ownership of then to a
GPtrArray container that preserves the destroy notify functions.
2022-12-16 18:45:36 +01:00
Marco Trevisan (Treviño)
d2c3f7f513 ghash: Add APIs to get hash table keys and values as GPtrArray
GPtrArray's are faster than lists and provide more flexibility, so add
APIs to get hash keys and values using these containers too.

Given that we know the size at array initialization we can optimize the
allocation quite a bit, making it faster than the API using GList both at
creation time and for consumers.
2022-12-16 18:45:36 +01:00
Simon McVittie
de7abf54c7 ghash: Correctly retrieve low 32 bits of 64-bit values
Dereferencing a pointer to a 64-bit object as though it was a 32-bit
object is the same as taking the least significant 32 bits on a
little-endian machine, but on a big-endian machine it is the same as
taking the *most* significant 32 bits, which would result in these hash
functions always hashing to zero. The /hash/int64/collisions and
/hash/double/collisions test-cases in glib/tests/hash.c detect this
and fail.

Instead, fetch the whole 64-bit quantity and do the bit-manipulation
to xor the more significant half with the less significant half in an
architecture-neutral way.

Fixes: dd1f4f70 "Optimize g_double_hash implementation"
Fixes: c1af4b2b "Optional optimization for `g_int64_hash`"
Resolves: https://gitlab.gnome.org/GNOME/glib/-/issues/2787
Signed-off-by: Simon McVittie <smcv@collabora.com>
2022-10-24 10:46:57 +01:00
Thomas Haller
4a709d0b46 ghash: comment g_hash_table_steal_extended() about not destroying key/value
The previous text was technically correct, but not very clear what
happens with the ownership of the key/value if it was not returned.
Elaborate on the fact, that the key/value is never destroyed, even if
not requested by the user.

I intuitively expected the function to behave differently, that is, to
destroy the key/value if (and only if) it was not returned. That is,
when the function does not return a pointer, then it would destroy it.
That would seem more consistent to me, where ownership is either
transferred to the caller, or the resource destroyed during the steal.

On the other hand, the existing behaviors is:

- is consistent with g_hash_table_steal() and never destroys key/value.
- behaves the same, regardless whether the key/value was returned.

So the existing behavior may be better.

Just elaborate on that detail in the doc.
2022-10-18 13:57:43 +01:00
Thomas Haller
99bedd110c ghash: document g_hash_table_steal_extended() behavior for sets 2022-10-14 08:03:43 +00:00
Matthias Clasen
db1aadaa5e Merge branch 'str-equal-only' into 'main'
g_str_equal: Provide macro for optimization

Closes #2775

See merge request GNOME/glib!2940
2022-10-11 18:19:25 +00:00
Xavier Claessens
6e341750df g_str_equal: Provide macro for optimization
g_str_equal() is a nicer API than strcmp()==0, and less error prone.
However, forcing a function call prevents compiler from doing
optimizations. In the case it is not used as callback to GHashTable,
provide a macro that calls strcmp directly. This also has the side
effect that it forces arguments to be `const char *` instead of
`gconstpointer` in the case it is not used as callback, which adds type
safety.

Fixes: #2775
2022-10-11 10:55:56 -04:00
wszqkzqk
c1af4b2b88 Optional optimization for g_int64_hash 2022-10-11 13:12:20 +02:00
wszqkzqk
dd1f4f709e Optimize g_double_hash implementation 2022-10-11 13:12:20 +02:00
Philip Withnall
70ee43f1e9 glib: Add SPDX license headers automatically
Add SPDX license (but not copyright) headers to all files which follow a
certain pattern in their existing non-machine-readable header comment.

This commit was entirely generated using the command:
```
git ls-files glib/*.[ch] | xargs perl -0777 -pi -e 's/\n \*\n \* This library is free software; you can redistribute it and\/or\n \* modify it under the terms of the GNU Lesser General Public/\n \*\n \* SPDX-License-Identifier: LGPL-2.1-or-later\n \*\n \* This library is free software; you can redistribute it and\/or\n \* modify it under the terms of the GNU Lesser General Public/igs'
```

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>

Helps: #1415
2022-05-18 09:19:02 +01:00
Jonas Ådahl
283d9e0c15 ghash: Add g_hash_table_new_similar()
This function creates a new hash table, but inherits the functions used
for the hash, comparison, and key/value memory management functions from
another hash table.

The primary use case is to implement a behaviour where you maintain a
hash table by regenerating it, letting the values not migrated be freed.
See the following pseudo code:

```
GHashTable *ht;

init(GList *resources) {
  ht = g_hash_table_new (g_str_hash, g_str_equal, g_free, g_free);
  for (r in resources)
    g_hash_table_insert (ht, strdup (resource_get_key (r)), create_value (r));
}

update(GList *resources) {
  GHashTable *new_ht = g_hash_table_new_similar (ht);

  for (r in resources) {
    if (g_hash_table_steal_extended (ht, resource_get_key (r), &key, &value))
      g_hash_table_insert (new_ht, key, value);
    else
      g_hash_table_insert (new_ht, strdup (resource_get_key (r)), create_value (r));
  }
  g_hash_table_unref (ht);
  ht = new_ht;
}
```
2022-01-18 22:19:55 +01:00
Evangelos Ribeiro Tzaras
708100c0a2 docs: Fix annotations for optional arguments
The length parameter in g_hash_table_get_keys_as_arrays() is optional and
this should be reflected in the gtk-doc annotations.
2021-06-11 15:19:17 +02:00
Philip Withnall
19470722b3 glib: Use g_memdup2() instead of g_memdup() in obvious places
Convert all the call sites which use `g_memdup()`’s length argument
trivially (for example, by passing a `sizeof()` or an existing `gsize`
variable), so that they use `g_memdup2()` instead.

In almost all of these cases the use of `g_memdup()` would not have
caused problems, but it will soon be deprecated, so best port away from
it

In particular, this fixes an overflow within `g_bytes_new()`, identified
as GHSL-2021-045 by GHSL team member Kevin Backhouse.

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
Fixes: GHSL-2021-045
Helps: #2319
2021-02-04 16:04:10 +00:00
Philip Withnall
271db1f409 ghash: Move initialisation to declaration
This introduces no functional changes, but should squash a warning from
`scan-build`:
```
../../../glib/ghash.c:575:3: warning: Value stored to 'small' is never read
  small = FALSE;
```

Signed-off-by: Philip Withnall <pwithnall@endlessos.org>
2020-10-14 12:58:25 +01:00
Emmanuel Fleury
cbae555a98 Add some notes on complexity in glib/ghash.c
Related to issue #3
2020-09-02 14:38:10 +02:00
Philip Withnall
00bfb3ab44 tree: Fix various typos and outdated terminology
This was mostly machine generated with the following command:
```
codespell \
    --builtin clear,rare,usage \
    --skip './po/*' --skip './.git/*' --skip './NEWS*' \
    --write-changes .
```
using the latest git version of `codespell` as per [these
instructions](https://github.com/codespell-project/codespell#user-content-updating).

Then I manually checked each change using `git add -p`, made a few
manual fixups and dropped a load of incorrect changes.

There are still some outdated or loaded terms used in GLib, mostly to do
with git branch terminology. They will need to be changed later as part
of a wider migration of git terminology.

If I’ve missed anything, please file an issue!

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-06-12 15:01:08 +01:00
Philip Withnall
d630c58021 ghash: Document the iteration order over a hash table is not defined
It’s quite surprising that this wasn’t documented already. Hash tables
are unordered, and any recognisable iteration ordering is not guaranteed
and might change in future releases.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-02-24 10:21:40 +00:00
Philip Withnall
b52bb75327 ghash: Clarify that g_hash_table_add() always consumes the key
Even if the key already exists in the table, `g_hash_table_add()` will
call the hash table’s key free func on the old key and will then replace
the old key with the newly-passed-in key. So `key` is always `(transfer
full)`.

In particular, `key` should never need to be freed by the caller if
`g_hash_table_add()` returns `FALSE`.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2020-01-31 12:13:23 +00:00
Matthias Clasen
1fb3628fb3 hash: Remove an assertion from the hot path
This assert is using atomics and was showing up
in some cache-heavy GTK profiles. Remove it.
2019-10-10 14:24:42 +01:00
Дилян Палаузов
512655aa12 minor typos in the documentation (a/an) 2019-08-24 19:14:05 +00:00
Allison Karlitskaya
115033338b ghash: fix small array handling in g_hash_table_remove_all_nodes()
Factor out the code for setting up the hash table size, mask and mod,
detecting valgrind and allocating the arrays for hashes, keys, and
values.

Make use of this new function from g_hash_table_remove_all_nodes().

The handling of have_big_keys and have_big_values was never correct in
this function because it reallocated the array without changing the
flags in the struct.  Any calls in to the hashtable from destroy
notifies would find the table in an inconsistent state.

Many thanks to Thomas Haller who is essentially responsible for all the
real work in this patch: both discovering and identifying the original
problem, as well as finding the solution to it.
2019-05-20 17:03:49 +02:00
Allison Karlitskaya
9add93e5a4 ghash: Be more explicit about memory in g_hash_table_destroy_all_nodes()
Make it clear that there is a reference transfer going on here, rather
than relying on the fields being overwritten on each branch of the
conditional below.
2019-05-20 17:03:49 +02:00
Allison Karlitskaya
b3fbf6c18b ghash: do less work when destroying the table
We were calling g_hash_table_set_shift() to reinitialise the hash table
even in the case of destroying it.  Only do that for the non-destruction
case, and fill the relevant fields with zeros for the destruction case.
This has a nice side effect of causing more certain crashes in case of
invalid reuse of the table after (or during) destruction.
2019-05-20 17:03:49 +02:00
Allison Karlitskaya
c5462cb3c1 ghash: Improve internal documentation
The changes introduced by 18745ff674 made
the comment at the top of g_hash_table_remove_all_nodes() no longer
correct.  Fix that inaccuracy and add more documentation all-around.
2019-05-20 17:03:49 +02:00
Allison Karlitskaya
96ce92025d ghash: fix bug introduced by valgrind fix
g_hash_table_new_full() had an invocation of
g_hash_table_realloc_key_or_value_array() with the @is_big argument
incorrectly hardcoded to FALSE, even though later in the function the
values of have_big_keys and have_big_values would be set conditionally.

This never caused problems before because on 64bit platforms, this would
result in the allocation of a guint-sized array (which would be fine, as
have_big_keys and have_big_values would always start out as false) and
on 32bit platforms, this function ignored the value and always allocated
a gpointer-sized array.

Since merge request GNOME/glib!845 we have the possibility for
have_big_keys and have_big_values to start out as TRUE on 64bit
platforms.  We need to make sure we pass the argument through correctly.
2019-05-20 17:03:49 +02:00
Allison Karlitskaya
436ca1f376 ghash: Disable small-arrays under valgrind
Valgrind can't find 64bit pointers when we pack them into an array of
32bit values.  Disable this optimisation if we detect that we are
running under valgrind.

Fixes #1749
2019-05-14 16:31:37 +02:00
Emmanuele Bassi
6cb6b418bf Add a version check for duplicated-branches warning
The GHashTable code ignores the duplicated-branches GCC warning, but we
need to do a compiler and version check, as either non-GCC compatible
compilers, or older versions of GCC will warn about the unknown pragma
or diagnostic.

If we don't do this while turning warnings into error, we're going to
fail the build unnecessarily.
2019-04-30 14:49:00 +01:00
Philip Withnall
38de3e9dc3 docs: Use ‘look up’ as a verb, rather than the noun ‘lookup’
Another niggle fixed.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2019-04-26 12:12:31 +01:00
Emmanuel Fleury
e9f57495c6 Fix various signedness warnings in glib/ghash.c
To conform to a better signedness schema, this patch change
GHashTable.size field from gint to gsize (and change accordingly the
tests with it).
2019-03-17 19:05:35 +01:00
Tapasweni Pathak
58bbdcf6c0 gmacros: Add G_ALIGNOF superseding _g_alignof macro 2018-12-18 13:59:23 +05:30
Hans Petter Jansson
d3074a748f ghash: Fix out-of-range use of signed integer
We were mistakenly shifting a signed int literal by up to 31 places.
Specify unsigned int instead.

Closes #1570
2018-10-12 13:09:39 +02:00
Hans Petter Jansson
9986395638 ghash: Use realloc in place of alloc for key/value
Minor simplification resulting in the removal of redundant alloc wrappers.
2018-10-03 22:14:38 +02:00
Hans Petter Jansson
194eef5f17 ghash: Be less eager to opportunistically grow the table on cleanup
When g_hash_table_resize() gets called, we clear out tombstones and grow
the table at the same time if needed. However, the threshold was set too
low, so we'd grow if the load was greater than .5 after subtracting
tombstones. Increase this threshold to ~.75.
2018-10-03 22:14:38 +02:00
Hans Petter Jansson
7eaf018b29 ghash: Significantly reduce peak memory use
When resizing, we were keeping both the old and new hash, key and value
arrays around while we reinserted entries, resulting in a peak memory
overhead of 50%. Using a temporary bookkeeping array with one bit per
entry we can now grow and shrink the main arrays using realloc() and an
eviction scheme, reducing the overhead to .625% (assuming 64-bit keys and
values). Tests show the CPU overhead is negligible.
2018-10-03 22:14:32 +02:00
Hans Petter Jansson
dc983d74cc ghash: Use less memory when storing ints on 64-bit platforms
If int is smaller than void * on our arch, we start out with
int-sized keys and values and resize to pointer-sized entries as
needed. This saves a good amount of memory when the HT is being
used with e.g. GUINT_TO_POINTER().
2018-10-03 22:11:07 +02:00
Hans Petter Jansson
171f698ead ghash: Simplify g_hash_table_set_shift()
Even if we're using a prime modulo for the initial probe, our table is
power-of-two-sized, meaning we can set the mask simply by subtracting one
from the size.
2018-09-17 16:17:10 +02:00