d377bc2ee5
* Changelog from Linbit: 9.1.22 (api:genl2/proto:86-121/transport:18) -------- * Upgrade from partial resync to a full resync if necessary when the user manually resolves a split-brain situation * Fix a potential NULL deref when a disk fails while doing a forget-peer operation. * Fix a rcu_read_lock()/rcu_read_unlock() imbalance * Restart the open() syscall when a process auto promoting a drbd device gets interrupted by a signal * Remove a deadlock that caused DRBD to connect sometimes exceptionally slow * Make detach operations interruptible * Added dev_is_open to events2 status information * Improve log readability for 2PC state changes and drbd-threads * Updated compability code for Linux 6.9 9.1.21 (api:genl2/proto:86-121/transport:18) -------- * fix a deadlock that can trigger when deleting a connection and another connection going down in parallel. This is a regression of 9.1.20 * Fix an out-of-bounds access when scanning the bitmap. It leads to a crash when the bitmap ends on a page boundary, and this is also a regression in 9.1.20. 9.1.20 (api:genl2/proto:86-121/transport:18) -------- * Fix a kernel crash that is sometimes triggered when downing drbd resources in a specific, unusual order (was triggered by the Kubernetes CSI driver) * Fix a rarely triggering kernel crash upon adding paths to a connection by rehauling the path lists' locking * Fix the continuation of an interrupted initial resync * Fix the state engine so that an incapable primary does not outdate indirectly reachable secondary nodes * Fix a logic bug that caused drbd to pretend that a peer's disk is outdated when doing a manual disconnect on a down connection; with that cured impact on fencing and quorum. * Fix forceful demotion of suspended devices * Rehaul of the build system to apply compatibility patches out of place that allows one to build for different target kernels from a single drbd source tree * Updated compability code for Linux 6.8 9.1.19 (api:genl2/proto:86-121/transport:18) -------- * Fix a resync decision case where drbd wrongly decided to do a full resync, where a partial resync was sufficient; that happened in a specific connect order when all nodes were on the same data generation (UUID) * Fix the online resize code to obey cached size information about temporal unreachable nodes * Fix a rare corner case in which DRBD on a diskless primary node failed to re-issue a read request to another node with a backing disk upon connection loss on the connection where it shipped the read request initially * Make timeout during promotion attempts interruptible * No longer write activity-log updates on the secondary node in a cluster with precisely two nodes with backing disk; this is a performance optimization * Reduce CPU usage of acknowledgment processing 9.1.18 (api:genl2/proto:86-121/transport:18) -------- * Fixed connecting nodes with differently sized backing disks, specifically when the smaller node is primary, before establishing the connections * Fixed thawing a device that has I/O frozen after loss of quorum when a configuration change eases its quorum requirements * Properly fail TLS if requested (only available in drbd-9.2) * Fixed a race condition that can cause auto-demote to trigger right after an explicit promote * Fixed a rare race condition that could mess up the handshake result before it is committed to the replication state. * Preserve "tiebreaker quorum" over a reboot of the last node (3-node clusters only) * Update compatibility code for Linux 6.6 9.1.17 (api:genl2/proto:86-121/transport:18) -------- * fix a potential crash when configuring drbd to bind to a non-existent local IP address (this is a regression of drbd-9.1.8) * Cure a very seldom triggering race condition bug during establishing connections; when you triggered it, you got an OOPS hinting to list corruption * fix a race condition regarding operations on the bitmap while forgetting a bitmap slot and a pointless warning * Fix handling of unexpected (on a resource in secondary role) write requests * Fix a corner case that can cause a process to hang when closing the DRBD device, while a connection gets re-established * Correctly block signal delivery during auto-demote * Improve the reliability of establishing connections * Do not clear the transport with `net-options --set-defaults`. This fix avoids unexpected disconnect/connect cycles upon an `adjust` when using the 'lb-tcp' or 'rdma' transports in drbd-9.2. * New netlink packet to report path status to drbdsetup * Improvements to the content and rate-limiting of many log messages * Update compatibility code and follow Linux upstream development until Linux 6.5 * remove patches which already included in the new version: 0001-drbd-allow-transports-to-take-additional-krefs-on-a-.patch 0002-drbd-improve-decision-about-marking-a-failed-disk-Ou.patch 0003-drbd-fix-error-path-in-drbd_get_listener.patch 0004-drbd-build-fix-spurious-re-build-attempt-of-compat.p.patch 0005-drbd-log-error-code-when-thread-fails-to-start.patch 0006-drbd-log-numeric-value-of-drbd_state_rv-as-well-as-s.patch 0007-drbd-stop-defining-__KERNEL_SYSCALLS__.patch 0008-compat-block-introduce-holder-ops.patch 0009-drbd-reduce-net_ee-not-empty-info-to-a-dynamic-debug.patch 0010-drbd-do-not-send-P_CURRENT_UUID-to-DRBD-8-peer-when-.patch 0011-compat-block-pass-a-gendisk-to-open.patch 0012-drbd-Restore-DATA_CORKED-and-CONTROL_CORKED-bits.patch 0013-drbd-remove-unused-extern-for-conn_try_outdate_peer.patch 0014-drbd-include-source-of-state-change-in-log.patch 0015-compat-block-use-the-holder-as-indication-for-exclus.patch 0016-drbd-Fix-net-options-set-defaults-to-not-clear-the-t.patch 0017-drbd-propagate-exposed-UUIDs-only-into-established-c.patch 0018-drbd-rework-autopromote.patch 0019-compat-block-remove-the-unused-mode-argument-to-rele.patch 0020-drbd-do-not-allow-auto-demote-to-be-interrupted-by-s.patch 0021-compat-sock-Remove-sendpage-in-favour-of-sendmsg-MSG.patch 0022-compat-block-replace-fmode_t-with-a-block-specific-t.patch 0023-compat-genetlink-remove-userhdr-from-struct-genl_inf.patch 0024-compat-fixup-FMODE_READ-FMODE_WRITE-usage.patch 0025-compat-drdb-Convert-to-use-bdev_open_by_path.patch 0026-compat-gate-blkdev_-patches-behind-bdev_open_by_path.patch boo1230635_01-compat-fix-nla_nest_start_noflag-test.patch boo1230635_02-drbd-port-block-device-access-to-file.patch * removed patches which are not needed anymore: boo1229062-re-enable-blk_queue_max_hw_sectors.patch bsc1226510-fix-build-err-against-6.9.3.patch * update: drbd_git_revision suse-coccinelle.patch drbd.spec * add upstream patches to align commit 13ada1be201e: 0001-drbd-properly-rate-limit-resync-progress-reports.patch 0002-drbd-inherit-history-UUIDs-from-sync-source-when-res.patch 0003-build-compat-fix-line-offset-in-annotation-pragmas-p.patch 0004-drbd-fix-exposed_uuid-going-backward.patch 0005-drbd-Proper-locking-around-new_current_uuid-on-a-dis.patch 0006-build-CycloneDX-fix-bom-ref-add-purl.patch 0007-build-Another-update-to-the-spdx-files.patch 0008-build-generate-spdx.json-not-tag-value-format.patch 0009-compat-fix-gen_patch_names-for-bdev_file_open_by_pat.patch 0010-compat-fix-nla_nest_start_noflag-test.patch 0011-compat-fix-blk_alloc_disk-rule.patch 0012-drbd-remove-const-from-function-return-type.patch 0013-drbd-don-t-set-max_write_zeroes_sectors-in-decide_on.patch 0014-drbd-split-out-a-drbd_discard_supported-helper.patch 0015-drbd-atomically-update-queue-limits-in-drbd_reconsid.patch 0016-compat-test-and-patch-for-queue_limits_start_update.patch 0017-compat-specify-which-essential-change-was-not-made.patch 0018-gen_patch_names-reorder-blk_mode_t.patch 0019-compat-fix-blk_queue_update_readahead-patch.patch 0020-compat-test-and-patch-for-que_limits-max_hw_discard_.patch 0021-compat-fixup-write_zeroes__no_capable.patch 0022-compat-fixup-queue_flag_discard__yes_present.patch 0023-drbd-move-flags-to-queue_limits.patch 0024-compat-test-and-patch-for-queue_limits.features.patch 0025-drbd-Annotate-struct-fifo_buffer-with-__counted_by.patch 0026-compat-test-and-patch-for-__counted_by.patch 0027-drbd-fix-function-cast-warnings-in-state-machine.patch 0028-Add-missing-documentation-of-peer_device-parameter-t.patch 0030-drbd-kref_put-path-when-kernel_accept-fails.patch 0031-build-fix-typo-in-Makefile.spatch.patch 0032-drbd-open-do-not-delay-open-if-already-Primary.patch * add patch to fix kernel imcompatibility issue (boo#1231290): boo1231290_fix_drbd_build_error_against_kernel_v6.11.0.patch OBS-URL: https://build.opensuse.org/package/show/network:ha-clustering:Factory/drbd?expand=0&rev=153
120 lines
4.8 KiB
Diff
120 lines
4.8 KiB
Diff
From aab03bfc73a62f95011316545a5c0fbb4817741b Mon Sep 17 00:00:00 2001
|
|
From: Lars Ellenberg <lars.ellenberg@linbit.com>
|
|
Date: Wed, 14 Aug 2024 11:49:42 +0200
|
|
Subject: [PATCH 01/32] drbd: properly rate-limit resync progress reports
|
|
|
|
A peer_device in "paused" sync would have flooded the "drbd events2"
|
|
generic netlink broadcast with "resync progress reports",
|
|
if it cleared significant out-of-sync bits,
|
|
as is the case with application writes,
|
|
or several peers syncing from the same sync source
|
|
and having a "paused sync" replication state between themselves.
|
|
|
|
If you have "many" such resources, this storm may even overflow receive buffers.
|
|
At most one progress report every three seconds should be enough,
|
|
and is what was intended.
|
|
|
|
Use a new "last progress report time stamp" to throttle
|
|
advancing resync progress marks and progress report broadcasts.
|
|
---
|
|
drbd/drbd_actlog.c | 35 +++++++++++++++++++++++------------
|
|
drbd/drbd_int.h | 1 +
|
|
drbd/drbd_receiver.c | 1 +
|
|
drbd/drbd_state.c | 2 ++
|
|
4 files changed, 27 insertions(+), 12 deletions(-)
|
|
|
|
diff --git a/drbd/drbd_actlog.c b/drbd/drbd_actlog.c
|
|
index b96560843878..646dcb29e1d9 100644
|
|
--- a/drbd/drbd_actlog.c
|
|
+++ b/drbd/drbd_actlog.c
|
|
@@ -1020,19 +1020,30 @@ static bool update_rs_extent(struct drbd_peer_device *peer_device,
|
|
|
|
void drbd_advance_rs_marks(struct drbd_peer_device *peer_device, unsigned long still_to_go)
|
|
{
|
|
- unsigned long now = jiffies;
|
|
- unsigned long last = peer_device->rs_mark_time[peer_device->rs_last_mark];
|
|
- int next = (peer_device->rs_last_mark + 1) % DRBD_SYNC_MARKS;
|
|
- if (time_after_eq(now, last + DRBD_SYNC_MARK_STEP)) {
|
|
- if (peer_device->rs_mark_left[peer_device->rs_last_mark] != still_to_go &&
|
|
- peer_device->repl_state[NOW] != L_PAUSED_SYNC_T &&
|
|
- peer_device->repl_state[NOW] != L_PAUSED_SYNC_S) {
|
|
- peer_device->rs_mark_time[next] = now;
|
|
- peer_device->rs_mark_left[next] = still_to_go;
|
|
- peer_device->rs_last_mark = next;
|
|
- }
|
|
- drbd_peer_device_post_work(peer_device, RS_PROGRESS);
|
|
+ unsigned long now;
|
|
+ int next;
|
|
+
|
|
+ /* report progress and advance marks only if we made progress */
|
|
+ if (peer_device->rs_mark_left[peer_device->rs_last_mark] == still_to_go)
|
|
+ return;
|
|
+
|
|
+ /* report progress and advance marks at most once every DRBD_SYNC_MARK_STEP (3 seconds) */
|
|
+ now = jiffies;
|
|
+ if (!time_after_eq(now, peer_device->rs_last_progress_report_ts + DRBD_SYNC_MARK_STEP))
|
|
+ return;
|
|
+
|
|
+ /* Do not advance marks if we are "paused" */
|
|
+ if (peer_device->repl_state[NOW] != L_PAUSED_SYNC_T &&
|
|
+ peer_device->repl_state[NOW] != L_PAUSED_SYNC_S) {
|
|
+ next = (peer_device->rs_last_mark + 1) % DRBD_SYNC_MARKS;
|
|
+ peer_device->rs_mark_time[next] = now;
|
|
+ peer_device->rs_mark_left[next] = still_to_go;
|
|
+ peer_device->rs_last_mark = next;
|
|
}
|
|
+
|
|
+ /* But still report progress even if paused. */
|
|
+ peer_device->rs_last_progress_report_ts = now;
|
|
+ drbd_peer_device_post_work(peer_device, RS_PROGRESS);
|
|
}
|
|
|
|
/* It is called lazy update, so don't do write-out too often. */
|
|
diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
|
|
index 49bd7b0c407c..c18407899f59 100644
|
|
--- a/drbd/drbd_int.h
|
|
+++ b/drbd/drbd_int.h
|
|
@@ -1285,6 +1285,7 @@ struct drbd_peer_device {
|
|
unsigned long rs_paused;
|
|
/* skipped because csum was equal [unit BM_BLOCK_SIZE] */
|
|
unsigned long rs_same_csum;
|
|
+ unsigned long rs_last_progress_report_ts;
|
|
#define DRBD_SYNC_MARKS 8
|
|
#define DRBD_SYNC_MARK_STEP (3*HZ)
|
|
/* block not up-to-date at mark [unit BM_BLOCK_SIZE] */
|
|
diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
|
|
index 19634f6423bd..ee54cf3ac116 100644
|
|
--- a/drbd/drbd_receiver.c
|
|
+++ b/drbd/drbd_receiver.c
|
|
@@ -3409,6 +3409,7 @@ static int receive_DataRequest(struct drbd_connection *connection, struct packet
|
|
peer_device->ov_skipped = 0;
|
|
peer_device->rs_total = ov_left;
|
|
peer_device->rs_last_writeout = now;
|
|
+ peer_device->rs_last_progress_report_ts = now;
|
|
for (i = 0; i < DRBD_SYNC_MARKS; i++) {
|
|
peer_device->rs_mark_left[i] = ov_left;
|
|
peer_device->rs_mark_time[i] = now;
|
|
diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
|
|
index be1de8f0653b..44f55ee5c939 100644
|
|
--- a/drbd/drbd_state.c
|
|
+++ b/drbd/drbd_state.c
|
|
@@ -2483,6 +2483,7 @@ static void initialize_resync_progress_marks(struct drbd_peer_device *peer_devic
|
|
unsigned long now = jiffies;
|
|
int i;
|
|
|
|
+ peer_device->rs_last_progress_report_ts = now;
|
|
for (i = 0; i < DRBD_SYNC_MARKS; i++) {
|
|
peer_device->rs_mark_left[i] = tw;
|
|
peer_device->rs_mark_time[i] = now;
|
|
@@ -2730,6 +2731,7 @@ static void finish_state_change(struct drbd_resource *resource, const char *tag)
|
|
peer_device->ov_last_skipped_size = 0;
|
|
peer_device->ov_last_skipped_start = 0;
|
|
peer_device->rs_last_writeout = now;
|
|
+ peer_device->rs_last_progress_report_ts = now;
|
|
for (i = 0; i < DRBD_SYNC_MARKS; i++) {
|
|
peer_device->rs_mark_left[i] = peer_device->rs_total;
|
|
peer_device->rs_mark_time[i] = now;
|
|
--
|
|
2.35.3
|
|
|