dfaggioli/qemu - qemu - openSUSE Gitea

Author	SHA1	Message	Date
João Silva	86700a687a	[openSUSE] block: Add a thread-pool version of fstat (bsc#1211000) The fstat call can take a long time to finish when running over NFS. Add a version of it that runs in the thread pool. Adapt one of its users, raw_co_get_allocated_file size to use the new version. That function is called via QMP under the qemu_global_mutex so it has a large chance of blocking VCPU threads in case it takes too long to finish. Link: https://lore.kernel.org/r/20240409145917.6780-1-farosas@suse.de Reviewed-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: João Silva <jsilva@suse.de> References: bsc#1211000 Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>	2025-04-01 17:58:58 +02:00
Lin Ma	2fd74df480	[openSUSE] block: Convert qmp_query_block and qmp_query_named_block_nodes to coroutine (bsc#1211000) Convert the remaining functions to make the QMP commands query-block and query-named-block-nodes run in their entirety in a coroutine. With this, any yield from those commands will return all the way back to the main loop. This releases the BQL and the main loop and avoids having the QMP command block another more important task from running. Both commands need to be converted at once because hmp_info_block calls both and it needs to be moved to a coroutine as well. Now the wrapper for bdrv_co_get_allocated_file_size() can be made not mixed and the wrapper for bdrv_co_block_device_info() can be removed. Link: https://lore.kernel.org/r/20240409145917.6780-1-farosas@suse.de Signed-off-by: Lin Ma <lma@suse.com> References: bsc#1211000 Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>	2025-04-01 17:58:58 +02:00
Fabiano Rosas	7f7862cf7a	[openSUSE] block: Convert bdrv_block_device_info into co_wrapper (bsc#1211000) We're converting callers of bdrv_co_get_allocated_file_size() to run in coroutines because that function will be made asynchronous when called (indirectly) from the QMP dispatcher. This function is a candidate because it calls bdrv_query_image_info() -> bdrv_co_do_query_node_info() -> bdrv_co_get_allocated_file_size(). It is safe to turn this is a coroutine because the code it calls is made up of either simple accessors and string manipulation functions [1] or it has already been determined to be safe [2]. 1) bdrv_refresh_filename(), bdrv_is_read_only(), blk_enable_write_cache(), bdrv_cow_bs(), blk_get_public(), throttle_group_get_name(), bdrv_write_threshold_get(), bdrv_query_dirty_bitmaps(), throttle_group_get_config(), bdrv_filter_or_cow_bs(), bdrv_skip_implicit_filters() 2) bdrv_co_do_query_node_info() (see previous commits); This was the only caller of bdrv_query_image_info(), so we can remove the wrapper for that function now. Link: https://lore.kernel.org/r/20240409145917.6780-1-farosas@suse.de References: bsc#1211000 Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>	2025-04-01 17:58:57 +02:00
Fabiano Rosas	809f42450c	[openSUSE] block: Convert bdrv_query_image_info to coroutine (bsc#1211000) This function is a caller of bdrv_do_query_node_info(), which have been converted to a coroutine. Convert this function as well so we're closer from having the whole qmp_query_block as a single coroutine. Also remove the wrapper for bdrv_co_do_query_node_info() now that all its callers are converted. Link: https://lore.kernel.org/r/20240409145917.6780-1-farosas@suse.de References: bsc#1211000 Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>	2025-04-01 17:58:57 +02:00
Fabiano Rosas	ea5943a913	[openSUSE] block: Convert bdrv_query_block_graph_info to coroutine (bsc#1211000) We're converting callers of bdrv_co_get_allocated_file_size() to run in coroutines because that function will be made asynchronous when called (indirectly) from the QMP dispatcher. This function is a candidate because it calls bdrv_do_query_node_info(), which in turn calls bdrv_co_get_allocated_file_size(). All the functions called from bdrv_do_query_node_info() onwards are coroutine-safe, either have a coroutine version themselves[1] or are mostly simple code/string manipulation[2]. 1) bdrv_co_getlength(), bdrv_co_get_allocated_file_size(), bdrv_co_get_info(); 2) bdrv_refresh_filename(), bdrv_get_format_name(), bdrv_get_full_backing_filename(), bdrv_query_snapshot_info_list(), bdrv_get_specific_info(); Link: https://lore.kernel.org/r/20240409145917.6780-1-farosas@suse.de Reviewed-by: Hanna Czenczek <hreitz@redhat.com> References: bsc#1211000 Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>	2025-04-01 17:58:57 +02:00
Fabiano Rosas	efd51e6bd1	[openSUSE] block: Run bdrv_do_query_node_info in a coroutine (bsc#1211000) Move this function into a coroutine so we can convert the whole qmp_query_block command into a coroutine in the next patches. Placing the entire command in a coroutine allow us to yield all the way back to the main loop, releasing the BQL and unblocking the main loop. When the whole conversion is completed, we'll be able to avoid a priority inversion that happens when a QMP command calls a slow (buggy) system call and blocks the vcpu thread from doing mmio due to contention on the BQL. About coroutine safety: Most callees have coroutine versions themselves and thus are safe to call in a coroutine. The remaining ones: - bdrv_refresh_filename, bdrv_get_full_backing_filename: String manipulation, nothing that would be unsafe for use in coroutines; - bdrv_get_format_name: Just accesses a field; - bdrv_get_specific_info, bdrv_query_snapshot_info_list: No locks or anything that would poll or block. (using a mixed wrapper for now, but after all callers are converted, this can become a coroutine exclusively) Link: https://lore.kernel.org/r/20240409145917.6780-1-farosas@suse.de References: bsc#1211000 Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>	2025-04-01 17:58:57 +02:00
Fabiano Rosas	da08f77a1a	[openSUSE] block: Reschedule query-block during qcow2 invalidation (bsc#1221812) There is a small window at the end of block device migration when devices are being re-activated. This includes a resetting of some fields of BDRVQcow2State at qcow2_co_invalidate_cache(). A concurrent QMP query-block command can call qcow2_get_specific_info() during this window and see the cleared values, which leads to an assert: qcow2_get_specific_info: Assertion `false' failed This is the same issue as Gitlab #1933, which has already been resolved[1], but there the fix applied only to non-coroutine commands. Once we move query-block to a coroutine the problem will manifest again. Add an operation blocker to the invalidation function to block the query info path during this window. Instead of failing query-block, which would be disruptive to users, use the blocker to know when to reschedule the coroutine back into the iohandler so it doesn't run while the BDRVQcow2State is inconsistent. To avoid failing query-block when all block operations are blocked, unblock the INFO operation at various places. This preserves the prior situations where query-block used to work. 1 - https://gitlab.com/qemu-project/qemu/-/issues/1933 Link: https://lore.kernel.org/all/87bk6trl9i.fsf@suse.de/ Link: https://lore.kernel.org/r/20240409145917.6780-1-farosas@suse.de References: bsc#1221812 Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>	2025-04-01 17:58:57 +02:00
Fabiano Rosas	fb1d246c6e	[openSUSE] block: Temporarily mark bdrv_co_get_allocated_file_size as mixed (bsc#1211000) Some callers of this function are about to be converted to run in coroutines, so allow it to be executed both inside and outside a coroutine while we convert all the callers. This will be reverted once all callers of bdrv_do_query_node_info run in a coroutine. Link: https://lore.kernel.org/r/20240409145917.6780-1-farosas@suse.de Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> References: bsc#1211000 Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>	2025-04-01 17:58:56 +02:00
Lin Ma	0c19f451ce	[openSUSE] scsi-generic: replace logical block count of response of READ CAPACITY (SLE-20965) While using SCSI passthrough, Following scenario makes qemu doesn't realized the capacity change of remote scsi target: 1. online resize the scsi target. 2. issue 'rescan-scsi-bus.sh -s ...' in host. 3. issue 'rescan-scsi-bus.sh -s ...' in vm. In above scenario I used to experienced errors while accessing the additional disk space in vm. I think the reasonable operations should be: 1. online resize the scsi target. 2. issue 'rescan-scsi-bus.sh -s ...' in host. 3. issue 'block_resize' via qmp to notify qemu. 4. issue 'rescan-scsi-bus.sh -s ...' in vm. The errors disappear once I notify qemu by block_resize via qmp. So this patch replaces the number of logical blocks of READ CAPACITY response from scsi target by qemu's bs->total_sectors. If the user in vm wants to access the additional disk space, The administrator of host must notify qemu once resizeing the scsi target. Bonus is that domblkinfo of libvirt can reflect the consistent capacity information between host and vm in case of missing block_resize in qemu. E.g: ... <disk type='block' device='lun'> <driver name='qemu' type='raw'/> <source dev='/dev/sdc' index='1'/> <backingStore/> <target dev='sda' bus='scsi'/> <alias name='scsi0-0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> ... Before: 1. online resize the scsi target. 2. host:~ # rescan-scsi-bus.sh -s /dev/sdc 3. guest:~ # rescan-scsi-bus.sh -s /dev/sda 4 host:~ # virsh domblkinfo --domain $DOMAIN --human --device sda Capacity: 4.000 GiB Allocation: 0.000 B Physical: 8.000 GiB 5. guest:~ # lsblk /dev/sda NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 8G 0 disk └─sda1 8:1 0 2G 0 part After: 1. online resize the scsi target. 2. host:~ # rescan-scsi-bus.sh -s /dev/sdc 3. guest:~ # rescan-scsi-bus.sh -s /dev/sda 4 host:~ # virsh domblkinfo --domain $DOMAIN --human --device sda Capacity: 4.000 GiB Allocation: 0.000 B Physical: 8.000 GiB 5. guest:~ # lsblk /dev/sda NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 4G 0 disk └─sda1 8:1 0 2G 0 part References: [SUSE-JIRA] (SLE-20965) Signed-off-by: Lin Ma <lma@suse.com>	2025-04-01 17:58:54 +02:00
Ayush Mishra	dbaa2936b3	hw/nvme: add NPDAL/NPDGL Add the NPDGL and NPDAL fields to support large alignment and granularities. Signed-off-by: Ayush Mishra <ayush.m55@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Link: https://lore.kernel.org/r/20241001012833.3551820-1-ayush.m55@samsung.com [k.jensen: renamed the enum values] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2024-11-04 19:09:45 +01:00
Arun Kumar	79e490058f	hw/nvme: i/o cmd set independent namespace data structure Add support for the I/O Command Set Independent Namespace Data Structure (CNS 8h and 1fh). Signed-off-by: Arun Kumar <arun.kka@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Link: https://lore.kernel.org/r/20240925004407.3521406-1-arun.kka@samsung.com Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2024-11-04 19:09:45 +01:00
Peter Maydell	51483f6c84	include: Move QemuLockCnt APIs to their own header Currently the QemuLockCnt data structure and associated functions are in the include/qemu/thread.h header. Move them to their own qemu/lockcnt.h. The main reason for doing this is that it means we can autogenerate the documentation comments into the docs/devel documentation. The copyright/author in the new header is drawn from lockcnt.c, since the header changes were added in the same commit as lockcnt.c; since neither thread.h nor lockcnt.c state an explicit license, the standard default of GPL-2-or-later applies. We include the new header (and the .c file, which was accidentally omitted previously) in the "RCU" part of MAINTAINERS, since that is where the lockcnt.rst documentation is categorized. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20240816132212.3602106-7-peter.maydell@linaro.org	2024-10-15 15:16:17 +01:00
Peter Maydell	718780d204	Merge tag 'pull-nvme-20241001' of https://gitlab.com/birkelund/qemu into staging nvme queue # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCgAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmb7nokACgkQTeGvMW1P # Del+2gf/YefiiYSL540C2QeYRwMFd6xFKKWYRRJoaARyLAoqInVdLiBql527Oov8 # rgDQq+D0XXP15CNDvfAZ59a36h1bAW79QCfKEUMSbP8GPeqb5pOSRfvYJSwnG1YX # SC70vKLOrBhzxYiQYSOhLNKdbUM00OUyf2xibu0zk84UpkXtzSR4h/byFnQIHwEV # /uUh4+cxY6eQK1Tfk/f66FEJLuJTchFOMVswYolDMezu2vJmToWHju/kpy2ugvaC # +WEUEti8kL66/B399u1uwAad2OejC1Jf4qMFcFQJ9Cs9RV4HTC9byolceJE+1R0V # CZt1SxvBNdK/ihs1iTjP7fInPqdYKw== # =16tX # -----END PGP SIGNATURE----- # gpg: Signature made Tue 01 Oct 2024 08:02:33 BST # gpg: using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9 # gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [full] # gpg: aka "Klaus Jensen <k.jensen@samsung.com>" [full] # Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468 4272 63D5 6FC5 E55D A838 # Subkey fingerprint: 5228 33AA 75E2 DCE6 A247 66C0 4DE1 AF31 6D4F 0DE9 * tag 'pull-nvme-20241001' of https://gitlab.com/birkelund/qemu: hw/nvme: add atomic write support hw/nvme: add knob for CTRATT.MEM hw/nvme: support CTRATT.MEM hw/nvme: clear masked events from the aer queue hw/nvme: report id controller metadata sgl support Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2024-10-01 11:34:07 +01:00
Arun Kumar	a1ab67883d	hw/nvme: support CTRATT.MEM Indicate that 'MDTS and Size Limits Exclude Metadata (MEM)' in the Controller Attributes (CTRATT) I/O Command Set Independent Identify Controller Data Structure. Signed-off-by: Arun Kumar <arun.kka@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> [k.jensen: updated commit message] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2024-09-30 12:45:17 +02:00
Dr. David Alan Gilbert	e84af3eb72	block: Remove unused aio_task_pool_empty aio_task_pool_empty has been unused since it was added in `6e9b225f73` ("block: introduce aio task pool") Remove it. Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org> Message-Id: <20240917002007.330689-1-dave@treblig.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2024-09-30 10:53:18 +03:00
Fiona Ebner	9484ad6c17	copy-before-write: allow specifying minimum cluster size In the context of backup fleecing, discarding the source will not work when the fleecing image has a larger granularity than the one used for block-copy operations (can happen if the backup target has smaller cluster size), because cbw_co_pdiscard_snapshot() will align down the discard requests and thus effectively ignore then. To make @discard-source work in such a scenario, allow specifying the minimum cluster size used for block-copy operations and thus in particular also the granularity for discard requests to the source. The type 'size' (corresponding to uint64_t in C) is used in QAPI to rule out negative inputs and for consistency with already existing @cluster-size parameters. Since block_copy_calculate_cluster_size() uses int64_t for its result, a check that the input is not too large is added in block_copy_state_new() before calling it. The calculation in block_copy_calculate_cluster_size() is done in the target int64_t type. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Acked-by: Markus Armbruster <armbru@redhat.com> (QAPI schema) Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240711120915.310243-2-f.ebner@proxmox.com> [vsementsov: switch version to 9.2 in QAPI doc] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2024-09-30 10:52:41 +03:00
Yoochan Jeong	7c85332a2b	hw/ufs: minor bug fixes related to ufs-test Minor bugs and errors related to ufs-test are resolved. Some permissions and code implementations that are not synchronized with the ufs spec are edited. Signed-off-by: Yoochan Jeong <yc01.jeong@samsung.com> Reviewed-by: Jeuk Kim <jeuk20.kim@samsung.com> Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>	2024-09-06 18:04:16 +09:00
Eric Blake	c8a76dbd90	nbd/server: CVE-2024-7409: Cap default max-connections to 100 Allowing an unlimited number of clients to any web service is a recipe for a rudimentary denial of service attack: the client merely needs to open lots of sockets without closing them, until qemu no longer has any more fds available to allocate. For qemu-nbd, we default to allowing only 1 connection unless more are explicitly asked for (-e or --shared); this was historically picked as a nice default (without an explicit -t, a non-persistent qemu-nbd goes away after a client disconnects, without needing any additional follow-up commands), and we are not going to change that interface now (besides, someday we want to point people towards qemu-storage-daemon instead of qemu-nbd). But for qemu proper, and the newer qemu-storage-daemon, the QMP nbd-server-start command has historically had a default of unlimited number of connections, in part because unlike qemu-nbd it is inherently persistent until nbd-server-stop. Allowing multiple client sockets is particularly useful for clients that can take advantage of MULTI_CONN (creating parallel sockets to increase throughput), although known clients that do so (such as libnbd's nbdcopy) typically use only 8 or 16 connections (the benefits of scaling diminish once more sockets are competing for kernel attention). Picking a number large enough for typical use cases, but not unlimited, makes it slightly harder for a malicious client to perform a denial of service merely by opening lots of connections withot progressing through the handshake. This change does not eliminate CVE-2024-7409 on its own, but reduces the chance for fd exhaustion or unlimited memory usage as an attack surface. On the other hand, by itself, it makes it more obvious that with a finite limit, we have the problem of an unauthenticated client holding 100 fds opened as a way to block out a legitimate client from being able to connect; thus, later patches will further add timeouts to reject clients that are not making progress. This is an INTENTIONAL change in behavior, and will break any client of nbd-server-start that was not passing an explicit max-connections parameter, yet expects more than 100 simultaneous connections. We are not aware of any such client (as stated above, most clients aware of MULTI_CONN get by just fine on 8 or 16 connections, and probably cope with later connections failing by relying on the earlier connections; libvirt has not yet been passing max-connections, but generally creates NBD servers with the intent for a single client for the sake of live storage migration; meanwhile, the KubeSAN project anticipates a large cluster sharing multiple clients [up to 8 per node, and up to 100 nodes in a cluster], but it currently uses qemu-nbd with an explicit --shared=0 rather than qemu-storage-daemon with nbd-server-start). We considered using a deprecation period (declare that omitting max-parameters is deprecated, and make it mandatory in 3 releases - then we don't need to pick an arbitrary default); that has zero risk of breaking any apps that accidentally depended on more than 100 connections, and where such breakage might not be noticed under unit testing but only under the larger loads of production usage. But it does not close the denial-of-service hole until far into the future, and requires all apps to change to add the parameter even if 100 was good enough. It also has a drawback that any app (like libvirt) that is accidentally relying on an unlimited default should seriously consider their own CVE now, at which point they are going to change to pass explicit max-connections sooner than waiting for 3 qemu releases. Finally, if our changed default breaks an app, that app can always pass in an explicit max-parameters with a larger value. It is also intentional that the HMP interface to nbd-server-start is not changed to expose max-connections (any client needing to fine-tune things should be using QMP). Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20240807174943.771624-12-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [ericb: Expand commit message to summarize Dan's argument for why we break corner-case back-compat behavior without a deprecation period] Signed-off-by: Eric Blake <eblake@redhat.com>	2024-08-08 16:02:23 -05:00
Eric Blake	fb1c2aaa98	nbd/server: Plumb in new args to nbd_client_add() Upcoming patches to fix a CVE need to track an opaque pointer passed in by the owner of a client object, as well as request for a time limit on how fast negotiation must complete. Prepare for that by changing the signature of nbd_client_new() and adding an accessor to get at the opaque pointer, although for now the two servers (qemu-nbd.c and blockdev-nbd.c) do not change behavior even though they pass in a new default timeout value. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20240807174943.771624-11-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [eblake: s/LIMIT/MAX_SECS/ as suggested by Dan] Signed-off-by: Eric Blake <eblake@redhat.com>	2024-08-08 15:05:27 -05:00
Kevin Wolf	7e17111646	block/graph-lock: Make WITH_GRAPH_RDLOCK_GUARD() fully checked Upstream clang 18 (and backports to clang 17 in Fedora and RHEL) implemented support for __attribute__((cleanup())) in its Thread Safety Analysis, so we can now actually have a proper implementation of WITH_GRAPH_RDLOCK_GUARD() that understands when we acquire and when we release the lock. -Wthread-safety is now only enabled if the compiler is new enough to understand this pattern. In theory, we could have used some #ifdefs to keep the existing basic checks on old compilers, but as long as someone runs a newer compiler (and our CI does), we will catch locking problems, so it's probably not worth keeping multiple implementations for this. The implementation can't use g_autoptr any more because the glib macros define wrapper functions that don't have the right TSA attributes, so the compiler would complain about them. Just use the cleanup attribute directly instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20240627181245.281403-3-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2024-08-06 20:12:39 +02:00
Arun Kumar	d522aef88d	hw/nvme: add cross namespace copy support Extend copy command to copy user data across different namespaces via support for specifying a namespace for each source range Signed-off-by: Arun Kumar <arun.kka@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2024-07-22 14:36:15 +02:00
Minwoo Im	6471556500	hw/nvme: add Identify Endurance Group List Commit `73064edfb8` ("hw/nvme: flexible data placement emulation") intorudced NVMe FDP feature to nvme-subsys and nvme-ctrl with a single endurance group #1 supported. This means that controller should return proper identify data to host with Identify Endurance Group List (CNS 19h). But, yes, only just for the endurance group #1. This patch allows host applications to ask for which endurance group is available and utilize FDP through that endurance group. Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Minwoo Im <minwoo.im@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2024-07-11 17:05:37 +02:00
Paolo Bonzini	44b424dc4a	block: remove separate bdrv_file_open callback bdrv_file_open and bdrv_open are completely equivalent, they are never checked except to see which one to invoke. So merge them into a single one. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-06-28 14:44:51 +02:00
Prasad Pandit	24687abf23	linux-aio: add IO_CMD_FDSYNC command support Libaio defines IO_CMD_FDSYNC command to sync all outstanding asynchronous I/O operations, by flushing out file data to the disk storage. Enable linux-aio to submit such aio request. When using aio=native without fdsync() support, QEMU creates pthreads, and destroying these pthreads results in TLB flushes. In a real-time guest environment, TLB flushes cause a latency spike. This patch helps to avoid such spikes. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Message-ID: <20240425070412.37248-1-ppandit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2024-06-10 11:05:43 +02:00
Stefan Hajnoczi	e669e800fc	aio: warn about iohandler_ctx special casing The main loop has two AioContexts: qemu_aio_context and iohandler_ctx. The main loop runs them both, but nested aio_poll() calls on qemu_aio_context exclude iohandler_ctx. Which one should qemu_get_current_aio_context() return when called from the main loop? Document that it's always qemu_aio_context. This has subtle effects on functions that use qemu_get_current_aio_context(). For example, aio_co_reschedule_self() does not work when moving from iohandler_ctx to qemu_aio_context because qemu_get_current_aio_context() does not differentiate these two AioContexts. Document this in order to reduce the chance of future bugs. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20240506190622.56095-3-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2024-06-10 11:05:43 +02:00
Minwoo Im	5c079578d2	hw/ufs: Add support MCQ of UFSHCI 4.0 This patch adds support for MCQ defined in UFSHCI 4.0. This patch utilized the legacy I/O codes as much as possible to support MCQ. MCQ operation & runtime register is placed at 0x1000 offset of UFSHCI register statically with no spare space among four registers (48B): UfsMcqSqReg, UfsMcqSqIntReg, UfsMcqCqReg, UfsMcqCqIntReg The maxinum number of queue is 32 as per spec, and the default MAC(Multiple Active Commands) are 32 in the device. Example: -device ufs,serial=foo,id=ufs0,mcq=true,mcq-maxq=8 Signed-off-by: Minwoo Im <minwoo.im@samsung.com> Reviewed-by: Jeuk Kim <jeuk20.kim@samsung.com> Message-Id: <20240528023106.856777-3-minwoo.im@samsung.com> Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>	2024-06-03 16:20:42 +09:00
Minwoo Im	cdba3b901a	hw/ufs: Update MCQ-related fields to block/ufs.h This patch is a prep patch for the following MCQ support patch for hw/ufs. This patch updated minimal mandatory fields to support MCQ based on UFSHCI 4.0. Signed-off-by: Minwoo Im <minwoo.im@samsung.com> Reviewed-by: Jeuk Kim <jeuk20.kim@samsung.com> Message-Id: <20240528023106.856777-2-minwoo.im@samsung.com> Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>	2024-06-03 16:20:42 +09:00
Vladimir Sementsov-Ogievskiy	0fd05c8d80	qapi: blockdev-backup: add discard-source parameter Add a parameter that enables discard-after-copy. That is mostly useful in "push backup with fleecing" scheme, when source is snapshot-access format driver node, based on copy-before-write filter snapshot-access API: [guest] [snapshot-access] ~~ blockdev-backup ~~> [backup target] \| \| \| root \| file v v [copy-before-write] \| \| \| file \| target v v [active disk] [temp.img] In this case discard-after-copy does two things: - discard data in temp.img to save disk space - avoid further copy-before-write operation in discarded area Note that we have to declare WRITE permission on source in copy-before-write filter, for discard to work. Still we can't take it unconditionally, as it will break normal backup from RO source. So, we have to add a parameter and pass it thorough bdrv_open flags. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Fiona Ebner <f.ebner@proxmox.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20240313152822.626493-5-vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2024-05-28 15:52:15 +03:00
Vladimir Sementsov-Ogievskiy	006e845b5a	block/copy-before-write: create block_copy bitmap in filter node Currently block_copy creates copy_bitmap in source node. But that is in bad relation with .independent_close=true of copy-before-write filter: source node may be detached and removed before .bdrv_close() handler called, which should call block_copy_state_free(), which in turn should remove copy_bitmap. That's all not ideal: it would be better if internal bitmap of block-copy object is not attached to any node. But that is not possible now. The simplest solution is just create copy_bitmap in filter node, where anyway two other bitmaps are created. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Fiona Ebner <f.ebner@proxmox.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240313152822.626493-4-vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2024-05-28 15:52:15 +03:00
Hanna Czenczek	5bdbaebcce	virtio: Re-enable notifications after drain During drain, we do not care about virtqueue notifications, which is why we remove the handlers on it. When removing those handlers, whether vq notifications are enabled or not depends on whether we were in polling mode or not; if not, they are enabled (by default); if so, they have been disabled by the io_poll_start callback. Because we do not care about those notifications after removing the handlers, this is fine. However, we have to explicitly ensure they are enabled when re-attaching the handlers, so we will resume receiving notifications. We do this in virtio_queue_aio_attach_host_notifier*(). If such a function is called while we are in a polling section, attaching the notifiers will then invoke the io_poll_start callback, re-disabling notifications. Because we will always miss virtqueue updates in the drained section, we also need to poll the virtqueue once after attaching the notifiers. Buglink: https://issues.redhat.com/browse/RHEL-3934 Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20240202153158.788922-3-hreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2024-02-07 21:51:03 +01:00
Peter Krempa	72098a3aba	stream: Allow users to request only format driver names in backing file format Introduce a new flag 'backing-mask-protocol' for the block-stream QMP command which instructs the internals to use 'raw' instead of the protocol driver in case when a image is used without a dummy 'raw' wrapper. The flag is designed such that it can be always asserted by management tools even when there isn't any update to backing files. The flag will be used by libvirt so that the backing images still reference the proper format even when libvirt will stop using the dummy raw driver (raw driver with no other config). Libvirt needs this so that the images stay compatible with older libvirt versions which didn't expect that a protocol driver name can appear in the backing file format field. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <bbee9a0a59748a8893289bf8249f568f0d587e62.1701796348.git.pkrempa@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2024-01-26 11:16:58 +01:00
Peter Krempa	4b028cbe75	commit: Allow users to request only format driver names in backing file format Introduce a new flag 'backing-mask-protocol' for the block-commit QMP command which instructs the internals to use 'raw' instead of the protocol driver in case when a image is used without a dummy 'raw' wrapper. The flag is designed such that it can be always asserted by management tools even when there isn't any update to backing files. The flag will be used by libvirt so that the backing images still reference the proper format even when libvirt will stop using the dummy raw driver (raw driver with no other config). Libvirt needs this so that the images stay compatible with older libvirt versions which didn't expect that a protocol driver name can appear in the backing file format field. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <2cb46e37093ce793ea1604abc8bbb90f4c8e434b.1701796348.git.pkrempa@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2024-01-26 11:16:58 +01:00
Paolo Bonzini	3cbc17ee92	io_uring: move LuringState typedef to block/aio.h The LuringState typedef is defined twice, in include/block/raw-aio.h and block/io_uring.c. Move it in include/block/aio.h, which is included everywhere the typedef is needed, since include/block/aio.h already has to define the forward reference to the struct. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-01-18 10:43:14 +01:00
Stefan Hajnoczi	0b2675c473	Rename "QEMU global mutex" to "BQL" in comments and docs The term "QEMU global mutex" is identical to the more widely used Big QEMU Lock ("BQL"). Update the code comments and documentation to use "BQL" instead of "QEMU global mutex". Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Message-id: 20240102153529.486531-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2024-01-08 10:45:43 -05:00
Stefan Hajnoczi	195801d700	system/cpus: rename qemu_mutex_lock_iothread() to bql_lock() The Big QEMU Lock (BQL) has many names and they are confusing. The actual QemuMutex variable is called qemu_global_mutex but it's commonly referred to as the BQL in discussions and some code comments. The locking APIs, however, are called qemu_mutex_lock_iothread() and qemu_mutex_unlock_iothread(). The "iothread" name is historic and comes from when the main thread was split into into KVM vcpu threads and the "iothread" (now called the main loop thread). I have contributed to the confusion myself by introducing a separate --object iothread, a separate concept unrelated to the BQL. The "iothread" name is no longer appropriate for the BQL. Rename the locking APIs to: - void bql_lock(void) - void bql_unlock(void) - bool bql_locked(void) There are more APIs with "iothread" in their names. Subsequent patches will rename them. There are also comments and documentation that will be updated in later patches. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Fabiano Rosas <farosas@suse.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Cédric Le Goater <clg@kaod.org> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20240102153529.486531-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2024-01-08 10:45:43 -05:00
Philippe Mathieu-Daudé	897a06c6d7	iothread: Remove unused Error argument in aio_context_set_aio_params aio_context_set_aio_params() doesn't use its undocumented Error argument. Remove it to simplify. Note this removes a use of "unchecked Error**" in iothread_set_aio_context_params(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20231120171806.19361-1-philmd@linaro.org>	2024-01-08 10:45:34 -05:00
Stefan Hajnoczi	23c983c8f6	block: remove outdated AioContext locking comments The AioContext lock no longer exists. There is one noteworthy change: - * More specifically, these functions use BDRV_POLL_WHILE(bs), which - * requires the caller to be either in the main thread and hold - * the BlockdriverState (bs) AioContext lock, or directly in the - * home thread that runs the bs AioContext. Calling them from - * another thread in another AioContext would cause deadlocks. + * More specifically, these functions use BDRV_POLL_WHILE(bs), which requires + * the caller to be either in the main thread or directly in the home thread + * that runs the bs AioContext. Calling them from another thread in another + * AioContext would cause deadlocks. I am not sure whether deadlocks are still possible. Maybe they have just moved to the fine-grained locks that have replaced the AioContext. Since I am not sure if the deadlocks are gone, I have kept the substance unchanged and just removed mention of the AioContext. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20231205182011.1976568-15-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-12-21 22:49:27 +01:00
Stefan Hajnoczi	9f8d2fdcce	aio: remove aio_context_acquire()/aio_context_release() API Delete these functions because nothing calls these functions anymore. I introduced these APIs in commit `98563fc3ec` ("aio: add aio_context_acquire() and aio_context_release()") in 2014. It's with a sigh of relief that I delete these APIs almost 10 years later. Thanks to Paolo Bonzini's vision for multi-queue QEMU, we got an understanding of where the code needed to go in order to remove the limitations that the original dataplane and the IOThread/AioContext approach that followed it. Emanuele Giuseppe Esposito had the splendid determination to convert large parts of the codebase so that they no longer needed the AioContext lock. This was a painstaking process, both in the actual code changes required and the iterations of code review that Emanuele eked out of Kevin and me over many months. Kevin Wolf tackled multitudes of graph locking conversions to protect in-flight I/O from run-time changes to the block graph as well as the clang Thread Safety Analysis annotations that allow the compiler to check whether the graph lock is being used correctly. And me, well, I'm just here to add some pizzazz to the QEMU multi-queue block layer :). Thank you to everyone who helped with this effort, including Eric Blake, code reviewer extraordinaire, and others who I've forgotten to mention. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20231205182011.1976568-11-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-12-21 22:49:27 +01:00
Stefan Hajnoczi	95bbddf9ad	aio-wait: draw equivalence between AIO_WAIT_WHILE() and AIO_WAIT_WHILE_UNLOCKED() Now that the AioContext lock no longer exists, AIO_WAIT_WHILE() and AIO_WAIT_WHILE_UNLOCKED() are equivalent. A future patch will get rid of AIO_WAIT_WHILE_UNLOCKED(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20231205182011.1976568-10-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-12-21 22:49:27 +01:00
Stefan Hajnoczi	c43d5bc858	block: remove bdrv_co_lock() The bdrv_co_lock() and bdrv_co_unlock() functions are already no-ops. Remove them. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20231205182011.1976568-8-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-12-21 22:49:27 +01:00
Stefan Hajnoczi	b49f4755c7	block: remove AioContext locking This is the big patch that removes aio_context_acquire()/aio_context_release() from the block layer and affected block layer users. There isn't a clean way to split this patch and the reviewers are likely the same group of people, so I decided to do it in one patch. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Message-ID: <20231205182011.1976568-7-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-12-21 22:49:27 +01:00
Stefan Hajnoczi	6bc30f1949	graph-lock: remove AioContext locking Stop acquiring/releasing the AioContext lock in bdrv_graph_wrlock()/bdrv_graph_unlock() since the lock no longer has any effect. The distinction between bdrv_graph_wrunlock() and bdrv_graph_wrunlock_ctx() becomes meaningless and they can be collapsed into one function. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231205182011.1976568-6-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-12-21 22:49:27 +01:00
Kevin Wolf	6bc0bcc89f	block: Fix deadlocks in bdrv_graph_wrunlock() bdrv_graph_wrunlock() calls aio_poll(), which may run callbacks that have a nested event loop. Nested event loops can depend on other iothreads making progress, so in order to allow them to make progress it must not hold the AioContext lock of another thread while calling aio_poll(). This introduces a @bs parameter to bdrv_graph_wrunlock() whose AioContext is temporarily dropped (which matches bdrv_graph_wrlock()), and a bdrv_graph_wrunlock_ctx() that can be used if the BlockDriverState doesn't necessarily exist any more when unlocking. This also requires a change to bdrv_schedule_unref(), which was relying on the incorrectly taken lock. It needs to take the lock itself now. While this is a separate bug, it can't be fixed a separate patch because otherwise the intermediate state would either deadlock or try to release a lock that we don't even hold. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231115172012.112727-3-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [kwolf: Fixed up bdrv_schedule_unref()] Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-11-21 12:45:21 +01:00
Michael Tokarev	a4dbf3fecb	include/block/ufs.h: spelling fix: setted Fixes: `bc4e68d362` "hw/ufs: Initial commit for emulated Universal-Flash-Storage" Reviewed-by: Jeuk Kim <jeuk20.kim@samsung.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-11-15 12:06:04 +03:00
Kevin Wolf	1f051dcbdf	block: Protect bs->file with graph_lock Almost all functions that access bs->file already take the graph lock now. Add locking to the remaining users and finally annotate the struct field itself as protected by the graph lock. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231027155333.420094-25-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-11-08 17:56:18 +01:00
Kevin Wolf	79a5586648	block: Add missing GRAPH_RDLOCK annotations This adds GRAPH_RDLOCK to some driver callbacks that are already called with the graph lock held, and which will need the annotation because they access bs->file, but don't have it yet. This also covers a few callbacks that were not marked GRAPH_RDLOCK before, but where updating BlockDriver is trivially possible. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231027155333.420094-21-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-11-08 17:56:17 +01:00
Kevin Wolf	e2dd273754	block: Introduce bdrv_co_change_backing_file() bdrv_change_backing_file() is called both inside and outside coroutine context. This makes it difficult for it to take the graph lock internally. It also means that driver implementations need to be able to run outside of coroutines, too. Switch it to the usual model with a coroutine based implementation and a co_wrapper instead. The new function is marked GRAPH_RDLOCK. As the co_wrapper now runs the function in the AioContext of the node (as it should always have done), this is not GLOBAL_STATE_CODE() any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231027155333.420094-20-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-11-08 17:56:17 +01:00
Kevin Wolf	004915a96a	block: Protect bs->backing with graph_lock Almost all functions that access bs->backing already take the graph lock now. Add locking to the remaining users and finally annotate the struct field itself as protected by the graph lock. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231027155333.420094-18-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-11-08 17:56:17 +01:00
Kevin Wolf	ccd6a37947	block: Mark bdrv_replace_node() GRAPH_WRLOCK Instead of taking the writer lock internally, require callers to already hold it when calling bdrv_replace_node(). Its callers may already want to hold the graph lock and so wouldn't be able to call functions that take it internally. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231027155333.420094-17-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-11-07 19:14:20 +01:00
Kevin Wolf	d0f9fd94d9	block: Mark bdrv_set_backing_hd_drained() GRAPH_WRLOCK Instead of taking the writer lock internally, require callers to already hold it when calling bdrv_set_backing_hd_drained(). Basically everthing in the function needs the lock and its callers may already want to hold the graph lock and so wouldn't be able to call functions that take it internally. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231027155333.420094-14-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-11-07 19:14:20 +01:00

1 2 3 4 5 ...

1610 Commits