tests/migration/guestperf: Add --iter argument to batch runs

Allow batch runs to be executed multiple times. This is useful for collecting averages. Signed-off-by: Fabiano Rosas <farosas@suse.de>
tests/migration/guestperf: Print completion statistics
2023-05-31 13:40:47 -03:00 · 2023-05-31 13:40:47 -03:00 · 2023-05-31 13:40:47 -03:00 · 2023-05-31 13:40:47 -03:00 · 2023-05-31 13:40:47 -03:00 · 2023-05-31 13:40:47 -03:00
34 changed files with 1726 additions and 180 deletions
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -39,6 +39,8 @@ over any transport.
 - exec migration: do the migration using the stdin/stdout through a process.
 - fd migration: do the migration using a file descriptor that is
  passed to QEMU.  QEMU doesn't care how this file descriptor is opened.
+- file migration: do the migration using a file that is passed by name
+  to QEMU.

 In addition, support is included for migration using RDMA, which
 transports the page data using ``RDMA``, where the hardware takes care of
@@ -566,6 +568,20 @@ Others (especially either older devices or system devices which for
 some reason don't have a bus concept) make use of the ``instance id``
 for otherwise identically named devices.

+Fixed-ram format
+----------------
+
+When the ``fixed-ram`` capability is enabled, a slightly different
+stream format is used for the RAM section. Instead of having a
+sequential stream of pages that follow the RAMBlock headers, the dirty
+pages for a RAMBlock follow its header. This ensures that each RAM
+page has a fixed offset in the resulting migration stream.
+
+The ``fixed-ram`` capaility can be enabled in both source and
+destination with:
+
+    ``migrate_set_capability fixed-ram on``
+
 Return path
 -----------

--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -44,6 +44,14 @@ struct RAMBlock {
    size_t page_size;
    /* dirty bitmap used during migration */
    unsigned long *bmap;
+    /* shadow dirty bitmap used when migrating to a file */
+    unsigned long *shadow_bmap;
+    /*
+     * offset in the file pages belonging to this ramblock are saved,
+     * used only during migration to a file.
+     */
+    off_t bitmap_offset;
+    uint64_t pages_offset;
    /* bitmap of already received pages in postcopy */
    unsigned long *receivedmap;

--- a/include/io/channel-file.h
+++ b/include/io/channel-file.h
@@ -22,6 +22,7 @@
 #define QIO_CHANNEL_FILE_H

 #include "io/channel.h"
+#include "io/task.h"
 #include "qom/object.h"

 #define TYPE_QIO_CHANNEL_FILE "qio-channel-file"
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -33,8 +33,10 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
 #define QIO_CHANNEL_ERR_BLOCK -2

 #define QIO_CHANNEL_WRITE_FLAG_ZERO_COPY 0x1
+#define QIO_CHANNEL_WRITE_FLAG_WITH_OFFSET 0x2

 #define QIO_CHANNEL_READ_FLAG_MSG_PEEK 0x1
+#define QIO_CHANNEL_READ_FLAG_WITH_OFFSET 0x2

 typedef enum QIOChannelFeature QIOChannelFeature;

@@ -44,6 +46,7 @@ enum QIOChannelFeature {
    QIO_CHANNEL_FEATURE_LISTEN,
    QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
    QIO_CHANNEL_FEATURE_READ_MSG_PEEK,
+    QIO_CHANNEL_FEATURE_SEEKABLE,
 };


@@ -128,6 +131,16 @@ struct QIOChannelClass {
                           Error **errp);

    /* Optional callbacks */
+    ssize_t (*io_pwritev)(QIOChannel *ioc,
+                          const struct iovec *iov,
+                          size_t niov,
+                          off_t offset,
+                          Error **errp);
+    ssize_t (*io_preadv)(QIOChannel *ioc,
+                         const struct iovec *iov,
+                         size_t niov,
+                         off_t offset,
+                         Error **errp);
    int (*io_shutdown)(QIOChannel *ioc,
                       QIOChannelShutdown how,
                       Error **errp);
@@ -510,6 +523,126 @@ int qio_channel_set_blocking(QIOChannel *ioc,
 int qio_channel_close(QIOChannel *ioc,
                      Error **errp);

+/**
+ * qio_channel_pwritev_full
+ * @ioc: the channel object
+ * @iov: the array of memory regions to write data from
+ * @niov: the length of the @iov array
+ * @offset: offset in the channel where writes should begin
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Not all implementations will support this facility, so may report
+ * an error. To avoid errors, the caller may check for the feature
+ * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
+ *
+ * Behaves as qio_channel_writev_full, apart from not supporting
+ * sending of file handles as well as beginning the write at the
+ * passed @offset
+ *
+ */
+ssize_t qio_channel_pwritev_full(QIOChannel *ioc, const struct iovec *iov,
+                                 size_t niov, off_t offset, Error **errp);
+
+/**
+ * qio_channel_write_full_all:
+ * @ioc: the channel object
+ * @iov: the array of memory regions to write data from
+ * @niov: the length of the @iov array
+ * @offset: the iovec offset in the file where to write the data
+ * @fds: an array of file handles to send
+ * @nfds: number of file handles in @fds
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
+ * @errp: pointer to a NULL-initialized error object
+ *
+ *
+ * Selects between a writev or pwritev channel writer function.
+ *
+ * If QIO_CHANNEL_WRITE_FLAG_OFFSET is passed in flags, pwritev is
+ * used and @offset is expected to be a meaningful value, @fds and
+ * @nfds are ignored; otherwise uses writev and @offset is ignored.
+ *
+ * Returns: 0 if all bytes were written, or -1 on error
+ */
+int qio_channel_write_full_all(QIOChannel *ioc, const struct iovec *iov,
+                               size_t niov, off_t offset, int *fds, size_t nfds,
+                               int flags, Error **errp);
+
+/**
+ * qio_channel_pwritev
+ * @ioc: the channel object
+ * @buf: the memory region to write data into
+ * @buflen: the number of bytes to @buf
+ * @offset: offset in the channel where writes should begin
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Not all implementations will support this facility, so may report
+ * an error. To avoid errors, the caller may check for the feature
+ * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
+ *
+ */
+ssize_t qio_channel_pwritev(QIOChannel *ioc, char *buf, size_t buflen,
+                            off_t offset, Error **errp);
+
+/**
+ * qio_channel_preadv_full
+ * @ioc: the channel object
+ * @iov: the array of memory regions to read data into
+ * @niov: the length of the @iov array
+ * @offset: offset in the channel where writes should begin
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Not all implementations will support this facility, so may report
+ * an error.  To avoid errors, the caller may check for the feature
+ * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
+ *
+ * Behaves as qio_channel_readv_full, apart from not supporting
+ * receiving of file handles as well as beginning the read at the
+ * passed @offset
+ *
+ */
+ssize_t qio_channel_preadv_full(QIOChannel *ioc, const struct iovec *iov,
+                                size_t niov, off_t offset, Error **errp);
+
+/**
+ * qio_channel_read_full_all:
+ * @ioc: the channel object
+ * @iov: the array of memory regions to read data to
+ * @niov: the length of the @iov array
+ * @offset: the iovec offset in the file from where to read the data
+ * @fds: an array of file handles to send
+ * @nfds: number of file handles in @fds
+ * @flags: read flags (QIO_CHANNEL_READ_FLAG_*)
+ * @errp: pointer to a NULL-initialized error object
+ *
+ *
+ * Selects between a readv or preadv channel reader function.
+ *
+ * If QIO_CHANNEL_READ_FLAG_OFFSET is passed in flags, preadv is
+ * used and @offset is expected to be a meaningful value, @fds and
+ * @nfds are ignored; otherwise uses readv and @offset is ignored.
+ *
+ * Returns: 0 if all bytes were read, or -1 on error
+ */
+int qio_channel_read_full_all(QIOChannel *ioc, const struct iovec *iov,
+                              size_t niov, off_t offset,
+                              int flags, Error **errp);
+
+/**
+ * qio_channel_preadv
+ * @ioc: the channel object
+ * @buf: the memory region to write data into
+ * @buflen: the number of bytes to @buf
+ * @offset: offset in the channel where writes should begin
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Not all implementations will support this facility, so may report
+ * an error.  To avoid errors, the caller may check for the feature
+ * flag QIO_CHANNEL_FEATURE_SEEKABLE prior to calling this method.
+ *
+ */
+ssize_t qio_channel_preadv(QIOChannel *ioc, char *buf, size_t buflen,
+                           off_t offset, Error **errp);
+
 /**
 * qio_channel_shutdown:
 * @ioc: the channel object
--- a/include/migration/qemu-file-types.h
+++ b/include/migration/qemu-file-types.h
@@ -50,6 +50,8 @@ unsigned int qemu_get_be16(QEMUFile *f);
 unsigned int qemu_get_be32(QEMUFile *f);
 uint64_t qemu_get_be64(QEMUFile *f);

+bool qemu_file_is_seekable(QEMUFile *f);
+
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
    qemu_put_be64(f, *pv);
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -570,6 +570,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
 bool qemu_has_ofd_lock(void);
 #endif

+bool qemu_has_direct_io(void);
+
 #if defined(__HAIKU__) && defined(__i386__)
 #define FMT_pid "%ld"
 #elif defined(WIN64)
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -35,6 +35,10 @@ qio_channel_file_new_fd(int fd)

    ioc->fd = fd;

+    if (lseek(fd, 0, SEEK_CUR) != (off_t)-1) {
+        qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE);
+    }
+
    trace_qio_channel_file_new_fd(ioc, fd);

    return ioc;
@@ -59,6 +63,10 @@ qio_channel_file_new_path(const char *path,
        return NULL;
    }

+    if (lseek(ioc->fd, 0, SEEK_CUR) != (off_t)-1) {
+        qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE);
+    }
+
    trace_qio_channel_file_new_path(ioc, path, flags, mode, ioc->fd);

    return ioc;
@@ -137,6 +145,56 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc,
    return ret;
 }

+static ssize_t qio_channel_file_preadv(QIOChannel *ioc,
+                                       const struct iovec *iov,
+                                       size_t niov,
+                                       off_t offset,
+                                       Error **errp)
+{
+    QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
+    ssize_t ret;
+
+ retry:
+    ret = preadv(fioc->fd, iov, niov, offset);
+    if (ret < 0) {
+        if (errno == EAGAIN) {
+            return QIO_CHANNEL_ERR_BLOCK;
+        }
+        if (errno == EINTR) {
+            goto retry;
+        }
+
+        error_setg_errno(errp, errno, "Unable to read from file");
+        return -1;
+    }
+
+    return ret;
+}
+
+static ssize_t qio_channel_file_pwritev(QIOChannel *ioc,
+                                        const struct iovec *iov,
+                                        size_t niov,
+                                        off_t offset,
+                                        Error **errp)
+{
+    QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
+    ssize_t ret;
+
+ retry:
+    ret = pwritev(fioc->fd, iov, niov, offset);
+    if (ret <= 0) {
+        if (errno == EAGAIN) {
+            return QIO_CHANNEL_ERR_BLOCK;
+        }
+        if (errno == EINTR) {
+            goto retry;
+        }
+        error_setg_errno(errp, errno, "Unable to write to file");
+        return -1;
+    }
+    return ret;
+}
+
 static int qio_channel_file_set_blocking(QIOChannel *ioc,
                                         bool enabled,
                                         Error **errp)
@@ -218,6 +276,8 @@ static void qio_channel_file_class_init(ObjectClass *klass,
    ioc_klass->io_writev = qio_channel_file_writev;
    ioc_klass->io_readv = qio_channel_file_readv;
    ioc_klass->io_set_blocking = qio_channel_file_set_blocking;
+    ioc_klass->io_pwritev = qio_channel_file_pwritev;
+    ioc_klass->io_preadv = qio_channel_file_preadv;
    ioc_klass->io_seek = qio_channel_file_seek;
    ioc_klass->io_close = qio_channel_file_close;
    ioc_klass->io_create_watch = qio_channel_file_create_watch;
--- a/io/channel.c
+++ b/io/channel.c
@@ -446,6 +446,146 @@ GSource *qio_channel_add_watch_source(QIOChannel *ioc,
 }


+ssize_t qio_channel_pwritev_full(QIOChannel *ioc, const struct iovec *iov,
+                                 size_t niov, off_t offset, Error **errp)
+{
+    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
+
+    if (!klass->io_pwritev) {
+        error_setg(errp, "Channel does not support pwritev");
+        return -1;
+    }
+
+    if (!qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_SEEKABLE)) {
+        error_setg_errno(errp, EINVAL, "Requested channel is not seekable");
+        return -1;
+    }
+
+    return klass->io_pwritev(ioc, iov, niov, offset, errp);
+}
+
+static int qio_channel_preadv_pwritev_contiguous(QIOChannel *ioc,
+                                                 const struct iovec *iov,
+                                                 size_t niov, off_t offset,
+                                                 bool is_write, Error **errp)
+{
+    ssize_t ret;
+    int i, slice_idx, slice_num;
+    uint64_t base, next, file_offset;
+    size_t len;
+
+    slice_idx = 0;
+    slice_num = 1;
+
+    /*
+     * If the iov array doesn't have contiguous elements, we need to
+     * split it in slices because we only have one (file) 'offset' for
+     * the whole iov. Do this here so callers don't need to break the
+     * iov array themselves.
+     */
+    for (i = 0; i < niov; i++, slice_num++) {
+        base = (uint64_t) iov[i].iov_base;
+
+        if (i != niov - 1) {
+            len = iov[i].iov_len;
+            next = (uint64_t) iov[i + 1].iov_base;
+
+            if (base + len == next) {
+                continue;
+            }
+        }
+
+        /*
+         * Use the offset of the first element of the segment that
+         * we're sending.
+         */
+        file_offset = offset + (uint64_t) iov[slice_idx].iov_base;
+
+        if (is_write) {
+            ret = qio_channel_pwritev_full(ioc, &iov[slice_idx], slice_num,
+                                           file_offset, errp);
+        } else {
+            ret = qio_channel_preadv_full(ioc, &iov[slice_idx], slice_num,
+                                          file_offset, errp);
+        }
+
+        if (ret < 0) {
+            break;
+        }
+
+        slice_idx += slice_num;
+        slice_num = 0;
+    }
+
+    return (ret < 0) ? -1 : 0;
+}
+
+int qio_channel_write_full_all(QIOChannel *ioc,
+                                const struct iovec *iov,
+                                size_t niov, off_t offset,
+                                int *fds, size_t nfds,
+                                int flags, Error **errp)
+{
+    if (flags & QIO_CHANNEL_WRITE_FLAG_WITH_OFFSET) {
+        return qio_channel_preadv_pwritev_contiguous(ioc, iov, niov,
+                                                     offset, true, errp);
+    }
+
+    return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, flags, errp);
+}
+
+ssize_t qio_channel_pwritev(QIOChannel *ioc, char *buf, size_t buflen,
+                            off_t offset, Error **errp)
+{
+    struct iovec iov = {
+        .iov_base = buf,
+        .iov_len = buflen
+    };
+
+    return qio_channel_pwritev_full(ioc, &iov, 1, offset, errp);
+}
+
+ssize_t qio_channel_preadv_full(QIOChannel *ioc, const struct iovec *iov,
+                                size_t niov, off_t offset, Error **errp)
+{
+    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
+
+    if (!klass->io_preadv) {
+        error_setg(errp, "Channel does not support preadv");
+        return -1;
+    }
+
+    if (!qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_SEEKABLE)) {
+        error_setg_errno(errp, EINVAL, "Requested channel is not seekable");
+        return -1;
+    }
+
+    return klass->io_preadv(ioc, iov, niov, offset, errp);
+}
+
+int qio_channel_read_full_all(QIOChannel *ioc, const struct iovec *iov,
+                              size_t niov, off_t offset,
+                              int flags, Error **errp)
+{
+    if (flags & QIO_CHANNEL_READ_FLAG_WITH_OFFSET) {
+        return qio_channel_preadv_pwritev_contiguous(ioc, iov, niov,
+                                                     offset, false, errp);
+    }
+
+    return qio_channel_readv_full_all(ioc, iov, niov, NULL, NULL, errp);
+}
+
+ssize_t qio_channel_preadv(QIOChannel *ioc, char *buf, size_t buflen,
+                           off_t offset, Error **errp)
+{
+    struct iovec iov = {
+        .iov_base = buf,
+        .iov_len = buflen
+    };
+
+    return qio_channel_preadv_full(ioc, &iov, 1, offset, errp);
+}
+
 int qio_channel_shutdown(QIOChannel *ioc,
                         QIOChannelShutdown how,
                         Error **errp)
--- a/migration/file.c
+++ b/migration/file.c
@@ -0,0 +1,131 @@
+#include "qemu/osdep.h"
+#include "io/channel-file.h"
+#include "file.h"
+#include "qemu/error-report.h"
+#include "migration.h"
+#include "options.h"
+
+static struct FileOutgoingArgs {
+    char *fname;
+    int flags;
+    int mode;
+} outgoing_args;
+
+static void qio_channel_file_connect_worker(QIOTask *task, gpointer opaque)
+{
+    /* noop */
+}
+
+static void file_migration_cancel(Error *errp)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+                      MIGRATION_STATUS_FAILED);
+    migration_cancel(errp);
+}
+
+int file_send_channel_destroy(QIOChannel *ioc)
+{
+    if (ioc) {
+        qio_channel_close(ioc, NULL);
+        object_unref(OBJECT(ioc));
+    }
+    g_free(outgoing_args.fname);
+    outgoing_args.fname = NULL;
+
+    return 0;
+}
+
+void file_send_channel_create(QIOTaskFunc f, void *data)
+{
+    QIOChannelFile *ioc;
+    QIOTask *task;
+    Error *errp = NULL;
+    int flags = outgoing_args.flags;
+
+    if (migrate_direct_io() && qemu_has_direct_io()) {
+        /*
+         * Enable O_DIRECT for the secondary channels. These are used
+         * for sending ram pages and writes should be guaranteed to be
+         * aligned to at least page size.
+         */
+        flags |= O_DIRECT;
+    }
+
+    ioc = qio_channel_file_new_path(outgoing_args.fname, flags,
+                                    outgoing_args.mode, &errp);
+    if (!ioc) {
+        file_migration_cancel(errp);
+        return;
+    }
+
+    task = qio_task_new(OBJECT(ioc), f, (gpointer)data, NULL);
+    qio_task_run_in_thread(task, qio_channel_file_connect_worker,
+                           (gpointer)data, NULL, NULL);
+}
+
+void file_start_outgoing_migration(MigrationState *s, const char *fname, Error **errp)
+{
+    QIOChannelFile *ioc;
+    int flags = O_CREAT | O_TRUNC | O_WRONLY;
+    mode_t mode = 0660;
+
+    ioc = qio_channel_file_new_path(fname, flags, mode, errp);
+    if (!ioc) {
+        error_report("Error creating migration outgoing channel");
+        return;
+    }
+
+    outgoing_args.fname = g_strdup(fname);
+    outgoing_args.flags = flags;
+    outgoing_args.mode = mode;
+
+    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-outgoing");
+    migration_channel_connect(s, QIO_CHANNEL(ioc), NULL, NULL);
+    object_unref(OBJECT(ioc));
+}
+
+static void file_process_migration_incoming(QIOTask *task, gpointer opaque)
+{
+    QIOChannelFile *ioc = opaque;
+
+    migration_channel_process_incoming(QIO_CHANNEL(ioc));
+    object_unref(OBJECT(ioc));
+}
+
+void file_start_incoming_migration(const char *fname, Error **errp)
+{
+    QIOChannelFile *ioc;
+    QIOTask *task;
+    int channels = 1;
+    int i = 0, fd;
+
+    ioc = qio_channel_file_new_path(fname, O_RDONLY, 0, errp);
+    if (!ioc) {
+        goto out;
+    }
+
+    if (migrate_multifd()) {
+        channels += migrate_multifd_channels();
+    }
+
+    fd = ioc->fd;
+
+    do {
+        qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming");
+        task = qio_task_new(OBJECT(ioc), file_process_migration_incoming,
+                            (gpointer)ioc, NULL);
+
+        qio_task_run_in_thread(task, qio_channel_file_connect_worker,
+                               (gpointer)ioc, NULL, NULL);
+    } while (++i < channels && (ioc = qio_channel_file_new_fd(fd)));
+
+out:
+    if (!ioc) {
+        error_report("Error creating migration incoming channel");
+        return;
+    }
+}
--- a/migration/file.h
+++ b/migration/file.h
@@ -0,0 +1,14 @@
+#ifndef QEMU_MIGRATION_FILE_H
+#define QEMU_MIGRATION_FILE_H
+
+#include "io/task.h"
+#include "channel.h"
+
+void file_start_outgoing_migration(MigrationState *s,
+                                   const char *filename,
+                                   Error **errp);
+
+void file_start_incoming_migration(const char *fname, Error **errp);
+void file_send_channel_create(QIOTaskFunc f, void *data);
+int file_send_channel_destroy(QIOChannel *ioc);
+#endif
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -16,6 +16,7 @@ softmmu_ss.add(files(
  'dirtyrate.c',
  'exec.c',
  'fd.c',
+  'file.c',
  'global_state.c',
  'migration-hmp-cmds.c',
  'migration.c',
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -364,6 +364,11 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
                }
            }
        }
+        if (params->has_direct_io) {
+            monitor_printf(mon, "%s: %s\n",
+                           MigrationParameter_str(MIGRATION_PARAMETER_DIRECT_IO),
+                           params->direct_io ? "on" : "off");
+        }
    }

    qapi_free_MigrationParameters(params);
@@ -620,6 +625,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
        error_setg(&err, "The block-bitmap-mapping parameter can only be set "
                   "through QMP");
        break;
+    case MIGRATION_PARAMETER_DIRECT_IO:
+        p->has_direct_io = true;
+        visit_type_bool(v, param, &p->direct_io, &err);
+        break;
    default:
        assert(0);
    }
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -20,6 +20,7 @@
 #include "migration/blocker.h"
 #include "exec.h"
 #include "fd.h"
+#include "file.h"
 #include "socket.h"
 #include "sysemu/runstate.h"
 #include "sysemu/sysemu.h"
@@ -105,7 +106,7 @@ static bool migration_needs_multiple_sockets(void)
 static bool uri_supports_multi_channels(const char *uri)
 {
    return strstart(uri, "tcp:", NULL) || strstart(uri, "unix:", NULL) ||
-           strstart(uri, "vsock:", NULL);
+           strstart(uri, "vsock:", NULL) || strstart(uri, "file:", NULL);
 }

 static bool
@@ -442,6 +443,8 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
        exec_start_incoming_migration(p, errp);
    } else if (strstart(uri, "fd:", &p)) {
        fd_start_incoming_migration(p, errp);
+    } else if (strstart(uri, "file:", &p)) {
+        file_start_incoming_migration(p, errp);
    } else {
        error_setg(errp, "unknown migration protocol: %s", uri);
    }
@@ -491,6 +494,10 @@ static void process_incoming_migration_bh(void *opaque)
    } else if (migration_incoming_colo_enabled()) {
        migration_incoming_disable_colo();
        vm_start();
+    } else if (global_state_received() &&
+               global_state_get_runstate() == RUN_STATE_PAUSED &&
+               (autostart || migrate_suspend())) {
+        vm_start();
    } else {
        runstate_set(global_state_get_runstate());
    }
@@ -695,6 +702,8 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
        }

        default_channel = (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC));
+    } else if (migrate_multifd() && migrate_fixed_ram()) {
+        default_channel = multifd_recv_first_channel();
    } else {
        default_channel = !mis->from_src_file;
    }
@@ -1397,6 +1406,7 @@ void migrate_init(MigrationState *s)
    error_free(s->error);
    s->error = NULL;
    s->hostname = NULL;
+    s->vmdesc = NULL;

    migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);

@@ -1662,6 +1672,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
        exec_start_outgoing_migration(s, p, &local_err);
    } else if (strstart(uri, "fd:", &p)) {
        fd_start_outgoing_migration(s, p, &local_err);
+    } else if (strstart(uri, "file:", &p)) {
+        file_start_outgoing_migration(s, p, &local_err);
    } else {
        if (!(has_resume && resume)) {
            yank_unregister_instance(MIGRATION_YANK_INSTANCE);
@@ -2625,7 +2637,7 @@ static MigThrError migration_detect_error(MigrationState *s)
    }
 }

-static void migration_calculate_complete(MigrationState *s)
+void migration_calculate_complete(MigrationState *s)
 {
    uint64_t bytes = migration_transferred_bytes(s->to_dst_file);
    int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
@@ -2657,8 +2669,7 @@ static void update_iteration_initial_status(MigrationState *s)
    s->iteration_initial_pages = ram_get_total_transferred_pages();
 }

-static void migration_update_counters(MigrationState *s,
-                                      int64_t current_time)
+void migration_update_counters(MigrationState *s, int64_t current_time)
 {
    uint64_t transferred, transferred_pages, time_spent;
    uint64_t current_bytes; /* bytes transferred since the beginning */
@@ -2757,6 +2768,7 @@ static void migration_iteration_finish(MigrationState *s)
    case MIGRATION_STATUS_COMPLETED:
        migration_calculate_complete(s);
        runstate_set(RUN_STATE_POSTMIGRATE);
+        trace_migration_status((int)s->mbps / 8, (int)s->pages_per_second, s->total_time);
        break;
    case MIGRATION_STATUS_COLO:
        assert(migrate_colo());
@@ -3165,6 +3177,33 @@ fail:
    return NULL;
 }

+static int fixed_ram_save_setup(MigrationState *s, Error **errp)
+{
+    if (!migrate_fixed_ram()) {
+        return 0;
+    }
+
+    if (!qemu_file_is_seekable(s->to_dst_file)) {
+        error_setg(errp, "Directly mapped memory requires a seekable transport");
+        return -1;
+    }
+
+    /*
+     * Fixed-ram migration is currently only used to address "vm
+     * suspend" scenarios, so the VM would always be stopped at the
+     * end of migration. Check if we can stop it now and use the
+     * knowledge that the VM is stopped to implement optimizations
+     * down the line.
+     */
+    if (migrate_suspend()) {
+        if (vm_stop_force_state(RUN_STATE_PAUSED)) {
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
 void migrate_fd_connect(MigrationState *s, Error *error_in)
 {
    Error *local_err = NULL;
@@ -3247,6 +3286,12 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
        return;
    }

+    if (fixed_ram_save_setup(s, &local_err) < 0) {
+        migrate_fd_cleanup(s);
+        migrate_fd_error(s, local_err);
+        return;
+    }
+
    if (multifd_save_setup(&local_err) != 0) {
        error_report_err(local_err);
        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -440,7 +440,9 @@ struct MigrationState {
 };

 void migrate_set_state(int *state, int old_state, int new_state);
-
+void migration_calculate_complete(MigrationState *s);
+void migration_update_counters(MigrationState *s,
+                               int64_t current_time);
 void migration_fd_process_incoming(QEMUFile *f, Error **errp);
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
 void migration_incoming_process(void);
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -17,6 +17,7 @@
 #include "exec/ramblock.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "file.h"
 #include "ram.h"
 #include "migration.h"
 #include "migration-stats.h"
@@ -28,6 +29,7 @@
 #include "threadinfo.h"
 #include "options.h"
 #include "qemu/yank.h"
+#include "io/channel-file.h"
 #include "io/channel-socket.h"
 #include "yank_functions.h"

@@ -140,6 +142,7 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p)
 static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
    uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
+    uint64_t read_base = 0;

    if (flags != MULTIFD_FLAG_NOCOMP) {
        error_setg(errp, "multifd %u: flags received %x flags expected %x",
@@ -150,7 +153,13 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
        p->iov[i].iov_base = p->host + p->normal[i];
        p->iov[i].iov_len = p->page_size;
    }
-    return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
+
+    if (migrate_fixed_ram()) {
+        read_base = p->pages->block->pages_offset - (uint64_t) p->host;
+    }
+
+    return qio_channel_read_full_all(p->c, p->iov, p->normal_num, read_base,
+                                     p->read_flags, errp);
 }

 static MultiFDMethods multifd_nocomp_ops = {
@@ -258,6 +267,19 @@ static void multifd_pages_clear(MultiFDPages_t *pages)
    g_free(pages);
 }

+static void multifd_set_file_bitmap(MultiFDSendParams *p, bool set)
+{
+    MultiFDPages_t *pages = p->pages;
+
+    if (!pages->block) {
+        return;
+    }
+
+    for (int i = 0; i < p->normal_num; i++) {
+        ramblock_set_shadow_bmap(pages->block, pages->offset[i], set);
+    }
+}
+
 static void multifd_send_fill_packet(MultiFDSendParams *p)
 {
    MultiFDPacket_t *packet = p->packet;
@@ -510,6 +532,15 @@ static void multifd_send_terminate_threads(Error *err)
    }
 }

+static int multifd_send_channel_destroy(QIOChannel *send)
+{
+    if (migrate_to_file()) {
+        return file_send_channel_destroy(send);
+    } else {
+        return socket_send_channel_destroy(send);
+    }
+}
+
 void multifd_save_cleanup(void)
 {
    int i;
@@ -532,7 +563,7 @@ void multifd_save_cleanup(void)
        if (p->registered_yank) {
            migration_ioc_unregister_yank(p->c);
        }
-        socket_send_channel_destroy(p->c);
+        multifd_send_channel_destroy(p->c);
        p->c = NULL;
        qemu_mutex_destroy(&p->mutex);
        qemu_sem_destroy(&p->sem);
@@ -595,6 +626,10 @@ int multifd_send_sync_main(QEMUFile *f)
        }
    }

+    if (!migrate_multifd_packets()) {
+        return 0;
+    }
+
    /*
     * When using zero-copy, it's necessary to flush the pages before any of
     * the pages can be sent again, so we'll make sure the new version of the
@@ -650,18 +685,22 @@ static void *multifd_send_thread(void *opaque)
    Error *local_err = NULL;
    int ret = 0;
    bool use_zero_copy_send = migrate_zero_copy_send();
+    bool use_packets = migrate_multifd_packets();

    thread = MigrationThreadAdd(p->name, qemu_get_thread_id());

    trace_multifd_send_thread_start(p->id);
    rcu_register_thread();

-    if (multifd_send_initial_packet(p, &local_err) < 0) {
-        ret = -1;
-        goto out;
+    if (use_packets) {
+        if (multifd_send_initial_packet(p, &local_err) < 0) {
+            ret = -1;
+            goto out;
+        }
+
+        /* initial packet */
+        p->num_packets = 1;
    }
-    /* initial packet */
-    p->num_packets = 1;

    while (true) {
        qemu_sem_post(&multifd_send_state->channels_ready);
@@ -673,11 +712,12 @@ static void *multifd_send_thread(void *opaque)
        qemu_mutex_lock(&p->mutex);

        if (p->pending_job) {
-            uint64_t packet_num = p->packet_num;
            uint32_t flags;
+            uint64_t write_base;
+
            p->normal_num = 0;

-            if (use_zero_copy_send) {
+            if (!use_packets || use_zero_copy_send) {
                p->iovs_num = 0;
            } else {
                p->iovs_num = 1;
@@ -695,16 +735,30 @@ static void *multifd_send_thread(void *opaque)
                    break;
                }
            }
-            multifd_send_fill_packet(p);
+
+            if (use_packets) {
+                multifd_send_fill_packet(p);
+                p->num_packets++;
+                write_base = 0;
+            } else {
+                multifd_set_file_bitmap(p, true);
+
+                /*
+                 * If we subtract the host page now, we don't need to
+                 * pass it into qio_channel_write_full_all() below.
+                 */
+                write_base = p->pages->block->pages_offset -
+                    (uint64_t)p->pages->block->host;
+            }
+
            flags = p->flags;
            p->flags = 0;
-            p->num_packets++;
            p->total_normal_pages += p->normal_num;
            p->pages->num = 0;
            p->pages->block = NULL;
            qemu_mutex_unlock(&p->mutex);

-            trace_multifd_send(p->id, packet_num, p->normal_num, flags,
+            trace_multifd_send(p->id, p->packet_num, p->normal_num, flags,
                               p->next_packet_size);

            if (use_zero_copy_send) {
@@ -716,14 +770,15 @@ static void *multifd_send_thread(void *opaque)
                }
                stat64_add(&mig_stats.multifd_bytes, p->packet_len);
                stat64_add(&mig_stats.transferred, p->packet_len);
-            } else {
+            } else if (use_packets) {
                /* Send header using the same writev call */
                p->iov[0].iov_len = p->packet_len;
                p->iov[0].iov_base = p->packet;
            }

-            ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
-                                              0, p->write_flags, &local_err);
+            ret = qio_channel_write_full_all(p->c, p->iov, p->iovs_num,
+                                             write_base, NULL, 0,
+                                             p->write_flags, &local_err);
            if (ret != 0) {
                break;
            }
@@ -740,6 +795,13 @@ static void *multifd_send_thread(void *opaque)
        } else if (p->quit) {
            qemu_mutex_unlock(&p->mutex);
            break;
+        } else if (!use_packets) {
+            /*
+             * When migrating to a file there's not need for a SYNC
+             * packet, the channels are ready right away.
+             */
+            qemu_sem_post(&multifd_send_state->channels_ready);
+            qemu_mutex_unlock(&p->mutex);
        } else {
            qemu_mutex_unlock(&p->mutex);
            /* sometimes there are spurious wakeups */
@@ -749,6 +811,7 @@ static void *multifd_send_thread(void *opaque)
 out:
    if (local_err) {
        trace_multifd_send_error(p->id);
+        multifd_set_file_bitmap(p, false);
        multifd_send_terminate_threads(local_err);
        error_free(local_err);
    }
@@ -889,26 +952,36 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
 static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
 {
    MultiFDSendParams *p = opaque;
-    QIOChannel *sioc = QIO_CHANNEL(qio_task_get_source(task));
+    QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
    Error *local_err = NULL;

    trace_multifd_new_send_channel_async(p->id);
    if (!qio_task_propagate_error(task, &local_err)) {
-        p->c = QIO_CHANNEL(sioc);
+        p->c = QIO_CHANNEL(ioc);
        qio_channel_set_delay(p->c, false);
        p->running = true;
-        if (multifd_channel_connect(p, sioc, local_err)) {
+        if (multifd_channel_connect(p, ioc, local_err)) {
            return;
        }
    }

-    multifd_new_send_channel_cleanup(p, sioc, local_err);
+    multifd_new_send_channel_cleanup(p, ioc, local_err);
+}
+
+static void multifd_new_send_channel_create(gpointer opaque)
+{
+    if (migrate_to_file()) {
+        file_send_channel_create(multifd_new_send_channel_async, opaque);
+    } else {
+        socket_send_channel_create(multifd_new_send_channel_async, opaque);
+    }
 }

 int multifd_save_setup(Error **errp)
 {
    int thread_count;
    uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
+    bool use_packets = migrate_multifd_packets();
    uint8_t i;

    if (!migrate_multifd()) {
@@ -933,25 +1006,33 @@ int multifd_save_setup(Error **errp)
        p->pending_job = 0;
        p->id = i;
        p->pages = multifd_pages_init(page_count);
-        p->packet_len = sizeof(MultiFDPacket_t)
-                      + sizeof(uint64_t) * page_count;
-        p->packet = g_malloc0(p->packet_len);
-        p->packet->magic = cpu_to_be32(MULTIFD_MAGIC);
-        p->packet->version = cpu_to_be32(MULTIFD_VERSION);
+
+        if (use_packets) {
+            p->packet_len = sizeof(MultiFDPacket_t)
+                          + sizeof(uint64_t) * page_count;
+            p->packet = g_malloc0(p->packet_len);
+            p->packet->magic = cpu_to_be32(MULTIFD_MAGIC);
+            p->packet->version = cpu_to_be32(MULTIFD_VERSION);
+
+            /* We need one extra place for the packet header */
+            p->iov = g_new0(struct iovec, page_count + 1);
+        } else {
+            p->iov = g_new0(struct iovec, page_count);
+        }
        p->name = g_strdup_printf("multifdsend_%d", i);
-        /* We need one extra place for the packet header */
-        p->iov = g_new0(struct iovec, page_count + 1);
        p->normal = g_new0(ram_addr_t, page_count);
        p->page_size = qemu_target_page_size();
        p->page_count = page_count;

        if (migrate_zero_copy_send()) {
            p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
+        } else if (!use_packets) {
+            p->write_flags |= QIO_CHANNEL_WRITE_FLAG_WITH_OFFSET;
        } else {
            p->write_flags = 0;
        }

-        socket_send_channel_create(multifd_new_send_channel_async, p);
+        multifd_new_send_channel_create(p);
    }

    for (i = 0; i < thread_count; i++) {
@@ -970,6 +1051,8 @@ int multifd_save_setup(Error **errp)

 struct {
    MultiFDRecvParams *params;
+    /* array of pages to receive */
+    MultiFDPages_t *pages;
    /* number of created threads */
    int count;
    /* syncs main thread and channels */
@@ -980,6 +1063,66 @@ struct {
    MultiFDMethods *ops;
 } *multifd_recv_state;

+static int multifd_recv_pages(QEMUFile *f)
+{
+    int i;
+    static int next_recv_channel;
+    MultiFDRecvParams *p = NULL;
+    MultiFDPages_t *pages = multifd_recv_state->pages;
+
+    /*
+     * next_channel can remain from a previous migration that was
+     * using more channels, so ensure it doesn't overflow if the
+     * limit is lower now.
+     */
+    next_recv_channel %= migrate_multifd_channels();
+    for (i = next_recv_channel;; i = (i + 1) % migrate_multifd_channels()) {
+        p = &multifd_recv_state->params[i];
+
+        qemu_mutex_lock(&p->mutex);
+        if (p->quit) {
+            error_report("%s: channel %d has already quit!", __func__, i);
+            qemu_mutex_unlock(&p->mutex);
+            return -1;
+        }
+        if (!p->pending_job) {
+            p->pending_job++;
+            next_recv_channel = (i + 1) % migrate_multifd_channels();
+            break;
+        }
+        qemu_mutex_unlock(&p->mutex);
+    }
+
+    multifd_recv_state->pages = p->pages;
+    p->pages = pages;
+    qemu_mutex_unlock(&p->mutex);
+    qemu_sem_post(&p->sem);
+
+    return 1;
+}
+
+int multifd_recv_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
+{
+    MultiFDPages_t *pages = multifd_recv_state->pages;
+
+    if (!pages->block) {
+        pages->block = block;
+    }
+
+    pages->offset[pages->num] = offset;
+    pages->num++;
+
+    if (pages->num < pages->allocated) {
+        return 1;
+    }
+
+    if (multifd_recv_pages(f) < 0) {
+        return -1;
+    }
+
+    return 1;
+}
+
 static void multifd_recv_terminate_threads(Error *err)
 {
    int i;
@@ -1001,6 +1144,7 @@ static void multifd_recv_terminate_threads(Error *err)

        qemu_mutex_lock(&p->mutex);
        p->quit = true;
+        qemu_sem_post(&p->sem);
        /*
         * We could arrive here for two reasons:
         *  - normal quit, i.e. everything went fine, just finished
@@ -1049,9 +1193,12 @@ void multifd_load_cleanup(void)
        object_unref(OBJECT(p->c));
        p->c = NULL;
        qemu_mutex_destroy(&p->mutex);
+        qemu_sem_destroy(&p->sem);
        qemu_sem_destroy(&p->sem_sync);
        g_free(p->name);
        p->name = NULL;
+        multifd_pages_clear(p->pages);
+        p->pages = NULL;
        p->packet_len = 0;
        g_free(p->packet);
        p->packet = NULL;
@@ -1064,6 +1211,8 @@ void multifd_load_cleanup(void)
    qemu_sem_destroy(&multifd_recv_state->sem_sync);
    g_free(multifd_recv_state->params);
    multifd_recv_state->params = NULL;
+    multifd_pages_clear(multifd_recv_state->pages);
+    multifd_recv_state->pages = NULL;
    g_free(multifd_recv_state);
    multifd_recv_state = NULL;
 }
@@ -1072,9 +1221,10 @@ void multifd_recv_sync_main(void)
 {
    int i;

-    if (!migrate_multifd()) {
+    if (!migrate_multifd() || !migrate_multifd_packets()) {
        return;
    }
+
    for (i = 0; i < migrate_multifd_channels(); i++) {
        MultiFDRecvParams *p = &multifd_recv_state->params[i];

@@ -1099,6 +1249,7 @@ static void *multifd_recv_thread(void *opaque)
 {
    MultiFDRecvParams *p = opaque;
    Error *local_err = NULL;
+    bool use_packets = migrate_multifd_packets();
    int ret;

    trace_multifd_recv_thread_start(p->id);
@@ -1106,22 +1257,45 @@ static void *multifd_recv_thread(void *opaque)

    while (true) {
        uint32_t flags;
+        p->normal_num = 0;

        if (p->quit) {
            break;
        }

-        ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
-                                       p->packet_len, &local_err);
-        if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
-            break;
-        }
+        if (use_packets) {
+            ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
+                                           p->packet_len, &local_err);
+            if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
+                break;
+            }

-        qemu_mutex_lock(&p->mutex);
-        ret = multifd_recv_unfill_packet(p, &local_err);
-        if (ret) {
-            qemu_mutex_unlock(&p->mutex);
-            break;
+            qemu_mutex_lock(&p->mutex);
+            ret = multifd_recv_unfill_packet(p, &local_err);
+            if (ret) {
+                qemu_mutex_unlock(&p->mutex);
+                break;
+            }
+            p->num_packets++;
+        } else {
+            /*
+             * No packets, so we need to wait for the vmstate code to
+             * queue pages.
+             */
+            qemu_sem_wait(&p->sem);
+            qemu_mutex_lock(&p->mutex);
+            if (!p->pending_job) {
+                qemu_mutex_unlock(&p->mutex);
+                break;
+            }
+
+            for (int i = 0; i < p->pages->num; i++) {
+                p->normal[p->normal_num] = p->pages->offset[i];
+                p->normal_num++;
+            }
+
+            p->pages->num = 0;
+            p->host = p->pages->block->host;
        }

        flags = p->flags;
@@ -1129,7 +1303,7 @@ static void *multifd_recv_thread(void *opaque)
        p->flags &= ~MULTIFD_FLAG_SYNC;
        trace_multifd_recv(p->id, p->packet_num, p->normal_num, flags,
                           p->next_packet_size);
-        p->num_packets++;
+
        p->total_normal_pages += p->normal_num;
        qemu_mutex_unlock(&p->mutex);

@@ -1144,6 +1318,13 @@ static void *multifd_recv_thread(void *opaque)
            qemu_sem_post(&multifd_recv_state->sem_sync);
            qemu_sem_wait(&p->sem_sync);
        }
+
+        if (!use_packets) {
+            qemu_mutex_lock(&p->mutex);
+            p->pending_job--;
+            p->pages->block = NULL;
+            qemu_mutex_unlock(&p->mutex);
+        }
    }

    if (local_err) {
@@ -1164,6 +1345,7 @@ int multifd_load_setup(Error **errp)
 {
    int thread_count;
    uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
+    bool use_packets = migrate_multifd_packets();
    uint8_t i;

    /*
@@ -1177,6 +1359,7 @@ int multifd_load_setup(Error **errp)
    thread_count = migrate_multifd_channels();
    multifd_recv_state = g_malloc0(sizeof(*multifd_recv_state));
    multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count);
+    multifd_recv_state->pages = multifd_pages_init(page_count);
    qatomic_set(&multifd_recv_state->count, 0);
    qemu_sem_init(&multifd_recv_state->sem_sync, 0);
    multifd_recv_state->ops = multifd_ops[migrate_multifd_compression()];
@@ -1185,12 +1368,20 @@ int multifd_load_setup(Error **errp)
        MultiFDRecvParams *p = &multifd_recv_state->params[i];

        qemu_mutex_init(&p->mutex);
+        qemu_sem_init(&p->sem, 0);
        qemu_sem_init(&p->sem_sync, 0);
        p->quit = false;
+        p->pending_job = 0;
        p->id = i;
-        p->packet_len = sizeof(MultiFDPacket_t)
-                      + sizeof(uint64_t) * page_count;
-        p->packet = g_malloc0(p->packet_len);
+        p->pages = multifd_pages_init(page_count);
+
+        if (use_packets) {
+            p->packet_len = sizeof(MultiFDPacket_t)
+                + sizeof(uint64_t) * page_count;
+            p->packet = g_malloc0(p->packet_len);
+        } else {
+            p->read_flags |= QIO_CHANNEL_READ_FLAG_WITH_OFFSET;
+        }
        p->name = g_strdup_printf("multifdrecv_%d", i);
        p->iov = g_new0(struct iovec, page_count);
        p->normal = g_new0(ram_addr_t, page_count);
@@ -1212,6 +1403,11 @@ int multifd_load_setup(Error **errp)
    return 0;
 }

+bool multifd_recv_first_channel(void)
+{
+    return !multifd_recv_state;
+}
+
 bool multifd_recv_all_channels_created(void)
 {
    int thread_count = migrate_multifd_channels();
@@ -1236,18 +1432,26 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
 {
    MultiFDRecvParams *p;
    Error *local_err = NULL;
-    int id;
+    bool use_packets = migrate_multifd_packets();
+    int id, num_packets = 0;

-    id = multifd_recv_initial_packet(ioc, &local_err);
-    if (id < 0) {
-        multifd_recv_terminate_threads(local_err);
-        error_propagate_prepend(errp, local_err,
-                                "failed to receive packet"
-                                " via multifd channel %d: ",
-                                qatomic_read(&multifd_recv_state->count));
-        return;
+    if (use_packets) {
+        id = multifd_recv_initial_packet(ioc, &local_err);
+        if (id < 0) {
+            multifd_recv_terminate_threads(local_err);
+            error_propagate_prepend(errp, local_err,
+                                    "failed to receive packet"
+                                    " via multifd channel %d: ",
+                                    qatomic_read(&multifd_recv_state->count));
+            return;
+        }
+        trace_multifd_recv_new_channel(id);
+
+        /* initial packet */
+        num_packets = 1;
+    } else {
+        id = qatomic_read(&multifd_recv_state->count);
    }
-    trace_multifd_recv_new_channel(id);

    p = &multifd_recv_state->params[id];
    if (p->c != NULL) {
@@ -1258,9 +1462,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
        return;
    }
    p->c = ioc;
+    p->num_packets = num_packets;
    object_ref(OBJECT(ioc));
-    /* initial packet */
-    p->num_packets = 1;

    p->running = true;
    qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -18,11 +18,13 @@ void multifd_save_cleanup(void);
 int multifd_load_setup(Error **errp);
 void multifd_load_cleanup(void);
 void multifd_load_shutdown(void);
+bool multifd_recv_first_channel(void);
 bool multifd_recv_all_channels_created(void);
 void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
 int multifd_send_sync_main(QEMUFile *f);
 int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
+int multifd_recv_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);

 /* Multifd Compression flags */
 #define MULTIFD_FLAG_SYNC (1 << 0)
@@ -152,7 +154,11 @@ typedef struct {
    uint32_t page_size;
    /* number of pages in a full packet */
    uint32_t page_count;
+    /* multifd flags for receiving ram */
+    int read_flags;

+    /* sem where to wait for more work */
+    QemuSemaphore sem;
    /* syncs main thread and channels */
    QemuSemaphore sem_sync;

@@ -166,6 +172,13 @@ typedef struct {
    uint32_t flags;
    /* global number of generated multifd packets */
    uint64_t packet_num;
+    int pending_job;
+    /* array of pages to sent.
+     * The owner of 'pages' depends of 'pending_job' value:
+     * pending_job == 0 -> migration_thread can use it.
+     * pending_job != 0 -> multifd_channel can use it.
+     */
+    MultiFDPages_t *pages;

    /* thread local variables. No locking required */

--- a/migration/options.c
+++ b/migration/options.c
@@ -185,6 +185,8 @@ Property migration_properties[] = {
    DEFINE_PROP_MIG_CAP("x-zero-copy-send",
            MIGRATION_CAPABILITY_ZERO_COPY_SEND),
 #endif
+    DEFINE_PROP_MIG_CAP("x-suspend", MIGRATION_CAPABILITY_SUSPEND),
+    DEFINE_PROP_MIG_CAP("x-fixed-ram", MIGRATION_CAPABILITY_FIXED_RAM),

    DEFINE_PROP_END_OF_LIST(),
 };
@@ -238,6 +240,13 @@ bool migrate_events(void)
    return s->capabilities[MIGRATION_CAPABILITY_EVENTS];
 }

+bool migrate_fixed_ram(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    return s->capabilities[MIGRATION_CAPABILITY_FIXED_RAM];
+}
+
 bool migrate_ignore_shared(void)
 {
    MigrationState *s = migrate_get_current();
@@ -308,6 +317,13 @@ bool migrate_return_path(void)
    return s->capabilities[MIGRATION_CAPABILITY_RETURN_PATH];
 }

+bool migrate_suspend(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    return s->capabilities[MIGRATION_CAPABILITY_SUSPEND];
+}
+
 bool migrate_validate_uuid(void)
 {
    MigrationState *s = migrate_get_current();
@@ -345,6 +361,16 @@ bool migrate_multifd_flush_after_each_section(void)
    return s->multifd_flush_after_each_section;
 }

+bool migrate_live(void)
+{
+    return !migrate_suspend();
+}
+
+bool migrate_multifd_packets(void)
+{
+    return !migrate_fixed_ram();
+}
+
 bool migrate_postcopy(void)
 {
    return migrate_postcopy_ram() || migrate_dirty_bitmaps();
@@ -357,6 +383,13 @@ bool migrate_tls(void)
    return s->parameters.tls_creds && *s->parameters.tls_creds;
 }

+bool migrate_to_file(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    return qemu_file_is_seekable(s->to_dst_file);
+}
+
 typedef enum WriteTrackingSupport {
    WT_SUPPORT_UNKNOWN = 0,
    WT_SUPPORT_ABSENT,
@@ -413,7 +446,9 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
    MIGRATION_CAPABILITY_XBZRLE,
    MIGRATION_CAPABILITY_X_COLO,
    MIGRATION_CAPABILITY_VALIDATE_UUID,
-    MIGRATION_CAPABILITY_ZERO_COPY_SEND);
+    MIGRATION_CAPABILITY_ZERO_COPY_SEND,
+    MIGRATION_CAPABILITY_SUSPEND,
+    MIGRATION_CAPABILITY_FIXED_RAM);

 /**
 * @migration_caps_check - check capability compatibility
@@ -547,6 +582,26 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
        }
    }

+    if (new_caps[MIGRATION_CAPABILITY_FIXED_RAM]) {
+        if (new_caps[MIGRATION_CAPABILITY_XBZRLE]) {
+            error_setg(errp,
+                       "Fixed-ram migration is incompatible with xbzrle");
+            return false;
+        }
+
+        if (new_caps[MIGRATION_CAPABILITY_COMPRESS]) {
+            error_setg(errp,
+                       "Fixed-ram migration is incompatible with compression");
+            return false;
+        }
+
+        if (new_caps[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+            error_setg(errp,
+                       "Fixed-ram migration is incompatible with postcopy ram");
+            return false;
+        }
+    }
+
    return true;
 }

@@ -697,6 +752,22 @@ int migrate_decompress_threads(void)
    return s->parameters.decompress_threads;
 }

+bool migrate_direct_io(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    /* For now O_DIRECT is only supported with fixed-ram */
+    if (!s->capabilities[MIGRATION_CAPABILITY_FIXED_RAM]) {
+        return false;
+    }
+
+    if (s->parameters.has_direct_io) {
+        return s->parameters.direct_io;
+    }
+
+    return false;
+}
+
 uint64_t migrate_downtime_limit(void)
 {
    MigrationState *s = migrate_get_current();
@@ -891,6 +962,11 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
                       s->parameters.block_bitmap_mapping);
    }

+    if (s->parameters.has_direct_io) {
+        params->has_direct_io = true;
+        params->direct_io = s->parameters.direct_io;
+    }
+
    return params;
 }

@@ -923,6 +999,7 @@ void migrate_params_init(MigrationParameters *params)
    params->has_announce_max = true;
    params->has_announce_rounds = true;
    params->has_announce_step = true;
+    params->has_direct_io = qemu_has_direct_io();
 }

 /*
@@ -1182,6 +1259,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
        dest->has_block_bitmap_mapping = true;
        dest->block_bitmap_mapping = params->block_bitmap_mapping;
    }
+
+    if (params->has_direct_io) {
+        dest->direct_io = params->direct_io;
+    }
 }

 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1300,6 +1381,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
            QAPI_CLONE(BitmapMigrationNodeAliasList,
                       params->block_bitmap_mapping);
    }
+
+    if (params->has_direct_io) {
+        s->parameters.direct_io = params->direct_io;
+    }
 }

 void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
--- a/migration/options.h
+++ b/migration/options.h
@@ -30,6 +30,7 @@ bool migrate_colo(void);
 bool migrate_compress(void);
 bool migrate_dirty_bitmaps(void);
 bool migrate_events(void);
+bool migrate_fixed_ram(void);
 bool migrate_ignore_shared(void);
 bool migrate_late_block_activate(void);
 bool migrate_multifd(void);
@@ -40,6 +41,7 @@ bool migrate_postcopy_ram(void);
 bool migrate_rdma_pin_all(void);
 bool migrate_release_ram(void);
 bool migrate_return_path(void);
+bool migrate_suspend(void);
 bool migrate_validate_uuid(void);
 bool migrate_xbzrle(void);
 bool migrate_zero_blocks(void);
@@ -53,8 +55,11 @@ bool migrate_zero_copy_send(void);
 */

 bool migrate_multifd_flush_after_each_section(void);
+bool migrate_live(void);
+bool migrate_multifd_packets(void);
 bool migrate_postcopy(void);
 bool migrate_tls(void);
+bool migrate_to_file(void);

 /* capabilities helpers */

@@ -75,6 +80,7 @@ uint8_t migrate_cpu_throttle_increment(void);
 uint8_t migrate_cpu_throttle_initial(void);
 bool migrate_cpu_throttle_tailslow(void);
 int migrate_decompress_threads(void);
+bool migrate_direct_io(void);
 uint64_t migrate_downtime_limit(void);
 uint8_t migrate_max_cpu_throttle(void);
 uint64_t migrate_max_bandwidth(void);
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -32,6 +32,7 @@
 #include "trace.h"
 #include "options.h"
 #include "qapi/error.h"
+#include "io/channel-file.h"

 #define IO_BUF_SIZE 32768
 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
@@ -268,6 +269,10 @@ static void qemu_iovec_release_ram(QEMUFile *f)
    memset(f->may_free, 0, sizeof(f->may_free));
 }

+bool qemu_file_is_seekable(QEMUFile *f)
+{
+    return qio_channel_has_feature(f->ioc, QIO_CHANNEL_FEATURE_SEEKABLE);
+}

 /**
 * Flushes QEMUFile buffer
@@ -532,6 +537,81 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, size_t size)
    }
 }

+void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos)
+{
+    Error *err = NULL;
+
+    if (f->last_error) {
+        return;
+    }
+
+    qemu_fflush(f);
+    qio_channel_pwritev(f->ioc, (char *)buf, buflen, pos, &err);
+
+    if (err) {
+        qemu_file_set_error_obj(f, -EIO, err);
+    } else {
+        f->total_transferred += buflen;
+    }
+
+    return;
+}
+
+
+size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos)
+{
+    Error *err = NULL;
+    ssize_t ret;
+
+    if (f->last_error) {
+        return 0;
+    }
+
+    ret = qio_channel_preadv(f->ioc, (char *)buf, buflen, pos, &err);
+    if (ret == -1 || err) {
+        goto error;
+    }
+
+    return (size_t)ret;
+
+ error:
+    qemu_file_set_error_obj(f, -EIO, err);
+    return 0;
+}
+
+void qemu_set_offset(QEMUFile *f, off_t off, int whence)
+{
+    Error *err = NULL;
+    off_t ret;
+
+    qemu_fflush(f);
+
+    if (!qemu_file_is_writable(f)) {
+        f->buf_index = 0;
+        f->buf_size = 0;
+    }
+
+    ret = qio_channel_io_seek(f->ioc, off, whence, &err);
+    if (ret == (off_t)-1) {
+        qemu_file_set_error_obj(f, -EIO, err);
+    }
+}
+
+off_t qemu_get_offset(QEMUFile *f)
+{
+    Error *err = NULL;
+    off_t ret;
+
+    qemu_fflush(f);
+
+    ret = qio_channel_io_seek(f->ioc, 0, SEEK_CUR, &err);
+    if (ret == (off_t)-1) {
+        qemu_file_set_error_obj(f, -EIO, err);
+    }
+    return ret;
+}
+
+
 void qemu_put_byte(QEMUFile *f, int v)
 {
    if (f->last_error) {
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -139,6 +139,10 @@ QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
 int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
+void qemu_set_offset(QEMUFile *f, off_t off, int whence);
+off_t qemu_get_offset(QEMUFile *f);
+void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos);
+size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos);

 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1104,9 +1104,15 @@ static int save_zero_page_to_file(PageSearchStatus *pss, QEMUFile *file,
    int len = 0;

    if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
-        len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO);
-        qemu_put_byte(file, 0);
-        len += 1;
+        if (migrate_fixed_ram()) {
+            /* for zero pages we don't need to do anything */
+            len = 1;
+        } else {
+            len += save_page_header(pss, file, block,
+                                    offset | RAM_SAVE_FLAG_ZERO);
+            qemu_put_byte(file, 0);
+            len += 1;
+        }
        ram_release_page(block->idstr, offset);
    }
    return len;
@@ -1188,14 +1194,20 @@ static int save_normal_page(PageSearchStatus *pss, RAMBlock *block,
 {
    QEMUFile *file = pss->pss_channel;

-    ram_transferred_add(save_page_header(pss, pss->pss_channel, block,
-                                         offset | RAM_SAVE_FLAG_PAGE));
-    if (async) {
-        qemu_put_buffer_async(file, buf, TARGET_PAGE_SIZE,
-                              migrate_release_ram() &&
-                              migration_in_postcopy());
+    if (migrate_fixed_ram()) {
+        qemu_put_buffer_at(file, buf, TARGET_PAGE_SIZE,
+                           block->pages_offset + offset);
+        set_bit(offset >> TARGET_PAGE_BITS, block->shadow_bmap);
    } else {
-        qemu_put_buffer(file, buf, TARGET_PAGE_SIZE);
+        ram_transferred_add(save_page_header(pss, pss->pss_channel, block,
+                                             offset | RAM_SAVE_FLAG_PAGE));
+        if (async) {
+            qemu_put_buffer_async(file, buf, TARGET_PAGE_SIZE,
+                                  migrate_release_ram() &&
+                                  migration_in_postcopy());
+        } else {
+            qemu_put_buffer(file, buf, TARGET_PAGE_SIZE);
+        }
    }
    ram_transferred_add(TARGET_PAGE_SIZE);
    stat64_add(&mig_stats.normal_pages, 1);
@@ -1357,14 +1369,16 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
        pss->page = 0;
        pss->block = QLIST_NEXT_RCU(pss->block, next);
        if (!pss->block) {
-            if (!migrate_multifd_flush_after_each_section()) {
-                QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
-                int ret = multifd_send_sync_main(f);
-                if (ret < 0) {
-                    return ret;
+            if (!migrate_fixed_ram()) {
+                if (!migrate_multifd_flush_after_each_section()) {
+                    QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
+                    int ret = multifd_send_sync_main(f);
+                    if (ret < 0) {
+                        return ret;
+                    }
+                    qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
+                    qemu_fflush(f);
                }
-                qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
-                qemu_fflush(f);
            }
            /*
             * If memory migration starts over, we will meet a dirtied page
@@ -2465,6 +2479,8 @@ static void ram_save_cleanup(void *opaque)
        block->clear_bmap = NULL;
        g_free(block->bmap);
        block->bmap = NULL;
+        g_free(block->shadow_bmap);
+        block->shadow_bmap = NULL;
    }

    xbzrle_cleanup();
@@ -2832,6 +2848,7 @@ static void ram_list_init_bitmaps(void)
             */
            block->bmap = bitmap_new(pages);
            bitmap_set(block->bmap, 0, pages);
+            block->shadow_bmap = bitmap_new(block->used_length >> TARGET_PAGE_BITS);
            block->clear_bmap_shift = shift;
            block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
        }
@@ -2972,6 +2989,70 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
    }
 }

+#define FIXED_RAM_HDR_VERSION 1
+struct FixedRamHeader {
+    uint32_t version;
+    uint64_t page_size;
+    uint64_t bitmap_offset;
+    uint64_t pages_offset;
+    /* end of v1 */
+} QEMU_PACKED;
+
+static void fixed_ram_insert_header(QEMUFile *file, RAMBlock *block)
+{
+    g_autofree struct FixedRamHeader *header;
+    size_t header_size, bitmap_size;
+    long num_pages;
+
+    header = g_new0(struct FixedRamHeader, 1);
+    header_size = sizeof(struct FixedRamHeader);
+
+    num_pages = block->used_length >> TARGET_PAGE_BITS;
+    bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
+
+    /*
+     * Save the file offsets of where the bitmap and the pages should
+     * go as they are written at the end of migration and during the
+     * iterative phase, respectively.
+     */
+    block->bitmap_offset = qemu_get_offset(file) + header_size;
+    block->pages_offset = ROUND_UP(block->bitmap_offset +
+                                   bitmap_size, 0x100000);
+
+    header->version = cpu_to_be32(FIXED_RAM_HDR_VERSION);
+    header->page_size = cpu_to_be64(TARGET_PAGE_SIZE);
+    header->bitmap_offset = cpu_to_be64(block->bitmap_offset);
+    header->pages_offset = cpu_to_be64(block->pages_offset);
+
+    qemu_put_buffer(file, (uint8_t *) header, header_size);
+}
+
+static int fixed_ram_read_header(QEMUFile *file, struct FixedRamHeader *header)
+{
+    size_t ret, header_size = sizeof(struct FixedRamHeader);
+
+    ret = qemu_get_buffer(file, (uint8_t *)header, header_size);
+    if (ret != header_size) {
+        return -1;
+    }
+
+    /* migration stream is big-endian */
+    be32_to_cpus(&header->version);
+
+    if (header->version > FIXED_RAM_HDR_VERSION) {
+        error_report("Migration fixed-ram capability version mismatch (expected %d, got %d)",
+                     FIXED_RAM_HDR_VERSION, header->version);
+        return -1;
+    }
+
+    be64_to_cpus(&header->page_size);
+    be64_to_cpus(&header->bitmap_offset);
+    be64_to_cpus(&header->pages_offset);
+
+
+    return 0;
+}
+
 /*
 * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
 * long-running RCU critical section.  When rcu-reclaims in the code
@@ -3021,6 +3102,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
            if (migrate_ignore_shared()) {
                qemu_put_be64(f, block->mr->addr);
            }
+
+            if (migrate_fixed_ram()) {
+                fixed_ram_insert_header(f, block);
+                /* prepare offset for next ramblock */
+                qemu_set_offset(f, block->pages_offset + block->used_length, SEEK_SET);
+            }
        }
    }

@@ -3044,6 +3131,27 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
    return 0;
 }

+static void ram_save_shadow_bmap(QEMUFile *f)
+{
+    RAMBlock *block;
+
+    RAMBLOCK_FOREACH_MIGRATABLE(block) {
+        long num_pages = block->used_length >> TARGET_PAGE_BITS;
+        long bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
+        qemu_put_buffer_at(f, (uint8_t *)block->shadow_bmap, bitmap_size,
+                           block->bitmap_offset);
+    }
+}
+
+void ramblock_set_shadow_bmap(RAMBlock *block, ram_addr_t offset, bool set)
+{
+    if (set) {
+        set_bit(offset >> TARGET_PAGE_BITS, block->shadow_bmap);
+    } else {
+        clear_bit(offset >> TARGET_PAGE_BITS, block->shadow_bmap);
+    }
+}
+
 /**
 * ram_save_iterate: iterative stage for migration
 *
@@ -3156,7 +3264,6 @@ out:
        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
        qemu_fflush(f);
        ram_transferred_add(8);
-
        ret = qemu_file_get_error(f);
    }
    if (ret < 0) {
@@ -3201,6 +3308,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
            pages = ram_find_and_save_block(rs);
            /* no more blocks to sent */
            if (pages == 0) {
+                if (migrate_fixed_ram()) {
+                    ram_save_shadow_bmap(f);
+                }
                break;
            }
            if (pages < 0) {
@@ -3227,6 +3337,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
        qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
    }
    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+
    qemu_fflush(f);

    return 0;
@@ -3825,6 +3936,154 @@ void colo_flush_ram_cache(void)
    trace_colo_flush_ram_cache_end();
 }

+static void read_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block,
+                                    long num_pages, unsigned long *bitmap)
+{
+    unsigned long set_bit_idx, clear_bit_idx;
+    unsigned long len;
+    ram_addr_t offset;
+    void *host;
+    size_t read, completed, read_len;
+
+    for (set_bit_idx = find_first_bit(bitmap, num_pages);
+         set_bit_idx < num_pages;
+         set_bit_idx = find_next_bit(bitmap, num_pages, clear_bit_idx + 1)) {
+
+        clear_bit_idx = find_next_zero_bit(bitmap, num_pages, set_bit_idx + 1);
+
+        len = TARGET_PAGE_SIZE * (clear_bit_idx - set_bit_idx);
+        offset = set_bit_idx << TARGET_PAGE_BITS;
+
+        for (read = 0, completed = 0; completed < len; offset += read) {
+            host = host_from_ram_block_offset(block, offset);
+            read_len = MIN(len, TARGET_PAGE_SIZE);
+
+            if (migrate_multifd()) {
+                multifd_recv_queue_page(f, block, offset);
+                read = read_len;
+            } else {
+                read = qemu_get_buffer_at(f, host, read_len,
+                                          block->pages_offset + offset);
+            }
+            completed += read;
+        }
+    }
+}
+
+static int parse_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block, ram_addr_t length)
+{
+    g_autofree unsigned long *dirty_bitmap = NULL;
+    struct FixedRamHeader header;
+    size_t bitmap_size;
+    long num_pages;
+    int ret = 0;
+
+    ret = fixed_ram_read_header(f, &header);
+    if (ret < 0) {
+        error_report("Error reading fixed-ram header");
+        return -EINVAL;
+    }
+
+    block->pages_offset = header.pages_offset;
+    num_pages = length / header.page_size;
+    bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
+
+    dirty_bitmap = g_malloc0(bitmap_size);
+    if (qemu_get_buffer_at(f, (uint8_t *)dirty_bitmap, bitmap_size,
+                           header.bitmap_offset) != bitmap_size) {
+        error_report("Error parsing dirty bitmap");
+        return -EINVAL;
+    }
+
+    read_ramblock_fixed_ram(f, block, num_pages, dirty_bitmap);
+
+    /* Skip pages array */
+    qemu_set_offset(f, block->pages_offset + length, SEEK_SET);
+
+    return ret;
+}
+
+static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
+{
+    int ret = 0;
+    /* ADVISE is earlier, it shows the source has the postcopy capability on */
+    bool postcopy_advised = migration_incoming_postcopy_advised();
+
+    assert(block);
+
+    if (migrate_fixed_ram()) {
+        return parse_ramblock_fixed_ram(f, block, length);
+    }
+
+    if (!qemu_ram_is_migratable(block)) {
+        error_report("block %s should not be migrated !", block->idstr);
+        ret = -EINVAL;
+    }
+
+    if (length != block->used_length) {
+        Error *local_err = NULL;
+
+        ret = qemu_ram_resize(block, length, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+        }
+    }
+    /* For postcopy we need to check hugepage sizes match */
+    if (postcopy_advised && migrate_postcopy_ram() &&
+        block->page_size != qemu_host_page_size) {
+        uint64_t remote_page_size = qemu_get_be64(f);
+        if (remote_page_size != block->page_size) {
+            error_report("Mismatched RAM page size %s "
+                         "(local) %zd != %" PRId64, block->idstr,
+                         block->page_size, remote_page_size);
+            ret = -EINVAL;
+        }
+    }
+    if (migrate_ignore_shared()) {
+        hwaddr addr = qemu_get_be64(f);
+        if (ramblock_is_ignored(block) &&
+            block->mr->addr != addr) {
+            error_report("Mismatched GPAs for block %s "
+                         "%" PRId64 "!= %" PRId64, block->idstr,
+                         (uint64_t)addr,
+                         (uint64_t)block->mr->addr);
+            ret = -EINVAL;
+        }
+    }
+    ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG, block->idstr);
+
+    return ret;
+}
+
+static int parse_ramblocks(QEMUFile *f, ram_addr_t total_ram_bytes)
+{
+    int ret = 0;
+
+    /* Synchronize RAM block list */
+    while (!ret && total_ram_bytes) {
+        char id[256];
+        RAMBlock *block;
+        ram_addr_t length;
+        int len = qemu_get_byte(f);
+
+        qemu_get_buffer(f, (uint8_t *)id, len);
+        id[len] = 0;
+        length = qemu_get_be64(f);
+
+        block = qemu_ram_block_by_name(id);
+        if (block) {
+            ret = parse_ramblock(f, block, length);
+        } else {
+            error_report("Unknown ramblock \"%s\", cannot accept "
+                         "migration", id);
+            ret = -EINVAL;
+        }
+        total_ram_bytes -= length;
+    }
+
+    return ret;
+}
+
 /**
 * ram_load_precopy: load pages in precopy case
 *
@@ -3839,14 +4098,13 @@ static int ram_load_precopy(QEMUFile *f)
 {
    MigrationIncomingState *mis = migration_incoming_get_current();
    int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0;
-    /* ADVISE is earlier, it shows the source has the postcopy capability on */
-    bool postcopy_advised = migration_incoming_postcopy_advised();
+
    if (!migrate_compress()) {
        invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE;
    }

    while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
-        ram_addr_t addr, total_ram_bytes;
+        ram_addr_t addr;
        void *host = NULL, *host_bak = NULL;
        uint8_t ch;

@@ -3917,65 +4175,7 @@ static int ram_load_precopy(QEMUFile *f)

        switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
        case RAM_SAVE_FLAG_MEM_SIZE:
-            /* Synchronize RAM block list */
-            total_ram_bytes = addr;
-            while (!ret && total_ram_bytes) {
-                RAMBlock *block;
-                char id[256];
-                ram_addr_t length;
-
-                len = qemu_get_byte(f);
-                qemu_get_buffer(f, (uint8_t *)id, len);
-                id[len] = 0;
-                length = qemu_get_be64(f);
-
-                block = qemu_ram_block_by_name(id);
-                if (block && !qemu_ram_is_migratable(block)) {
-                    error_report("block %s should not be migrated !", id);
-                    ret = -EINVAL;
-                } else if (block) {
-                    if (length != block->used_length) {
-                        Error *local_err = NULL;
-
-                        ret = qemu_ram_resize(block, length,
-                                              &local_err);
-                        if (local_err) {
-                            error_report_err(local_err);
-                        }
-                    }
-                    /* For postcopy we need to check hugepage sizes match */
-                    if (postcopy_advised && migrate_postcopy_ram() &&
-                        block->page_size != qemu_host_page_size) {
-                        uint64_t remote_page_size = qemu_get_be64(f);
-                        if (remote_page_size != block->page_size) {
-                            error_report("Mismatched RAM page size %s "
-                                         "(local) %zd != %" PRId64,
-                                         id, block->page_size,
-                                         remote_page_size);
-                            ret = -EINVAL;
-                        }
-                    }
-                    if (migrate_ignore_shared()) {
-                        hwaddr addr = qemu_get_be64(f);
-                        if (ramblock_is_ignored(block) &&
-                            block->mr->addr != addr) {
-                            error_report("Mismatched GPAs for block %s "
-                                         "%" PRId64 "!= %" PRId64,
-                                         id, (uint64_t)addr,
-                                         (uint64_t)block->mr->addr);
-                            ret = -EINVAL;
-                        }
-                    }
-                    ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
-                                          block->idstr);
-                } else {
-                    error_report("Unknown ramblock \"%s\", cannot "
-                                 "accept migration", id);
-                    ret = -EINVAL;
-                }
-
-                total_ram_bytes -= length;
-            }
+            ret = parse_ramblocks(f, addr);
            break;

        case RAM_SAVE_FLAG_ZERO:
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -76,6 +76,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
 bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
 void postcopy_preempt_shutdown_file(MigrationState *s);
 void *postcopy_preempt_thread(void *opaque);
+void ramblock_set_shadow_bmap(RAMBlock *block, ram_addr_t offset, bool set);

 /* ram cache */
 int colo_init_ram_cache(void);
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -243,6 +243,7 @@ static bool should_validate_capability(int capability)
    /* Validate only new capabilities to keep compatibility. */
    switch (capability) {
    case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
+    case MIGRATION_CAPABILITY_FIXED_RAM:
        return true;
    default:
        return false;
@@ -1209,13 +1210,25 @@ void qemu_savevm_non_migratable_list(strList **reasons)

 void qemu_savevm_state_header(QEMUFile *f)
 {
+    MigrationState *s = migrate_get_current();
+
+    s->vmdesc = json_writer_new(false);
+
    trace_savevm_state_header();
    qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
    qemu_put_be32(f, QEMU_VM_FILE_VERSION);

-    if (migrate_get_current()->send_configuration) {
+    if (s->send_configuration) {
        qemu_put_byte(f, QEMU_VM_CONFIGURATION);
-        vmstate_save_state(f, &vmstate_configuration, &savevm_state, 0);
+        /*
+         * This starts the main json object and is paired with the
+         * json_writer_end_object in
+         * qemu_savevm_state_complete_precopy_non_iterable
+         */
+        json_writer_start_object(s->vmdesc, NULL);
+        json_writer_start_object(s->vmdesc, "configuration");
+        vmstate_save_state(f, &vmstate_configuration, &savevm_state, s->vmdesc);
+        json_writer_end_object(s->vmdesc);
    }
 }

@@ -1240,8 +1253,6 @@ void qemu_savevm_state_setup(QEMUFile *f)
    Error *local_err = NULL;
    int ret;

-    ms->vmdesc = json_writer_new(false);
-    json_writer_start_object(ms->vmdesc, NULL);
    json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
    json_writer_start_array(ms->vmdesc, "devices");

@@ -1630,6 +1641,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
    qemu_mutex_lock_iothread();

    while (qemu_file_get_error(f) == 0) {
+        migration_update_counters(ms, qemu_clock_get_ms(QEMU_CLOCK_REALTIME));
        if (qemu_savevm_state_iterate(f, false) > 0) {
            break;
        }
@@ -1652,6 +1664,9 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
    }
    migrate_set_state(&ms->state, MIGRATION_STATUS_SETUP, status);

+    migration_calculate_complete(ms);
+    trace_migration_status((int)ms->mbps / 8, (int)ms->pages_per_second, ms->total_time);
+
    /* f is outer parameter, it should not stay in global migration state after
     * this function finished */
    ms->to_dst_file = NULL;
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -165,6 +165,7 @@ migration_return_path_end_after(int rp_error) "%d"
 migration_thread_after_loop(void) ""
 migration_thread_file_err(void) ""
 migration_thread_setup_complete(void) ""
+migration_status(int mpbs, int pages_per_second, int64_t total_time) "%d MB/s, %d pages/s, %ld ms"
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -487,6 +487,12 @@
 #     and should not affect the correctness of postcopy migration.
 #     (since 7.1)
 #
+# @suspend: If enabled, a non-live migration will be performed,
+#           i.e. the VM will be stopped before migration. (since 8.1)
+#
+# @fixed-ram: Migrate using fixed offsets for each RAM page. Requires
+#             a seekable transport such as a file.  (since 8.1)
+#
 # Features:
 #
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -502,7 +508,7 @@
           'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
           { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
           'validate-uuid', 'background-snapshot',
-           'zero-copy-send', 'postcopy-preempt'] }
+           'zero-copy-send', 'postcopy-preempt', 'suspend', 'fixed-ram'] }

 ##
 # @MigrationCapabilityStatus:
@@ -779,6 +785,9 @@
 #     Nodes are mapped to their block device name if there is one, and
 #     to their node name otherwise.  (Since 5.2)
 #
+# @direct-io: Open migration files with O_DIRECT when possible. Not
+#             all migration transports support this. (since 8.1)
+#
 # Features:
 #
 # @unstable: Member @x-checkpoint-delay is experimental.
@@ -800,7 +809,7 @@
           'xbzrle-cache-size', 'max-postcopy-bandwidth',
           'max-cpu-throttle', 'multifd-compression',
           'multifd-zlib-level' ,'multifd-zstd-level',
-           'block-bitmap-mapping' ] }
+           'block-bitmap-mapping', 'direct-io' ] }

 ##
 # @MigrateSetParameters:
@@ -935,6 +944,9 @@
 #     Nodes are mapped to their block device name if there is one, and
 #     to their node name otherwise.  (Since 5.2)
 #
+# @direct-io: Open migration files with O_DIRECT when possible. Not
+#             all migration transports support this. (since 8.1)
+#
 # Features:
 #
 # @unstable: Member @x-checkpoint-delay is experimental.
@@ -972,7 +984,8 @@
            '*multifd-compression': 'MultiFDCompression',
            '*multifd-zlib-level': 'uint8',
            '*multifd-zstd-level': 'uint8',
-            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
+            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
+            '*direct-io': 'bool' } }

 ##
 # @migrate-set-parameters:
@@ -1127,6 +1140,9 @@
 #     Nodes are mapped to their block device name if there is one, and
 #     to their node name otherwise.  (Since 5.2)
 #
+# @direct-io: Open migration files with O_DIRECT when possible. Not
+#             all migration transports support this. (since 8.1)
+#
 # Features:
 #
 # @unstable: Member @x-checkpoint-delay is experimental.
@@ -1161,7 +1177,8 @@
            '*multifd-compression': 'MultiFDCompression',
            '*multifd-zlib-level': 'uint8',
            '*multifd-zstd-level': 'uint8',
-            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
+            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
+            '*direct-io': 'bool' } }

 ##
 # @query-migrate-parameters:
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -23,7 +23,7 @@ import argparse
 import collections
 import struct
 import sys
-
+import math

 def mkdir_p(path):
    try:
@@ -119,11 +119,16 @@ class RamSection(object):
        self.file = file
        self.section_key = section_key
        self.TARGET_PAGE_SIZE = ramargs['page_size']
+        self.TARGET_PAGE_BITS = math.log2(self.TARGET_PAGE_SIZE)
        self.dump_memory = ramargs['dump_memory']
        self.write_memory = ramargs['write_memory']
+        self.fixed_ram = ramargs['fixed-ram']
        self.sizeinfo = collections.OrderedDict()
+        self.bitmap_offset = collections.OrderedDict()
+        self.pages_offset = collections.OrderedDict()
        self.data = collections.OrderedDict()
        self.data['section sizes'] = self.sizeinfo
+        self.ram_read = False
        self.name = ''
        if self.write_memory:
            self.files = { }
@@ -140,7 +145,13 @@ class RamSection(object):
    def getDict(self):
        return self.data

+    def write_or_dump_fixed_ram(self):
+        pass
+
    def read(self):
+        if self.fixed_ram and self.ram_read:
+            return
+
        # Read all RAM sections
        while True:
            addr = self.file.read64()
@@ -167,7 +178,26 @@ class RamSection(object):
                        f.truncate(0)
                        f.truncate(len)
                        self.files[self.name] = f
+
+                    if self.fixed_ram:
+                        bitmap_len = self.file.read32()
+                        # skip the pages_offset which we don't need
+                        offset = self.file.tell() + 8
+                        self.bitmap_offset[self.name] = offset
+                        offset = ((offset + bitmap_len + self.TARGET_PAGE_SIZE - 1) //
+                                  self.TARGET_PAGE_SIZE) * self.TARGET_PAGE_SIZE
+                        self.pages_offset[self.name] = offset
+                        self.file.file.seek(offset + len)
+
                flags &= ~self.RAM_SAVE_FLAG_MEM_SIZE
+                if self.fixed_ram:
+                    self.ram_read = True
+                # now we should rewind to the ram page offset of the first
+                # ram section
+                if self.fixed_ram:
+                    if self.write_memory or self.dump_memory:
+                        self.write_or_dump_fixed_ram()
+                        return

            if flags & self.RAM_SAVE_FLAG_COMPRESS:
                if flags & self.RAM_SAVE_FLAG_CONTINUE:
@@ -208,7 +238,7 @@ class RamSection(object):

            # End of RAM section
            if flags & self.RAM_SAVE_FLAG_EOS:
-                break
+                return

            if flags != 0:
                raise Exception("Unknown RAM flags: %x" % flags)
@@ -521,6 +551,7 @@ class MigrationDump(object):
        ramargs['page_size'] = self.vmsd_desc['page_size']
        ramargs['dump_memory'] = dump_memory
        ramargs['write_memory'] = write_memory
+        ramargs['fixed-ram'] = False
        self.section_classes[('ram',0)][1] = ramargs

        while True:
@@ -528,8 +559,20 @@ class MigrationDump(object):
            if section_type == self.QEMU_VM_EOF:
                break
            elif section_type == self.QEMU_VM_CONFIGURATION:
-                section = ConfigurationSection(file)
-                section.read()
+                config_desc = self.vmsd_desc.get('configuration')
+                if config_desc is not None:
+                    config = VMSDSection(file, 1, config_desc, 'configuration')
+                    config.read()
+                    caps = config.data.get("configuration/capabilities")
+                    if caps is not None:
+                        caps = caps.data["capabilities"]
+                        if type(caps) != list:
+                            caps = [caps]
+                        for i in caps:
+                            # chomp out string length
+                            cap = i.data[1:].decode("utf8")
+                            if cap == "fixed-ram":
+                                ramargs['fixed-ram'] = True
            elif section_type == self.QEMU_VM_SECTION_START or section_type == self.QEMU_VM_SECTION_FULL:
                section_id = file.read32()
                name = file.readstr()
--- a/tests/migration/guestperf/comparison.py
+++ b/tests/migration/guestperf/comparison.py
@@ -135,4 +135,39 @@ COMPARISONS = [
        Scenario("compr-multifd-channels-64",
                 multifd=True, multifd_channels=64),
    ]),
+
+    # Looking at the effect of fixed-ram + multifd with varying
+    # numbers of channels
+    Comparison("fixed-ram", scenarios = [
+        Scenario("multifd-channels-4",
+                 multifd=True, multifd_channels=4,
+                 fixed_ram=True, bandwidth=0, suspend=True),
+        Scenario("multifd-channels-8",
+                 multifd=True, multifd_channels=8,
+                 fixed_ram=True, bandwidth=0, suspend=True),
+        Scenario("multifd-channels-32",
+                 multifd=True, multifd_channels=32,
+                 fixed_ram=True, bandwidth=0, suspend=True),
+        Scenario("multifd-channels-64",
+                 multifd=True, multifd_channels=64,
+                 fixed_ram=True, bandwidth=0, suspend=True),
+
+        # with O_DIRECT
+        Scenario("dio-multifd-channels-4",
+                 multifd=True, multifd_channels=4,
+                 fixed_ram=True, bandwidth=0, suspend=True,
+                 direct_io=True),
+        Scenario("dio-multifd-channels-8",
+                 multifd=True, multifd_channels=8,
+                 fixed_ram=True, bandwidth=0, suspend=True,
+                 direct_io=True),
+        Scenario("dio-multifd-channels-32",
+                 multifd=True, multifd_channels=32,
+                 fixed_ram=True, bandwidth=0, suspend=True,
+                 direct_io=True),
+        Scenario("dio-multifd-channels-64",
+                 multifd=True, multifd_channels=64,
+                 fixed_ram=True, bandwidth=0, suspend=True,
+                 direct_io=True),
+    ]),
 ]
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -35,10 +35,11 @@ from qemu.machine import QEMUMachine
 class Engine(object):

    def __init__(self, binary, dst_host, kernel, initrd, transport="tcp",
-                 sleep=15, verbose=False, debug=False):
+                 sleep=15, verbose=False, debug=False, dst_file="/tmp/migfile"):

        self._binary = binary # Path to QEMU binary
        self._dst_host = dst_host # Hostname of target host
+        self._dst_file = dst_file # Path to file (for file transport)
        self._kernel = kernel # Path to kernel image
        self._initrd = initrd # Path to stress initrd
        self._transport = transport # 'unix' or 'tcp' or 'rdma'
@@ -203,6 +204,34 @@ class Engine(object):
            resp = dst.command("migrate-set-parameters",
                               multifd_channels=scenario._multifd_channels)

+        if scenario._fixed_ram:
+            resp = src.command("migrate-set-capabilities",
+                               capabilities = [
+                                   { "capability": "fixed-ram",
+                                     "state": True }
+                               ])
+            resp = dst.command("migrate-set-capabilities",
+                               capabilities = [
+                                   { "capability": "fixed-ram",
+                                     "state": True }
+                               ])
+
+        if scenario._direct_io:
+            resp = src.command("migrate-set-parameters",
+                               direct_io=scenario._direct_io)
+
+        if scenario._suspend:
+            resp = src.command("migrate-set-capabilities",
+                               capabilities = [
+                                   { "capability": "suspend",
+                                     "state": True }
+                               ])
+            resp = dst.command("migrate-set-capabilities",
+                               capabilities = [
+                                   { "capability": "suspend",
+                                     "state": True }
+                               ])
+
        resp = src.command("migrate", uri=connect_uri)

        post_copy = False
@@ -233,14 +262,31 @@ class Engine(object):
                    progress_history.append(progress)

                if progress._status == "completed":
-                    if self._verbose:
-                        print("Sleeping %d seconds for final guest workload run" % self._sleep)
-                    sleep_secs = self._sleep
-                    while sleep_secs > 1:
-                        time.sleep(1)
-                        src_qemu_time.append(self._cpu_timing(src_pid))
-                        src_vcpu_time.extend(self._vcpu_timing(src_pid, src_threads))
-                        sleep_secs -= 1
+                    print("Transferred %5dMB of non-zero pages (total %5dMB @%5dMB/s in %4dms)" % (
+                        progress._ram._normal_bytes / (1024 * 1024),
+                        progress._ram._total_bytes / (1024 * 1024),
+                        progress._ram._transfer_rate_mbs / 8,
+                        progress._duration,
+                    ))
+
+                    if connect_uri[0:5] == "file:":
+                        if self._verbose:
+                            print("Migrating incoming")
+                        dst.command("migrate-incoming", uri=connect_uri)
+
+                        while True:
+                            progress = self._migrate_progress(dst)
+                            if progress._status in ("completed", "failed", "cancelled"):
+                                break;
+                    else:
+                        if self._verbose:
+                            print("Sleeping %d seconds for final guest workload run" % self._sleep)
+                        sleep_secs = self._sleep
+                        while sleep_secs > 1:
+                            time.sleep(1)
+                            src_qemu_time.append(self._cpu_timing(src_pid))
+                            src_vcpu_time.extend(self._vcpu_timing(src_pid, src_threads))
+                            sleep_secs -= 1

                return [progress_history, src_qemu_time, src_vcpu_time]

@@ -357,7 +403,11 @@ class Engine(object):
        if self._dst_host != "localhost":
            tunnelled = True
        argv = self._get_common_args(hardware, tunnelled)
-        return argv + ["-incoming", uri]
+
+        incoming = ["-incoming", uri]
+        if uri[0:5] == "file:":
+            incoming = ["-incoming", "defer"]
+        return argv + incoming

    @staticmethod
    def _get_common_wrapper(cpu_bind, mem_bind):
@@ -417,6 +467,10 @@ class Engine(object):
                os.remove(monaddr)
            except:
                pass
+        elif self._transport == "file":
+            if self._dst_host != "localhost":
+                raise Exception("Use unix migration transport for non-local host")
+            uri = "file:%s" % self._dst_file

        if self._dst_host != "localhost":
            dstmonaddr = ("localhost", 9001)
@@ -453,6 +507,9 @@ class Engine(object):
            if self._dst_host == "localhost" and os.path.exists(dstmonaddr):
                os.remove(dstmonaddr)

+            if uri[0:5] == "file:" and os.path.exists(uri[5:]):
+                os.remove(uri[5:])
+
            if self._verbose:
                print("Finished migration")

--- a/tests/migration/guestperf/scenario.py
+++ b/tests/migration/guestperf/scenario.py
@@ -30,7 +30,8 @@ class Scenario(object):
                 auto_converge=False, auto_converge_step=10,
                 compression_mt=False, compression_mt_threads=1,
                 compression_xbzrle=False, compression_xbzrle_cache=10,
-                 multifd=False, multifd_channels=2):
+                 multifd=False, multifd_channels=2,
+                 fixed_ram=False, direct_io=False, suspend=False):

        self._name = name

@@ -60,6 +61,12 @@ class Scenario(object):
        self._multifd = multifd
        self._multifd_channels = multifd_channels

+        self._fixed_ram = fixed_ram
+
+        self._direct_io = direct_io
+
+        self._suspend = suspend
+
    def serialize(self):
        return {
            "name": self._name,
@@ -79,6 +86,9 @@ class Scenario(object):
            "compression_xbzrle_cache": self._compression_xbzrle_cache,
            "multifd": self._multifd,
            "multifd_channels": self._multifd_channels,
+            "fixed_ram": self._fixed_ram,
+            "direct_io": self._direct_io,
+            "suspend": self._suspend,
        }

    @classmethod
@@ -100,4 +110,7 @@ class Scenario(object):
            data["compression_xbzrle"],
            data["compression_xbzrle_cache"],
            data["multifd"],
-            data["multifd_channels"])
+            data["multifd_channels"],
+            data["fixed_ram"],
+            data["direct_io"],
+            data["suspend"])
--- a/tests/migration/guestperf/shell.py
+++ b/tests/migration/guestperf/shell.py
@@ -48,6 +48,7 @@ class BaseShell(object):
        parser.add_argument("--kernel", dest="kernel", default="/boot/vmlinuz-%s" % platform.release())
        parser.add_argument("--initrd", dest="initrd", default="tests/migration/initrd-stress.img")
        parser.add_argument("--transport", dest="transport", default="unix")
+        parser.add_argument("--dst-file", dest="dst_file")


        # Hardware args
@@ -71,7 +72,8 @@ class BaseShell(object):
                      transport=args.transport,
                      sleep=args.sleep,
                      debug=args.debug,
-                      verbose=args.verbose)
+                      verbose=args.verbose,
+                      dst_file=args.dst_file)

    def get_hardware(self, args):
        def split_map(value):
@@ -127,6 +129,15 @@ class Shell(BaseShell):
        parser.add_argument("--multifd-channels", dest="multifd_channels",
                            default=2, type=int)

+        parser.add_argument("--fixed-ram", dest="fixed_ram", default=False,
+                            action="store_true")
+
+        parser.add_argument("--direct-io", dest="direct_io", default=False,
+                            action="store_true")
+
+        parser.add_argument("--suspend", dest="suspend", default=False,
+                            action="store_true")
+
    def get_scenario(self, args):
        return Scenario(name="perfreport",
                        downtime=args.downtime,
@@ -150,7 +161,14 @@ class Shell(BaseShell):
                        compression_xbzrle_cache=args.compression_xbzrle_cache,

                        multifd=args.multifd,
-                        multifd_channels=args.multifd_channels)
+                        multifd_channels=args.multifd_channels,
+
+                        fixed_ram=args.fixed_ram,
+
+                        direct_io=args.direct_io,
+
+                        suspend=args.suspend)
+

    def run(self, argv):
        args = self._parser.parse_args(argv)
@@ -187,6 +205,7 @@ class BatchShell(BaseShell):

        parser.add_argument("--filter", dest="filter", default="*")
        parser.add_argument("--output", dest="output", default=os.getcwd())
+        parser.add_argument("--iters", dest="iters", default=1, type=int)

    def run(self, argv):
        args = self._parser.parse_args(argv)
@@ -202,22 +221,23 @@ class BatchShell(BaseShell):
            for comparison in COMPARISONS:
                compdir = os.path.join(args.output, comparison._name)
                for scenario in comparison._scenarios:
-                    name = os.path.join(comparison._name, scenario._name)
-                    if not fnmatch.fnmatch(name, args.filter):
+                    for n in range(args.iters):
+                        name = os.path.join(comparison._name, scenario._name)
+                        if not fnmatch.fnmatch(name, args.filter):
+                            if args.verbose:
+                                print("Skipping %s" % name)
+                            continue
+
                        if args.verbose:
-                            print("Skipping %s" % name)
-                        continue
+                            print("Running %s i=%d" % (name,n))

-                    if args.verbose:
-                        print("Running %s" % name)
-
-                    dirname = os.path.join(args.output, comparison._name)
-                    filename = os.path.join(dirname, scenario._name + ".json")
-                    if not os.path.exists(dirname):
-                        os.makedirs(dirname)
-                    report = engine.run(hardware, scenario)
-                    with open(filename, "w") as fh:
-                        print(report.to_json(), file=fh)
+                        dirname = os.path.join(args.output, comparison._name)
+                        filename = os.path.join(dirname, scenario._name + ".json")
+                        if not os.path.exists(dirname):
+                            os.makedirs(dirname)
+                        report = engine.run(hardware, scenario)
+                        with open(filename, "w") as fh:
+                            print(report.to_json(), file=fh)
        except Exception as e:
            print("Error: %s" % str(e), file=sys.stderr)
            if args.debug:
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -130,6 +130,25 @@ void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...)
    qobject_unref(rsp);
 }

+
+void migrate_incoming_qmp(QTestState *who, const char *uri, const char *fmt, ...)
+{
+    va_list ap;
+    QDict *args, *rsp;
+
+    va_start(ap, fmt);
+    args = qdict_from_vjsonf_nofail(fmt, ap);
+    va_end(ap);
+
+    g_assert(!qdict_haskey(args, "uri"));
+    qdict_put_str(args, "uri", uri);
+
+    rsp = qtest_qmp(who, "{ 'execute': 'migrate-incoming', 'arguments': %p}", args);
+
+    g_assert(qdict_haskey(rsp, "return"));
+    qobject_unref(rsp);
+}
+
 /*
 * Note: caller is responsible to free the returned object via
 * qobject_unref() after use
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -31,6 +31,10 @@ QDict *qmp_command(QTestState *who, const char *command, ...);
 G_GNUC_PRINTF(3, 4)
 void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...);

+G_GNUC_PRINTF(3, 4)
+void migrate_incoming_qmp(QTestState *who, const char *uri,
+                          const char *fmt, ...);
+
 QDict *migrate_query(QTestState *who);
 QDict *migrate_query_not_failed(QTestState *who);

--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -780,6 +780,7 @@ static void test_migrate_end(QTestState *from, QTestState *to, bool test_dest)
    cleanup("migsocket");
    cleanup("src_serial");
    cleanup("dest_serial");
+    cleanup("migfile");
 }

 #ifdef CONFIG_GNUTLS
@@ -1452,6 +1453,14 @@ static void test_precopy_common(MigrateCommon *args)
         * hanging forever if migration didn't converge */
        wait_for_migration_complete(from);

+        /*
+         * For file based migration the target must begin its migration after
+         * the source has finished
+         */
+        if (args->connect_uri && strstr(args->connect_uri, "file:")) {
+            migrate_incoming_qmp(to, args->connect_uri, "{}");
+        }
+
        if (!got_stop) {
            qtest_qmp_eventwait(from, "STOP");
        }
@@ -1639,6 +1648,137 @@ static void test_precopy_unix_compress_nowait(void)
    test_precopy_common(&args);
 }

+static void test_precopy_file_stream_ram(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:%s/migfile", tmpfs);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+    };
+
+    test_precopy_common(&args);
+}
+
+static void *migrate_fixed_ram_start_live(QTestState *from, QTestState *to)
+{
+    migrate_set_capability(from, "fixed-ram", true);
+    migrate_set_capability(to, "fixed-ram", true);
+
+    return NULL;
+}
+
+static void *migrate_fixed_ram_start_nonlive(QTestState *from, QTestState *to)
+{
+    migrate_fixed_ram_start_live(from, to);
+
+    migrate_set_capability(from, "suspend", true);
+    migrate_set_capability(to, "suspend", true);
+
+    return NULL;
+}
+
+static void test_precopy_file_fixed_ram_live(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:%s/migfile", tmpfs);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+        .start_hook = migrate_fixed_ram_start_live,
+    };
+
+    test_precopy_common(&args);
+}
+
+static void test_precopy_file_fixed_ram_nonlive(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:%s/migfile", tmpfs);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+        .start_hook = migrate_fixed_ram_start_nonlive,
+    };
+
+    test_precopy_common(&args);
+}
+
+static void *migrate_multifd_fixed_ram_start_live(QTestState *from, QTestState *to)
+{
+    migrate_fixed_ram_start_live(from, to);
+
+    migrate_set_parameter_int(from, "multifd-channels", 4);
+    migrate_set_parameter_int(to, "multifd-channels", 4);
+
+    migrate_set_capability(from, "multifd", true);
+    migrate_set_capability(to, "multifd", true);
+
+    return NULL;
+}
+
+static void *migrate_multifd_fixed_ram_start_nonlive(QTestState *from, QTestState *to)
+{
+    migrate_fixed_ram_start_nonlive(from, to);
+
+    migrate_set_parameter_int(from, "multifd-channels", 4);
+    migrate_set_parameter_int(to, "multifd-channels", 4);
+
+    migrate_set_capability(from, "multifd", true);
+    migrate_set_capability(to, "multifd", true);
+
+    return NULL;
+}
+
+static void *migrate_multifd_fixed_ram_dio_start_nonlive(QTestState *from, QTestState *to)
+{
+    migrate_fixed_ram_start_nonlive(from, to);
+
+    migrate_set_parameter_int(from, "multifd-channels", 4);
+    migrate_set_parameter_int(to, "multifd-channels", 4);
+
+    migrate_set_capability(from, "multifd", true);
+    migrate_set_capability(to, "multifd", true);
+
+    migrate_set_parameter_bool(from, "direct-io", true);
+    migrate_set_parameter_bool(to, "direct-io", true);
+
+    return NULL;
+}
+
+static void test_multifd_file_fixed_ram_live(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:%s/migfile", tmpfs);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+        .start_hook = migrate_multifd_fixed_ram_start_live,
+    };
+
+    test_precopy_common(&args);
+}
+
+static void test_multifd_file_fixed_ram_nonlive(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:%s/migfile", tmpfs);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+        .start_hook = migrate_multifd_fixed_ram_start_nonlive,
+    };
+
+    test_precopy_common(&args);
+}
+
+static void test_multifd_file_fixed_ram_dio_nonlive(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:%s/migfile", tmpfs);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+        .start_hook = migrate_multifd_fixed_ram_dio_start_nonlive,
+    };
+
+    test_precopy_common(&args);
+}
+
 static void test_precopy_tcp_plain(void)
 {
    MigrateCommon args = {
@@ -2668,6 +2808,23 @@ int main(int argc, char **argv)
        qtest_add_func("/migration/precopy/unix/compress/nowait",
                       test_precopy_unix_compress_nowait);
    }
+
+    qtest_add_func("/migration/precopy/file/stream-ram",
+                   test_precopy_file_stream_ram);
+
+    qtest_add_func("/migration/precopy/file/fixed-ram/nonlive",
+                   test_precopy_file_fixed_ram_nonlive);
+    qtest_add_func("/migration/precopy/file/fixed-ram/live",
+                   test_precopy_file_fixed_ram_live);
+
+    qtest_add_func("/migration/multifd/file/fixed-ram/nonlive",
+                   test_multifd_file_fixed_ram_nonlive);
+    qtest_add_func("/migration/multifd/file/fixed-ram/live",
+                   test_multifd_file_fixed_ram_live);
+
+    qtest_add_func("/migration/multifd/file/fixed-ram/dio/nonlive",
+                   test_multifd_file_fixed_ram_dio_nonlive);
+
 #ifdef CONFIG_GNUTLS
    qtest_add_func("/migration/precopy/unix/tls/psk",
                   test_precopy_unix_tls_psk);
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -277,6 +277,15 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive)
 }
 #endif

+bool qemu_has_direct_io(void)
+{
+#ifdef O_DIRECT
+    return true;
+#else
+    return false;
+#endif
+}
+
 static int qemu_open_cloexec(const char *name, int flags, mode_t mode)
 {
    int ret;
Author	SHA1	Message	Date
Fabiano Rosas	55e0704225	tests/migration/guestperf: Add --iter argument to batch runs Allow batch runs to be executed multiple times. This is useful for collecting averages. Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	97b72019ae	tests/migration/guestperf: Print completion statistics Print completion statistics at the end of a scenario run. Also take the "Running" message from under verbose to match the completion message. Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	973b9f33f1	tests/migration/guestperf: Add fixed-ram Comparisons Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	c25cc087e0	tests/migration/guestperf: Add file, fixed-ram, direct-io and suspend support Add support to the new migration features: - 'file' transport; - 'fixed-ram' stream format capability; - 'direct-io' parameter; - 'suspend' capability; Usage: $ ./guestperf.py --binary <path/to/qemu> --initrd <path/to/initrd-stress.img> \ --transport file --dst-file migfile --fixed-ram \ --suspend --direct-io --multifd --multifd-channels 4 \ --output fixed-ram.json --verbose Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	01792c9d70	tests/qtest: Add a test for migration with direct-io and multifd Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	f30740dfa7	migration: Add direct-io parameter Add the direct-io migration parameter that tells the migration code to use O_DIRECT when opening the migration stream file whenever possible. This is currently only used for the secondary channels of fixed-ram migration, which can guarantee that writes are page aligned. However the parameter could be made to affect other types of file-based migrations in the future. Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	7f26896b3a	tests/qtest: Add a multifd + fixed-ram migration test Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	62f83eefd8	migration/multifd: Support incoming fixed-ram stream format For the incoming fixed-ram migration we need to read the ramblock headers, get the pages bitmap and send the host address of each non-zero page to the multifd channel thread for writing. To read from the migration file we need a preadv function that can read into the iovs in segments of contiguous pages because (as in the writing case) the file offset applies to the entire iovec. Usage on HMP is: (qemu) migrate_set_capability multifd on (qemu) migrate_set_capability fixed-ram on (qemu) migrate_set_parameter max-bandwidth 0 (qemu) migrate_set_parameter multifd-channels 8 (qemu) migrate_incoming file:migfile (qemu) info status (qemu) c Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	907760146b	migration/multifd: Support outgoing fixed-ram stream format The new fixed-ram stream format uses a file transport and puts ram pages in the migration file at their respective offsets and can be done in parallel by using the pwritev system call which takes iovecs and an offset. Add support to enabling the new format along with multifd to make use of the threading and page handling already in place. This requires multifd to stop sending headers and leaving the stream format to the fixed-ram code. When it comes time to write the data, we need to call a version of qio_channel_write that can take an offset. Usage on HMP is: (qemu) stop (qemu) migrate_set_capability multifd on (qemu) migrate_set_capability fixed-ram on (qemu) migrate_set_parameter max-bandwidth 0 (qemu) migrate_set_parameter multifd-channels 8 (qemu) migrate file:migfile Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	cc6f04f1fb	migration/ram: Ignore multifd flush when doing fixed-ram migration Some functionalities of multifd are incompatible with the 'fixed-ram' migration format. The MULTIFD_FLUSH flag in particular is not used because in fixed-ram there is no sinchronicity between migration source and destination so there is not need for a sync packet. In fact, fixed-ram disables packets in multifd as a whole. Make sure RAM_SAVE_FLAG_MULTIFD_FLUSH is never emitted when fixed-ram is enabled. Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	84a425b3ad	migration/ram: Add a wrapper for fixed-ram shadow bitmap We'll need to set the shadow_bmap bits from outside ram.c soon and TARGET_PAGE_BITS is poisoned, so add a wrapper to it. Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	7d8fe86bce	io: Add a pwritev/preadv version that takes a discontiguous iovec For the upcoming support to fixed-ram migration with multifd, we need to be able to accept an iovec array with non-contiguous data. Add a pwritev and preadv version that splits the array into contiguous segments before writing. With that we can have the ram code continue to add pages in any order and the multifd code continue to send large arrays for reading and writing. Signed-off-by: Fabiano Rosas <farosas@suse.de> --- Since iovs can be non contiguous, we'd need a separate array on the side to carry an extra file offset for each of them, so I'm relying on the fact that iovs are all within a same host page and passing in an encoded offset that takes the host page into account.	2023-05-31 13:40:47 -03:00
Fabiano Rosas	bd3ab48639	migration/multifd: Add pages to the receiving side Currently multifd does not need to have knowledge of pages on the receiving side because all the information needed is within the packets that come in the stream. We're about to add support to fixed-ram migration, which cannot use packets because it expects the ramblock section in the migration file to contain only the guest pages data. Add a pointer to MultiFDPages in the multifd_recv_state and use the pages similarly to what we already do on the sending side. The pages are used to transfer data between the ram migration code in the main migration thread and the multifd receiving threads. Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	0f3db9233e	migration/multifd: Add incoming QIOChannelFile support On the receiving side we don't need to differentiate between main channel and threads, so whichever channel is defined first gets to be the main one. And since there are no packets, use the atomic channel count to index into the params array. Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	a8eceefaca	migration/multifd: Add outgoing QIOChannelFile support Allow multifd to open file-backed channels. This will be used when enabling the fixed-ram migration stream format which expects a seekable transport. The QIOChannel read and write methods will use the preadv/pwritev versions which don't update the file offset at each call so we can reuse the fd without re-opening for every channel. Note that this is just setup code and multifd cannot yet make use of the file channels. Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	5ad76d6887	migration/multifd: Allow multifd without packets For the upcoming support to the new 'fixed-ram' migration stream format, we cannot use multifd packets because each write into the ramblock section in the migration file is expected to contain only the guest pages. They are written at their respective offsets relative to the ramblock section header. There is no space for the packet information and the expected gains from the new approach come partly from being able to write the pages sequentially without extraneous data in between. The new format also doesn't need the packets and all necessary information can be taken from the standard migration headers with some (future) changes to multifd code. Use the presence of the fixed-ram capability to decide whether to send packets. For now this has no effect as fixed-ram cannot yet be enabled with multifd. Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	d8d45fd80a	migration/multifd: Remove direct "socket" references We're about to enable support for other transports in multifd, so remove direct references to sockets. Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	5ee018c957	migration: Add completion tracepoint Add a completion tracepoint that provides basic stats for debug. Displays throughput (MB/s and pages/s) and total time (ms). Usage: $QEMU ... -trace migration_status Output: migration_status 1506 MB/s, 436725 pages/s, 8698 ms Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	0e02f381ce	migration/ram: Fixup end of ram migration detection It seems we'd be better off by leaving the EOS flags in the stream. Otherwise any new flag that is added in the code will need to be excluded for fixed-ram because we cannot have a solitary flag without an EOS after it.	2023-05-31 13:40:47 -03:00
Nikolay Borisov	2d362f3616	tests/qtest: migration-test: Add tests for fixed-ram file-based migration Add basic tests for 'fixed-ram' migration. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Nikolay Borisov	4c19a8d4f6	migration/ram: Add support for 'fixed-ram' migration restore Add the necessary code to parse the format changes for the 'fixed-ram' capability. One of the more notable changes in behavior is that in the 'fixed-ram' case ram pages are restored in one go rather than constantly looping through the migration stream. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> --- (farosas) reused more of the common code by making the fixed-ram function take only one ramblock and calling it from inside parse_ramblock.	2023-05-31 13:40:47 -03:00
Nikolay Borisov	48a1cf2ee9	migration/ram: Add support for 'fixed-ram' outgoing migration Implement the outgoing migration side for the 'fixed-ram' capability. A bitmap is introduced to track which pages have been written in the migration file. Pages are written at a fixed location for every ramblock. Zero pages are ignored as they'd be zero in the destination migration as well. The migration stream is altered to put the dirty pages for a ramblock after its header instead of having a sequential stream of pages that follow the ramblock headers. Since all pages have a fixed location, RAM_SAVE_FLAG_EOS is no longer generated on every migration iteration. Without fixed-ram (current): ramblock 1 header\|ramblock 2 header\|...\|RAM_SAVE_FLAG_EOS\|stream of pages (iter 1)\|RAM_SAVE_FLAG_EOS\|stream of pages (iter 2)\|... With fixed-ram (new): ramblock 1 header\|ramblock 1 fixed-ram header\|ramblock 1 pages (fixed offsets)\|ramblock 2 header\|ramblock 2 fixed-ram header\|ramblock 2 pages (fixed offsets)\|...\|RAM_SAVE_FLAG_EOS where: - ramblock header: the generic information for a ramblock, such as idstr, used_len, etc. - ramblock fixed-ram header: the new information added by this feature: bitmap of pages written, bitmap size and offset of pages in the migration file. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Nikolay Borisov	e0da9dd208	migration/ram: Refactor precopy ram loading code To facilitate the implementation of the 'fixed-ram' migration restore, factor out the code responsible for parsing the ramblocks headers. This also makes ram_load_precopy easier to comprehend. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	6c89f7f052	migration/ram: Introduce 'fixed-ram' migration capability Add a new migration capability 'fixed-ram'. The core of the feature is to ensure that each ram page has a specific offset in the resulting migration stream. The reason why we'd want such behavior are two fold: - When doing a 'fixed-ram' migration the resulting file will have a bounded size, since pages which are dirtied multiple times will always go to a fixed location in the file, rather than constantly being added to a sequential stream. This eliminates cases where a vm with, say, 1G of ram can result in a migration file that's 10s of GBs, provided that the workload constantly redirties memory. - It paves the way to implement DIRECT_IO-enabled save/restore of the migration stream as the pages are ensured to be written at aligned offsets. For now, enabling the capability has no effect. The next couple of patches implement the core funcionality. Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:47 -03:00
Fabiano Rosas	fa8c3900cc	migration: Introduce a suspend capability Add a new capability to indicate whether the VM should be made to stop prior to migration. This allows QEMU to be more aggressive with dirty tracking optimizations and allows the management layer to be explicit on whether it expects the migration to be performed live. Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:46 -03:00
Nikolay Borisov	bb3e4bc43d	migration/qemu-file: add utility methods for working with seekable channels Add utility methods that will be needed when implementing 'fixed-ram' migration capability. qemu_file_is_seekable qemu_put_buffer_at qemu_get_buffer_at qemu_set_offset qemu_get_offset Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> --- fixed total_transferred accounting restructured to use qio_channel_file_preadv instead of the _full variant	2023-05-31 13:40:46 -03:00
Nikolay Borisov	56651ee748	io: implement io_pwritev/preadv for QIOChannelFile The upcoming 'fixed-ram' feature will require qemu to write data to (and restore from) specific offsets of the migration file. Add a minimal implementation of pwritev/preadv and expose them via the io_pwritev and io_preadv interfaces. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2023-05-31 13:40:46 -03:00
Nikolay Borisov	3d37ee9440	io: Add generic pwritev/preadv interface Introduce basic pwritev/preadv support in the generic channel layer. Specific implementation will follow for the file channel as this is required in order to support migration streams with fixed location of each ram page. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:46 -03:00
Nikolay Borisov	a67a096988	io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Add a generic QIOChannel feature SEEKABLE which would be used by the qemu_file* apis. For the time being this will be only implemented for file channels. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2023-05-31 13:40:46 -03:00
Nikolay Borisov	7b7e8ad9f6	migration: Initial support of fixed-ram feature for analyze-migration.py In order to allow analyze-migration.py script to work with migration streams that have the 'fixed-ram' capability, it's required to have access to the stream's configuration object. This commit enables this by making migration json writer part of MigrationState struct, allowing the configuration object be serialized to json. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:46 -03:00
Nikolay Borisov	0c66e9da30	tests/qtest: migration-test: Add tests for file-based migration Add basic tests for file-based migration. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> --- (farosas) fix segfault when connect_uri is not set	2023-05-31 13:40:46 -03:00
Nikolay Borisov	daf2b307a8	tests/qtest: migration: Add migrate_incoming_qmp helper file-based migration requires the target to initiate its migration after the source has finished writing out the data in the file. Currently there's no easy way to initiate 'migrate-incoming', allow this by introducing migrate_incoming_qmp helper, similarly to migrate_qmp. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2023-05-31 13:40:46 -03:00
Nikolay Borisov	df9f67cd74	migration: Add support for 'file:' uri for incoming migration This is a counterpart to the 'file:' uri support for source migration, now a file can also serve as the source of an incoming migration. Unlike other migration protocol backends, the 'file' protocol cannot honour non-blocking mode. POSIX file/block storage will always report ready to read/write, regardless of how slow the underlying storage will be at servicing the request. For incoming migration this limitation may result in the main event loop not being fully responsive while loading the VM state. This won't impact the VM since it is not running at this phase, however, it may impact management applications. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:46 -03:00
Nikolay Borisov	13db248e8b	migration: Add support for 'file:' uri for source migration Implement support for a "file:" uri so that a migration can be initiated directly to a file from QEMU. Unlike other migration protocol backends, the 'file' protocol cannot honour non-blocking mode. POSIX file/block storage will always report ready to read/write, regardless of how slow the underlying storage will be at servicing the request. For outgoing migration this limitation is not a serious problem as the migration data transfer always happens in a dedicated thread. It may, however, result in delays in honouring a request to cancel the migration operation. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2023-05-31 13:40:46 -03:00