Commit Graph

482 Commits

Author SHA1 Message Date
zhanghailiang
479125d53e COLO: fix setting checkpoint-delay not working properly
If we set checkpoint-delay through command 'migrate-set-parameters',
It will not take effect until we finish last sleep chekpoint-delay,
That's will be offensive espeically when we want to change its value
from an extreme big one to a proper value.

Fix it by using timer to realize checkpoint-delay.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Message-Id: <1484657864-21708-2-git-send-email-zhang.zhanghailiang@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-02-13 17:27:13 +00:00
Pavel Butsykin
ced1c6166e migration: discard non-dirty ram pages after the start of postcopy
After the start of postcopy migration there are some non-dirty pages which have
already been migrated. These pages are no longer needed on the source vm so that
we can free them and it doen't hurt to complete the migration.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Message-Id: <20170203152321.19739-4-pbutsykin@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-02-13 17:27:13 +00:00
Pavel Butsykin
53f09a1076 add 'release-ram' migrate capability
This feature frees the migrated memory on the source during postcopy-ram
migration. In the second step of postcopy-ram migration when the source vm
is put on pause we can free unnecessary memory. It will allow, in particular,
to start relaxing the memory stress on the source host in a load-balancing
scenario.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Message-Id: <20170203152321.19739-3-pbutsykin@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
   Manually merged in Pavel's 'migration: madvise error_report fixup!'
2017-02-13 17:27:13 +00:00
Dr. David Alan Gilbert
ef8d6488d2 postcopy: Recover block devices on early failure
An early postcopy failure can be recovered from as long as we know
we haven't sent the command to run the destination.
We have to undo the bdrv_inactivate_all by calling
bdrv_invalidate_cache_all

Note that I'm not using ms->block_inactive because once we've
sent the postcopy package we dont want anything else to try
and recover the block storage on the source; the destination
might have started writing to it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170202155909.31784-3-dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-02-06 13:36:49 +01:00
Juan Quintela
b4b076daf3 migration: create Migration Incoming State at init time
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1485207141-1941-3-git-send-email-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-02-06 13:36:49 +01:00
Pankaj Gupta
009fad7f4c migration: Change name of live migration thread
Change the name of live migration thread from 'migration'
to 'live_migration' to identify it clearly. 'migration'
is a generic word and kernel also has  tasks for process
migration with the name 'migration/cpu#'.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Message-Id: <1485178976-15225-1-git-send-email-pagupta@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-01-24 18:00:31 +00:00
zhanghailiang
1d2acc3162 migration: re-active images while migration been canceled after inactive them
commit fe904ea824 fixed a case
which migration aborted QEMU because it didn't regain the control
of images while some errors happened.

Actually, there are another two cases can trigger the same error reports:
" bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed",

Case 1, codes path:
migration_thread()
    migration_completion()
        bdrv_inactivate_all() ----------------> inactivate images
        qemu_savevm_state_complete_precopy()
            socket_writev_buffer() --------> error because destination fails
                qemu_fflush() ----------------> set error on migration stream
-> qmp_migrate_cancel() ----------------> user cancelled migration concurrently
    -> migrate_set_state() ------------------> set migrate CANCELLIN
    migration_completion() -----------------> go on to fail_invalidate
	if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch

Case 2, codes path:
migration_thread()
    migration_completion()
        bdrv_inactivate_all() ----------------> inactivate images
    migreation_completion() finished
-> qmp_migrate_cancel() ---------------> user cancelled migration concurrently
    qemu_mutex_lock_iothread();
    qemu_bh_schedule (s->cleanup_bh);

As we can see from above, qmp_migrate_cancel can slip in whenever
migration_thread does not hold the global lock. If this happens after
bdrv_inactive_all() been called, the above error reports will appear.

To prevent this, we can call bdrv_invalidate_cache_all() in qmp_migrate_cancel()
directly if we find images become inactive.

Besides, bdrv_invalidate_cache_all() in migration_completion() doesn't have the
protection of big lock, fix it by add the missing qemu_mutex_lock_iothread();

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Message-Id: <1485244792-11248-1-git-send-email-zhang.zhanghailiang@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-01-24 18:00:31 +00:00
Ashijeet Acharya
b67b8c3a9d migration: Fail migration blocker for --only-migratable
migrate_add_blocker should rightly fail if the '--only-migratable'
option was specified and the device in use should not be able to
perform the action which results in an unmigratable VM.

Make migrate_add_blocker return -EACCES in this case.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Message-Id: <1484566314-3987-6-git-send-email-ashijeetacharya@gmail.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-01-24 18:00:31 +00:00
Ashijeet Acharya
fe44dc9180 migration: disallow migrate_add_blocker during migration
If a migration is already in progress and somebody attempts
to add a migration blocker, this should rightly fail.

Add an errp parameter and a retcode return value to migrate_add_blocker.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Message-Id: <1484566314-3987-5-git-send-email-ashijeetacharya@gmail.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Merged with recent 'Allow invtsc migration' change
2017-01-24 18:00:30 +00:00
zhanghailiang
fe39a4d440 migration: fix missing assignment for has_x_checkpoint_delay
We forgot to assign true to params->has_x_checkpoint_delay parameter
in qmp_query_migrate_parameters.

Without this, qmp command 'query-migrate-parameters' doesn't show the
default value for x-checkpoint-delay option.

This also fixes the fact that HMP was relying on unspecified behavior by
reading x_checkpoint_delay without checking has_x_checkpoint_delay.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2016-11-14 14:50:56 +01:00
Peter Maydell
eab9e9629c Merge remote-tracking branch 'remotes/amit-migration/tags/migration-for-2.8' into staging
Migration bits from the COLO project

# gpg: Signature made Sun 30 Oct 2016 10:39:55 GMT
# gpg:                using RSA key 0xEB0B4DFC657EF670
# gpg: Good signature from "Amit Shah <amit@amitshah.net>"
# gpg:                 aka "Amit Shah <amit@kernel.org>"
# gpg:                 aka "Amit Shah <amitshah@gmx.net>"
# Primary key fingerprint: 48CA 3722 5FE7 F4A8 B337  2735 1E9A 3B5F 8540 83B6
#      Subkey fingerprint: CC63 D332 AB8F 4617 4529  6534 EB0B 4DFC 657E F670

* remotes/amit-migration/tags/migration-for-2.8:
  MAINTAINERS: Add maintainer for COLO framework related files
  configure: Support enable/disable COLO feature
  docs: Add documentation for COLO feature
  COLO: Implement failover work for secondary VM
  COLO: Implement the process of failover for primary VM
  COLO: Introduce state to record failover process
  COLO: Add 'x-colo-lost-heartbeat' command to trigger failover
  COLO: Synchronize PVM's state to SVM periodically
  COLO: Add checkpoint-delay parameter for migrate-set-parameters
  COLO: Load VMState into QIOChannelBuffer before restore it
  COLO: Send PVM state to secondary side when do checkpoint
  COLO: Add a new RunState RUN_STATE_COLO
  COLO: Introduce checkpointing protocol
  COLO: Establish a new communicating path for COLO
  migration: Switch to COLO process after finishing loadvm
  migration: Enter into COLO mode after migration if COLO is enabled
  COLO: migrate COLO related info to secondary node
  migration: Introduce capability 'x-colo' to migration

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-31 13:06:38 +00:00
Peter Maydell
277d44f5a6 Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-fetch' into staging
trivial patches for 2016-10-28

# gpg: Signature made Fri 28 Oct 2016 16:17:51 BST
# gpg:                using RSA key 0x701B4F6B1A693E59
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 8044 65C5
#      Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931  4B22 701B 4F6B 1A69 3E59

* remotes/mjt/tags/trivial-patches-fetch: (23 commits)
  Fix build for less common build directories names
  clean-up: removed duplicate #includes
  scripts/clean-includes: added duplicate #include check
  monitor: deprecate 'default' option
  qemu-ga: Remove stray 'q' in documentation
  Makefile: Fix help text for target 'installer'
  s390: avoid always-true comparison in s390_pci_generate_fid()
  migration: Remove unneeded NULL check from migrate_fd_error()
  scripts/hxtool: fix undefined behavour of echo
  qemu-options.hx: set: fix copy-paste error
  usb: Change *_exitfn return type from int to void
  MAINTAINERS: qemu-trivial information
  colo-compare: remove unused struct CompareChardevProps and 'props' variable
  milkymist-pfpu: fix potential integer overflow
  hw/block/nvme: Simplify if-statements a little bit
  target-lm32: rewrite gen_compare()
  lm32: milkymist-tmu2: fix integer overflow
  target-lm32: disable asm logging via LOG_DIS()
  target-lm32: swap operand of wcsr in LOG_DIS()
  target-lm32: fix LOG_DIS operand order
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-31 11:58:30 +00:00
zhanghailiang
68b5359187 COLO: Add checkpoint-delay parameter for migrate-set-parameters
Add checkpoint-delay parameter for migrate-set-parameters, so that
we can control the checkpoint frequency when COLO is in periodic mode.

Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>
2016-10-30 15:17:39 +05:30
zhanghailiang
25d0c16f62 migration: Switch to COLO process after finishing loadvm
Switch from normal migration loadvm process into COLO checkpoint process if
COLO mode is enabled.

We add three new members to struct MigrationIncomingState,
'have_colo_incoming_thread' and 'colo_incoming_thread' record the COLO
related thread for secondary VM, 'migration_incoming_co' records the
original migration incoming coroutine.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>
2016-10-30 15:17:39 +05:30
zhanghailiang
0b827d5e72 migration: Enter into COLO mode after migration if COLO is enabled
Add a new migration state: MIGRATION_STATUS_COLO. Migration source side
enters this state after the first live migration successfully finished
if COLO is enabled by command 'migrate_set_capability x-colo on'.

We reuse migration thread, so the process of checkpointing will be handled
in migration thread.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>
2016-10-30 15:17:39 +05:30
zhanghailiang
35a6ed4f71 migration: Introduce capability 'x-colo' to migration
We add helper function colo_supported() to indicate whether
colo is supported or not, with which we use to control whether or not
showing 'x-colo' string to users, they can use qmp command
'query-migrate-capabilities' or hmp command 'info migrate_capabilities'
to learn if colo is supported.

The default value for COLO (COarse-Grain LOck Stepping) is disabled.

Cc: Juan Quintela <quintela@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>
2016-10-30 15:17:39 +05:30
Peter Maydell
25174055f4 migration: Remove unneeded NULL check from migrate_fd_error()
All the callers of migrate_fd_error() pass a non-NULL
error parameter, and if any did pass NULL then we would
segfault in error_copy(), so remove the unnecessary
NULL check earlier in the function.
(Spotted by Coverity.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-10-28 18:17:23 +03:00
Daniel P. Berrange
6f01f136af migration: set name for all I/O channels created
Ensure that all I/O channels created for migration are given names
to distinguish their respective roles.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-10-27 09:13:10 +02:00
Ashijeet Acharya
2ff3025797 migrate: move max-bandwidth and downtime-limit to migrate_set_parameter
Mark the old commands 'migrate_set_speed' and 'migrate_set_downtime' as
deprecated.
Move max-bandwidth and downtime-limit into migrate-set-parameters for
setting maximum migration speed and expected downtime limit parameters
respectively.
Change downtime units to milliseconds (only for new-command) and set
its upper bound limit to 2000 seconds.
Update the query part in both hmp and qmp qemu control interfaces.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2016-10-13 17:23:53 +02:00
Ashijeet Acharya
091ecc8b69 migrate: Fix bounds check for migration parameters in migration.c
This patch fixes the out-of-bounds check of migration parameters in
qmp_migrate_set_parameters() for cpu-throttle-initial and
cpu-throttle-increment by adding a return statement for both as they
were broken since their introduction in 2.5 via commit 1626fee.
Due to the missing return statements, parameters were getting set to
out-of-bounds values despite the error.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2016-10-13 17:23:53 +02:00
Eric Blake
7f375e0446 migrate: Use boxed qapi for migrate-set-parameters
Now that QAPI makes it easy to pass a struct around, we don't
have to declare as many parameters or local variables.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2016-10-13 17:23:53 +02:00
Eric Blake
de63ab6124 migrate: Share common MigrationParameters struct
It is rather verbose, and slightly error-prone, to repeat
the same set of parameters for input (migrate-set-parameters)
as for output (query-migrate-parameters), where the only
difference is whether the members are optional.  We can just
document that the optional members will always be present
on output, and then share a common struct between both
commands.  The next patch can then reduce the amount of
code needed on input.

Also, we made a mistake in qemu 2.7 of returning an empty
string during 'query-migrate-parameters' when there is no
TLS, rather than omitting TLS details entirely.  Technically,
this change risks breaking any 2.7 client that is hard-coded
to expect the parameter's existence; on the other hand, clients
that are portable to 2.6 already must be prepared for those
members to not be present.

And this gets rid of yet one more place where the QMP output
visitor is silently converting a NULL string into "" (which
is a hack I ultimately want to kill off).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2016-10-13 17:23:53 +02:00
Dr. David Alan Gilbert
42da5550d6 migration: set state to post-migrate on failure
If a migration fails/is cancelled during the postcopy stage we currently
end up with the runstate as finish-migrate, where it should be post-migrate.
There's a small window in precopy where I think the same thing can
happen, but I've never seen it.

It rarely matters; the only postcopy case is if you restart a migration, which
again is a case that rarely matters in postcopy because it's only
safe to restart the migration if you know the destination hasn't
been running (which you might if you started the destination with -S
and hadn't got around to 'c' ing it before the postcopy failed).
Even then it's a small window but potentially you could hit if
there's a problem loading the devices on the destination.

This corresponds to:
https://bugzilla.redhat.com/show_bug.cgi?id=1355683

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Message-Id: <1468601086-32117-1-git-send-email-dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-07-22 13:23:09 +05:30
Paolo Bonzini
0b8b8753e4 coroutine: move entry argument to qemu_coroutine_create
In practice the entry argument is always known at creation time, and
it is confusing that sometimes qemu_coroutine_enter is used with a
non-NULL argument to re-enter a coroutine (this happens in
block/sheepdog.c and tests/test-coroutine.c).  So pass the opaque value
at creation time, for consistency with e.g. aio_bh_new.

Mostly done with the following semantic patch:

@ entry1 @
expression entry, arg, co;
@@
- co = qemu_coroutine_create(entry);
+ co = qemu_coroutine_create(entry, arg);
  ...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);

@ entry2 @
expression entry, arg;
identifier co;
@@
- Coroutine *co = qemu_coroutine_create(entry);
+ Coroutine *co = qemu_coroutine_create(entry, arg);
  ...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);

@ entry3 @
expression entry, arg;
@@
- qemu_coroutine_enter(qemu_coroutine_create(entry), arg);
+ qemu_coroutine_enter(qemu_coroutine_create(entry, arg));

@ reentry @
expression co;
@@
- qemu_coroutine_enter(co, NULL);
+ qemu_coroutine_enter(co);

except for the aforementioned few places where the semantic patch
stumbled (as expected) and for test_co_queue, which would otherwise
produce an uninitialized variable warning.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13 13:26:02 +02:00
Peter Maydell
4d88513157 migration: Don't use *_to_cpup() and cpu_to_*w()
The *_to_cpup() and cpu_to_*w() functions just compose a pointer
dereference with a byteswap. Instead use ld*_p() and st*_p(),
which handle potential pointer misalignment and avoid the need
to cast the pointer.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Message-Id: <1465574962-2710-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-06-17 18:23:49 +05:30
Daniel P. Berrange
22724f4921 migration: rename functions to starting migrations
Apply the following renames for starting incoming migration:

 process_incoming_migration -> migration_fd_process_incoming
 migration_set_incoming_channel -> migration_channel_process_incoming
 migration_tls_set_incoming_channel -> migration_tls_channel_process_incoming

and for starting outgoing migration:

 migration_set_outgoing_channel -> migration_channel_connect
 migration_tls_set_outgoing_channel -> migration_tls_channel_connect

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1464776234-9910-3-git-send-email-berrange@redhat.com
Message-Id: <1464776234-9910-3-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-06-16 09:51:37 +05:30
Dr. David Alan Gilbert
096631bd95 Postcopy: Check for support when setting the capability
Knowing whether the destination host supports migration with
postcopy can be tricky.
The destination doesn't need the capability set, however
if we set it then use the opportunity to do the test and
tell the user/management layer early.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 1465816605-29488-7-git-send-email-dgilbert@redhat.com
Message-Id: <1465816605-29488-7-git-send-email-dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-06-16 09:50:07 +05:30
Dr. David Alan Gilbert
d3bf5418e2 Postcopy: Add stats on page requests
On the source, add a count of page requests received from the
destination.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-id: 1465816605-29488-4-git-send-email-dgilbert@redhat.com
Message-Id: <1465816605-29488-4-git-send-email-dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-06-16 09:50:07 +05:30
Dr. David Alan Gilbert
a22463a5dc Migration: Split out ram part of qmp_query_migrate
The RAM section of qmp_query_migrate is reasonably complex
and repeated 3 times.  Split it out into a helper.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1465816605-29488-3-git-send-email-dgilbert@redhat.com
Reviwed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <1465816605-29488-3-git-send-email-dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-06-16 09:50:07 +05:30
Daniel P. Berrange
e122636562 migration: add support for encrypting data with TLS
This extends the migration_set_incoming_channel and
migration_set_outgoing_channel methods so that they
will automatically wrap the QIOChannel in a
QIOChannelTLS instance if TLS credentials are configured
in the migration parameters.

This allows TLS to work for tcp, unix, fd and exec
migration protocols. It does not (currently) work for
RDMA since it does not use these APIs, but it is
unlikely that TLS would be desired with RDMA anyway
since it would degrade the performance to that seen
with TCP defeating the purpose of using RDMA.

On the target host, QEMU would be launched with a set
of TLS credentials for a server endpoint

 $ qemu-system-x86_64 -monitor stdio -incoming defer \
    -object tls-creds-x509,dir=/home/berrange/security/qemutls,endpoint=server,id=tls0 \
    ...other args...

To enable incoming TLS migration 2 monitor commands are
then used

  (qemu) migrate_set_str_parameter tls-creds tls0
  (qemu) migrate_incoming tcp:myhostname:9000

On the source host, QEMU is launched in a similar
manner but using client endpoint credentials

 $ qemu-system-x86_64 -monitor stdio \
    -object tls-creds-x509,dir=/home/berrange/security/qemutls,endpoint=client,id=tls0 \
    ...other args...

To enable outgoing TLS migration 2 monitor commands are
then used

  (qemu) migrate_set_str_parameter tls-creds tls0
  (qemu) migrate tcp:otherhostname:9000

Thanks to earlier improvements to error reporting,
TLS errors can be seen 'info migrate' when doing a
detached migration. For example:

  (qemu) info migrate
  capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off
  Migration status: failed
  total time: 0 milliseconds
  error description: TLS handshake failed: The TLS connection was non-properly terminated.

Or

  (qemu) info migrate
  capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off
  Migration status: failed
  total time: 0 milliseconds
  error description: Certificate does not match the hostname localhost

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-27-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:32:13 +05:30
Daniel P. Berrange
69ef1f36b0 migration: define 'tls-creds' and 'tls-hostname' migration parameters
Define two new migration parameters to be used with TLS encryption.
The 'tls-creds' parameter provides the ID of an instance of the
'tls-creds' object type, or rather a subclass such as 'tls-creds-x509'.
Providing these credentials will enable use of TLS on the migration
data stream.

If using x509 certificates, together with a migration URI that does
not include a hostname, the 'tls-hostname' parameter provides the
hostname to use when verifying the server's x509 certificate. This
allows TLS to be used in combination with fd: and exec: protocols
where a TCP connection is established by a 3rd party outside of
QEMU.

NB, this requires changing the migrate_set_parameter method in the
HMP to accept a 's' (string) value instead of 'i' (integer). This
is backwards compatible, because the parsing of strings allows the
quotes to be optional, thus any integer is also a valid string.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-26-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:32:10 +05:30
Daniel P. Berrange
2594f56d4c migration: don't use an array for storing migrate parameters
The MigrateState struct uses an array for storing migration
parameters. This presumes that all future parameters will
be integers too, which is not going to be the case. There
is no functional reason why an array is used, if anything
it makes the code less clear. The QAPI schema already
defines a struct - MigrationParameters - capable of storing
all the individual parameters, so just use that instead of
an array.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-25-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:32:07 +05:30
Daniel P. Berrange
527792fae6 migration: convert exec socket protocol to use QIOChannel
Convert the exec socket migration protocol driver to use
QIOChannel and QEMUFileChannel, instead of the stdio
popen APIs. It can be unconditionally built because the
QIOChannelCommand class can report suitable error messages
on platforms which can't fork processes.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-17-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:47 +05:30
Daniel P. Berrange
64802ee57f migration: convert fd socket protocol to use QIOChannel
Convert the fd socket migration protocol driver to use
QIOChannel and QEMUFileChannel, instead of plain sockets
APIs. It can be unconditionally built because the
QIOChannel APIs it uses will take care to report suitable
error messages if needed.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-16-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:45 +05:30
Daniel P. Berrange
d984464eb9 migration: convert unix socket protocol to use QIOChannel
Convert the unix socket migration protocol driver to use
QIOChannel and QEMUFileChannel, instead of plain sockets
APIs. It can be unconditionally built, since the socket
impl of QIOChannel will report a suitable error on platforms
where UNIX sockets are unavailable.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-13-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:37 +05:30
Daniel P. Berrange
61b67d473d migration: convert post-copy to use QIOChannelBuffer
The post-copy code does some I/O to/from an intermediate
in-memory buffer rather than direct to the underlying
I/O channel. Switch this code to use QIOChannelBuffer
instead of QEMUSizedBuffer.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-12-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:34 +05:30
Daniel P. Berrange
d59ce6f344 migration: add reporting of errors for outgoing migration
Currently if an application initiates an outgoing migration,
it may or may not, get an error reported back on failure. If
the error occurs synchronously to the 'migrate' command
execution, the client app will see the error message. This
is the case for DNS lookup failures. If the error occurs
asynchronously to the monitor command though, the error
will be thrown away and the client left guessing about
what went wrong. This is the case for failure to connect
to the TCP server (eg due to wrong port, or firewall
rules, or other similar errors).

In the future we'll be adding more scope for errors to
happen asynchronously with the TLS protocol handshake.
TLS errors are hard to diagnose even when they are well
reported, so discarding errors entirely will make it
impossible to debug TLS connection problems.

Management apps which do migration are already using
'query-migrate' / 'info migrate' to check up on progress
of background migration operations and to see their end
status. This is a fine place to also include the error
message when things go wrong.

This patch thus adds an 'error-desc' field to the
MigrationInfo struct, which will be populated when
the 'status' is set to 'failed':

(qemu) migrate -d tcp:localhost:9001
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off
Migration status: failed (Error connecting to socket: Connection refused)
total time: 0 milliseconds

In the HMP, when doing non-detached migration, it is
also possible to display this error message directly
to the app.

(qemu) migrate tcp:localhost:9001
Error connecting to socket: Connection refused

Or with QMP

  {
    "execute": "query-migrate",
    "arguments": {}
  }
  {
    "return": {
      "status": "failed",
      "error-desc": "address resolution failed for myhost:9000: No address associated with hostname"
    }
  }

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-11-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:30 +05:30
Daniel P. Berrange
48f07489ed migration: add helpers for creating QEMUFile from a QIOChannel
Currently creating a QEMUFile instance from a QIOChannel is
quite simple only requiring a single call to
qemu_fopen_channel_input or  qemu_fopen_channel_output
depending on the end of migration connection.

When QEMU gains TLS support, however, there will need to be
a TLS negotiation done inbetween creation of the QIOChannel
and creation of the final QEMUFile. Introduce some helper
methods that will encapsulate this logic, isolating the
migration protocol drivers from knowledge about TLS.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Acked-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-10-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:27 +05:30
Daniel P. Berrange
9e4d2b98ee migration: force QEMUFile to blocking mode for outgoing migration
Instead of relying on the default QEMUFile I/O blocking flag
state, explicitly turn on blocking I/O for outgoing migration
since it takes place in a background thread.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-8-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:21 +05:30
Daniel P. Berrange
06ad513532 migration: introduce set_blocking function in QEMUFileOps
Remove the assumption that every QEMUFile implementation has
a file descriptor available by introducing a new function
in QEMUFileOps to change the blocking state of a QEMUFile.

If not set, it will fallback to the original code using
the get_fd method.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-7-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:19 +05:30
Greg Kurz
fe904ea824 migration: regain control of images when migration fails to complete
We currently have an error path during migration that can cause
the source QEMU to abort:

migration_thread()
  migration_completion()
    runstate_is_running() ----------------> true if guest is running
    bdrv_inactivate_all() ----------------> inactivate images
    qemu_savevm_state_complete_precopy()
     ... qemu_fflush()
           socket_writev_buffer() --------> error because destination fails
         qemu_fflush() -------------------> set error on migration stream
  migration_completion() -----------------> set migrate state to FAILED
migration_thread() -----------------------> break migration loop
  vm_start() -----------------------------> restart guest with inactive
                                            images

and you get:

qemu-system-ppc64: socket_writev_buffer: Got err=104 for (32768/18446744073709551615)
qemu-system-ppc64: /home/greg/Work/qemu/qemu-master/block/io.c:1342:bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed.
Aborted (core dumped)

If we try postcopy with a similar scenario, we also get the writev error
message but QEMU leaves the guest paused because entered_postcopy is true.

We could possibly do the same with precopy and leave the guest paused.
But since the historical default for migration errors is to restart the
source, this patch adds a call to bdrv_invalidate_cache_all() instead.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Message-Id: <146357896785.6003.11983081732454362715.stgit@bahia.huguette.org>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-23 22:19:36 +05:30
Greg Kurz
24f3902b08 savevm: fail if migration blockers are present
QEMU has currently two ways to prevent migration to occur:
- migration blocker when it depends on runtime state
- VMStateDescription.unmigratable when migration is not supported at all

This patch gathers all the logic into a single function to be called from
both the savevm and the migrate paths.

This fixes a bug with 9p, at least, where savevm would succeed and the
following would happen in the guest after loadvm:

$ ls /host
ls: cannot access /host: Protocol error

With this patch:

(qemu) savevm foo
Migration is disabled when VirtFS export path '/' is mounted in the guest
using mount_tag 'host'

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <146239057139.11271.9011797645454781543.stgit@bahia.huguette.org>

[Update subject according to Paolo's suggestion - Amit]

Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-23 21:44:08 +05:30
Jason J. Herne
d85a31d1f4 migration: Promote improved autoconverge commands out of experimental state
The new autoconverge throttling commands have been tested for a release now. It
is time to move them out of the experimental state.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Message-Id: <1461262038-8197-1-git-send-email-jjherne@linux.vnet.ibm.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-23 16:05:09 +05:30
Stefan Weil
cb8d4c8f54 Fix some typos found by codespell
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:27 +03:00
Veronia Bahaa
f348b6d1a5 util: move declarations out of qemu-common.h
Move declarations out of qemu-common.h for functions declared in
utils/ files: e.g. include/qemu/path.h for utils/path.c.
Move inline functions out of qemu-common.h and into new files (e.g.
include/qemu/bcd.h)

Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Dr. David Alan Gilbert
32c3db5b26 postcopy: Remove the x-
Postcopy seems to have survived a cycle with only a few fixes,
and Jiri has the current libvirt wired up and working
( https://www.redhat.com/archives/libvir-list/2016-March/msg00080.html )
so remove the experimental tag.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1457690016-9070-3-git-send-email-dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-11 17:53:59 +05:30
Peter Xu
568b01caf3 migration: fix warning for source_return_path_thread
max_len is not necessary, while it brings a warning during compilation
when specify "-Wstack-usage=1000000". Replacing using sizeof().

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1457503932-31763-1-git-send-email-peterx@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-11 12:58:37 +05:30
Dr. David Alan Gilbert
614e8018ed Postcopy: Fix sync count in info migrate
I'd missed the sync count off in the postcopy case.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Message-id: 1456394631-18010-1-git-send-email-dgilbert@redhat.com
Message-Id: <1456394631-18010-1-git-send-email-dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-08 16:52:27 +05:30
Denis V. Lunev
0aa6aefc9c migration (ordinary): move bdrv_invalidate_cache_all of of coroutine context
There is a possibility to hit an assert in qcow2_get_specific_info that
s->qcow_version is undefined. This happens when VM in starting from
suspended state, i.e. it processes incoming migration, and in the same
time 'info block' is called.

The problem is that qcow2_invalidate_cache() closes the image and
memset()s BDRVQcowState in the middle.

The patch moves processing of bdrv_invalidate_cache_all out of
coroutine context for standard migration to avoid that.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Amit Shah <amit.shah@redhat.com>
Message-Id: <1456304019-10507-2-git-send-email-den@openvz.org>

[Amit: Fix a use-after-free bug]

Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-02-26 20:39:50 +05:30
Dr. David Alan Gilbert
b82fc321bf Postcopy+spice: Pass spice migration data earlier
Spice hooks the migration status changes to figure out when to
transmit information to the new spice server; but the migration
status in postcopy doesn't quite fit - the destination starts
running before the end of the source migration.

It's not a case of hanging off the migration status change to
postcopy-active either, since that happens before we stop the
guest CPU.

Fix it by sending a notify just after sending the device state,
and adding a flag that can be tested by the notify receiver.

Symptom:
   spice handover doesn't work with the error:
   red_worker.c:11540:display_channel_wait_for_migrate_data: timeout

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Message-id: 1456161452-25318-1-git-send-email-dgilbert@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-02-23 12:05:02 +01:00