input-linux: initialize key state

Query input device keys, initialize state accordingly, so the correct state is reflected in case any key is pressed at initialization time. There is a high chance for this to actually happen for the 'enter' key in case you start qemu with a terminal command (directly or virsh). When finding any pressed keys the input grab is delayed until all keys are lifted, to avoid confusing guest and host with appearently stuck keys. Reported-by: Muted Bytes <mutedbytes@gmail.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Message-id: 1476277384-30365-1-git-send-email-kraxel@redhat.com
ui: rename vnc_init_state to vnc_start_protocol
2016-10-13 09:25:24 +02:00 · 2016-10-13 09:22:31 +02:00 · 2016-10-13 09:22:30 +02:00 · 2016-10-13 09:22:30 +02:00 · 2016-10-13 09:22:30 +02:00 · 2016-10-13 09:22:20 +02:00
954 changed files with 40157 additions and 20440 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -39,9 +39,7 @@
 /qmp-introspect.[ch]
 /qmp-marshal.c
 /qemu-doc.html
-/qemu-tech.html
 /qemu-doc.info
-/qemu-tech.info
 /qemu-img
 /qemu-nbd
 /qemu-options.def
@@ -53,7 +51,9 @@
 /qemu-bridge-helper
 /qemu-monitor.texi
 /qemu-monitor-info.texi
-/qmp-commands.txt
+/qemu-version.h
+/qemu-version.h.tmp
+/module_block.h
 /vscclient
 /fsdev/virtfs-proxy-helper
 *.[1-9]
--- a/.travis.yml
+++ b/.travis.yml
@@ -9,6 +9,7 @@ cache: ccache
 addons:
  apt:
    packages:
+      # Build dependencies
      - libaio-dev
      - libattr1-dev
      - libbrlapi-dev
@@ -89,6 +90,7 @@ matrix:
    - env: CONFIG=""
      os: osx
      compiler: clang
+    # Plain Trusty Build
    - env: CONFIG=""
      sudo: required
      addons:
@@ -99,3 +101,46 @@ matrix:
        - sudo apt-get build-dep -qq qemu
        - wget -O - http://people.linaro.org/~alex.bennee/qemu-submodule-git-seed.tar.xz | tar -xvJ
        - git submodule update --init --recursive
+    # Using newer GCC with sanitizers
+    - addons:
+        apt:
+          sources:
+            # PPAs for newer toolchains
+            - ubuntu-toolchain-r-test
+          packages:
+            # Extra toolchains
+            - gcc-5
+            - g++-5
+            # Build dependencies
+            - libaio-dev
+            - libattr1-dev
+            - libbrlapi-dev
+            - libcap-ng-dev
+            - libgnutls-dev
+            - libgtk-3-dev
+            - libiscsi-dev
+            - liblttng-ust-dev
+            - libnfs-dev
+            - libncurses5-dev
+            - libnss3-dev
+            - libpixman-1-dev
+            - libpng12-dev
+            - librados-dev
+            - libsdl1.2-dev
+            - libseccomp-dev
+            - libspice-protocol-dev
+            - libspice-server-dev
+            - libssh2-1-dev
+            - liburcu-dev
+            - libusb-1.0-0-dev
+            - libvte-2.90-dev
+            - sparse
+            - uuid-dev
+      language: generic
+      compiler: none
+      env:
+        - COMPILER_NAME=gcc CXX=g++-5 CC=gcc-5
+        - CONFIG="--cc=gcc-5 --cxx=g++-5 --disable-pie --disable-linux-user --with-coroutine=gthread"
+        - TEST_CMD=""
+      before_script:
+        - ./configure ${CONFIG} --extra-cflags="-g3 -O0 -fsanitize=thread -fuse-ld=gold" || cat config.log
--- a/10
+++ b/10
@@ -9,7 +9,7 @@ patches before submitting.
 Of course, the most important aspect in any coding style is whitespace.
 Crusty old coders who have trouble spotting the glasses on their noses
 can tell the difference between a tab and eight spaces from a distance
-of approximately fifteen parsecs.  Many a flamewar have been fought and
+of approximately fifteen parsecs.  Many a flamewar has been fought and
 lost on this issue.

 QEMU indents are four spaces.  Tabs are never used, except in Makefiles
@@ -31,7 +31,11 @@ Do not leave whitespace dangling off the ends of lines.

 2. Line width

-Lines are 80 characters; not longer.
+Lines should be 80 characters; try not to make them longer.
+
+Sometimes it is hard to do, especially when dealing with QEMU subsystems
+that use long function or symbol names.  Even in that case, do not make
+lines much longer than 80 characters.

 Rationale:
 - Some people like to tile their 24" screens with a 6x4 matrix of 80x24
@@ -39,6 +43,8 @@ Rationale:
   let them keep doing it.
 - Code and especially patches is much more readable if limited to a sane
   line length.  Eighty is traditional.
+ - The four-space indentation makes the most common excuse ("But look
+   at all that white space on the left!") moot.
 - It is the QEMU coding style.

 3. Naming
--- a/4
+++ b/4
@@ -158,6 +158,10 @@ painful. These are:
 * you may assume that right shift of a signed integer duplicates
   the sign bit (ie it is an arithmetic shift, not a logical shift)

+In addition, QEMU assumes that the compiler does not use the latitude
+given in C99 and C11 to treat aspects of signed '<<' as undefined, as
+documented in the GNU Compiler Collection manual starting at version 4.0.
+
 7. Error handling and reporting

 7.1 Reporting errors to the human user
--- a/188
+++ b/188
@@ -83,6 +83,7 @@ F: include/exec/cpu*.h
 F: include/exec/exec-all.h
 F: include/exec/helper*.h
 F: include/exec/tb-hash.h
+F: include/sysemu/cpus.h

 FPU emulation
 M: Aurelien Jarno <aurelien@aurel32.net>
@@ -115,6 +116,7 @@ M: Edgar E. Iglesias <edgar.iglesias@gmail.com>
 S: Maintained
 F: target-cris/
 F: hw/cris/
+F: include/hw/cris/
 F: tests/tcg/cris/
 F: disas/cris.c

@@ -144,10 +146,17 @@ F: disas/microblaze.c

 MIPS
 M: Aurelien Jarno <aurelien@aurel32.net>
-M: Leon Alrae <leon.alrae@imgtec.com>
+M: Yongbok Kim <yongbok.kim@imgtec.com>
 S: Maintained
 F: target-mips/
 F: hw/mips/
+F: hw/misc/mips_*
+F: hw/intc/mips_gic.c
+F: hw/timer/mips_gictimer.c
+F: include/hw/mips/
+F: include/hw/misc/mips_*
+F: include/hw/intc/mips_gic.h
+F: include/hw/timer/mips_gictimer.h
 F: tests/tcg/mips/
 F: disas/mips.c

@@ -156,6 +165,8 @@ M: Anthony Green <green@moxielogic.com>
 S: Maintained
 F: target-moxie/
 F: disas/moxie.c
+F: hw/moxie/
+F: default-configs/moxie-softmmu.mak

 OpenRISC
 M: Jia Liu <proljc@gmail.com>
@@ -171,6 +182,7 @@ L: qemu-ppc@nongnu.org
 S: Maintained
 F: target-ppc/
 F: hw/ppc/
+F: include/hw/ppc/
 F: disas/ppc.c

 S390
@@ -187,6 +199,7 @@ S: Odd Fixes
 F: target-sh4/
 F: hw/sh4/
 F: disas/sh4.c
+F: include/hw/sh4/

 SPARC
 M: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
@@ -202,6 +215,7 @@ M: Guan Xuetao <gxt@mprc.pku.edu.cn>
 S: Maintained
 F: target-unicore32/
 F: hw/unicore32/
+F: include/hw/unicore32/

 X86
 M: Paolo Bonzini <pbonzini@redhat.com>
@@ -225,6 +239,7 @@ M: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
 S: Maintained
 F: target-tricore/
 F: hw/tricore/
+F: include/hw/tricore/

 Guest CPU Cores (KVM):
 ----------------------
@@ -314,6 +329,9 @@ L: qemu-devel@nongnu.org
 M: Stefan Weil <sw@weilnetz.de>
 S: Maintained
 F: *win32*
+F: */*win32*
+F: include/*/*win32*
+X: qga/*win32*
 F: qemu.nsi

 ARM Machines
@@ -449,23 +467,22 @@ S: Maintained
 F: hw/*/versatile*

 Xilinx Zynq
+M: Edgar E. Iglesias <edgar.iglesias@gmail.com>
 M: Alistair Francis <alistair.francis@xilinx.com>
-M: Peter Crosthwaite <crosthwaite.peter@gmail.com>
 L: qemu-arm@nongnu.org
 S: Maintained
-F: hw/arm/xilinx_zynq.c
-F: hw/misc/zynq_slcr.c
+F: hw/*/xilinx_*
 F: hw/*/cadence_*
-F: hw/ssi/xilinx_spips.c
+F: hw/misc/zynq_slcr.c
+X: hw/ssi/xilinx_*

 Xilinx ZynqMP
 M: Alistair Francis <alistair.francis@xilinx.com>
-M: Peter Crosthwaite <crosthwaite.peter@gmail.com>
+M: Edgar E. Iglesias <edgar.iglesias@gmail.com>
 L: qemu-arm@nongnu.org
 S: Maintained
-F: hw/arm/xlnx-zynqmp.c
-F: hw/arm/xlnx-ep108.c
-F: include/hw/arm/xlnx-zynqmp.h
+F: hw/*/xlnx*.c
+F: include/hw/*/xlnx*.h

 ARM ACPI Subsystem
 M: Shannon Zhao <zhaoshenglong@huawei.com>
@@ -475,6 +492,21 @@ S: Maintained
 F: hw/arm/virt-acpi-build.c
 F: include/hw/arm/virt-acpi-build.h

+STM32F205
+M: Alistair Francis <alistair@alistair23.me>
+S: Maintained
+F: hw/arm/stm32f205_soc.c
+F: hw/misc/stm32f2xx_syscfg.c
+F: hw/char/stm32f2xx_usart.c
+F: hw/timer/stm32f2xx_timer.c
+F: hw/adc/*
+F: hw/ssi/stm32f2xx_spi.c
+
+Netduino 2
+M: Alistair Francis <alistair@alistair23.me>
+S: Maintained
+F: hw/arm/netduino2.c
+
 CRIS Machines
 -------------
 Axis Dev88
@@ -571,6 +603,9 @@ L: qemu-ppc@nongnu.org
 S: Supported
 F: hw/ppc/e500.[hc]
 F: hw/ppc/e500plat.c
+F: include/hw/ppc/ppc_e500.h
+F: include/hw/pci-host/ppce500.h
+F: pc-bios/u-boot.e500

 mpc8544ds
 M: Alexander Graf <agraf@suse.de>
@@ -588,6 +623,8 @@ F: hw/ppc/mac_newworld.c
 F: hw/pci-host/uninorth.c
 F: hw/pci-bridge/dec.[hc]
 F: hw/misc/macio/
+F: include/hw/ppc/mac_dbdma.h
+F: hw/nvram/mac_nvram.c

 Old World
 M: Alexander Graf <agraf@suse.de>
@@ -596,6 +633,7 @@ S: Maintained
 F: hw/ppc/mac_oldworld.c
 F: hw/pci-host/grackle.c
 F: hw/misc/macio/
+F: hw/intc/heathrow_pic.c

 PReP
 L: qemu-devel@nongnu.org
@@ -604,6 +642,7 @@ S: Odd Fixes
 F: hw/ppc/prep.c
 F: hw/pci-host/prep.[hc]
 F: hw/isa/pc87312.[hc]
+F: pc-bios/ppc_rom.bin

 sPAPR
 M: David Gibson <david@gibson.dropbear.id.au>
@@ -615,6 +654,14 @@ F: include/hw/*/spapr*
 F: hw/*/xics*
 F: include/hw/*/xics*
 F: pc-bios/spapr-rtas/*
+F: pc-bios/spapr-rtas.bin
+F: pc-bios/slof.bin
+F: docs/specs/ppc-spapr-hcalls.txt
+F: docs/specs/ppc-spapr-hotplug.txt
+F: tests/spapr*
+F: tests/libqos/*spapr*
+F: tests/rtas*
+F: tests/libqos/rtas*

 virtex_ml507
 M: Edgar E. Iglesias <edgar.iglesias@gmail.com>
@@ -628,31 +675,38 @@ R2D
 M: Magnus Damm <magnus.damm@gmail.com>
 S: Maintained
 F: hw/sh4/r2d.c
+F: hw/intc/sh_intc.c
+F: hw/timer/sh_timer.c

 Shix
 M: Magnus Damm <magnus.damm@gmail.com>
-S: Orphan
+S: Odd Fixes
 F: hw/sh4/shix.c

 SPARC Machines
 --------------
 Sun4m
-M: Blue Swirl <blauwirbel@gmail.com>
 M: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
 S: Maintained
 F: hw/sparc/sun4m.c
+F: hw/dma/sparc32_dma.c
+F: hw/dma/sun4m_iommu.c
+F: include/hw/sparc/sparc32_dma.h
+F: include/hw/sparc/sun4m.h
+F: pc-bios/openbios-sparc32

 Sun4u
-M: Blue Swirl <blauwirbel@gmail.com>
 M: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
 S: Maintained
 F: hw/sparc64/sun4u.c
+F: pc-bios/openbios-sparc64

 Leon3
 M: Fabien Chouteau <chouteau@adacore.com>
 S: Maintained
 F: hw/sparc/leon3.c
 F: hw/*/grlib*
+F: include/hw/sparc/grlib.h

 S390 Machines
 -------------
@@ -666,6 +720,9 @@ F: hw/s390x/
 F: include/hw/s390x/
 F: pc-bios/s390-ccw/
 F: hw/watchdog/wdt_diag288.c
+F: include/hw/watchdog/wdt_diag288.h
+F: pc-bios/s390-ccw.img
+F: default-configs/s390x-softmmu.mak
 T: git git://github.com/cohuck/qemu.git s390-next
 T: git git://github.com/borntraeger/qemu.git s390-next

@@ -695,7 +752,7 @@ F: hw/i2c/smbus_ich9.c
 F: hw/acpi/piix4.c
 F: hw/acpi/ich9.c
 F: include/hw/acpi/ich9.h
-F: include/hw/acpi/piix.h
+F: include/hw/acpi/piix4.h
 F: hw/misc/sga.c

 PC Chipset
@@ -715,6 +772,10 @@ F: hw/misc/pc-testdev.c
 F: hw/timer/hpet*
 F: hw/timer/i8254*
 F: hw/timer/mc146818rtc*
+F: include/hw/i2c/pm_smbus.h
+F: include/hw/timer/hpet.h
+F: include/hw/timer/i8254*
+F: include/hw/timer/mc146818rtc*

 Machine core
 M: Eduardo Habkost <ehabkost@redhat.com>
@@ -748,6 +809,7 @@ M: John Snow <jsnow@redhat.com>
 L: qemu-block@nongnu.org
 S: Supported
 F: include/hw/ide.h
+F: include/hw/ide/
 F: hw/ide/
 F: hw/block/block.c
 F: hw/block/cdrom.c
@@ -797,16 +859,15 @@ F: hw/mem/*
 F: hw/acpi/*
 F: hw/smbios/*
 F: hw/i386/acpi-build.[hc]
-F: hw/i386/*dsl
 F: hw/arm/virt-acpi-build.c
 F: include/hw/arm/virt-acpi-build.h
-F: scripts/acpi*py

 ppc4xx
 M: Alexander Graf <agraf@suse.de>
 L: qemu-ppc@nongnu.org
 S: Odd Fixes
 F: hw/ppc/ppc4*.c
+F: include/hw/ppc/ppc4xx.h

 ppce500
 M: Alexander Graf <agraf@suse.de>
@@ -826,13 +887,15 @@ Network devices
 M: Jason Wang <jasowang@redhat.com>
 S: Odd Fixes
 F: hw/net/
+F: tests/virtio-net-test.c
 T: git git://github.com/jasowang/qemu.git net

 SCSI
 M: Paolo Bonzini <pbonzini@redhat.com>
 S: Supported
-F: include/hw/scsi*
+F: include/hw/scsi/*
 F: hw/scsi/*
+F: tests/virtio-scsi-test.c
 T: git git://github.com/bonzini/qemu.git scsi-next

 LSI53C895A
@@ -883,8 +946,11 @@ virtio
 M: Michael S. Tsirkin <mst@redhat.com>
 S: Supported
 F: hw/*/virtio*
+F: hw/virtio/Makefile.objs
+F: hw/virtio/trace-events
 F: net/vhost-user.c
 F: include/hw/virtio/
+F: tests/virtio-balloon-test.c

 virtio-9p
 M: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
@@ -902,7 +968,7 @@ L: qemu-block@nongnu.org
 S: Supported
 F: hw/block/virtio-blk.c
 F: hw/block/dataplane/*
-F: hw/virtio/dataplane/*
+F: tests/virtio-blk-test.c
 T: git git://github.com/stefanha/qemu.git block

 virtio-ccw
@@ -925,6 +991,8 @@ S: Supported
 F: hw/char/virtio-serial-bus.c
 F: hw/char/virtio-console.c
 F: include/hw/virtio/virtio-serial.h
+F: tests/virtio-console-test.c
+F: tests/virtio-serial-test.c

 virtio-rng
 M: Amit Shah <amit.shah@redhat.com>
@@ -933,6 +1001,7 @@ F: hw/virtio/virtio-rng.c
 F: include/hw/virtio/virtio-rng.h
 F: include/sysemu/rng*.h
 F: backends/rng*.c
+F: tests/virtio-rng-test.c

 nvme
 M: Keith Busch <keith.busch@intel.com>
@@ -948,14 +1017,6 @@ S: Supported
 F: hw/scsi/megasas.c
 F: hw/scsi/mfi.h

-Xilinx EDK
-M: Edgar E. Iglesias <edgar.iglesias@gmail.com>
-M: Alistair Francis <alistair.francis@xilinx.com>
-M: Peter Crosthwaite <crosthwaite.peter@gmail.com>
-S: Maintained
-F: hw/*/xilinx_*
-F: include/hw/xilinx.h
-
 Network packet abstractions
 M: Dmitry Fleytman <dmitry@daynix.com>
 S: Maintained
@@ -974,6 +1035,8 @@ Rocker
 M: Jiri Pirko <jiri@resnulli.us>
 S: Maintained
 F: hw/net/rocker/
+F: tests/rocker/
+F: docs/specs/rocker.txt

 NVDIMM
 M: Xiao Guangrong <guangrong.xiao@linux.intel.com>
@@ -992,6 +1055,12 @@ M: Dmitry Fleytman <dmitry@daynix.com>
 S: Maintained
 F: hw/net/e1000e*

+Generic Loader
+M: Alistair Francis <alistair.francis@xilinx.com>
+S: Maintained
+F: hw/core/generic-loader.c
+F: include/hw/core/generic-loader.h
+
 Subsystems
 ----------
 Audio
@@ -999,6 +1068,7 @@ M: Gerd Hoffmann <kraxel@redhat.com>
 S: Maintained
 F: audio/
 F: hw/audio/
+F: include/hw/audio/
 F: tests/ac97-test.c
 F: tests/es1370-test.c
 F: tests/intel-hda-test.c
@@ -1072,12 +1142,6 @@ S: Supported
 F: qom/cpu.c
 F: include/qom/cpu.h

-ICC Bus
-M: Igor Mammedov <imammedo@redhat.com>
-S: Supported
-F: include/hw/cpu/icc_bus.h
-F: hw/cpu/icc_bus.c
-
 Device Tree
 M: Peter Crosthwaite <crosthwaite.peter@gmail.com>
 M: Alexander Graf <agraf@suse.de>
@@ -1139,12 +1203,12 @@ F: qemu-timer.c
 F: vl.c

 Human Monitor (HMP)
-M: Luiz Capitulino <lcapitulino@redhat.com>
+M: Dr. David Alan Gilbert <dgilbert@redhat.com>
 S: Maintained
 F: monitor.c
-F: hmp.c
-F: hmp-commands.hx
-T: git git://repo.or.cz/qemu/qmp-unstable.git queue/qmp
+F: hmp.[ch]
+F: hmp-commands*.hx
+F: include/monitor/hmp-target.h

 Network device backends
 M: Jason Wang <jasowang@redhat.com>
@@ -1177,6 +1241,13 @@ F: numa.c
 F: include/sysemu/numa.h
 T: git git://github.com/ehabkost/qemu.git numa

+Host Memory Backends
+M: Eduardo Habkost <ehabkost@redhat.com>
+M: Igor Mammedov <imammedo@redhat.com>
+S: Maintained
+F: backends/hostmem*.c
+F: include/sysemu/hostmem.h
+
 QAPI
 M: Markus Armbruster <armbru@redhat.com>
 M: Michael Roth <mdroth@linux.vnet.ibm.com>
@@ -1202,8 +1273,8 @@ F: qapi/*.json
 T: git git://repo.or.cz/qemu/armbru.git qapi-next

 QObject
-M: Luiz Capitulino <lcapitulino@redhat.com>
-S: Maintained
+M: Markus Armbruster <armbru@redhat.com>
+S: Supported
 F: qobject/
 F: include/qapi/qmp/
 X: include/qapi/qmp/dispatch.h
@@ -1213,7 +1284,7 @@ F: tests/check-qint.c
 F: tests/check-qjson.c
 F: tests/check-qlist.c
 F: tests/check-qstring.c
-T: git git://repo.or.cz/qemu/qmp-unstable.git queue/qmp
+T: git git://repo.or.cz/qemu/armbru.git qapi-next

 QEMU Guest Agent
 M: Michael Roth <mdroth@linux.vnet.ibm.com>
@@ -1238,11 +1309,16 @@ M: Markus Armbruster <armbru@redhat.com>
 S: Supported
 F: qmp.c
 F: monitor.c
-F: qmp-commands.hx
 F: docs/*qmp-*
 F: scripts/qmp/
 T: git git://repo.or.cz/qemu/armbru.git qapi-next

+Register API
+M: Alistair Francis <alistair.francis@xilinx.com>
+S: Maintained
+F: hw/core/register.c
+F: include/hw/register.h
+
 SLIRP
 M: Samuel Thibault <samuel.thibault@ens-lyon.org>
 M: Jan Kiszka <jan.kiszka@siemens.com>
@@ -1252,6 +1328,11 @@ F: net/slirp.c
 F: include/net/slirp.h
 T: git git://git.kiszka.org/qemu.git queues/slirp

+Stubs
+M: Paolo Bonzini <pbonzini@redhat.com>
+S: Maintained
+F: stubs/
+
 Tracing
 M: Stefan Hajnoczi <stefanha@redhat.com>
 S: Maintained
@@ -1325,6 +1406,22 @@ F: include/qemu/throttle.h
 F: util/throttle.c
 L: qemu-block@nongnu.org

+UUID
+M: Fam Zheng <famz@redhat.com>
+S: Supported
+F: util/uuid.c
+F: include/qemu/uuid.h
+F: tests/test-uuid.c
+
+COLO Proxy
+M: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+M: Li Zhijian <lizhijian@cn.fujitsu.com>
+S: Supported
+F: docs/colo-proxy.txt
+F: net/colo*
+F: net/filter-rewriter.c
+F: net/filter-mirror.c
+
 Usermode Emulation
 ------------------
 Overall
@@ -1336,11 +1433,13 @@ F: user-exec.c
 BSD user
 S: Orphan
 F: bsd-user/
+F: default-configs/*-bsd-user.mak

 Linux user
 M: Riku Voipio <riku.voipio@iki.fi>
 S: Maintained
 F: linux-user/
+F: default-configs/*-linux-user.mak

 Tiny Code Generator (TCG)
 -------------------------
@@ -1575,7 +1674,7 @@ M: Kevin Wolf <kwolf@redhat.com>
 L: qemu-block@nongnu.org
 S: Supported
 F: block/linux-aio.c
-F: block/raw-aio.h
+F: include/block/raw-aio.h
 F: block/raw-posix.c
 F: block/raw-win32.c
 F: block/raw_bsd.c
@@ -1619,6 +1718,15 @@ L: qemu-block@nongnu.org
 S: Supported
 F: tests/image-fuzzer/

+Replication
+M: Wen Congyang <wency@cn.fujitsu.com>
+M: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
+S: Supported
+F: replication*
+F: block/replication.c
+F: tests/test-replication.c
+F: docs/block-replication.txt
+
 Build and test automation
 -------------------------
 M: Alex Bennée <alex.bennee@linaro.org>
--- a/138
+++ b/138
@@ -56,9 +56,6 @@ GENERATED_SOURCES += qmp-marshal.c qapi-types.c qapi-visit.c qapi-event.c
 GENERATED_HEADERS += qmp-introspect.h
 GENERATED_SOURCES += qmp-introspect.c

-GENERATED_HEADERS += trace/generated-events.h
-GENERATED_SOURCES += trace/generated-events.c
-
 GENERATED_HEADERS += trace/generated-tracers.h
 ifeq ($(findstring dtrace,$(TRACE_BACKENDS)),dtrace)
 GENERATED_HEADERS += trace/generated-tracers-dtrace.h
@@ -76,6 +73,8 @@ GENERATED_HEADERS += trace/generated-ust-provider.h
 GENERATED_SOURCES += trace/generated-ust.c
 endif

+GENERATED_HEADERS += module_block.h
+
 # Don't try to regenerate Makefile or configure
 # We don't generate any of them
 Makefile: ;
@@ -91,8 +90,7 @@ LIBS+=-lz $(LIBS_TOOLS)
 HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF)

 ifdef BUILD_DOCS
-DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 qemu-ga.8
-DOCS+=qmp-commands.txt
+DOCS=qemu-doc.html qemu.1 qemu-img.1 qemu-nbd.8 qemu-ga.8
 ifdef CONFIG_VIRTFS
 DOCS+=fsdev/virtfs-proxy-helper.1
 endif
@@ -106,20 +104,20 @@ SUBDIR_DEVICES_MAK_DEP=$(patsubst %, %-config-devices.mak.d, $(TARGET_DIRS))

 ifeq ($(SUBDIR_DEVICES_MAK),)
 config-all-devices.mak:
-	$(call quiet-command,echo '# no devices' > $@,"  GEN   $@")
+	$(call quiet-command,echo '# no devices' > $@,"GEN","$@")
 else
 config-all-devices.mak: $(SUBDIR_DEVICES_MAK)
 	$(call quiet-command, sed -n \
             's|^\([^=]*\)=\(.*\)$$|\1:=$$(findstring y,$$(\1)\2)|p' \
             $(SUBDIR_DEVICES_MAK) | sort -u > $@, \
-             "  GEN   $@")
+             "GEN","$@")
 endif

 -include $(SUBDIR_DEVICES_MAK_DEP)

 %/config-devices.mak: default-configs/%.mak $(SRC_PATH)/scripts/make_device_config.sh
 	$(call quiet-command, \
-            $(SHELL) $(SRC_PATH)/scripts/make_device_config.sh $< $*-config-devices.mak.d $@ > $@.tmp, "  GEN   $@.tmp")
+            $(SHELL) $(SRC_PATH)/scripts/make_device_config.sh $< $*-config-devices.mak.d $@ > $@.tmp,"GEN","$@.tmp")
 	$(call quiet-command, if test -f $@; then \
 	  if cmp -s $@.old $@; then \
 	    mv $@.tmp $@; \
@@ -136,7 +134,7 @@ endif
 	 else \
 	  mv $@.tmp $@; \
 	  cp -p $@ $@.old; \
-	 fi, "  GEN   $@");
+	 fi,"GEN","$@");

 defconfig:
 	rm -f config-all-devices.mak $(SUBDIR_DEVICES_MAK)
@@ -190,7 +188,7 @@ qemu-version.h: FORCE
 config-host.h: config-host.h-timestamp
 config-host.h-timestamp: config-host.mak
 qemu-options.def: $(SRC_PATH)/qemu-options.hx $(SRC_PATH)/scripts/hxtool
-	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"  GEN   $@")
+	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"GEN","$@")

 SUBDIR_RULES=$(patsubst %,subdir-%, $(TARGET_DIRS))
 SOFTMMU_SUBDIR_RULES=$(filter %-softmmu,$(SUBDIR_RULES))
@@ -225,17 +223,18 @@ dtc/%:
 $(SUBDIR_RULES): libqemuutil.a libqemustub.a $(common-obj-y) $(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY))

 ROMSUBDIR_RULES=$(patsubst %,romsubdir-%, $(ROMS))
+# Only keep -O and -g cflags
 romsubdir-%:
-	$(call quiet-command,$(MAKE) $(SUBDIR_MAKEFLAGS) -C pc-bios/$* V="$(V)" TARGET_DIR="$*/",)
+	$(call quiet-command,$(MAKE) $(SUBDIR_MAKEFLAGS) -C pc-bios/$* V="$(V)" TARGET_DIR="$*/" CFLAGS="$(filter -O% -g%,$(CFLAGS))",)

 ALL_SUBDIRS=$(TARGET_DIRS) $(patsubst %,pc-bios/%, $(ROMS))

 recurse-all: $(SUBDIR_RULES) $(ROMSUBDIR_RULES)

 $(BUILD_DIR)/version.o: $(SRC_PATH)/version.rc config-host.h | $(BUILD_DIR)/version.lo
-	$(call quiet-command,$(WINDRES) -I$(BUILD_DIR) -o $@ $<,"  RC    version.o")
+	$(call quiet-command,$(WINDRES) -I$(BUILD_DIR) -o $@ $<,"RC","version.o")
 $(BUILD_DIR)/version.lo: $(SRC_PATH)/version.rc config-host.h
-	$(call quiet-command,$(WINDRES) -I$(BUILD_DIR) -o $@ $<,"  RC    version.lo")
+	$(call quiet-command,$(WINDRES) -I$(BUILD_DIR) -o $@ $<,"RC","version.lo")

 Makefile: $(version-obj-y) $(version-lobj-y)

@@ -245,9 +244,6 @@ Makefile: $(version-obj-y) $(version-lobj-y)
 libqemustub.a: $(stub-obj-y)
 libqemuutil.a: $(util-obj-y)

-block-modules = $(foreach o,$(block-obj-m),"$(basename $(subst /,-,$o))",) NULL
-util/module.o-cflags = -D'CONFIG_BLOCK_MODULES=$(block-modules)'
-
 ######################################################################

 qemu-img.o: qemu-img-cmds.h
@@ -262,7 +258,7 @@ fsdev/virtfs-proxy-helper$(EXESUF): fsdev/virtfs-proxy-helper.o fsdev/9p-marshal
 fsdev/virtfs-proxy-helper$(EXESUF): LIBS += -lcap

 qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx $(SRC_PATH)/scripts/hxtool
-	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"  GEN   $@")
+	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"GEN","$@")

 qemu-ga$(EXESUF): LIBS = $(LIBS_QGA)
 qemu-ga$(EXESUF): QEMU_CFLAGS += -I qga/qapi-generated
@@ -275,17 +271,17 @@ qga/qapi-generated/qga-qapi-types.c qga/qapi-generated/qga-qapi-types.h :\
 $(SRC_PATH)/qga/qapi-schema.json $(SRC_PATH)/scripts/qapi-types.py $(qapi-py)
 	$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-types.py \
 		$(gen-out-type) -o qga/qapi-generated -p "qga-" $<, \
-		"  GEN   $@")
+		"GEN","$@")
 qga/qapi-generated/qga-qapi-visit.c qga/qapi-generated/qga-qapi-visit.h :\
 $(SRC_PATH)/qga/qapi-schema.json $(SRC_PATH)/scripts/qapi-visit.py $(qapi-py)
 	$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-visit.py \
 		$(gen-out-type) -o qga/qapi-generated -p "qga-" $<, \
-		"  GEN   $@")
+		"GEN","$@")
 qga/qapi-generated/qga-qmp-commands.h qga/qapi-generated/qga-qmp-marshal.c :\
 $(SRC_PATH)/qga/qapi-schema.json $(SRC_PATH)/scripts/qapi-commands.py $(qapi-py)
 	$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-commands.py \
 		$(gen-out-type) -o qga/qapi-generated -p "qga-" $<, \
-		"  GEN   $@")
+		"GEN","$@")

 qapi-modules = $(SRC_PATH)/qapi-schema.json $(SRC_PATH)/qapi/common.json \
               $(SRC_PATH)/qapi/block.json $(SRC_PATH)/qapi/block-core.json \
@@ -297,27 +293,27 @@ qapi-types.c qapi-types.h :\
 $(qapi-modules) $(SRC_PATH)/scripts/qapi-types.py $(qapi-py)
 	$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-types.py \
 		$(gen-out-type) -o "." -b $<, \
-		"  GEN   $@")
+		"GEN","$@")
 qapi-visit.c qapi-visit.h :\
 $(qapi-modules) $(SRC_PATH)/scripts/qapi-visit.py $(qapi-py)
 	$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-visit.py \
 		$(gen-out-type) -o "." -b $<, \
-		"  GEN   $@")
+		"GEN","$@")
 qapi-event.c qapi-event.h :\
 $(qapi-modules) $(SRC_PATH)/scripts/qapi-event.py $(qapi-py)
 	$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-event.py \
 		$(gen-out-type) -o "." $<, \
-		"  GEN   $@")
+		"GEN","$@")
 qmp-commands.h qmp-marshal.c :\
 $(qapi-modules) $(SRC_PATH)/scripts/qapi-commands.py $(qapi-py)
 	$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-commands.py \
-		$(gen-out-type) -o "." -m $<, \
-		"  GEN   $@")
+		$(gen-out-type) -o "." $<, \
+		"GEN","$@")
 qmp-introspect.h qmp-introspect.c :\
 $(qapi-modules) $(SRC_PATH)/scripts/qapi-introspect.py $(qapi-py)
 	$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-introspect.py \
 		$(gen-out-type) -o "." $<, \
-		"  GEN   $@")
+		"GEN","$@")

 QGALIB_GEN=$(addprefix qga/qapi-generated/, qga-qapi-types.h qga-qapi-visit.h qga-qmp-commands.h)
 $(qga-obj-y) qemu-ga.o: $(QGALIB_GEN)
@@ -336,7 +332,7 @@ $(QEMU_GA_MSI): config-host.mak

 $(QEMU_GA_MSI):  $(SRC_PATH)/qga/installer/qemu-ga.wxs
 	$(call quiet-command,QEMU_GA_VERSION="$(QEMU_GA_VERSION)" QEMU_GA_MANUFACTURER="$(QEMU_GA_MANUFACTURER)" QEMU_GA_DISTRO="$(QEMU_GA_DISTRO)" BUILD_DIR="$(BUILD_DIR)" \
-	wixl -o $@ $(QEMU_GA_MSI_ARCH) $(QEMU_GA_MSI_WITH_VSS) $(QEMU_GA_MSI_MINGW_DLL_PATH) $<, "  WIXL  $@")
+	wixl -o $@ $(QEMU_GA_MSI_ARCH) $(QEMU_GA_MSI_WITH_VSS) $(QEMU_GA_MSI_MINGW_DLL_PATH) $<,"WIXL","$@")
 else
 msi:
 	@echo "MSI build not configured or dependency resolution failed (reconfigure with --enable-guest-agent-msi option)"
@@ -352,6 +348,11 @@ ivshmem-client$(EXESUF): $(ivshmem-client-obj-y) libqemuutil.a libqemustub.a
 ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) libqemuutil.a libqemustub.a
 	$(call LINK, $^)

+module_block.h: $(SRC_PATH)/scripts/modules/module_block.py config-host.mak
+	$(call quiet-command,$(PYTHON) $< $@ \
+	$(addprefix $(SRC_PATH)/,$(patsubst %.mo,%.c,$(block-obj-m))), \
+	"GEN","$@")
+
 clean:
 # avoid old build problems by removing potentially incorrect old files
 	rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h gen-op-arm.h
@@ -394,7 +395,6 @@ distclean: clean
 	rm -f qemu-doc.vr
 	rm -f config.log
 	rm -f linux-headers/asm
-	rm -f qemu-tech.info qemu-tech.aux qemu-tech.cp qemu-tech.dvi qemu-tech.fn qemu-tech.info qemu-tech.ky qemu-tech.log qemu-tech.pdf qemu-tech.pg qemu-tech.toc qemu-tech.tp qemu-tech.vr
 	for d in $(TARGET_DIRS); do \
 	rm -rf $$d || exit 1 ; \
        done
@@ -430,8 +430,8 @@ endif

 install-doc: $(DOCS)
 	$(INSTALL_DIR) "$(DESTDIR)$(qemu_docdir)"
-	$(INSTALL_DATA) qemu-doc.html  qemu-tech.html "$(DESTDIR)$(qemu_docdir)"
-	$(INSTALL_DATA) qmp-commands.txt "$(DESTDIR)$(qemu_docdir)"
+	$(INSTALL_DATA) qemu-doc.html "$(DESTDIR)$(qemu_docdir)"
+	$(INSTALL_DATA) $(SRC_PATH)/docs/qmp-commands.txt "$(DESTDIR)$(qemu_docdir)"
 ifdef CONFIG_POSIX
 	$(INSTALL_DIR) "$(DESTDIR)$(mandir)/man1"
 	$(INSTALL_DATA) qemu.1 "$(DESTDIR)$(mandir)/man1"
@@ -517,13 +517,13 @@ ui/shader/%-vert.h: $(SRC_PATH)/ui/shader/%.vert $(SRC_PATH)/scripts/shaderinclu
 	@mkdir -p $(dir $@)
 	$(call quiet-command,\
 		perl $(SRC_PATH)/scripts/shaderinclude.pl $< > $@,\
-		"  VERT  $@")
+		"VERT","$@")

 ui/shader/%-frag.h: $(SRC_PATH)/ui/shader/%.frag $(SRC_PATH)/scripts/shaderinclude.pl
 	@mkdir -p $(dir $@)
 	$(call quiet-command,\
 		perl $(SRC_PATH)/scripts/shaderinclude.pl $< > $@,\
-		"  FRAG  $@")
+		"FRAG","$@")

 ui/console-gl.o: $(SRC_PATH)/ui/console-gl.c \
 	ui/shader/texture-blit-vert.h ui/shader/texture-blit-frag.h
@@ -533,68 +533,65 @@ MAKEINFO=makeinfo
 MAKEINFOFLAGS=--no-headers --no-split --number-sections
 TEXIFLAG=$(if $(V),,--quiet)
 %.dvi: %.texi
-	$(call quiet-command,texi2dvi $(TEXIFLAG) -I . $<,"  GEN   $@")
+	$(call quiet-command,texi2dvi $(TEXIFLAG) -I . $<,"GEN","$@")

 %.html: %.texi
 	$(call quiet-command,LC_ALL=C $(MAKEINFO) $(MAKEINFOFLAGS) --html $< -o $@, \
-	"  GEN   $@")
+	"GEN","$@")

 %.info: %.texi
-	$(call quiet-command,$(MAKEINFO) $< -o $@,"  GEN   $@")
+	$(call quiet-command,$(MAKEINFO) $< -o $@,"GEN","$@")

 %.pdf: %.texi
-	$(call quiet-command,texi2pdf $(TEXIFLAG) -I . $<,"  GEN   $@")
+	$(call quiet-command,texi2pdf $(TEXIFLAG) -I . $<,"GEN","$@")

 qemu-options.texi: $(SRC_PATH)/qemu-options.hx $(SRC_PATH)/scripts/hxtool
-	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"  GEN   $@")
+	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"GEN","$@")

 qemu-monitor.texi: $(SRC_PATH)/hmp-commands.hx $(SRC_PATH)/scripts/hxtool
-	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"  GEN   $@")
+	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"GEN","$@")

 qemu-monitor-info.texi: $(SRC_PATH)/hmp-commands-info.hx $(SRC_PATH)/scripts/hxtool
-	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"  GEN   $@")
-
-qmp-commands.txt: $(SRC_PATH)/qmp-commands.hx $(SRC_PATH)/scripts/hxtool
-	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -q < $< > $@,"  GEN   $@")
+	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"GEN","$@")

 qemu-img-cmds.texi: $(SRC_PATH)/qemu-img-cmds.hx $(SRC_PATH)/scripts/hxtool
-	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"  GEN   $@")
+	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"GEN","$@")

 qemu.1: qemu-doc.texi qemu-options.texi qemu-monitor.texi qemu-monitor-info.texi
 	$(call quiet-command, \
 	  perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< qemu.pod && \
 	  $(POD2MAN) --section=1 --center=" " --release=" " qemu.pod > $@, \
-	  "  GEN   $@")
+	  "GEN","$@")
 qemu.1: qemu-option-trace.texi

 qemu-img.1: qemu-img.texi qemu-option-trace.texi qemu-img-cmds.texi
 	$(call quiet-command, \
 	  perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< qemu-img.pod && \
 	  $(POD2MAN) --section=1 --center=" " --release=" " qemu-img.pod > $@, \
-	  "  GEN   $@")
+	  "GEN","$@")

 fsdev/virtfs-proxy-helper.1: fsdev/virtfs-proxy-helper.texi
 	$(call quiet-command, \
 	  perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< fsdev/virtfs-proxy-helper.pod && \
 	  $(POD2MAN) --section=1 --center=" " --release=" " fsdev/virtfs-proxy-helper.pod > $@, \
-	  "  GEN   $@")
+	  "GEN","$@")

 qemu-nbd.8: qemu-nbd.texi qemu-option-trace.texi
 	$(call quiet-command, \
 	  perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< qemu-nbd.pod && \
 	  $(POD2MAN) --section=8 --center=" " --release=" " qemu-nbd.pod > $@, \
-	  "  GEN   $@")
+	  "GEN","$@")

 qemu-ga.8: qemu-ga.texi
 	$(call quiet-command, \
 	  perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< qemu-ga.pod && \
 	  $(POD2MAN) --section=8 --center=" " --release=" " qemu-ga.pod > $@, \
-	  "  GEN   $@")
+	  "GEN","$@")

-dvi: qemu-doc.dvi qemu-tech.dvi
-html: qemu-doc.html qemu-tech.html
-info: qemu-doc.info qemu-tech.info
-pdf: qemu-doc.pdf qemu-tech.pdf
+dvi: qemu-doc.dvi
+html: qemu-doc.html
+info: qemu-doc.info
+pdf: qemu-doc.pdf

 qemu-doc.dvi qemu-doc.html qemu-doc.info qemu-doc.pdf: \
 	qemu-img.texi qemu-nbd.texi qemu-options.texi qemu-option-trace.texi \
@@ -668,3 +665,40 @@ endif
 -include $(wildcard *.d tests/*.d)

 include $(SRC_PATH)/tests/docker/Makefile.include
+
+.PHONY: help
+help:
+	@echo  'Generic targets:'
+	@echo  '  all             - Build all'
+	@echo  '  dir/file.o      - Build specified target only'
+	@echo  '  install         - Install QEMU, documentation and tools'
+	@echo  '  ctags/TAGS      - Generate tags file for editors'
+	@echo  '  cscope          - Generate cscope index'
+	@echo  ''
+	@$(if $(TARGET_DIRS), \
+		echo 'Architecture specific targets:'; \
+		$(foreach t, $(TARGET_DIRS), \
+		printf "  %-30s - Build for %s\\n" $(patsubst %,subdir-%,$(t)) $(t);) \
+		echo '')
+	@echo  'Cleaning targets:'
+	@echo  '  clean           - Remove most generated files but keep the config'
+	@echo  '  distclean       - Remove all generated files'
+	@echo  '  dist            - Build a distributable tarball'
+	@echo  ''
+	@echo  'Test targets:'
+	@echo  '  check           - Run all tests (check-help for details)'
+	@echo  '  docker          - Help about targets running tests inside Docker containers'
+	@echo  ''
+	@echo  'Documentation targets:'
+	@echo  '  dvi html info pdf'
+	@echo  '                  - Build documentation in specified format'
+	@echo  ''
+ifdef CONFIG_WIN32
+	@echo  'Windows targets:'
+	@echo  '  installer       - Build NSIS-based installer for qemu-ga'
+ifdef QEMU_GA_MSI_ENABLED
+	@echo  '  msi             - Build MSI-based installer for qemu-ga'
+endif
+	@echo  ''
+endif
+	@echo  '  make V=0|1 [targets] 0 => quiet build (default), 1 => verbose build'
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -15,6 +15,7 @@ block-obj-$(CONFIG_POSIX) += aio-posix.o
 block-obj-$(CONFIG_WIN32) += aio-win32.o
 block-obj-y += block/
 block-obj-y += qemu-io-cmds.o
+block-obj-$(CONFIG_REPLICATION) += replication.o

 block-obj-m = block/

@@ -88,7 +89,7 @@ endif

 #######################################################################
 # Target-independent parts used in system and user emulation
-common-obj-y += tcg-runtime.o
+common-obj-y += tcg-runtime.o cpus-common.o
 common-obj-y += hw/
 common-obj-y += qom/
 common-obj-y += disas/
@@ -141,6 +142,7 @@ trace-events-y += hw/dma/trace-events
 trace-events-y += hw/sparc/trace-events
 trace-events-y += hw/sd/trace-events
 trace-events-y += hw/isa/trace-events
+trace-events-y += hw/mem/trace-events
 trace-events-y += hw/i386/trace-events
 trace-events-y += hw/9pfs/trace-events
 trace-events-y += hw/ppc/trace-events
--- a/Makefile.target
+++ b/Makefile.target
@@ -26,7 +26,7 @@ ifneq (,$(findstring -mwindows,$(libs_softmmu)))
 # Terminate program name with a 'w' because the linker builds a windows executable.
 QEMU_PROGW=qemu-system-$(TARGET_NAME)w$(EXESUF)
 $(QEMU_PROG): $(QEMU_PROGW)
-	$(call quiet-command,$(OBJCOPY) --subsystem console $(QEMU_PROGW) $(QEMU_PROG),"  GEN   $(TARGET_DIR)$(QEMU_PROG)")
+	$(call quiet-command,$(OBJCOPY) --subsystem console $(QEMU_PROGW) $(QEMU_PROG),"GEN","$(TARGET_DIR)$(QEMU_PROG)")
 QEMU_PROG_BUILD = $(QEMU_PROGW)
 else
 QEMU_PROG_BUILD = $(QEMU_PROG)
@@ -55,7 +55,7 @@ $(QEMU_PROG).stp-installed: $(BUILD_DIR)/trace-events-all
 		--binary=$(bindir)/$(QEMU_PROG) \
 		--target-name=$(TARGET_NAME) \
 		--target-type=$(TARGET_TYPE) \
-		< $< > $@,"  GEN   $(TARGET_DIR)$(QEMU_PROG).stp-installed")
+		$< > $@,"GEN","$(TARGET_DIR)$(QEMU_PROG).stp-installed")

 $(QEMU_PROG).stp: $(BUILD_DIR)/trace-events-all
 	$(call quiet-command,$(TRACETOOL) \
@@ -64,14 +64,14 @@ $(QEMU_PROG).stp: $(BUILD_DIR)/trace-events-all
 		--binary=$(realpath .)/$(QEMU_PROG) \
 		--target-name=$(TARGET_NAME) \
 		--target-type=$(TARGET_TYPE) \
-		< $< > $@,"  GEN   $(TARGET_DIR)$(QEMU_PROG).stp")
+		$< > $@,"GEN","$(TARGET_DIR)$(QEMU_PROG).stp")

 $(QEMU_PROG)-simpletrace.stp: $(BUILD_DIR)/trace-events-all
 	$(call quiet-command,$(TRACETOOL) \
 		--format=simpletrace-stap \
 		--backends=$(TRACE_BACKENDS) \
 		--probe-prefix=qemu.$(TARGET_TYPE).$(TARGET_NAME) \
-		< $< > $@,"  GEN   $(TARGET_DIR)$(QEMU_PROG)-simpletrace.stp")
+		$< > $@,"GEN","$(TARGET_DIR)$(QEMU_PROG)-simpletrace.stp")

 else
 stap:
@@ -156,7 +156,7 @@ else
 obj-y += hw/$(TARGET_BASE_ARCH)/
 endif

-GENERATED_HEADERS += hmp-commands.h hmp-commands-info.h qmp-commands-old.h
+GENERATED_HEADERS += hmp-commands.h hmp-commands-info.h

 endif # CONFIG_SOFTMMU

@@ -196,26 +196,23 @@ $(QEMU_PROG_BUILD): config-devices.mak
 $(QEMU_PROG_BUILD): $(all-obj-y) ../libqemuutil.a ../libqemustub.a
 	$(call LINK, $(filter-out %.mak, $^))
 ifdef CONFIG_DARWIN
-	$(call quiet-command,Rez -append $(SRC_PATH)/pc-bios/qemu.rsrc -o $@,"  REZ   $(TARGET_DIR)$@")
-	$(call quiet-command,SetFile -a C $@,"  SETFILE $(TARGET_DIR)$@")
+	$(call quiet-command,Rez -append $(SRC_PATH)/pc-bios/qemu.rsrc -o $@,"REZ","$(TARGET_DIR)$@")
+	$(call quiet-command,SetFile -a C $@,"SETFILE","$(TARGET_DIR)$@")
 endif

 gdbstub-xml.c: $(TARGET_XML_FILES) $(SRC_PATH)/scripts/feature_to_c.sh
-	$(call quiet-command,rm -f $@ && $(SHELL) $(SRC_PATH)/scripts/feature_to_c.sh $@ $(TARGET_XML_FILES),"  GEN   $(TARGET_DIR)$@")
+	$(call quiet-command,rm -f $@ && $(SHELL) $(SRC_PATH)/scripts/feature_to_c.sh $@ $(TARGET_XML_FILES),"GEN","$(TARGET_DIR)$@")

 hmp-commands.h: $(SRC_PATH)/hmp-commands.hx $(SRC_PATH)/scripts/hxtool
-	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"  GEN   $(TARGET_DIR)$@")
+	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"GEN","$(TARGET_DIR)$@")

 hmp-commands-info.h: $(SRC_PATH)/hmp-commands-info.hx $(SRC_PATH)/scripts/hxtool
-	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"  GEN   $(TARGET_DIR)$@")
+	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"GEN","$(TARGET_DIR)$@")

-qmp-commands-old.h: $(SRC_PATH)/qmp-commands.hx $(SRC_PATH)/scripts/hxtool
-	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"  GEN   $(TARGET_DIR)$@")
-
-clean:
+clean: clean-target
 	rm -f *.a *~ $(PROGS)
 	rm -f $(shell find . -name '*.[od]')
-	rm -f hmp-commands.h qmp-commands-old.h gdbstub-xml.c
+	rm -f hmp-commands.h gdbstub-xml.c
 ifdef CONFIG_TRACE_SYSTEMTAP
 	rm -f *.stp
 endif
--- a/2
+++ b/2
@@ -42,8 +42,6 @@ of other UNIX targets. The simple steps to build QEMU are:
  ../configure
  make

-Complete details of the process for building and configuring QEMU for
-all supported host platforms can be found in the qemu-tech.html file.
 Additional information can also be found online via the QEMU website:

  http://qemu-project.org/Hosts/Linux
--- a/2
+++ b/2
@@ -1 +1 @@
-2.6.91
+2.7.50
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -431,11 +431,13 @@ bool aio_poll(AioContext *ctx, bool blocking)
    assert(npfd == 0);

    /* fill pollfds */
-    QLIST_FOREACH(node, &ctx->aio_handlers, node) {
-        if (!node->deleted && node->pfd.events
-            && !aio_epoll_enabled(ctx)
-            && aio_node_check(ctx, node->is_external)) {
-            add_pollfd(node);
+
+    if (!aio_epoll_enabled(ctx)) {
+        QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+            if (!node->deleted && node->pfd.events
+                && aio_node_check(ctx, node->is_external)) {
+                add_pollfd(node);
+            }
        }
    }

--- a/arch_init.c
+++ b/arch_init.c
@@ -235,25 +235,6 @@ void audio_init(void)
    }
 }

-int qemu_uuid_parse(const char *str, uint8_t *uuid)
-{
-    int ret;
-
-    if (strlen(str) != 36) {
-        return -1;
-    }
-
-    ret = sscanf(str, UUID_FMT, &uuid[0], &uuid[1], &uuid[2], &uuid[3],
-                 &uuid[4], &uuid[5], &uuid[6], &uuid[7], &uuid[8], &uuid[9],
-                 &uuid[10], &uuid[11], &uuid[12], &uuid[13], &uuid[14],
-                 &uuid[15]);
-
-    if (ret != 16) {
-        return -1;
-    }
-    return 0;
-}
-
 void do_acpitable_option(const QemuOpts *opts)
 {
 #ifdef TARGET_I386
--- a/async.c
+++ b/async.c
@@ -44,6 +44,25 @@ struct QEMUBH {
    bool deleted;
 };

+void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
+{
+    QEMUBH *bh;
+    bh = g_new(QEMUBH, 1);
+    *bh = (QEMUBH){
+        .ctx = ctx,
+        .cb = cb,
+        .opaque = opaque,
+    };
+    qemu_mutex_lock(&ctx->bh_lock);
+    bh->next = ctx->first_bh;
+    bh->scheduled = 1;
+    bh->deleted = 1;
+    /* Make sure that the members are ready before putting bh into list */
+    smp_wmb();
+    ctx->first_bh = bh;
+    qemu_mutex_unlock(&ctx->bh_lock);
+}
+
 QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
 {
    QEMUBH *bh;
@@ -86,7 +105,7 @@ int aio_bh_poll(AioContext *ctx)
         * thread sees the zero before bh->cb has run, and thus will call
         * aio_notify again if necessary.
         */
-        if (!bh->deleted && atomic_xchg(&bh->scheduled, 0)) {
+        if (atomic_xchg(&bh->scheduled, 0)) {
            /* Idle BHs and the notify BH don't count as progress */
            if (!bh->idle && bh != ctx->notify_dummy_bh) {
                ret = 1;
@@ -104,7 +123,7 @@ int aio_bh_poll(AioContext *ctx)
        bhp = &ctx->first_bh;
        while (*bhp) {
            bh = *bhp;
-            if (bh->deleted) {
+            if (bh->deleted && !bh->scheduled) {
                *bhp = bh->next;
                g_free(bh);
            } else {
@@ -168,7 +187,7 @@ aio_compute_timeout(AioContext *ctx)
    QEMUBH *bh;

    for (bh = ctx->first_bh; bh; bh = bh->next) {
-        if (!bh->deleted && bh->scheduled) {
+        if (bh->scheduled) {
            if (bh->idle) {
                /* idle bottom halves will be polled at least
                 * every 10ms */
@@ -216,7 +235,7 @@ aio_ctx_check(GSource *source)
    aio_notify_accept(ctx);

    for (bh = ctx->first_bh; bh; bh = bh->next) {
-        if (!bh->deleted && bh->scheduled) {
+        if (bh->scheduled) {
            return true;
        }
    }
--- a/audio/audio.c
+++ b/audio/audio.c
@@ -1739,13 +1739,21 @@ static void audio_vm_change_state_handler (void *opaque, int running,
    audio_reset_timer (s);
 }

-static void audio_atexit (void)
+static bool is_cleaning_up;
+
+bool audio_is_cleaning_up(void)
+{
+    return is_cleaning_up;
+}
+
+void audio_cleanup(void)
 {
    AudioState *s = &glob_audio_state;
-    HWVoiceOut *hwo = NULL;
-    HWVoiceIn *hwi = NULL;
+    HWVoiceOut *hwo, *hwon;
+    HWVoiceIn *hwi, *hwin;

-    while ((hwo = audio_pcm_hw_find_any_out (hwo))) {
+    is_cleaning_up = true;
+    QLIST_FOREACH_SAFE(hwo, &glob_audio_state.hw_head_out, entries, hwon) {
        SWVoiceCap *sc;

        if (hwo->enabled) {
@@ -1761,17 +1769,20 @@ static void audio_atexit (void)
                cb->ops.destroy (cb->opaque);
            }
        }
+        QLIST_REMOVE(hwo, entries);
    }

-    while ((hwi = audio_pcm_hw_find_any_in (hwi))) {
+    QLIST_FOREACH_SAFE(hwi, &glob_audio_state.hw_head_in, entries, hwin) {
        if (hwi->enabled) {
            hwi->pcm_ops->ctl_in (hwi, VOICE_DISABLE);
        }
        hwi->pcm_ops->fini_in (hwi);
+        QLIST_REMOVE(hwi, entries);
    }

    if (s->drv) {
        s->drv->fini (s->drv_opaque);
+        s->drv = NULL;
    }
 }

@@ -1799,7 +1810,7 @@ static void audio_init (void)
    QLIST_INIT (&s->hw_head_out);
    QLIST_INIT (&s->hw_head_in);
    QLIST_INIT (&s->cap_head);
-    atexit (audio_atexit);
+    atexit(audio_cleanup);

    s->ts = timer_new_ns(QEMU_CLOCK_VIRTUAL, audio_timer, s);

@@ -1966,8 +1977,7 @@ CaptureVoiceOut *AUD_add_capture (
        QLIST_INSERT_HEAD (&s->cap_head, cap, entries);
        QLIST_INSERT_HEAD (&cap->cb_head, cb, entries);

-        hw = NULL;
-        while ((hw = audio_pcm_hw_find_any_out (hw))) {
+        QLIST_FOREACH(hw, &glob_audio_state.hw_head_out, entries) {
            audio_attach_capture (hw);
        }
        return cap;
--- a/audio/audio.h
+++ b/audio/audio.h
@@ -163,4 +163,7 @@ static inline void *advance (void *p, int incr)
 int wav_start_capture (CaptureState *s, const char *path, int freq,
                       int bits, int nchannels);

+bool audio_is_cleaning_up(void);
+void audio_cleanup(void);
+
 #endif /* QEMU_AUDIO_H */
--- a/audio/coreaudio.c
+++ b/audio/coreaudio.c
@@ -36,8 +36,6 @@
 #define MAC_OS_X_VERSION_10_6 1060
 #endif

-static int isAtexit;
-
 typedef struct {
    int buffer_frames;
    int nbuffers;
@@ -378,11 +376,6 @@ static inline UInt32 isPlaying (AudioDeviceID outputDeviceID)
    return result;
 }

-static void coreaudio_atexit (void)
-{
-    isAtexit = 1;
-}
-
 static int coreaudio_lock (coreaudioVoiceOut *core, const char *fn_name)
 {
    int err;
@@ -630,7 +623,7 @@ static void coreaudio_fini_out (HWVoiceOut *hw)
    int err;
    coreaudioVoiceOut *core = (coreaudioVoiceOut *) hw;

-    if (!isAtexit) {
+    if (!audio_is_cleaning_up()) {
        /* stop playback */
        if (isPlaying(core->outputDeviceID)) {
            status = AudioDeviceStop(core->outputDeviceID, core->ioprocid);
@@ -673,7 +666,7 @@ static int coreaudio_ctl_out (HWVoiceOut *hw, int cmd, ...)

    case VOICE_DISABLE:
        /* stop playback */
-        if (!isAtexit) {
+        if (!audio_is_cleaning_up()) {
            if (isPlaying(core->outputDeviceID)) {
                status = AudioDeviceStop(core->outputDeviceID,
                                         core->ioprocid);
@@ -697,7 +690,6 @@ static void *coreaudio_audio_init (void)
    CoreaudioConf *conf = g_malloc(sizeof(CoreaudioConf));
    *conf = glob_conf;

-    atexit(coreaudio_atexit);
    return conf;
 }

--- a/audio/trace-events
+++ b/audio/trace-events
@@ -1,4 +1,4 @@
-# See docs/trace-events.txt for syntax documentation.
+# See docs/tracing.txt for syntax documentation.

 # audio/alsaaudio.c
 alsa_revents(int revents) "revents = %d"
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -203,6 +203,7 @@ static bool host_memory_backend_get_prealloc(Object *obj, Error **errp)
 static void host_memory_backend_set_prealloc(Object *obj, bool value,
                                             Error **errp)
 {
+    Error *local_err = NULL;
    HostMemoryBackend *backend = MEMORY_BACKEND(obj);

    if (backend->force_prealloc) {
@@ -223,7 +224,11 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
        void *ptr = memory_region_get_ram_ptr(&backend->mr);
        uint64_t sz = memory_region_size(&backend->mr);

-        os_mem_prealloc(fd, ptr, sz);
+        os_mem_prealloc(fd, ptr, sz, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
        backend->prealloc = true;
    }
 }
@@ -286,8 +291,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
    if (bc->alloc) {
        bc->alloc(backend, &local_err);
        if (local_err) {
-            error_propagate(errp, local_err);
-            return;
+            goto out;
        }

        ptr = memory_region_get_ram_ptr(&backend->mr);
@@ -343,9 +347,15 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
         * specified NUMA policy in place.
         */
        if (backend->prealloc) {
-            os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz);
+            os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz,
+                            &local_err);
+            if (local_err) {
+                goto out;
+            }
        }
    }
+out:
+    error_propagate(errp, local_err);
 }

 static bool
--- a/backends/msmouse.c
+++ b/backends/msmouse.c
@@ -139,7 +139,6 @@ static void msmouse_chr_close (struct CharDriverState *chr)

    qemu_input_handler_unregister(mouse->hs);
    g_free(mouse);
-    g_free(chr);
 }

 static QemuInputHandler msmouse_handler = {
@@ -159,6 +158,9 @@ static CharDriverState *qemu_chr_open_msmouse(const char *id,
    CharDriverState *chr;

    chr = qemu_chr_alloc(common, errp);
+    if (!chr) {
+        return NULL;
+    }
    chr->chr_write = msmouse_chr_write;
    chr->chr_close = msmouse_chr_close;
    chr->chr_accept_input = msmouse_chr_accept_input;
--- a/backends/rng-egd.c
+++ b/backends/rng-egd.c
@@ -41,7 +41,9 @@ static void rng_egd_request_entropy(RngBackend *b, RngRequest *req)
        header[0] = 0x02;
        header[1] = len;

-        qemu_chr_fe_write(s->chr, header, sizeof(header));
+        /* XXX this blocks entire thread. Rewrite to use
+         * qemu_chr_fe_write and background I/O callbacks */
+        qemu_chr_fe_write_all(s->chr, header, sizeof(header));

        size -= len;
    }
--- a/block.c
+++ b/block.c
@@ -25,7 +25,9 @@
 #include "trace.h"
 #include "block/block_int.h"
 #include "block/blockjob.h"
+#include "block/nbd.h"
 #include "qemu/error-report.h"
+#include "module_block.h"
 #include "qemu/module.h"
 #include "qapi/qmp/qerror.h"
 #include "qapi/qmp/qbool.h"
@@ -40,6 +42,7 @@
 #include "qapi-event.h"
 #include "qemu/cutils.h"
 #include "qemu/id.h"
+#include "qapi/util.h"

 #ifdef CONFIG_BSD
 #include <sys/ioctl.h>
@@ -241,17 +244,40 @@ BlockDriverState *bdrv_new(void)
    return bs;
 }

-BlockDriver *bdrv_find_format(const char *format_name)
+static BlockDriver *bdrv_do_find_format(const char *format_name)
 {
    BlockDriver *drv1;
+
    QLIST_FOREACH(drv1, &bdrv_drivers, list) {
        if (!strcmp(drv1->format_name, format_name)) {
            return drv1;
        }
    }
+
    return NULL;
 }

+BlockDriver *bdrv_find_format(const char *format_name)
+{
+    BlockDriver *drv1;
+    int i;
+
+    drv1 = bdrv_do_find_format(format_name);
+    if (drv1) {
+        return drv1;
+    }
+
+    /* The driver isn't registered, maybe we need to load a module */
+    for (i = 0; i < (int)ARRAY_SIZE(block_driver_modules); ++i) {
+        if (!strcmp(block_driver_modules[i].format_name, format_name)) {
+            block_module_load_one(block_driver_modules[i].library_name);
+            break;
+        }
+    }
+
+    return bdrv_do_find_format(format_name);
+}
+
 static int bdrv_is_whitelisted(BlockDriver *drv, bool read_only)
 {
    static const char *whitelist_rw[] = {
@@ -460,6 +486,19 @@ static BlockDriver *find_hdev_driver(const char *filename)
    return drv;
 }

+static BlockDriver *bdrv_do_find_protocol(const char *protocol)
+{
+    BlockDriver *drv1;
+
+    QLIST_FOREACH(drv1, &bdrv_drivers, list) {
+        if (drv1->protocol_name && !strcmp(drv1->protocol_name, protocol)) {
+            return drv1;
+        }
+    }
+
+    return NULL;
+}
+
 BlockDriver *bdrv_find_protocol(const char *filename,
                                bool allow_protocol_prefix,
                                Error **errp)
@@ -468,6 +507,7 @@ BlockDriver *bdrv_find_protocol(const char *filename,
    char protocol[128];
    int len;
    const char *p;
+    int i;

    /* TODO Drivers without bdrv_file_open must be specified explicitly */

@@ -494,15 +534,25 @@ BlockDriver *bdrv_find_protocol(const char *filename,
        len = sizeof(protocol) - 1;
    memcpy(protocol, filename, len);
    protocol[len] = '\0';
-    QLIST_FOREACH(drv1, &bdrv_drivers, list) {
-        if (drv1->protocol_name &&
-            !strcmp(drv1->protocol_name, protocol)) {
-            return drv1;
+
+    drv1 = bdrv_do_find_protocol(protocol);
+    if (drv1) {
+        return drv1;
+    }
+
+    for (i = 0; i < (int)ARRAY_SIZE(block_driver_modules); ++i) {
+        if (block_driver_modules[i].protocol_name &&
+            !strcmp(block_driver_modules[i].protocol_name, protocol)) {
+            block_module_load_one(block_driver_modules[i].library_name);
+            break;
        }
    }

-    error_setg(errp, "Unknown protocol '%s'", protocol);
-    return NULL;
+    drv1 = bdrv_do_find_protocol(protocol);
+    if (!drv1) {
+        error_setg(errp, "Unknown protocol '%s'", protocol);
+    }
+    return drv1;
 }

 /*
@@ -684,6 +734,9 @@ static void bdrv_temp_snapshot_options(int *child_flags, QDict *child_options,
    qdict_set_default_str(child_options, BDRV_OPT_CACHE_DIRECT, "off");
    qdict_set_default_str(child_options, BDRV_OPT_CACHE_NO_FLUSH, "on");

+    /* Copy the read-only option from the parent */
+    qdict_copy_default(child_options, parent_options, BDRV_OPT_READ_ONLY);
+
    /* aio=native doesn't work for cache.direct=off, so disable it for the
     * temporary snapshot */
    *child_flags &= ~BDRV_O_NATIVE_AIO;
@@ -706,10 +759,13 @@ static void bdrv_inherited_options(int *child_flags, QDict *child_options,
    qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_DIRECT);
    qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_NO_FLUSH);

+    /* Inherit the read-only option from the parent if it's not set */
+    qdict_copy_default(child_options, parent_options, BDRV_OPT_READ_ONLY);
+
    /* Our block drivers take care to send flushes and respect unmap policy,
     * so we can default to enable both on lower layers regardless of the
     * corresponding parent options. */
-    flags |= BDRV_O_UNMAP;
+    qdict_set_default_str(child_options, BDRV_OPT_DISCARD, "unmap");

    /* Clear flags that only apply to the top layer */
    flags &= ~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING | BDRV_O_COPY_ON_READ |
@@ -759,7 +815,8 @@ static void bdrv_backing_options(int *child_flags, QDict *child_options,
    qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_NO_FLUSH);

    /* backing files always opened read-only */
-    flags &= ~(BDRV_O_RDWR | BDRV_O_COPY_ON_READ);
+    qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "on");
+    flags &= ~BDRV_O_COPY_ON_READ;

    /* snapshot=on is handled on the top layer */
    flags &= ~(BDRV_O_SNAPSHOT | BDRV_O_TEMPORARY);
@@ -806,6 +863,14 @@ static void update_flags_from_options(int *flags, QemuOpts *opts)
    if (qemu_opt_get_bool(opts, BDRV_OPT_CACHE_DIRECT, false)) {
        *flags |= BDRV_O_NOCACHE;
    }
+
+    *flags &= ~BDRV_O_RDWR;
+
+    assert(qemu_opt_find(opts, BDRV_OPT_READ_ONLY));
+    if (!qemu_opt_get_bool(opts, BDRV_OPT_READ_ONLY, false)) {
+        *flags |= BDRV_O_RDWR;
+    }
+
 }

 static void update_options_from_flags(QDict *options, int flags)
@@ -818,6 +883,10 @@ static void update_options_from_flags(QDict *options, int flags)
        qdict_put(options, BDRV_OPT_CACHE_NO_FLUSH,
                  qbool_from_bool(flags & BDRV_O_NO_FLUSH));
    }
+    if (!qdict_haskey(options, BDRV_OPT_READ_ONLY)) {
+        qdict_put(options, BDRV_OPT_READ_ONLY,
+                  qbool_from_bool(!(flags & BDRV_O_RDWR)));
+    }
 }

 static void bdrv_assign_node_name(BlockDriverState *bs,
@@ -857,7 +926,7 @@ out:
    g_free(gen_node_name);
 }

-static QemuOptsList bdrv_runtime_opts = {
+QemuOptsList bdrv_runtime_opts = {
    .name = "bdrv_common",
    .head = QTAILQ_HEAD_INITIALIZER(bdrv_runtime_opts.head),
    .desc = {
@@ -881,6 +950,21 @@ static QemuOptsList bdrv_runtime_opts = {
            .type = QEMU_OPT_BOOL,
            .help = "Ignore flush requests",
        },
+        {
+            .name = BDRV_OPT_READ_ONLY,
+            .type = QEMU_OPT_BOOL,
+            .help = "Node is opened in read-only mode",
+        },
+        {
+            .name = "detect-zeroes",
+            .type = QEMU_OPT_STRING,
+            .help = "try to optimize zero writes (off, on, unmap)",
+        },
+        {
+            .name = "discard",
+            .type = QEMU_OPT_STRING,
+            .help = "discard operation (ignore/off, unmap/on)",
+        },
        { /* end of list */ }
    },
 };
@@ -897,6 +981,8 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
    const char *filename;
    const char *driver_name = NULL;
    const char *node_name = NULL;
+    const char *discard;
+    const char *detect_zeroes;
    QemuOpts *opts;
    BlockDriver *drv;
    Error *local_err = NULL;
@@ -912,6 +998,8 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
        goto fail_opts;
    }

+    update_flags_from_options(&bs->open_flags, opts);
+
    driver_name = qemu_opt_get(opts, "driver");
    drv = bdrv_find_format(driver_name);
    assert(drv != NULL);
@@ -963,6 +1051,41 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
        }
    }

+    discard = qemu_opt_get(opts, "discard");
+    if (discard != NULL) {
+        if (bdrv_parse_discard_flags(discard, &bs->open_flags) != 0) {
+            error_setg(errp, "Invalid discard option");
+            ret = -EINVAL;
+            goto fail_opts;
+        }
+    }
+
+    detect_zeroes = qemu_opt_get(opts, "detect-zeroes");
+    if (detect_zeroes) {
+        BlockdevDetectZeroesOptions value =
+            qapi_enum_parse(BlockdevDetectZeroesOptions_lookup,
+                            detect_zeroes,
+                            BLOCKDEV_DETECT_ZEROES_OPTIONS__MAX,
+                            BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF,
+                            &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            ret = -EINVAL;
+            goto fail_opts;
+        }
+
+        if (value == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP &&
+            !(bs->open_flags & BDRV_O_UNMAP))
+        {
+            error_setg(errp, "setting detect-zeroes to unmap is not allowed "
+                             "without setting discard operation to unmap");
+            ret = -EINVAL;
+            goto fail_opts;
+        }
+
+        bs->detect_zeroes = value;
+    }
+
    if (filename != NULL) {
        pstrcpy(bs->filename, sizeof(bs->filename), filename);
    } else {
@@ -973,9 +1096,6 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
    bs->drv = drv;
    bs->opaque = g_malloc0(drv->instance_size);

-    /* Apply cache mode options */
-    update_flags_from_options(&bs->open_flags, opts);
-
    /* Open the image, either directly or using a protocol */
    open_flags = bdrv_open_flags(bs, bs->open_flags);
    if (drv->bdrv_file_open) {
@@ -1311,6 +1431,23 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd)
    /* Otherwise we won't be able to commit due to check in bdrv_commit */
    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
                    bs->backing_blocker);
+    /*
+     * We do backup in 3 ways:
+     * 1. drive backup
+     *    The target bs is new opened, and the source is top BDS
+     * 2. blockdev backup
+     *    Both the source and the target are top BDSes.
+     * 3. internal backup(used for block replication)
+     *    Both the source and the target are backing file
+     *
+     * In case 1 and 2, neither the source nor the target is the backing file.
+     * In case 3, we will block the top BDS, so there is only one block job
+     * for the top BDS and its backing chain.
+     */
+    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+                    bs->backing_blocker);
+    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
+                    bs->backing_blocker);
 out:
    bdrv_refresh_limits(bs, NULL);
 }
@@ -1609,6 +1746,25 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
        goto fail;
    }

+    /* Set the BDRV_O_RDWR and BDRV_O_ALLOW_RDWR flags.
+     * FIXME: we're parsing the QDict to avoid having to create a
+     * QemuOpts just for this, but neither option is optimal. */
+    if (g_strcmp0(qdict_get_try_str(options, BDRV_OPT_READ_ONLY), "on") &&
+        !qdict_get_try_bool(options, BDRV_OPT_READ_ONLY, false)) {
+        flags |= (BDRV_O_RDWR | BDRV_O_ALLOW_RDWR);
+    } else {
+        flags &= ~BDRV_O_RDWR;
+    }
+
+    if (flags & BDRV_O_SNAPSHOT) {
+        snapshot_options = qdict_new();
+        bdrv_temp_snapshot_options(&snapshot_flags, snapshot_options,
+                                   flags, options);
+        /* Let bdrv_backing_options() override "read-only" */
+        qdict_del(options, BDRV_OPT_READ_ONLY);
+        bdrv_backing_options(&flags, options, flags, options);
+    }
+
    bs->open_flags = flags;
    bs->options = options;
    options = qdict_clone_shallow(options);
@@ -1633,18 +1789,6 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,

    /* Open image file without format layer */
    if ((flags & BDRV_O_PROTOCOL) == 0) {
-        if (flags & BDRV_O_RDWR) {
-            flags |= BDRV_O_ALLOW_RDWR;
-        }
-        if (flags & BDRV_O_SNAPSHOT) {
-            snapshot_options = qdict_new();
-            bdrv_temp_snapshot_options(&snapshot_flags, snapshot_options,
-                                       flags, options);
-            bdrv_backing_options(&flags, options, flags, options);
-        }
-
-        bs->open_flags = flags;
-
        file = bdrv_open_child(filename, options, "file", bs,
                               &child_file, true, &local_err);
        if (local_err) {
@@ -1829,6 +1973,13 @@ static BlockReopenQueue *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
        options = qdict_new();
    }

+    /* Check if this BlockDriverState is already in the queue */
+    QSIMPLEQ_FOREACH(bs_entry, bs_queue, entry) {
+        if (bs == bs_entry->state.bs) {
+            break;
+        }
+    }
+
    /*
     * Precedence of options:
     * 1. Explicitly passed in options (highest)
@@ -1849,7 +2000,11 @@ static BlockReopenQueue *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
    }

    /* Old explicitly set values (don't overwrite by inherited value) */
-    old_options = qdict_clone_shallow(bs->explicit_options);
+    if (bs_entry) {
+        old_options = qdict_clone_shallow(bs_entry->state.explicit_options);
+    } else {
+        old_options = qdict_clone_shallow(bs->explicit_options);
+    }
    bdrv_join_options(bs, options, old_options);
    QDECREF(old_options);

@@ -1888,8 +2043,13 @@ static BlockReopenQueue *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
                                child->role, options, flags);
    }

-    bs_entry = g_new0(BlockReopenQueueEntry, 1);
-    QSIMPLEQ_INSERT_TAIL(bs_queue, bs_entry, entry);
+    if (!bs_entry) {
+        bs_entry = g_new0(BlockReopenQueueEntry, 1);
+        QSIMPLEQ_INSERT_TAIL(bs_queue, bs_entry, entry);
+    } else {
+        QDECREF(bs_entry->state.options);
+        QDECREF(bs_entry->state.explicit_options);
+    }

    bs_entry->state.bs = bs;
    bs_entry->state.options = options;
@@ -2206,6 +2366,7 @@ static void bdrv_close(BlockDriverState *bs)
 void bdrv_close_all(void)
 {
    block_job_cancel_sync_all();
+    nbd_export_close_all();

    /* Drop references from requests still in flight, such as canceled block
     * jobs whose AIO context has not been polled yet */
@@ -2946,11 +3107,6 @@ bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag)
    return false;
 }

-int bdrv_is_snapshot(BlockDriverState *bs)
-{
-    return !!(bs->open_flags & BDRV_O_SNAPSHOT);
-}
-
 /* backing_file can either be relative, or absolute, or a protocol.  If it is
 * relative, it must be relative to the chain.  So, passing in bs->filename
 * from a BDS as backing_file should not be done, as that may be relative to
@@ -3204,17 +3360,10 @@ int bdrv_media_changed(BlockDriverState *bs)
 void bdrv_eject(BlockDriverState *bs, bool eject_flag)
 {
    BlockDriver *drv = bs->drv;
-    const char *device_name;

    if (drv && drv->bdrv_eject) {
        drv->bdrv_eject(bs, eject_flag);
    }
-
-    device_name = bdrv_get_device_name(bs);
-    if (device_name[0] != '\0') {
-        qapi_event_send_device_tray_moved(device_name,
-                                          eject_flag, &error_abort);
-    }
 }

 /**
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,8 +1,8 @@
-block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o bochs.o vpc.o vvfat.o
+block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o bochs.o vpc.o vvfat.o dmg.o
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
-block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
+block-obj-y += vhdx.o vhdx-endian.o vhdx-log.o
 block-obj-y += quorum.o
 block-obj-y += parallels.o blkdebug.o blkverify.o blkreplay.o
 block-obj-y += block-backend.o snapshot.o qapi.o
@@ -22,12 +22,14 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o dirty-bitmap.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
+block-obj-$(CONFIG_REPLICATION) += replication.o

 block-obj-y += crypto.o

 common-obj-y += stream.o
-common-obj-y += backup.o

+nfs.o-libs         := $(LIBNFS_LIBS)
 iscsi.o-cflags     := $(LIBISCSI_CFLAGS)
 iscsi.o-libs       := $(LIBISCSI_LIBS)
 curl.o-cflags      := $(CURL_CFLAGS)
@@ -39,7 +41,7 @@ gluster.o-libs     := $(GLUSTERFS_LIBS)
 ssh.o-cflags       := $(LIBSSH2_CFLAGS)
 ssh.o-libs         := $(LIBSSH2_LIBS)
 archipelago.o-libs := $(ARCHIPELAGO_LIBS)
-block-obj-m        += dmg.o
-dmg.o-libs         := $(BZIP2_LIBS)
+block-obj-$(if $(CONFIG_BZIP2),m,n) += dmg-bz2.o
+dmg-bz2.o-libs     := $(BZIP2_LIBS)
 qcow.o-libs        := -lz
 linux-aio.o-libs   := -laio
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -87,7 +87,6 @@ typedef enum {

 typedef struct ArchipelagoAIOCB {
    BlockAIOCB common;
-    QEMUBH *bh;
    struct BDRVArchipelagoState *s;
    QEMUIOVector *qiov;
    ARCHIPCmd cmd;
@@ -154,11 +153,10 @@ static void archipelago_finish_aiocb(AIORequestData *reqdata)
    } else if (reqdata->aio_cb->ret == reqdata->segreq->total) {
        reqdata->aio_cb->ret = 0;
    }
-    reqdata->aio_cb->bh = aio_bh_new(
+    aio_bh_schedule_oneshot(
                        bdrv_get_aio_context(reqdata->aio_cb->common.bs),
                        qemu_archipelago_complete_aio, reqdata
                        );
-    qemu_bh_schedule(reqdata->aio_cb->bh);
 }

 static int wait_reply(struct xseg *xseg, xport srcport, struct xseg_port *port,
@@ -313,7 +311,6 @@ static void qemu_archipelago_complete_aio(void *opaque)
    AIORequestData *reqdata = (AIORequestData *) opaque;
    ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) reqdata->aio_cb;

-    qemu_bh_delete(aio_cb->bh);
    aio_cb->common.cb(aio_cb->common.opaque, aio_cb->ret);
    aio_cb->status = 0;

--- a/block/backup.c
+++ b/block/backup.c
@@ -17,6 +17,7 @@
 #include "block/block.h"
 #include "block/block_int.h"
 #include "block/blockjob.h"
+#include "block/block_backup.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/ratelimit.h"
@@ -27,13 +28,6 @@
 #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
 #define SLICE_TIME 100000000ULL /* ns */

-typedef struct CowRequest {
-    int64_t start;
-    int64_t end;
-    QLIST_ENTRY(CowRequest) list;
-    CoQueue wait_queue; /* coroutines blocked on this request */
-} CowRequest;
-
 typedef struct BackupBlockJob {
    BlockJob common;
    BlockBackend *target;
@@ -47,6 +41,7 @@ typedef struct BackupBlockJob {
    uint64_t sectors_read;
    unsigned long *done_bitmap;
    int64_t cluster_size;
+    bool compress;
    NotifierWithReturn before_write;
    QLIST_HEAD(, CowRequest) inflight_reqs;
 } BackupBlockJob;
@@ -154,7 +149,8 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
                                       bounce_qiov.size, BDRV_REQ_MAY_UNMAP);
        } else {
            ret = blk_co_pwritev(job->target, start * job->cluster_size,
-                                 bounce_qiov.size, &bounce_qiov, 0);
+                                 bounce_qiov.size, &bounce_qiov,
+                                 job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
        }
        if (ret < 0) {
            trace_backup_do_cow_write_fail(job, start, ret);
@@ -253,6 +249,57 @@ static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
    blk_set_aio_context(s->target, aio_context);
 }

+void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+    int64_t len;
+
+    assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
+
+    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+        error_setg(errp, "The backup job only supports block checkpoint in"
+                   " sync=none mode");
+        return;
+    }
+
+    len = DIV_ROUND_UP(backup_job->common.len, backup_job->cluster_size);
+    bitmap_zero(backup_job->done_bitmap, len);
+}
+
+void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
+                                          int nb_sectors)
+{
+    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+    int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
+    int64_t start, end;
+
+    assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
+
+    start = sector_num / sectors_per_cluster;
+    end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+    wait_for_overlapping_requests(backup_job, start, end);
+}
+
+void backup_cow_request_begin(CowRequest *req, BlockJob *job,
+                              int64_t sector_num,
+                              int nb_sectors)
+{
+    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+    int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
+    int64_t start, end;
+
+    assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
+
+    start = sector_num / sectors_per_cluster;
+    end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+    cow_request_begin(req, backup_job, start, end);
+}
+
+void backup_cow_request_end(CowRequest *req)
+{
+    cow_request_end(req);
+}
+
 static const BlockJobDriver backup_job_driver = {
    .instance_size          = sizeof(BackupBlockJob),
    .job_type               = BLOCK_JOB_TYPE_BACKUP,
@@ -477,6 +524,7 @@ static void coroutine_fn backup_run(void *opaque)
 void backup_start(const char *job_id, BlockDriverState *bs,
                  BlockDriverState *target, int64_t speed,
                  MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
+                  bool compress,
                  BlockdevOnError on_source_error,
                  BlockdevOnError on_target_error,
                  BlockCompletionFunc *cb, void *opaque,
@@ -507,6 +555,12 @@ void backup_start(const char *job_id, BlockDriverState *bs,
        return;
    }

+    if (compress && target->drv->bdrv_co_pwritev_compressed == NULL) {
+        error_setg(errp, "Compression is not supported for this drive %s",
+                   bdrv_get_device_name(target));
+        return;
+    }
+
    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
        return;
    }
@@ -555,6 +609,7 @@ void backup_start(const char *job_id, BlockDriverState *bs,
    job->sync_mode = sync_mode;
    job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ?
                       sync_bitmap : NULL;
+    job->compress = compress;

    /* If there is no backing file on the target, we cannot rely on COW if our
     * backup cluster size is smaller than the target cluster size. Even for
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -39,6 +39,9 @@ typedef struct BDRVBlkdebugState {
    int new_state;
    int align;

+    /* For blkdebug_refresh_filename() */
+    char *config_file;
+
    QLIST_HEAD(, BlkdebugRule) rules[BLKDBG__MAX];
    QSIMPLEQ_HEAD(, BlkdebugRule) active_rules;
    QLIST_HEAD(, BlkdebugSuspendedReq) suspended_reqs;
@@ -46,7 +49,6 @@ typedef struct BDRVBlkdebugState {

 typedef struct BlkdebugAIOCB {
    BlockAIOCB common;
-    QEMUBH *bh;
    int ret;
 } BlkdebugAIOCB;

@@ -351,7 +353,6 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
    BDRVBlkdebugState *s = bs->opaque;
    QemuOpts *opts;
    Error *local_err = NULL;
-    const char *config;
    uint64_t align;
    int ret;

@@ -364,8 +365,8 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
    }

    /* Read rules from config file or command line options */
-    config = qemu_opt_get(opts, "config");
-    ret = read_config(s, config, options, errp);
+    s->config_file = g_strdup(qemu_opt_get(opts, "config"));
+    ret = read_config(s, s->config_file, options, errp);
    if (ret) {
        goto out;
    }
@@ -398,6 +399,9 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
 fail_unref:
    bdrv_unref_child(bs, bs->file);
 out:
+    if (ret < 0) {
+        g_free(s->config_file);
+    }
    qemu_opts_del(opts);
    return ret;
 }
@@ -405,7 +409,6 @@ out:
 static void error_callback_bh(void *opaque)
 {
    struct BlkdebugAIOCB *acb = opaque;
-    qemu_bh_delete(acb->bh);
    acb->common.cb(acb->common.opaque, acb->ret);
    qemu_aio_unref(acb);
 }
@@ -416,7 +419,6 @@ static BlockAIOCB *inject_error(BlockDriverState *bs,
    BDRVBlkdebugState *s = bs->opaque;
    int error = rule->options.inject.error;
    struct BlkdebugAIOCB *acb;
-    QEMUBH *bh;
    bool immediately = rule->options.inject.immediately;

    if (rule->options.inject.once) {
@@ -431,9 +433,7 @@ static BlockAIOCB *inject_error(BlockDriverState *bs,
    acb = qemu_aio_get(&blkdebug_aiocb_info, bs, cb, opaque);
    acb->ret = -error;

-    bh = aio_bh_new(bdrv_get_aio_context(bs), error_callback_bh, acb);
-    acb->bh = bh;
-    qemu_bh_schedule(bh);
+    aio_bh_schedule_oneshot(bdrv_get_aio_context(bs), error_callback_bh, acb);

    return &acb->common;
 }
@@ -515,6 +515,8 @@ static void blkdebug_close(BlockDriverState *bs)
            remove_rule(rule);
        }
    }
+
+    g_free(s->config_file);
 }

 static void suspend_request(BlockDriverState *bs, BlkdebugRule *rule)
@@ -679,6 +681,7 @@ static int blkdebug_truncate(BlockDriverState *bs, int64_t offset)

 static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)
 {
+    BDRVBlkdebugState *s = bs->opaque;
    QDict *opts;
    const QDictEntry *e;
    bool force_json = false;
@@ -700,8 +703,7 @@ static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)

    if (!force_json && bs->file->bs->exact_filename[0]) {
        snprintf(bs->exact_filename, sizeof(bs->exact_filename),
-                 "blkdebug:%s:%s",
-                 qdict_get_try_str(options, "config") ?: "",
+                 "blkdebug:%s:%s", s->config_file ?: "",
                 bs->file->bs->exact_filename);
    }

--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -20,11 +20,6 @@ typedef struct Request {
    QEMUBH *bh;
 } Request;

-/* Next request id.
-   This counter is global, because requests from different
-   block devices should not get overlapping ids. */
-static uint64_t request_id;
-
 static int blkreplay_open(BlockDriverState *bs, QDict *options, int flags,
                          Error **errp)
 {
@@ -84,7 +79,7 @@ static void block_request_create(uint64_t reqid, BlockDriverState *bs,
 static int coroutine_fn blkreplay_co_preadv(BlockDriverState *bs,
    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
-    uint64_t reqid = request_id++;
+    uint64_t reqid = blkreplay_next_id();
    int ret = bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
    block_request_create(reqid, bs, qemu_coroutine_self());
    qemu_coroutine_yield();
@@ -95,7 +90,7 @@ static int coroutine_fn blkreplay_co_preadv(BlockDriverState *bs,
 static int coroutine_fn blkreplay_co_pwritev(BlockDriverState *bs,
    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
-    uint64_t reqid = request_id++;
+    uint64_t reqid = blkreplay_next_id();
    int ret = bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
    block_request_create(reqid, bs, qemu_coroutine_self());
    qemu_coroutine_yield();
@@ -106,7 +101,7 @@ static int coroutine_fn blkreplay_co_pwritev(BlockDriverState *bs,
 static int coroutine_fn blkreplay_co_pwrite_zeroes(BlockDriverState *bs,
    int64_t offset, int count, BdrvRequestFlags flags)
 {
-    uint64_t reqid = request_id++;
+    uint64_t reqid = blkreplay_next_id();
    int ret = bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
    block_request_create(reqid, bs, qemu_coroutine_self());
    qemu_coroutine_yield();
@@ -117,7 +112,7 @@ static int coroutine_fn blkreplay_co_pwrite_zeroes(BlockDriverState *bs,
 static int coroutine_fn blkreplay_co_pdiscard(BlockDriverState *bs,
                                              int64_t offset, int count)
 {
-    uint64_t reqid = request_id++;
+    uint64_t reqid = blkreplay_next_id();
    int ret = bdrv_co_pdiscard(bs->file->bs, offset, count);
    block_request_create(reqid, bs, qemu_coroutine_self());
    qemu_coroutine_yield();
@@ -127,7 +122,7 @@ static int coroutine_fn blkreplay_co_pdiscard(BlockDriverState *bs,

 static int coroutine_fn blkreplay_co_flush(BlockDriverState *bs)
 {
-    uint64_t reqid = request_id++;
+    uint64_t reqid = blkreplay_next_id();
    int ret = bdrv_co_flush(bs->file->bs);
    block_request_create(reqid, bs, qemu_coroutine_self());
    qemu_coroutine_yield();
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -22,7 +22,6 @@ typedef struct {
 typedef struct BlkverifyAIOCB BlkverifyAIOCB;
 struct BlkverifyAIOCB {
    BlockAIOCB common;
-    QEMUBH *bh;

    /* Request metadata */
    bool is_write;
@@ -175,7 +174,6 @@ static BlkverifyAIOCB *blkverify_aio_get(BlockDriverState *bs, bool is_write,
 {
    BlkverifyAIOCB *acb = qemu_aio_get(&blkverify_aiocb_info, bs, cb, opaque);

-    acb->bh = NULL;
    acb->is_write = is_write;
    acb->sector_num = sector_num;
    acb->nb_sectors = nb_sectors;
@@ -191,7 +189,6 @@ static void blkverify_aio_bh(void *opaque)
 {
    BlkverifyAIOCB *acb = opaque;

-    qemu_bh_delete(acb->bh);
    if (acb->buf) {
        qemu_iovec_destroy(&acb->raw_qiov);
        qemu_vfree(acb->buf);
@@ -218,9 +215,8 @@ static void blkverify_aio_cb(void *opaque, int ret)
            acb->verify(acb);
        }

-        acb->bh = aio_bh_new(bdrv_get_aio_context(acb->common.bs),
-                             blkverify_aio_bh, acb);
-        qemu_bh_schedule(acb->bh);
+        aio_bh_schedule_oneshot(bdrv_get_aio_context(acb->common.bs),
+                                blkverify_aio_bh, acb);
        break;
    }
 }
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -38,6 +38,7 @@ struct BlockBackend {
    BlockBackendPublic public;

    void *dev;                  /* attached device model, if any */
+    bool legacy_dev;            /* true if dev is not a DeviceState */
    /* TODO change to DeviceState when all users are qdevified */
    const BlockDevOps *dev_ops;
    void *dev_opaque;
@@ -65,7 +66,6 @@ struct BlockBackend {

 typedef struct BlockBackendAIOCB {
    BlockAIOCB common;
-    QEMUBH *bh;
    BlockBackend *blk;
    int ret;
 } BlockBackendAIOCB;
@@ -409,6 +409,22 @@ bool bdrv_has_blk(BlockDriverState *bs)
    return bdrv_first_blk(bs) != NULL;
 }

+/*
+ * Returns true if @bs has only BlockBackends as parents.
+ */
+bool bdrv_is_root_node(BlockDriverState *bs)
+{
+    BdrvChild *c;
+
+    QLIST_FOREACH(c, &bs->parents, next_parent) {
+        if (c->role != &child_root) {
+            return false;
+        }
+    }
+
+    return true;
+}
+
 /*
 * Return @blk's DriveInfo if any, else null.
 */
@@ -491,32 +507,38 @@ void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
    }
 }

-/*
- * Attach device model @dev to @blk.
- * Return 0 on success, -EBUSY when a device model is attached already.
- */
-int blk_attach_dev(BlockBackend *blk, void *dev)
-/* TODO change to DeviceState *dev when all users are qdevified */
+static int blk_do_attach_dev(BlockBackend *blk, void *dev)
 {
    if (blk->dev) {
        return -EBUSY;
    }
    blk_ref(blk);
    blk->dev = dev;
+    blk->legacy_dev = false;
    blk_iostatus_reset(blk);
    return 0;
 }

+/*
+ * Attach device model @dev to @blk.
+ * Return 0 on success, -EBUSY when a device model is attached already.
+ */
+int blk_attach_dev(BlockBackend *blk, DeviceState *dev)
+{
+    return blk_do_attach_dev(blk, dev);
+}
+
 /*
 * Attach device model @dev to @blk.
 * @blk must not have a device model attached already.
 * TODO qdevified devices don't use this, remove when devices are qdevified
 */
-void blk_attach_dev_nofail(BlockBackend *blk, void *dev)
+void blk_attach_dev_legacy(BlockBackend *blk, void *dev)
 {
-    if (blk_attach_dev(blk, dev) < 0) {
+    if (blk_do_attach_dev(blk, dev) < 0) {
        abort();
    }
+    blk->legacy_dev = true;
 }

 /*
@@ -543,6 +565,42 @@ void *blk_get_attached_dev(BlockBackend *blk)
    return blk->dev;
 }

+/* Return the qdev ID, or if no ID is assigned the QOM path, of the block
+ * device attached to the BlockBackend. */
+static char *blk_get_attached_dev_id(BlockBackend *blk)
+{
+    DeviceState *dev;
+
+    assert(!blk->legacy_dev);
+    dev = blk->dev;
+
+    if (!dev) {
+        return g_strdup("");
+    } else if (dev->id) {
+        return g_strdup(dev->id);
+    }
+    return object_get_canonical_path(OBJECT(dev));
+}
+
+/*
+ * Return the BlockBackend which has the device model @dev attached if it
+ * exists, else null.
+ *
+ * @dev must not be null.
+ */
+BlockBackend *blk_by_dev(void *dev)
+{
+    BlockBackend *blk = NULL;
+
+    assert(dev != NULL);
+    while ((blk = blk_all_next(blk)) != NULL) {
+        if (blk->dev == dev) {
+            return blk;
+        }
+    }
+    return NULL;
+}
+
 /*
 * Set @blk's device model callbacks to @ops.
 * @opaque is the opaque argument to pass to the callbacks.
@@ -551,6 +609,11 @@ void *blk_get_attached_dev(BlockBackend *blk)
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops,
                     void *opaque)
 {
+    /* All drivers that use blk_set_dev_ops() are qdevified and we want to keep
+     * it that way, so we can assume blk->dev is a DeviceState if blk->dev_ops
+     * is set. */
+    assert(!blk->legacy_dev);
+
    blk->dev_ops = ops;
    blk->dev_opaque = opaque;
 }
@@ -566,13 +629,17 @@ void blk_dev_change_media_cb(BlockBackend *blk, bool load)
    if (blk->dev_ops && blk->dev_ops->change_media_cb) {
        bool tray_was_open, tray_is_open;

+        assert(!blk->legacy_dev);
+
        tray_was_open = blk_dev_is_tray_open(blk);
        blk->dev_ops->change_media_cb(blk->dev_opaque, load);
        tray_is_open = blk_dev_is_tray_open(blk);

        if (tray_was_open != tray_is_open) {
-            qapi_event_send_device_tray_moved(blk_name(blk), tray_is_open,
+            char *id = blk_get_attached_dev_id(blk);
+            qapi_event_send_device_tray_moved(blk_name(blk), id, tray_is_open,
                                              &error_abort);
+            g_free(id);
        }
    }
 }
@@ -727,21 +794,6 @@ static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
    return 0;
 }

-static int blk_check_request(BlockBackend *blk, int64_t sector_num,
-                             int nb_sectors)
-{
-    if (sector_num < 0 || sector_num > INT64_MAX / BDRV_SECTOR_SIZE) {
-        return -EIO;
-    }
-
-    if (nb_sectors < 0 || nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) {
-        return -EIO;
-    }
-
-    return blk_check_byte_request(blk, sector_num * BDRV_SECTOR_SIZE,
-                                  nb_sectors * BDRV_SECTOR_SIZE);
-}
-
 int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
                               unsigned int bytes, QEMUIOVector *qiov,
                               BdrvRequestFlags flags)
@@ -878,7 +930,6 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
 static void error_callback_bh(void *opaque)
 {
    struct BlockBackendAIOCB *acb = opaque;
-    qemu_bh_delete(acb->bh);
    acb->common.cb(acb->common.opaque, acb->ret);
    qemu_aio_unref(acb);
 }
@@ -888,16 +939,12 @@ BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
                                  void *opaque, int ret)
 {
    struct BlockBackendAIOCB *acb;
-    QEMUBH *bh;

    acb = blk_aio_get(&block_backend_aiocb_info, blk, cb, opaque);
    acb->blk = blk;
    acb->ret = ret;

-    bh = aio_bh_new(blk_get_aio_context(blk), error_callback_bh, acb);
-    acb->bh = bh;
-    qemu_bh_schedule(bh);
-
+    aio_bh_schedule_oneshot(blk_get_aio_context(blk), error_callback_bh, acb);
    return &acb->common;
 }

@@ -906,7 +953,6 @@ typedef struct BlkAioEmAIOCB {
    BlkRwCo rwco;
    int bytes;
    bool has_returned;
-    QEMUBH* bh;
 } BlkAioEmAIOCB;

 static const AIOCBInfo blk_aio_em_aiocb_info = {
@@ -915,10 +961,6 @@ static const AIOCBInfo blk_aio_em_aiocb_info = {

 static void blk_aio_complete(BlkAioEmAIOCB *acb)
 {
-    if (acb->bh) {
-        assert(acb->has_returned);
-        qemu_bh_delete(acb->bh);
-    }
    if (acb->has_returned) {
        acb->common.cb(acb->common.opaque, acb->rwco.ret);
        qemu_aio_unref(acb);
@@ -927,7 +969,10 @@ static void blk_aio_complete(BlkAioEmAIOCB *acb)

 static void blk_aio_complete_bh(void *opaque)
 {
-    blk_aio_complete(opaque);
+    BlkAioEmAIOCB *acb = opaque;
+
+    assert(acb->has_returned);
+    blk_aio_complete(acb);
 }

 static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
@@ -947,7 +992,6 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
        .ret    = NOT_DONE,
    };
    acb->bytes = bytes;
-    acb->bh = NULL;
    acb->has_returned = false;

    co = qemu_coroutine_create(co_entry, acb);
@@ -955,8 +999,8 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,

    acb->has_returned = true;
    if (acb->rwco.ret != NOT_DONE) {
-        acb->bh = aio_bh_new(blk_get_aio_context(blk), blk_aio_complete_bh, acb);
-        qemu_bh_schedule(acb->bh);
+        aio_bh_schedule_oneshot(blk_get_aio_context(blk),
+                                blk_aio_complete_bh, acb);
    }

    return &acb->common;
@@ -1186,8 +1230,9 @@ static void send_qmp_error_event(BlockBackend *blk,
    IoOperationType optype;

    optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
-    qapi_event_send_block_io_error(blk_name(blk), optype, action,
-                                   blk_iostatus_is_enabled(blk),
+    qapi_event_send_block_io_error(blk_name(blk),
+                                   bdrv_get_node_name(blk_bs(blk)), optype,
+                                   action, blk_iostatus_is_enabled(blk),
                                   error == ENOSPC, strerror(error),
                                   &error_abort);
 }
@@ -1292,9 +1337,19 @@ void blk_lock_medium(BlockBackend *blk, bool locked)
 void blk_eject(BlockBackend *blk, bool eject_flag)
 {
    BlockDriverState *bs = blk_bs(blk);
+    char *id;
+
+    /* blk_eject is only called by qdevified devices */
+    assert(!blk->legacy_dev);

    if (bs) {
        bdrv_eject(bs, eject_flag);
+
+        id = blk_get_attached_dev_id(blk);
+        qapi_event_send_device_tray_moved(blk_name(blk), id,
+                                          eject_flag, &error_abort);
+        g_free(id);
+
    }
 }

@@ -1484,15 +1539,11 @@ int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
                          flags | BDRV_REQ_ZERO_WRITE);
 }

-int blk_write_compressed(BlockBackend *blk, int64_t sector_num,
-                         const uint8_t *buf, int nb_sectors)
+int blk_pwrite_compressed(BlockBackend *blk, int64_t offset, const void *buf,
+                          int count)
 {
-    int ret = blk_check_request(blk, sector_num, nb_sectors);
-    if (ret < 0) {
-        return ret;
-    }
-
-    return bdrv_write_compressed(blk_bs(blk), sector_num, buf, nb_sectors);
+    return blk_prw(blk, offset, (void *) buf, count, blk_write_entry,
+                   BDRV_REQ_WRITE_COMPRESSED);
 }

 int blk_truncate(BlockBackend *blk, int64_t offset)
@@ -1576,13 +1627,12 @@ void blk_update_root_state(BlockBackend *blk)
 }

 /*
- * Applies the information in the root state to the given BlockDriverState. This
- * does not include the flags which have to be specified for bdrv_open(), use
- * blk_get_open_flags_from_root_state() to inquire them.
+ * Returns the detect-zeroes setting to be used for bdrv_open() of a
+ * BlockDriverState which is supposed to inherit the root state.
 */
-void blk_apply_root_state(BlockBackend *blk, BlockDriverState *bs)
+bool blk_get_detect_zeroes_from_root_state(BlockBackend *blk)
 {
-    bs->detect_zeroes = blk->root_state.detect_zeroes;
+    return blk->root_state.detect_zeroes;
 }

 /*
@@ -1624,28 +1674,6 @@ int blk_commit_all(void)
    return 0;
 }

-int blk_flush_all(void)
-{
-    BlockBackend *blk = NULL;
-    int result = 0;
-
-    while ((blk = blk_all_next(blk)) != NULL) {
-        AioContext *aio_context = blk_get_aio_context(blk);
-        int ret;
-
-        aio_context_acquire(aio_context);
-        if (blk_is_inserted(blk)) {
-            ret = blk_flush(blk);
-            if (ret < 0 && !result) {
-                result = ret;
-            }
-        }
-        aio_context_release(aio_context);
-    }
-
-    return result;
-}
-

 /* throttling disk I/O limits */
 void blk_set_io_limits(BlockBackend *blk, ThrottleConfig *cfg)
--- a/block/commit.c
+++ b/block/commit.c
@@ -83,7 +83,7 @@ static void commit_complete(BlockJob *job, void *opaque)
    BlockDriverState *active = s->active;
    BlockDriverState *top = blk_bs(s->top);
    BlockDriverState *base = blk_bs(s->base);
-    BlockDriverState *overlay_bs;
+    BlockDriverState *overlay_bs = bdrv_find_overlay(active, top);
    int ret = data->ret;

    if (!block_job_is_cancelled(&s->common) && ret == 0) {
@@ -97,7 +97,6 @@ static void commit_complete(BlockJob *job, void *opaque)
    if (s->base_flags != bdrv_get_flags(base)) {
        bdrv_reopen(base, s->base_flags, NULL);
    }
-    overlay_bs = bdrv_find_overlay(active, top);
    if (overlay_bs && s->orig_overlay_flags != bdrv_get_flags(overlay_bs)) {
        bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL);
    }
@@ -243,14 +242,14 @@ void commit_start(const char *job_id, BlockDriverState *bs,
    orig_overlay_flags = bdrv_get_flags(overlay_bs);

    /* convert base & overlay_bs to r/w, if necessary */
-    if (!(orig_overlay_flags & BDRV_O_RDWR)) {
-        reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs, NULL,
-                                         orig_overlay_flags | BDRV_O_RDWR);
-    }
    if (!(orig_base_flags & BDRV_O_RDWR)) {
        reopen_queue = bdrv_reopen_queue(reopen_queue, base, NULL,
                                         orig_base_flags | BDRV_O_RDWR);
    }
+    if (!(orig_overlay_flags & BDRV_O_RDWR)) {
+        reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs, NULL,
+                                         orig_overlay_flags | BDRV_O_RDWR);
+    }
    if (reopen_queue) {
        bdrv_reopen_multiple(reopen_queue, &local_err);
        if (local_err != NULL) {
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -33,6 +33,7 @@
 #define BLOCK_CRYPTO_OPT_LUKS_IVGEN_ALG "ivgen-alg"
 #define BLOCK_CRYPTO_OPT_LUKS_IVGEN_HASH_ALG "ivgen-hash-alg"
 #define BLOCK_CRYPTO_OPT_LUKS_HASH_ALG "hash-alg"
+#define BLOCK_CRYPTO_OPT_LUKS_ITER_TIME "iter-time"

 typedef struct BlockCrypto BlockCrypto;

@@ -183,6 +184,11 @@ static QemuOptsList block_crypto_create_opts_luks = {
            .type = QEMU_OPT_STRING,
            .help = "Name of encryption hash algorithm",
        },
+        {
+            .name = BLOCK_CRYPTO_OPT_LUKS_ITER_TIME,
+            .type = QEMU_OPT_NUMBER,
+            .help = "Time to spend in PBKDF in milliseconds",
+        },
        { /* end of list */ }
    },
 };
--- a/block/curl.c
+++ b/block/curl.c
@@ -96,7 +96,6 @@ struct BDRVCURLState;

 typedef struct CURLAIOCB {
    BlockAIOCB common;
-    QEMUBH *bh;
    QEMUIOVector *qiov;

    int64_t sector_num;
@@ -169,7 +168,7 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
    state->sock_fd = fd;
    s = state->s;

-    DPRINTF("CURL (AIO): Sock action %d on fd %d\n", action, fd);
+    DPRINTF("CURL (AIO): Sock action %d on fd %d\n", action, (int)fd);
    switch (action) {
        case CURL_POLL_IN:
            aio_set_fd_handler(s->aio_context, fd, false,
@@ -675,11 +674,28 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
    curl_easy_setopt(state->curl, CURLOPT_HEADERDATA, s);
    if (curl_easy_perform(state->curl))
        goto out;
-    curl_easy_getinfo(state->curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD, &d);
-    if (d)
-        s->len = (size_t)d;
-    else if(!s->len)
+    if (curl_easy_getinfo(state->curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD, &d)) {
        goto out;
+    }
+    /* Prior CURL 7.19.4 return value of 0 could mean that the file size is not
+     * know or the size is zero. From 7.19.4 CURL returns -1 if size is not
+     * known and zero if it is realy zero-length file. */
+#if LIBCURL_VERSION_NUM >= 0x071304
+    if (d < 0) {
+        pstrcpy(state->errmsg, CURL_ERROR_SIZE,
+                "Server didn't report file size.");
+        goto out;
+    }
+#else
+    if (d <= 0) {
+        pstrcpy(state->errmsg, CURL_ERROR_SIZE,
+                "Unknown file size or zero-length file.");
+        goto out;
+    }
+#endif
+
+    s->len = (size_t)d;
+
    if ((!strncasecmp(s->url, "http://", strlen("http://"))
        || !strncasecmp(s->url, "https://", strlen("https://")))
        && !s->accept_range) {
@@ -722,9 +738,6 @@ static void curl_readv_bh_cb(void *p)
    CURLAIOCB *acb = p;
    BDRVCURLState *s = acb->common.bs->opaque;

-    qemu_bh_delete(acb->bh);
-    acb->bh = NULL;
-
    size_t start = acb->sector_num * SECTOR_SIZE;
    size_t end;

@@ -788,8 +801,7 @@ static BlockAIOCB *curl_aio_readv(BlockDriverState *bs,
    acb->sector_num = sector_num;
    acb->nb_sectors = nb_sectors;

-    acb->bh = aio_bh_new(bdrv_get_aio_context(bs), curl_readv_bh_cb, acb);
-    qemu_bh_schedule(acb->bh);
+    aio_bh_schedule_oneshot(bdrv_get_aio_context(bs), curl_readv_bh_cb, acb);
    return &acb->common;
 }

--- a/block/dmg-bz2.c
+++ b/block/dmg-bz2.c
@@ -0,0 +1,61 @@
+/*
+ * DMG bzip2 uncompression
+ *
+ * Copyright (c) 2004 Johannes E. Schindelin
+ * Copyright (c) 2016 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "dmg.h"
+#include <bzlib.h>
+
+static int dmg_uncompress_bz2_do(char *next_in, unsigned int avail_in,
+                                 char *next_out, unsigned int avail_out)
+{
+    int ret;
+    uint64_t total_out;
+    bz_stream bzstream = {};
+
+    ret = BZ2_bzDecompressInit(&bzstream, 0, 0);
+    if (ret != BZ_OK) {
+        return -1;
+    }
+    bzstream.next_in = next_in;
+    bzstream.avail_in = avail_in;
+    bzstream.next_out = next_out;
+    bzstream.avail_out = avail_out;
+    ret = BZ2_bzDecompress(&bzstream);
+    total_out = ((uint64_t)bzstream.total_out_hi32 << 32) +
+                bzstream.total_out_lo32;
+    BZ2_bzDecompressEnd(&bzstream);
+    if (ret != BZ_STREAM_END ||
+        total_out != avail_out) {
+        return -1;
+    }
+    return 0;
+}
+
+__attribute__((constructor))
+static void dmg_bz2_init(void)
+{
+    assert(!dmg_uncompress_bz2);
+    dmg_uncompress_bz2 = dmg_uncompress_bz2_do;
+}
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -28,10 +28,10 @@
 #include "qemu/bswap.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
-#include <zlib.h>
-#ifdef CONFIG_BZIP2
-#include <bzlib.h>
-#endif
+#include "dmg.h"
+
+int (*dmg_uncompress_bz2)(char *next_in, unsigned int avail_in,
+                          char *next_out, unsigned int avail_out);

 enum {
    /* Limit chunk sizes to prevent unreasonable amounts of memory being used
@@ -41,31 +41,6 @@ enum {
    DMG_SECTORCOUNTS_MAX = DMG_LENGTHS_MAX / 512,
 };

-typedef struct BDRVDMGState {
-    CoMutex lock;
-    /* each chunk contains a certain number of sectors,
-     * offsets[i] is the offset in the .dmg file,
-     * lengths[i] is the length of the compressed chunk,
-     * sectors[i] is the sector beginning at offsets[i],
-     * sectorcounts[i] is the number of sectors in that chunk,
-     * the sectors array is ordered
-     * 0<=i<n_chunks */
-
-    uint32_t n_chunks;
-    uint32_t* types;
-    uint64_t* offsets;
-    uint64_t* lengths;
-    uint64_t* sectors;
-    uint64_t* sectorcounts;
-    uint32_t current_chunk;
-    uint8_t *compressed_chunk;
-    uint8_t *uncompressed_chunk;
-    z_stream zstream;
-#ifdef CONFIG_BZIP2
-    bz_stream bzstream;
-#endif
-} BDRVDMGState;
-
 static int dmg_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
    int len;
@@ -210,10 +185,9 @@ static bool dmg_is_known_block_type(uint32_t entry_type)
    case 0x00000001:    /* uncompressed */
    case 0x00000002:    /* zeroes */
    case 0x80000005:    /* zlib */
-#ifdef CONFIG_BZIP2
-    case 0x80000006:    /* bzip2 */
-#endif
        return true;
+    case 0x80000006:    /* bzip2 */
+        return !!dmg_uncompress_bz2;
    default:
        return false;
    }
@@ -439,6 +413,7 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
    int64_t offset;
    int ret;

+    block_module_load_one("dmg-bz2");
    bs->read_only = true;

    s->n_chunks = 0;
@@ -587,9 +562,6 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
    if (!is_sector_in_chunk(s, s->current_chunk, sector_num)) {
        int ret;
        uint32_t chunk = search_chunk(s, sector_num);
-#ifdef CONFIG_BZIP2
-        uint64_t total_out;
-#endif

        if (chunk >= s->n_chunks) {
            return -1;
@@ -620,8 +592,10 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
                return -1;
            }
            break; }
-#ifdef CONFIG_BZIP2
        case 0x80000006: /* bzip2 compressed */
+            if (!dmg_uncompress_bz2) {
+                break;
+            }
            /* we need to buffer, because only the chunk as whole can be
             * inflated. */
            ret = bdrv_pread(bs->file, s->offsets[chunk],
@@ -630,24 +604,15 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
                return -1;
            }

-            ret = BZ2_bzDecompressInit(&s->bzstream, 0, 0);
-            if (ret != BZ_OK) {
-                return -1;
-            }
-            s->bzstream.next_in = (char *)s->compressed_chunk;
-            s->bzstream.avail_in = (unsigned int) s->lengths[chunk];
-            s->bzstream.next_out = (char *)s->uncompressed_chunk;
-            s->bzstream.avail_out = (unsigned int) 512 * s->sectorcounts[chunk];
-            ret = BZ2_bzDecompress(&s->bzstream);
-            total_out = ((uint64_t)s->bzstream.total_out_hi32 << 32) +
-                        s->bzstream.total_out_lo32;
-            BZ2_bzDecompressEnd(&s->bzstream);
-            if (ret != BZ_STREAM_END ||
-                total_out != 512 * s->sectorcounts[chunk]) {
-                return -1;
+            ret = dmg_uncompress_bz2((char *)s->compressed_chunk,
+                                     (unsigned int) s->lengths[chunk],
+                                     (char *)s->uncompressed_chunk,
+                                     (unsigned int)
+                                         (512 * s->sectorcounts[chunk]));
+            if (ret < 0) {
+                return ret;
            }
            break;
-#endif /* CONFIG_BZIP2 */
        case 1: /* copy */
            ret = bdrv_pread(bs->file, s->offsets[chunk],
                             s->uncompressed_chunk, s->lengths[chunk]);
--- a/block/dmg.h
+++ b/block/dmg.h
@@ -0,0 +1,59 @@
+/*
+ * Header for DMG driver
+ *
+ * Copyright (c) 2004-2006 Fabrice Bellard
+ * Copyright (c) 2016 Red hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef BLOCK_DMG_H
+#define BLOCK_DMG_H
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include <zlib.h>
+
+typedef struct BDRVDMGState {
+    CoMutex lock;
+    /* each chunk contains a certain number of sectors,
+     * offsets[i] is the offset in the .dmg file,
+     * lengths[i] is the length of the compressed chunk,
+     * sectors[i] is the sector beginning at offsets[i],
+     * sectorcounts[i] is the number of sectors in that chunk,
+     * the sectors array is ordered
+     * 0<=i<n_chunks */
+
+    uint32_t n_chunks;
+    uint32_t *types;
+    uint64_t *offsets;
+    uint64_t *lengths;
+    uint64_t *sectors;
+    uint64_t *sectorcounts;
+    uint32_t current_chunk;
+    uint8_t *compressed_chunk;
+    uint8_t *uncompressed_chunk;
+    z_stream zstream;
+} BDRVDMGState;
+
+extern int (*dmg_uncompress_bz2)(char *next_in, unsigned int avail_in,
+                                 char *next_out, unsigned int avail_out);
+
+#endif
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -30,13 +30,14 @@
 #define GLUSTER_DEFAULT_PORT        24007
 #define GLUSTER_DEBUG_DEFAULT       4
 #define GLUSTER_DEBUG_MAX           9
+#define GLUSTER_OPT_LOGFILE         "logfile"
+#define GLUSTER_LOGFILE_DEFAULT     "-" /* handled in libgfapi as /dev/stderr */

 #define GERR_INDEX_HINT "hint: check in 'server' array index '%d'\n"

 typedef struct GlusterAIOCB {
    int64_t size;
    int ret;
-    QEMUBH *bh;
    Coroutine *coroutine;
    AioContext *aio_context;
 } GlusterAIOCB;
@@ -44,6 +45,7 @@ typedef struct GlusterAIOCB {
 typedef struct BDRVGlusterState {
    struct glfs *glfs;
    struct glfs_fd *fd;
+    char *logfile;
    bool supports_seek_data;
    int debug_level;
 } BDRVGlusterState;
@@ -73,6 +75,11 @@ static QemuOptsList qemu_gluster_create_opts = {
            .type = QEMU_OPT_NUMBER,
            .help = "Gluster log level, valid range is 0-9",
        },
+        {
+            .name = GLUSTER_OPT_LOGFILE,
+            .type = QEMU_OPT_STRING,
+            .help = "Logfile path of libgfapi",
+        },
        { /* end of list */ }
    }
 };
@@ -91,6 +98,11 @@ static QemuOptsList runtime_opts = {
            .type = QEMU_OPT_NUMBER,
            .help = "Gluster log level, valid range is 0-9",
        },
+        {
+            .name = GLUSTER_OPT_LOGFILE,
+            .type = QEMU_OPT_STRING,
+            .help = "Logfile path of libgfapi",
+        },
        { /* end of list */ }
    },
 };
@@ -341,7 +353,7 @@ static struct glfs *qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
        }
    }

-    ret = glfs_set_logging(glfs, "-", gconf->debug_level);
+    ret = glfs_set_logging(glfs, gconf->logfile, gconf->debug_level);
    if (ret < 0) {
        goto out;
    }
@@ -576,7 +588,9 @@ static struct glfs *qemu_gluster_init(BlockdevOptionsGluster *gconf,
        if (ret < 0) {
            error_setg(errp, "invalid URI");
            error_append_hint(errp, "Usage: file=gluster[+transport]://"
-                                    "[host[:port]]/volume/path[?socket=...]\n");
+                                    "[host[:port]]volume/path[?socket=...]"
+                                    "[,file.debug=N]"
+                                    "[,file.logfile=/path/filename.log]\n");
            errno = -ret;
            return NULL;
        }
@@ -586,7 +600,9 @@ static struct glfs *qemu_gluster_init(BlockdevOptionsGluster *gconf,
            error_append_hint(errp, "Usage: "
                             "-drive driver=qcow2,file.driver=gluster,"
                             "file.volume=testvol,file.path=/path/a.qcow2"
-                             "[,file.debug=9],file.server.0.type=tcp,"
+                             "[,file.debug=9]"
+                             "[,file.logfile=/path/filename.log],"
+                             "file.server.0.type=tcp,"
                             "file.server.0.host=1.2.3.4,"
                             "file.server.0.port=24007,"
                             "file.server.1.transport=unix,"
@@ -605,8 +621,6 @@ static void qemu_gluster_complete_aio(void *opaque)
 {
    GlusterAIOCB *acb = (GlusterAIOCB *)opaque;

-    qemu_bh_delete(acb->bh);
-    acb->bh = NULL;
    qemu_coroutine_enter(acb->coroutine);
 }

@@ -625,8 +639,7 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void *arg)
        acb->ret = -EIO; /* Partial read/write - fail it */
    }

-    acb->bh = aio_bh_new(acb->aio_context, qemu_gluster_complete_aio, acb);
-    qemu_bh_schedule(acb->bh);
+    aio_bh_schedule_oneshot(acb->aio_context, qemu_gluster_complete_aio, acb);
 }

 static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
@@ -677,7 +690,7 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
    BlockdevOptionsGluster *gconf = NULL;
    QemuOpts *opts;
    Error *local_err = NULL;
-    const char *filename;
+    const char *filename, *logfile;

    opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
    qemu_opts_absorb_qdict(opts, options, &local_err);
@@ -700,6 +713,13 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
    gconf = g_new0(BlockdevOptionsGluster, 1);
    gconf->debug_level = s->debug_level;
    gconf->has_debug_level = true;
+
+    logfile = qemu_opt_get(opts, GLUSTER_OPT_LOGFILE);
+    s->logfile = g_strdup(logfile ? logfile : GLUSTER_LOGFILE_DEFAULT);
+
+    gconf->logfile = g_strdup(s->logfile);
+    gconf->has_logfile = true;
+
    s->glfs = qemu_gluster_init(gconf, filename, options, errp);
    if (!s->glfs) {
        ret = -errno;
@@ -738,6 +758,7 @@ out:
    if (!ret) {
        return ret;
    }
+    g_free(s->logfile);
    if (s->fd) {
        glfs_close(s->fd);
    }
@@ -769,6 +790,8 @@ static int qemu_gluster_reopen_prepare(BDRVReopenState *state,
    gconf = g_new0(BlockdevOptionsGluster, 1);
    gconf->debug_level = s->debug_level;
    gconf->has_debug_level = true;
+    gconf->logfile = g_strdup(s->logfile);
+    gconf->has_logfile = true;
    reop_s->glfs = qemu_gluster_init(gconf, state->bs->filename, NULL, errp);
    if (reop_s->glfs == NULL) {
        ret = -errno;
@@ -914,6 +937,12 @@ static int qemu_gluster_create(const char *filename,
    }
    gconf->has_debug_level = true;

+    gconf->logfile = qemu_opt_get_del(opts, GLUSTER_OPT_LOGFILE);
+    if (!gconf->logfile) {
+        gconf->logfile = g_strdup(GLUSTER_LOGFILE_DEFAULT);
+    }
+    gconf->has_logfile = true;
+
    glfs = qemu_gluster_init(gconf, filename, NULL, errp);
    if (!glfs) {
        ret = -errno;
@@ -1025,6 +1054,7 @@ static void qemu_gluster_close(BlockDriverState *bs)
 {
    BDRVGlusterState *s = bs->opaque;

+    g_free(s->logfile);
    if (s->fd) {
        glfs_close(s->fd);
        s->fd = NULL;
--- a/block/io.c
+++ b/block/io.c
@@ -171,7 +171,6 @@ static void bdrv_drain_recurse(BlockDriverState *bs)
 typedef struct {
    Coroutine *co;
    BlockDriverState *bs;
-    QEMUBH *bh;
    bool done;
 } BdrvCoDrainData;

@@ -191,7 +190,6 @@ static void bdrv_co_drain_bh_cb(void *opaque)
    BdrvCoDrainData *data = opaque;
    Coroutine *co = data->co;

-    qemu_bh_delete(data->bh);
    bdrv_drain_poll(data->bs);
    data->done = true;
    qemu_coroutine_enter(co);
@@ -210,9 +208,9 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs)
        .co = qemu_coroutine_self(),
        .bs = bs,
        .done = false,
-        .bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_drain_bh_cb, &data),
    };
-    qemu_bh_schedule(data.bh);
+    aio_bh_schedule_oneshot(bdrv_get_aio_context(bs),
+                            bdrv_co_drain_bh_cb, &data);

    qemu_coroutine_yield();
    /* If we are resumed from some other event (such as an aio completion or a
@@ -540,17 +538,6 @@ static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
    return 0;
 }

-static int bdrv_check_request(BlockDriverState *bs, int64_t sector_num,
-                              int nb_sectors)
-{
-    if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
-        return -EIO;
-    }
-
-    return bdrv_check_byte_request(bs, sector_num * BDRV_SECTOR_SIZE,
-                                   nb_sectors * BDRV_SECTOR_SIZE);
-}
-
 typedef struct RwCo {
    BdrvChild *child;
    int64_t offset;
@@ -897,6 +884,19 @@ emulate_flags:
    return ret;
 }

+static int coroutine_fn
+bdrv_driver_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
+                               uint64_t bytes, QEMUIOVector *qiov)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (!drv->bdrv_co_pwritev_compressed) {
+        return -ENOTSUP;
+    }
+
+    return drv->bdrv_co_pwritev_compressed(bs, offset, bytes, qiov);
+}
+
 static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,
        int64_t offset, unsigned int bytes, QEMUIOVector *qiov)
 {
@@ -1180,10 +1180,11 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
    int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
                        bs->bl.request_alignment);

-    assert(is_power_of_2(alignment));
-    head = offset & (alignment - 1);
-    tail = (offset + count) & (alignment - 1);
-    max_write_zeroes &= ~(alignment - 1);
+    assert(alignment % bs->bl.request_alignment == 0);
+    head = offset % alignment;
+    tail = (offset + count) % alignment;
+    max_write_zeroes = QEMU_ALIGN_DOWN(max_write_zeroes, alignment);
+    assert(max_write_zeroes >= bs->bl.request_alignment);

    while (count > 0 && !ret) {
        int num = count;
@@ -1314,6 +1315,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
    } else if (flags & BDRV_REQ_ZERO_WRITE) {
        bdrv_debug_event(bs, BLKDBG_PWRITEV_ZERO);
        ret = bdrv_co_do_pwrite_zeroes(bs, offset, bytes, flags);
+    } else if (flags & BDRV_REQ_WRITE_COMPRESSED) {
+        ret = bdrv_driver_pwritev_compressed(bs, offset, bytes, qiov);
    } else if (bytes <= max_transfer) {
        bdrv_debug_event(bs, BLKDBG_PWRITEV);
        ret = bdrv_driver_pwritev(bs, offset, bytes, qiov, flags);
@@ -1614,6 +1617,31 @@ int coroutine_fn bdrv_co_pwrite_zeroes(BdrvChild *child, int64_t offset,
                           BDRV_REQ_ZERO_WRITE | flags);
 }

+/*
+ * Flush ALL BDSes regardless of if they are reachable via a BlkBackend or not.
+ */
+int bdrv_flush_all(void)
+{
+    BdrvNextIterator it;
+    BlockDriverState *bs = NULL;
+    int result = 0;
+
+    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+        AioContext *aio_context = bdrv_get_aio_context(bs);
+        int ret;
+
+        aio_context_acquire(aio_context);
+        ret = bdrv_flush(bs);
+        if (ret < 0 && !result) {
+            result = ret;
+        }
+        aio_context_release(aio_context);
+    }
+
+    return result;
+}
+
+
 typedef struct BdrvCoGetBlockStatusData {
    BlockDriverState *bs;
    BlockDriverState *base;
@@ -1878,28 +1906,6 @@ int bdrv_is_allocated_above(BlockDriverState *top,
    return 0;
 }

-int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
-                          const uint8_t *buf, int nb_sectors)
-{
-    BlockDriver *drv = bs->drv;
-    int ret;
-
-    if (!drv) {
-        return -ENOMEDIUM;
-    }
-    if (!drv->bdrv_write_compressed) {
-        return -ENOTSUP;
-    }
-    ret = bdrv_check_request(bs, sector_num, nb_sectors);
-    if (ret < 0) {
-        return ret;
-    }
-
-    assert(QLIST_EMPTY(&bs->dirty_bitmaps));
-
-    return drv->bdrv_write_compressed(bs, sector_num, buf, nb_sectors);
-}
-
 typedef struct BdrvVmstateCo {
    BlockDriverState    *bs;
    QEMUIOVector        *qiov;
@@ -2087,7 +2093,6 @@ typedef struct BlockAIOCBCoroutine {
    bool is_write;
    bool need_bh;
    bool *done;
-    QEMUBH* bh;
 } BlockAIOCBCoroutine;

 static const AIOCBInfo bdrv_em_co_aiocb_info = {
@@ -2107,7 +2112,6 @@ static void bdrv_co_em_bh(void *opaque)
    BlockAIOCBCoroutine *acb = opaque;

    assert(!acb->need_bh);
-    qemu_bh_delete(acb->bh);
    bdrv_co_complete(acb);
 }

@@ -2117,8 +2121,7 @@ static void bdrv_co_maybe_schedule_bh(BlockAIOCBCoroutine *acb)
    if (acb->req.error != -EINPROGRESS) {
        BlockDriverState *bs = acb->common.bs;

-        acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_em_bh, acb);
-        qemu_bh_schedule(acb->bh);
+        aio_bh_schedule_oneshot(bdrv_get_aio_context(bs), bdrv_co_em_bh, acb);
    }
 }

@@ -2282,11 +2285,11 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
    int current_gen = bs->write_gen;

    /* Wait until any previous flushes are completed */
-    while (bs->flush_started_gen != bs->flushed_gen) {
+    while (bs->active_flush_req != NULL) {
        qemu_co_queue_wait(&bs->flush_queue);
    }

-    bs->flush_started_gen = current_gen;
+    bs->active_flush_req = &req;

    /* Write back all layers by calling one driver function */
    if (bs->drv->bdrv_co_flush) {
@@ -2356,7 +2359,9 @@ flush_parent:
 out:
    /* Notify any pending flushes that we have completed */
    bs->flushed_gen = current_gen;
-    qemu_co_queue_restart_all(&bs->flush_queue);
+    bs->active_flush_req = NULL;
+    /* Return value is ignored - it's ok if wait queue is empty */
+    qemu_co_queue_next(&bs->flush_queue);

    tracked_request_end(&req);
    return ret;
@@ -2429,9 +2434,10 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset,

    /* Discard is advisory, so ignore any unaligned head or tail */
    align = MAX(bs->bl.pdiscard_alignment, bs->bl.request_alignment);
-    assert(is_power_of_2(align));
-    head = MIN(count, -offset & (align - 1));
+    assert(align % bs->bl.request_alignment == 0);
+    head = offset % align;
    if (head) {
+        head = MIN(count, align - head);
        count -= head;
        offset += head;
    }
@@ -2449,6 +2455,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset,

    max_pdiscard = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_pdiscard, INT_MAX),
                                   align);
+    assert(max_pdiscard);

    while (count > 0) {
        int ret;
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -36,7 +36,7 @@
 #include "block/block_int.h"
 #include "block/scsi.h"
 #include "qemu/iov.h"
-#include "sysemu/sysemu.h"
+#include "qemu/uuid.h"
 #include "qmp-commands.h"
 #include "qapi/qmp/qstring.h"
 #include "crypto/secret.h"
@@ -95,7 +95,6 @@ typedef struct IscsiTask {
    int do_retry;
    struct scsi_task *task;
    Coroutine *co;
-    QEMUBH *bh;
    IscsiLun *iscsilun;
    QEMUTimer retry_timer;
    int err_code;
@@ -167,7 +166,6 @@ static void iscsi_co_generic_bh_cb(void *opaque)
 {
    struct IscsiTask *iTask = opaque;
    iTask->complete = 1;
-    qemu_bh_delete(iTask->bh);
    qemu_coroutine_enter(iTask->co);
 }

@@ -299,9 +297,8 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,

 out:
    if (iTask->co) {
-        iTask->bh = aio_bh_new(iTask->iscsilun->aio_context,
-                               iscsi_co_generic_bh_cb, iTask);
-        qemu_bh_schedule(iTask->bh);
+        aio_bh_schedule_oneshot(iTask->iscsilun->aio_context,
+                                 iscsi_co_generic_bh_cb, iTask);
    } else {
        iTask->complete = 1;
    }
@@ -1813,19 +1810,22 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)

    IscsiLun *iscsilun = bs->opaque;
    uint64_t max_xfer_len = iscsilun->use_16_for_rw ? 0xffffffff : 0xffff;
+    unsigned int block_size = MAX(BDRV_SECTOR_SIZE, iscsilun->block_size);

-    bs->bl.request_alignment = iscsilun->block_size;
+    assert(iscsilun->block_size >= BDRV_SECTOR_SIZE || bs->sg);
+
+    bs->bl.request_alignment = block_size;

    if (iscsilun->bl.max_xfer_len) {
        max_xfer_len = MIN(max_xfer_len, iscsilun->bl.max_xfer_len);
    }

-    if (max_xfer_len * iscsilun->block_size < INT_MAX) {
+    if (max_xfer_len * block_size < INT_MAX) {
        bs->bl.max_transfer = max_xfer_len * iscsilun->block_size;
    }

    if (iscsilun->lbp.lbpu) {
-        if (iscsilun->bl.max_unmap < 0xffffffff / iscsilun->block_size) {
+        if (iscsilun->bl.max_unmap < 0xffffffff / block_size) {
            bs->bl.max_pdiscard =
                iscsilun->bl.max_unmap * iscsilun->block_size;
        }
@@ -1835,7 +1835,7 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
        bs->bl.pdiscard_alignment = iscsilun->block_size;
    }

-    if (iscsilun->bl.max_ws_len < 0xffffffff / iscsilun->block_size) {
+    if (iscsilun->bl.max_ws_len < 0xffffffff / block_size) {
        bs->bl.max_pwrite_zeroes =
            iscsilun->bl.max_ws_len * iscsilun->block_size;
    }
@@ -1846,7 +1846,7 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
        bs->bl.pwrite_zeroes_alignment = iscsilun->block_size;
    }
    if (iscsilun->bl.opt_xfer_len &&
-        iscsilun->bl.opt_xfer_len < INT_MAX / iscsilun->block_size) {
+        iscsilun->bl.opt_xfer_len < INT_MAX / block_size) {
        bs->bl.opt_transfer = pow2floor(iscsilun->bl.opt_xfer_len *
                                        iscsilun->block_size);
    }
@@ -2010,45 +2010,9 @@ static BlockDriver bdrv_iscsi = {
    .bdrv_attach_aio_context = iscsi_attach_aio_context,
 };

-static QemuOptsList qemu_iscsi_opts = {
-    .name = "iscsi",
-    .head = QTAILQ_HEAD_INITIALIZER(qemu_iscsi_opts.head),
-    .desc = {
-        {
-            .name = "user",
-            .type = QEMU_OPT_STRING,
-            .help = "username for CHAP authentication to target",
-        },{
-            .name = "password",
-            .type = QEMU_OPT_STRING,
-            .help = "password for CHAP authentication to target",
-        },{
-            .name = "password-secret",
-            .type = QEMU_OPT_STRING,
-            .help = "ID of the secret providing password for CHAP "
-                    "authentication to target",
-        },{
-            .name = "header-digest",
-            .type = QEMU_OPT_STRING,
-            .help = "HeaderDigest setting. "
-                    "{CRC32C|CRC32C-NONE|NONE-CRC32C|NONE}",
-        },{
-            .name = "initiator-name",
-            .type = QEMU_OPT_STRING,
-            .help = "Initiator iqn name to use when connecting",
-        },{
-            .name = "timeout",
-            .type = QEMU_OPT_NUMBER,
-            .help = "Request timeout in seconds (default 0 = no timeout)",
-        },
-        { /* end of list */ }
-    },
-};
-
 static void iscsi_block_init(void)
 {
    bdrv_register(&bdrv_iscsi);
-    qemu_add_opts(&qemu_iscsi_opts);
 }

 block_init(iscsi_block_init);
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -59,7 +59,6 @@ struct LinuxAioState {

    /* I/O completion processing */
    QEMUBH *completion_bh;
-    struct io_event events[MAX_EVENTS];
    int event_idx;
    int event_max;
 };
@@ -95,64 +94,156 @@ static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)

    laiocb->ret = ret;
    if (laiocb->co) {
-        qemu_coroutine_enter(laiocb->co);
+        /* If the coroutine is already entered it must be in ioq_submit() and
+         * will notice laio->ret has been filled in when it eventually runs
+         * later.  Coroutines cannot be entered recursively so avoid doing
+         * that!
+         */
+        if (!qemu_coroutine_entered(laiocb->co)) {
+            qemu_coroutine_enter(laiocb->co);
+        }
    } else {
        laiocb->common.cb(laiocb->common.opaque, ret);
        qemu_aio_unref(laiocb);
    }
 }

-/* The completion BH fetches completed I/O requests and invokes their
- * callbacks.
+/**
+ * aio_ring buffer which is shared between userspace and kernel.
+ *
+ * This copied from linux/fs/aio.c, common header does not exist
+ * but AIO exists for ages so we assume ABI is stable.
+ */
+struct aio_ring {
+    unsigned    id;    /* kernel internal index number */
+    unsigned    nr;    /* number of io_events */
+    unsigned    head;  /* Written to by userland or by kernel. */
+    unsigned    tail;
+
+    unsigned    magic;
+    unsigned    compat_features;
+    unsigned    incompat_features;
+    unsigned    header_length;  /* size of aio_ring */
+
+    struct io_event io_events[0];
+};
+
+/**
+ * io_getevents_peek:
+ * @ctx: AIO context
+ * @events: pointer on events array, output value
+
+ * Returns the number of completed events and sets a pointer
+ * on events array.  This function does not update the internal
+ * ring buffer, only reads head and tail.  When @events has been
+ * processed io_getevents_commit() must be called.
+ */
+static inline unsigned int io_getevents_peek(io_context_t ctx,
+                                             struct io_event **events)
+{
+    struct aio_ring *ring = (struct aio_ring *)ctx;
+    unsigned int head = ring->head, tail = ring->tail;
+    unsigned int nr;
+
+    nr = tail >= head ? tail - head : ring->nr - head;
+    *events = ring->io_events + head;
+    /* To avoid speculative loads of s->events[i] before observing tail.
+       Paired with smp_wmb() inside linux/fs/aio.c: aio_complete(). */
+    smp_rmb();
+
+    return nr;
+}
+
+/**
+ * io_getevents_commit:
+ * @ctx: AIO context
+ * @nr: the number of events on which head should be advanced
+ *
+ * Advances head of a ring buffer.
+ */
+static inline void io_getevents_commit(io_context_t ctx, unsigned int nr)
+{
+    struct aio_ring *ring = (struct aio_ring *)ctx;
+
+    if (nr) {
+        ring->head = (ring->head + nr) % ring->nr;
+    }
+}
+
+/**
+ * io_getevents_advance_and_peek:
+ * @ctx: AIO context
+ * @events: pointer on events array, output value
+ * @nr: the number of events on which head should be advanced
+ *
+ * Advances head of a ring buffer and returns number of elements left.
+ */
+static inline unsigned int
+io_getevents_advance_and_peek(io_context_t ctx,
+                              struct io_event **events,
+                              unsigned int nr)
+{
+    io_getevents_commit(ctx, nr);
+    return io_getevents_peek(ctx, events);
+}
+
+/**
+ * qemu_laio_process_completions:
+ * @s: AIO state
+ *
+ * Fetches completed I/O requests and invokes their callbacks.
 *
 * The function is somewhat tricky because it supports nested event loops, for
 * example when a request callback invokes aio_poll().  In order to do this,
- * the completion events array and index are kept in LinuxAioState.  The BH
- * reschedules itself as long as there are completions pending so it will
- * either be called again in a nested event loop or will be called after all
- * events have been completed.  When there are no events left to complete, the
- * BH returns without rescheduling.
+ * indices are kept in LinuxAioState.  Function schedules BH completion so it
+ * can be called again in a nested event loop.  When there are no events left
+ * to complete the BH is being canceled.
 */
-static void qemu_laio_completion_bh(void *opaque)
+static void qemu_laio_process_completions(LinuxAioState *s)
 {
-    LinuxAioState *s = opaque;
-
-    /* Fetch more completion events when empty */
-    if (s->event_idx == s->event_max) {
-        do {
-            struct timespec ts = { 0 };
-            s->event_max = io_getevents(s->ctx, MAX_EVENTS, MAX_EVENTS,
-                                        s->events, &ts);
-        } while (s->event_max == -EINTR);
-
-        s->event_idx = 0;
-        if (s->event_max <= 0) {
-            s->event_max = 0;
-            return; /* no more events */
-        }
-        s->io_q.in_flight -= s->event_max;
-    }
+    struct io_event *events;

    /* Reschedule so nested event loops see currently pending completions */
    qemu_bh_schedule(s->completion_bh);

-    /* Process completion events */
-    while (s->event_idx < s->event_max) {
-        struct iocb *iocb = s->events[s->event_idx].obj;
-        struct qemu_laiocb *laiocb =
+    while ((s->event_max = io_getevents_advance_and_peek(s->ctx, &events,
+                                                         s->event_idx))) {
+        for (s->event_idx = 0; s->event_idx < s->event_max; ) {
+            struct iocb *iocb = events[s->event_idx].obj;
+            struct qemu_laiocb *laiocb =
                container_of(iocb, struct qemu_laiocb, iocb);

-        laiocb->ret = io_event_ret(&s->events[s->event_idx]);
-        s->event_idx++;
+            laiocb->ret = io_event_ret(&events[s->event_idx]);

-        qemu_laio_process_completion(laiocb);
-    }
-
-    if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
-        ioq_submit(s);
+            /* Change counters one-by-one because we can be nested. */
+            s->io_q.in_flight--;
+            s->event_idx++;
+            qemu_laio_process_completion(laiocb);
+        }
    }

    qemu_bh_cancel(s->completion_bh);
+
+    /* If we are nested we have to notify the level above that we are done
+     * by setting event_max to zero, upper level will then jump out of it's
+     * own `for` loop.  If we are the last all counters droped to zero. */
+    s->event_max = 0;
+    s->event_idx = 0;
+}
+
+static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
+{
+    qemu_laio_process_completions(s);
+    if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
+        ioq_submit(s);
+    }
+}
+
+static void qemu_laio_completion_bh(void *opaque)
+{
+    LinuxAioState *s = opaque;
+
+    qemu_laio_process_completions_and_submit(s);
 }

 static void qemu_laio_completion_cb(EventNotifier *e)
@@ -160,7 +251,7 @@ static void qemu_laio_completion_cb(EventNotifier *e)
    LinuxAioState *s = container_of(e, LinuxAioState, e);

    if (event_notifier_test_and_clear(&s->e)) {
-        qemu_laio_completion_bh(s);
+        qemu_laio_process_completions_and_submit(s);
    }
 }

@@ -221,7 +312,13 @@ static void ioq_submit(LinuxAioState *s)
            break;
        }
        if (ret < 0) {
-            abort();
+            /* Fail the first request, retry the rest */
+            aiocb = QSIMPLEQ_FIRST(&s->io_q.pending);
+            QSIMPLEQ_REMOVE_HEAD(&s->io_q.pending, next);
+            s->io_q.in_queue--;
+            aiocb->ret = ret;
+            qemu_laio_process_completion(aiocb);
+            continue;
        }

        s->io_q.in_flight += ret;
@@ -230,6 +327,19 @@ static void ioq_submit(LinuxAioState *s)
        QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed);
    } while (ret == len && !QSIMPLEQ_EMPTY(&s->io_q.pending));
    s->io_q.blocked = (s->io_q.in_queue > 0);
+
+    if (s->io_q.in_flight) {
+        /* We can try to complete something just right away if there are
+         * still requests in-flight. */
+        qemu_laio_process_completions(s);
+        /*
+         * Even we have completed everything (in_flight == 0), the queue can
+         * have still pended requests (in_queue > 0).  We do not attempt to
+         * repeat submission to avoid IO hang.  The reason is simple: s->e is
+         * still set and completion callback will be called shortly and all
+         * pended requests will be submitted from there.
+         */
+    }
 }

 void laio_io_plug(BlockDriverState *bs, LinuxAioState *s)
@@ -287,6 +397,7 @@ int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
        .co         = qemu_coroutine_self(),
        .nbytes     = qiov->size,
        .ctx        = s,
+        .ret        = -EINPROGRESS,
        .is_read    = (type == QEMU_AIO_READ),
        .qiov       = qiov,
    };
@@ -296,7 +407,9 @@ int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
        return ret;
    }

-    qemu_coroutine_yield();
+    if (laiocb.ret == -EINPROGRESS) {
+        qemu_coroutine_yield();
+    }
    return laiocb.ret;
 }

--- a/block/mirror.c
+++ b/block/mirror.c
@@ -419,6 +419,10 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
            mirror_wait_for_io(s);
        }

+        if (s->ret < 0) {
+            return 0;
+        }
+
        mirror_clip_sectors(s, sector_num, &io_sectors);
        switch (mirror_method) {
        case MIRROR_METHOD_COPY:
@@ -912,7 +916,8 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
                             BlockCompletionFunc *cb,
                             void *opaque, Error **errp,
                             const BlockJobDriver *driver,
-                             bool is_none_mode, BlockDriverState *base)
+                             bool is_none_mode, BlockDriverState *base,
+                             bool auto_complete)
 {
    MirrorBlockJob *s;

@@ -948,6 +953,9 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
    s->granularity = granularity;
    s->buf_size = ROUND_UP(buf_size, granularity);
    s->unmap = unmap;
+    if (auto_complete) {
+        s->should_complete = true;
+    }

    s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
    if (!s->dirty_bitmap) {
@@ -986,14 +994,15 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
    mirror_start_job(job_id, bs, target, replaces,
                     speed, granularity, buf_size, backing_mode,
                     on_source_error, on_target_error, unmap, cb, opaque, errp,
-                     &mirror_job_driver, is_none_mode, base);
+                     &mirror_job_driver, is_none_mode, base, false);
 }

 void commit_active_start(const char *job_id, BlockDriverState *bs,
                         BlockDriverState *base, int64_t speed,
                         BlockdevOnError on_error,
                         BlockCompletionFunc *cb,
-                         void *opaque, Error **errp)
+                         void *opaque, Error **errp,
+                         bool auto_complete)
 {
    int64_t length, base_length;
    int orig_base_flags;
@@ -1034,7 +1043,7 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
    mirror_start_job(job_id, bs, base, NULL, speed, 0, 0,
                     MIRROR_LEAVE_BACKING_CHAIN,
                     on_error, on_error, false, cb, opaque, &local_err,
-                     &commit_active_job_driver, false, base);
+                     &commit_active_job_driver, false, base, auto_complete);
    if (local_err) {
        error_propagate(errp, local_err);
        goto error_restore_flags;
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -20,7 +20,7 @@
 typedef struct NbdClientSession {
    QIOChannelSocket *sioc; /* The master data channel */
    QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
-    uint32_t nbdflags;
+    uint16_t nbdflags;
    off_t size;

    CoMutex send_mutex;
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -42,6 +42,9 @@

 typedef struct BDRVNBDState {
    NbdClientSession client;
+
+    /* For nbd_refresh_filename() */
+    char *path, *host, *port, *export, *tlscredsid;
 } BDRVNBDState;

 static int nbd_parse_uri(const char *filename, QDict *options)
@@ -188,13 +191,15 @@ out:
    g_free(file);
 }

-static SocketAddress *nbd_config(BDRVNBDState *s, QDict *options, char **export,
-                                 Error **errp)
+static SocketAddress *nbd_config(BDRVNBDState *s, QemuOpts *opts, Error **errp)
 {
    SocketAddress *saddr;

-    if (qdict_haskey(options, "path") == qdict_haskey(options, "host")) {
-        if (qdict_haskey(options, "path")) {
+    s->path = g_strdup(qemu_opt_get(opts, "path"));
+    s->host = g_strdup(qemu_opt_get(opts, "host"));
+
+    if (!s->path == !s->host) {
+        if (s->path) {
            error_setg(errp, "path and host may not be used at the same time.");
        } else {
            error_setg(errp, "one of path and host must be specified.");
@@ -204,32 +209,28 @@ static SocketAddress *nbd_config(BDRVNBDState *s, QDict *options, char **export,

    saddr = g_new0(SocketAddress, 1);

-    if (qdict_haskey(options, "path")) {
+    if (s->path) {
        UnixSocketAddress *q_unix;
        saddr->type = SOCKET_ADDRESS_KIND_UNIX;
        q_unix = saddr->u.q_unix.data = g_new0(UnixSocketAddress, 1);
-        q_unix->path = g_strdup(qdict_get_str(options, "path"));
-        qdict_del(options, "path");
+        q_unix->path = g_strdup(s->path);
    } else {
        InetSocketAddress *inet;
+
+        s->port = g_strdup(qemu_opt_get(opts, "port"));
+
        saddr->type = SOCKET_ADDRESS_KIND_INET;
        inet = saddr->u.inet.data = g_new0(InetSocketAddress, 1);
-        inet->host = g_strdup(qdict_get_str(options, "host"));
-        if (!qdict_get_try_str(options, "port")) {
+        inet->host = g_strdup(s->host);
+        inet->port = g_strdup(s->port);
+        if (!inet->port) {
            inet->port = g_strdup_printf("%d", NBD_DEFAULT_PORT);
-        } else {
-            inet->port = g_strdup(qdict_get_str(options, "port"));
        }
-        qdict_del(options, "host");
-        qdict_del(options, "port");
    }

    s->client.is_unix = saddr->type == SOCKET_ADDRESS_KIND_UNIX;

-    *export = g_strdup(qdict_get_try_str(options, "export"));
-    if (*export) {
-        qdict_del(options, "export");
-    }
+    s->export = g_strdup(qemu_opt_get(opts, "export"));

    return saddr;
 }
@@ -292,28 +293,66 @@ static QCryptoTLSCreds *nbd_get_tls_creds(const char *id, Error **errp)
 }


+static QemuOptsList nbd_runtime_opts = {
+    .name = "nbd",
+    .head = QTAILQ_HEAD_INITIALIZER(nbd_runtime_opts.head),
+    .desc = {
+        {
+            .name = "host",
+            .type = QEMU_OPT_STRING,
+            .help = "TCP host to connect to",
+        },
+        {
+            .name = "port",
+            .type = QEMU_OPT_STRING,
+            .help = "TCP port to connect to",
+        },
+        {
+            .name = "path",
+            .type = QEMU_OPT_STRING,
+            .help = "Unix socket path to connect to",
+        },
+        {
+            .name = "export",
+            .type = QEMU_OPT_STRING,
+            .help = "Name of the NBD export to open",
+        },
+        {
+            .name = "tls-creds",
+            .type = QEMU_OPT_STRING,
+            .help = "ID of the TLS credentials to use",
+        },
+    },
+};
+
 static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
                    Error **errp)
 {
    BDRVNBDState *s = bs->opaque;
-    char *export = NULL;
+    QemuOpts *opts = NULL;
+    Error *local_err = NULL;
    QIOChannelSocket *sioc = NULL;
-    SocketAddress *saddr;
-    const char *tlscredsid;
+    SocketAddress *saddr = NULL;
    QCryptoTLSCreds *tlscreds = NULL;
    const char *hostname = NULL;
    int ret = -EINVAL;

+    opts = qemu_opts_create(&nbd_runtime_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        goto error;
+    }
+
    /* Pop the config into our state object. Exit if invalid. */
-    saddr = nbd_config(s, options, &export, errp);
+    saddr = nbd_config(s, opts, errp);
    if (!saddr) {
        goto error;
    }

-    tlscredsid = g_strdup(qdict_get_try_str(options, "tls-creds"));
-    if (tlscredsid) {
-        qdict_del(options, "tls-creds");
-        tlscreds = nbd_get_tls_creds(tlscredsid, errp);
+    s->tlscredsid = g_strdup(qemu_opt_get(opts, "tls-creds"));
+    if (s->tlscredsid) {
+        tlscreds = nbd_get_tls_creds(s->tlscredsid, errp);
        if (!tlscreds) {
            goto error;
        }
@@ -335,7 +374,7 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
    }

    /* NBD handshake */
-    ret = nbd_client_init(bs, sioc, export,
+    ret = nbd_client_init(bs, sioc, s->export,
                          tlscreds, hostname, errp);
 error:
    if (sioc) {
@@ -344,8 +383,15 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
    if (tlscreds) {
        object_unref(OBJECT(tlscreds));
    }
+    if (ret < 0) {
+        g_free(s->path);
+        g_free(s->host);
+        g_free(s->port);
+        g_free(s->export);
+        g_free(s->tlscredsid);
+    }
    qapi_free_SocketAddress(saddr);
-    g_free(export);
+    qemu_opts_del(opts);
    return ret;
 }

@@ -362,7 +408,15 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)

 static void nbd_close(BlockDriverState *bs)
 {
+    BDRVNBDState *s = bs->opaque;
+
    nbd_client_close(bs);
+
+    g_free(s->path);
+    g_free(s->host);
+    g_free(s->port);
+    g_free(s->export);
+    g_free(s->tlscredsid);
 }

 static int64_t nbd_getlength(BlockDriverState *bs)
@@ -385,48 +439,45 @@ static void nbd_attach_aio_context(BlockDriverState *bs,

 static void nbd_refresh_filename(BlockDriverState *bs, QDict *options)
 {
+    BDRVNBDState *s = bs->opaque;
    QDict *opts = qdict_new();
-    const char *path   = qdict_get_try_str(options, "path");
-    const char *host   = qdict_get_try_str(options, "host");
-    const char *port   = qdict_get_try_str(options, "port");
-    const char *export = qdict_get_try_str(options, "export");
-    const char *tlscreds = qdict_get_try_str(options, "tls-creds");

    qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("nbd")));

-    if (path && export) {
+    if (s->path && s->export) {
        snprintf(bs->exact_filename, sizeof(bs->exact_filename),
-                 "nbd+unix:///%s?socket=%s", export, path);
-    } else if (path && !export) {
+                 "nbd+unix:///%s?socket=%s", s->export, s->path);
+    } else if (s->path && !s->export) {
        snprintf(bs->exact_filename, sizeof(bs->exact_filename),
-                 "nbd+unix://?socket=%s", path);
-    } else if (!path && export && port) {
+                 "nbd+unix://?socket=%s", s->path);
+    } else if (!s->path && s->export && s->port) {
        snprintf(bs->exact_filename, sizeof(bs->exact_filename),
-                 "nbd://%s:%s/%s", host, port, export);
-    } else if (!path && export && !port) {
+                 "nbd://%s:%s/%s", s->host, s->port, s->export);
+    } else if (!s->path && s->export && !s->port) {
        snprintf(bs->exact_filename, sizeof(bs->exact_filename),
-                 "nbd://%s/%s", host, export);
-    } else if (!path && !export && port) {
+                 "nbd://%s/%s", s->host, s->export);
+    } else if (!s->path && !s->export && s->port) {
        snprintf(bs->exact_filename, sizeof(bs->exact_filename),
-                 "nbd://%s:%s", host, port);
-    } else if (!path && !export && !port) {
+                 "nbd://%s:%s", s->host, s->port);
+    } else if (!s->path && !s->export && !s->port) {
        snprintf(bs->exact_filename, sizeof(bs->exact_filename),
-                 "nbd://%s", host);
+                 "nbd://%s", s->host);
    }

-    if (path) {
-        qdict_put_obj(opts, "path", QOBJECT(qstring_from_str(path)));
-    } else if (port) {
-        qdict_put_obj(opts, "host", QOBJECT(qstring_from_str(host)));
-        qdict_put_obj(opts, "port", QOBJECT(qstring_from_str(port)));
+    if (s->path) {
+        qdict_put_obj(opts, "path", QOBJECT(qstring_from_str(s->path)));
+    } else if (s->port) {
+        qdict_put_obj(opts, "host", QOBJECT(qstring_from_str(s->host)));
+        qdict_put_obj(opts, "port", QOBJECT(qstring_from_str(s->port)));
    } else {
-        qdict_put_obj(opts, "host", QOBJECT(qstring_from_str(host)));
+        qdict_put_obj(opts, "host", QOBJECT(qstring_from_str(s->host)));
    }
-    if (export) {
-        qdict_put_obj(opts, "export", QOBJECT(qstring_from_str(export)));
+    if (s->export) {
+        qdict_put_obj(opts, "export", QOBJECT(qstring_from_str(s->export)));
    }
-    if (tlscreds) {
-        qdict_put_obj(opts, "tls-creds", QOBJECT(qstring_from_str(tlscreds)));
+    if (s->tlscredsid) {
+        qdict_put_obj(opts, "tls-creds",
+                      QOBJECT(qstring_from_str(s->tlscredsid)));
    }

    bs->full_open_options = opts;
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -57,7 +57,6 @@ typedef struct NFSRPC {
    QEMUIOVector *iov;
    struct stat *st;
    Coroutine *co;
-    QEMUBH *bh;
    NFSClient *client;
 } NFSRPC;

@@ -103,7 +102,6 @@ static void nfs_co_generic_bh_cb(void *opaque)
 {
    NFSRPC *task = opaque;
    task->complete = 1;
-    qemu_bh_delete(task->bh);
    qemu_coroutine_enter(task->co);
 }

@@ -127,9 +125,8 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data,
        error_report("NFS Error: %s", nfs_get_error(nfs));
    }
    if (task->co) {
-        task->bh = aio_bh_new(task->client->aio_context,
-                              nfs_co_generic_bh_cb, task);
-        qemu_bh_schedule(task->bh);
+        aio_bh_schedule_oneshot(task->client->aio_context,
+                                nfs_co_generic_bh_cb, task);
    } else {
        task->complete = 1;
    }
--- a/block/null.c
+++ b/block/null.c
@@ -124,7 +124,6 @@ static coroutine_fn int null_co_flush(BlockDriverState *bs)

 typedef struct {
    BlockAIOCB common;
-    QEMUBH *bh;
    QEMUTimer timer;
 } NullAIOCB;

@@ -136,7 +135,6 @@ static void null_bh_cb(void *opaque)
 {
    NullAIOCB *acb = opaque;
    acb->common.cb(acb->common.opaque, 0);
-    qemu_bh_delete(acb->bh);
    qemu_aio_unref(acb);
 }

@@ -164,8 +162,7 @@ static inline BlockAIOCB *null_aio_common(BlockDriverState *bs,
        timer_mod_ns(&acb->timer,
                     qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + s->latency_ns);
    } else {
-        acb->bh = aio_bh_new(bdrv_get_aio_context(bs), null_bh_cb, acb);
-        qemu_bh_schedule(acb->bh);
+        aio_bh_schedule_oneshot(bdrv_get_aio_context(bs), null_bh_cb, acb);
    }
    return &acb->common;
 }
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -43,6 +43,7 @@
 #define HEADER_MAGIC2 "WithouFreSpacExt"
 #define HEADER_VERSION 2
 #define HEADER_INUSE_MAGIC  (0x746F6E59)
+#define MAX_PARALLELS_IMAGE_FACTOR (1ull << 32)

 #define DEFAULT_CLUSTER_SIZE 1048576        /* 1 MiB */

@@ -475,6 +476,10 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
                          BDRV_SECTOR_SIZE);
    cl_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE,
                          DEFAULT_CLUSTER_SIZE), BDRV_SECTOR_SIZE);
+    if (total_size >= MAX_PARALLELS_IMAGE_FACTOR * cl_size) {
+        error_propagate(errp, local_err);
+        return -E2BIG;
+    }

    ret = bdrv_create_file(filename, opts, &local_err);
    if (ret < 0) {
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -913,75 +913,32 @@ static int qcow_make_empty(BlockDriverState *bs)
    return 0;
 }

-typedef struct QcowWriteCo {
-    BlockDriverState *bs;
-    int64_t sector_num;
-    const uint8_t *buf;
-    int nb_sectors;
-    int ret;
-} QcowWriteCo;
-
-static void qcow_write_co_entry(void *opaque)
-{
-    QcowWriteCo *co = opaque;
-    QEMUIOVector qiov;
-
-    struct iovec iov = (struct iovec) {
-        .iov_base   = (uint8_t*) co->buf,
-        .iov_len    = co->nb_sectors * BDRV_SECTOR_SIZE,
-    };
-    qemu_iovec_init_external(&qiov, &iov, 1);
-
-    co->ret = qcow_co_writev(co->bs, co->sector_num, co->nb_sectors, &qiov);
-}
-
-/* Wrapper for non-coroutine contexts */
-static int qcow_write(BlockDriverState *bs, int64_t sector_num,
-                      const uint8_t *buf, int nb_sectors)
-{
-    Coroutine *co;
-    AioContext *aio_context = bdrv_get_aio_context(bs);
-    QcowWriteCo data = {
-        .bs         = bs,
-        .sector_num = sector_num,
-        .buf        = buf,
-        .nb_sectors = nb_sectors,
-        .ret        = -EINPROGRESS,
-    };
-    co = qemu_coroutine_create(qcow_write_co_entry, &data);
-    qemu_coroutine_enter(co);
-    while (data.ret == -EINPROGRESS) {
-        aio_poll(aio_context, true);
-    }
-    return data.ret;
-}
-
 /* XXX: put compressed sectors first, then all the cluster aligned
   tables to avoid losing bytes in alignment */
-static int qcow_write_compressed(BlockDriverState *bs, int64_t sector_num,
-                                 const uint8_t *buf, int nb_sectors)
+static coroutine_fn int
+qcow_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
+                           uint64_t bytes, QEMUIOVector *qiov)
 {
    BDRVQcowState *s = bs->opaque;
+    QEMUIOVector hd_qiov;
+    struct iovec iov;
    z_stream strm;
    int ret, out_len;
-    uint8_t *out_buf;
+    uint8_t *buf, *out_buf;
    uint64_t cluster_offset;

-    if (nb_sectors != s->cluster_sectors) {
-        ret = -EINVAL;
-
-        /* Zero-pad last write if image size is not cluster aligned */
-        if (sector_num + nb_sectors == bs->total_sectors &&
-            nb_sectors < s->cluster_sectors) {
-            uint8_t *pad_buf = qemu_blockalign(bs, s->cluster_size);
-            memset(pad_buf, 0, s->cluster_size);
-            memcpy(pad_buf, buf, nb_sectors * BDRV_SECTOR_SIZE);
-            ret = qcow_write_compressed(bs, sector_num,
-                                        pad_buf, s->cluster_sectors);
-            qemu_vfree(pad_buf);
+    buf = qemu_blockalign(bs, s->cluster_size);
+    if (bytes != s->cluster_size) {
+        if (bytes > s->cluster_size ||
+            offset + bytes != bs->total_sectors << BDRV_SECTOR_BITS)
+        {
+            qemu_vfree(buf);
+            return -EINVAL;
        }
-        return ret;
+        /* Zero-pad last write if image size is not cluster aligned */
+        memset(buf + bytes, 0, s->cluster_size - bytes);
    }
+    qemu_iovec_to_buf(qiov, 0, buf, qiov->size);

    out_buf = g_malloc(s->cluster_size);

@@ -1012,27 +969,35 @@ static int qcow_write_compressed(BlockDriverState *bs, int64_t sector_num,

    if (ret != Z_STREAM_END || out_len >= s->cluster_size) {
        /* could not compress: write normal cluster */
-        ret = qcow_write(bs, sector_num, buf, s->cluster_sectors);
-        if (ret < 0) {
-            goto fail;
-        }
-    } else {
-        cluster_offset = get_cluster_offset(bs, sector_num << 9, 2,
-                                            out_len, 0, 0);
-        if (cluster_offset == 0) {
-            ret = -EIO;
-            goto fail;
-        }
-
-        cluster_offset &= s->cluster_offset_mask;
-        ret = bdrv_pwrite(bs->file, cluster_offset, out_buf, out_len);
+        ret = qcow_co_writev(bs, offset >> BDRV_SECTOR_BITS,
+                             bytes >> BDRV_SECTOR_BITS, qiov);
        if (ret < 0) {
            goto fail;
        }
+        goto success;
    }
+    qemu_co_mutex_lock(&s->lock);
+    cluster_offset = get_cluster_offset(bs, offset, 2, out_len, 0, 0);
+    qemu_co_mutex_unlock(&s->lock);
+    if (cluster_offset == 0) {
+        ret = -EIO;
+        goto fail;
+    }
+    cluster_offset &= s->cluster_offset_mask;

+    iov = (struct iovec) {
+        .iov_base   = out_buf,
+        .iov_len    = out_len,
+    };
+    qemu_iovec_init_external(&hd_qiov, &iov, 1);
+    ret = bdrv_co_pwritev(bs->file, cluster_offset, out_len, &hd_qiov, 0);
+    if (ret < 0) {
+        goto fail;
+    }
+success:
    ret = 0;
 fail:
+    qemu_vfree(buf);
    g_free(out_buf);
    return ret;
 }
@@ -1085,7 +1050,7 @@ static BlockDriver bdrv_qcow = {

    .bdrv_set_key           = qcow_set_key,
    .bdrv_make_empty        = qcow_make_empty,
-    .bdrv_write_compressed  = qcow_write_compressed,
+    .bdrv_co_pwritev_compressed = qcow_co_pwritev_compressed,
    .bdrv_get_info          = qcow_get_info,

    .create_opts            = &qcow_create_opts,
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -83,7 +83,9 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
    }
    memset(new_l1_table, 0, align_offset(new_l1_size2, 512));

-    memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
+    if (s->l1_size) {
+        memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
+    }

    /* write new table (align to cluster) */
    BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ALLOC_TABLE);
@@ -427,7 +429,7 @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,

    if (bs->encrypted) {
        Error *err = NULL;
-        int64_t sector = (cluster_offset + offset_in_cluster)
+        int64_t sector = (src_cluster_offset + offset_in_cluster)
                         >> BDRV_SECTOR_BITS;
        assert(s->cipher);
        assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1804,7 +1804,10 @@ static size_t header_ext_add(char *buf, uint32_t magic, const void *s,
        .magic  = cpu_to_be32(magic),
        .len    = cpu_to_be32(len),
    };
-    memcpy(buf + sizeof(QCowExtension), s, len);
+
+    if (len) {
+        memcpy(buf + sizeof(QCowExtension), s, len);
+    }

    return ext_len;
 }
@@ -2533,84 +2536,39 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset)
    return 0;
 }

-typedef struct Qcow2WriteCo {
-    BlockDriverState *bs;
-    int64_t sector_num;
-    const uint8_t *buf;
-    int nb_sectors;
-    int ret;
-} Qcow2WriteCo;
-
-static void qcow2_write_co_entry(void *opaque)
-{
-    Qcow2WriteCo *co = opaque;
-    QEMUIOVector qiov;
-    uint64_t offset = co->sector_num * BDRV_SECTOR_SIZE;
-    uint64_t bytes = co->nb_sectors * BDRV_SECTOR_SIZE;
-
-    struct iovec iov = (struct iovec) {
-        .iov_base   = (uint8_t*) co->buf,
-        .iov_len    = bytes,
-    };
-    qemu_iovec_init_external(&qiov, &iov, 1);
-
-    co->ret = qcow2_co_pwritev(co->bs, offset, bytes, &qiov, 0);
-}
-
-/* Wrapper for non-coroutine contexts */
-static int qcow2_write(BlockDriverState *bs, int64_t sector_num,
-                       const uint8_t *buf, int nb_sectors)
-{
-    Coroutine *co;
-    AioContext *aio_context = bdrv_get_aio_context(bs);
-    Qcow2WriteCo data = {
-        .bs         = bs,
-        .sector_num = sector_num,
-        .buf        = buf,
-        .nb_sectors = nb_sectors,
-        .ret        = -EINPROGRESS,
-    };
-    co = qemu_coroutine_create(qcow2_write_co_entry, &data);
-    qemu_coroutine_enter(co);
-    while (data.ret == -EINPROGRESS) {
-        aio_poll(aio_context, true);
-    }
-    return data.ret;
-}
-
 /* XXX: put compressed sectors first, then all the cluster aligned
   tables to avoid losing bytes in alignment */
-static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num,
-                                  const uint8_t *buf, int nb_sectors)
+static coroutine_fn int
+qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
+                            uint64_t bytes, QEMUIOVector *qiov)
 {
    BDRVQcow2State *s = bs->opaque;
+    QEMUIOVector hd_qiov;
+    struct iovec iov;
    z_stream strm;
    int ret, out_len;
-    uint8_t *out_buf;
+    uint8_t *buf, *out_buf;
    uint64_t cluster_offset;

-    if (nb_sectors == 0) {
+    if (bytes == 0) {
        /* align end of file to a sector boundary to ease reading with
           sector based I/Os */
        cluster_offset = bdrv_getlength(bs->file->bs);
        return bdrv_truncate(bs->file->bs, cluster_offset);
    }

-    if (nb_sectors != s->cluster_sectors) {
-        ret = -EINVAL;
-
-        /* Zero-pad last write if image size is not cluster aligned */
-        if (sector_num + nb_sectors == bs->total_sectors &&
-            nb_sectors < s->cluster_sectors) {
-            uint8_t *pad_buf = qemu_blockalign(bs, s->cluster_size);
-            memset(pad_buf, 0, s->cluster_size);
-            memcpy(pad_buf, buf, nb_sectors * BDRV_SECTOR_SIZE);
-            ret = qcow2_write_compressed(bs, sector_num,
-                                         pad_buf, s->cluster_sectors);
-            qemu_vfree(pad_buf);
+    buf = qemu_blockalign(bs, s->cluster_size);
+    if (bytes != s->cluster_size) {
+        if (bytes > s->cluster_size ||
+            offset + bytes != bs->total_sectors << BDRV_SECTOR_BITS)
+        {
+            qemu_vfree(buf);
+            return -EINVAL;
        }
-        return ret;
+        /* Zero-pad last write if image size is not cluster aligned */
+        memset(buf + bytes, 0, s->cluster_size - bytes);
    }
+    qemu_iovec_to_buf(qiov, 0, buf, bytes);

    out_buf = g_malloc(s->cluster_size);

@@ -2641,33 +2599,44 @@ static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num,

    if (ret != Z_STREAM_END || out_len >= s->cluster_size) {
        /* could not compress: write normal cluster */
-        ret = qcow2_write(bs, sector_num, buf, s->cluster_sectors);
-        if (ret < 0) {
-            goto fail;
-        }
-    } else {
-        cluster_offset = qcow2_alloc_compressed_cluster_offset(bs,
-            sector_num << 9, out_len);
-        if (!cluster_offset) {
-            ret = -EIO;
-            goto fail;
-        }
-        cluster_offset &= s->cluster_offset_mask;
-
-        ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len);
-        if (ret < 0) {
-            goto fail;
-        }
-
-        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
-        ret = bdrv_pwrite(bs->file, cluster_offset, out_buf, out_len);
+        ret = qcow2_co_pwritev(bs, offset, bytes, qiov, 0);
        if (ret < 0) {
            goto fail;
        }
+        goto success;
    }

+    qemu_co_mutex_lock(&s->lock);
+    cluster_offset =
+        qcow2_alloc_compressed_cluster_offset(bs, offset, out_len);
+    if (!cluster_offset) {
+        qemu_co_mutex_unlock(&s->lock);
+        ret = -EIO;
+        goto fail;
+    }
+    cluster_offset &= s->cluster_offset_mask;
+
+    ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len);
+    qemu_co_mutex_unlock(&s->lock);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    iov = (struct iovec) {
+        .iov_base   = out_buf,
+        .iov_len    = out_len,
+    };
+    qemu_iovec_init_external(&hd_qiov, &iov, 1);
+
+    BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
+    ret = bdrv_co_pwritev(bs->file, cluster_offset, out_len, &hd_qiov, 0);
+    if (ret < 0) {
+        goto fail;
+    }
+success:
    ret = 0;
 fail:
+    qemu_vfree(buf);
    g_free(out_buf);
    return ret;
 }
@@ -3412,7 +3381,7 @@ BlockDriver bdrv_qcow2 = {
    .bdrv_co_pwrite_zeroes  = qcow2_co_pwrite_zeroes,
    .bdrv_co_pdiscard       = qcow2_co_pdiscard,
    .bdrv_truncate          = qcow2_truncate,
-    .bdrv_write_compressed  = qcow2_write_compressed,
+    .bdrv_co_pwritev_compressed = qcow2_co_pwritev_compressed,
    .bdrv_make_empty        = qcow2_make_empty,

    .bdrv_snapshot_create   = qcow2_snapshot_create,
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -530,7 +530,6 @@ int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
                        bool exact_size);
 int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index);
-void qcow2_l2_cache_reset(BlockDriverState *bs);
 int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset);
 int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
                          uint8_t *out_buf, const uint8_t *in_buf,
--- a/block/qed.c
+++ b/block/qed.c
@@ -909,7 +909,6 @@ static void qed_aio_complete_bh(void *opaque)
    void *user_opaque = acb->common.opaque;
    int ret = acb->bh_ret;

-    qemu_bh_delete(acb->bh);
    qemu_aio_unref(acb);

    /* Invoke callback */
@@ -934,9 +933,8 @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)

    /* Arrange for a bh to invoke the completion function */
    acb->bh_ret = ret;
-    acb->bh = aio_bh_new(bdrv_get_aio_context(acb->common.bs),
-                         qed_aio_complete_bh, acb);
-    qemu_bh_schedule(acb->bh);
+    aio_bh_schedule_oneshot(bdrv_get_aio_context(acb->common.bs),
+                            qed_aio_complete_bh, acb);

    /* Start next allocating write request waiting behind this one.  Note that
     * requests enqueue themselves when they first hit an unallocated cluster
--- a/block/qed.h
+++ b/block/qed.h
@@ -130,7 +130,6 @@ enum {

 typedef struct QEDAIOCB {
    BlockAIOCB common;
-    QEMUBH *bh;
    int bh_ret;                     /* final return status for completion bh */
    QSIMPLEQ_ENTRY(QEDAIOCB) next;  /* next request */
    int flags;                      /* QED_AIOCB_* bits ORed together */
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -143,6 +143,7 @@ typedef struct BDRVRawState {
    bool has_discard:1;
    bool has_write_zeroes:1;
    bool discard_zeroes:1;
+    bool use_linux_aio:1;
    bool has_fallocate;
    bool needs_alignment;
 } BDRVRawState;
@@ -367,18 +368,6 @@ static void raw_parse_flags(int bdrv_flags, int *open_flags)
    }
 }

-#ifdef CONFIG_LINUX_AIO
-static bool raw_use_aio(int bdrv_flags)
-{
-    /*
-     * Currently Linux do AIO only for files opened with O_DIRECT
-     * specified so check NOCACHE flag too
-     */
-    return (bdrv_flags & (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO)) ==
-                         (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO);
-}
-#endif
-
 static void raw_parse_filename(const char *filename, QDict *options,
                               Error **errp)
 {
@@ -399,6 +388,11 @@ static QemuOptsList raw_runtime_opts = {
            .type = QEMU_OPT_STRING,
            .help = "File name of the image",
        },
+        {
+            .name = "aio",
+            .type = QEMU_OPT_STRING,
+            .help = "host AIO implementation (threads, native)",
+        },
        { /* end of list */ }
    },
 };
@@ -410,6 +404,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
    QemuOpts *opts;
    Error *local_err = NULL;
    const char *filename = NULL;
+    BlockdevAioOptions aio, aio_default;
    int fd, ret;
    struct stat st;

@@ -429,6 +424,18 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
        goto fail;
    }

+    aio_default = (bdrv_flags & BDRV_O_NATIVE_AIO)
+                  ? BLOCKDEV_AIO_OPTIONS_NATIVE
+                  : BLOCKDEV_AIO_OPTIONS_THREADS;
+    aio = qapi_enum_parse(BlockdevAioOptions_lookup, qemu_opt_get(opts, "aio"),
+                          BLOCKDEV_AIO_OPTIONS__MAX, aio_default, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        ret = -EINVAL;
+        goto fail;
+    }
+    s->use_linux_aio = (aio == BLOCKDEV_AIO_OPTIONS_NATIVE);
+
    s->open_flags = open_flags;
    raw_parse_flags(bdrv_flags, &s->open_flags);

@@ -444,14 +451,15 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
    s->fd = fd;

 #ifdef CONFIG_LINUX_AIO
-    if (!raw_use_aio(bdrv_flags) && (bdrv_flags & BDRV_O_NATIVE_AIO)) {
+     /* Currently Linux does AIO only for files opened with O_DIRECT */
+    if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) {
        error_setg(errp, "aio=native was specified, but it requires "
                         "cache.direct=on, which was not specified.");
        ret = -EINVAL;
        goto fail;
    }
 #else
-    if (bdrv_flags & BDRV_O_NATIVE_AIO) {
+    if (s->use_linux_aio) {
        error_setg(errp, "aio=native was specified, but is not supported "
                         "in this build.");
        ret = -EINVAL;
@@ -1256,7 +1264,7 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
        if (!bdrv_qiov_is_aligned(bs, qiov)) {
            type |= QEMU_AIO_MISALIGNED;
 #ifdef CONFIG_LINUX_AIO
-        } else if (bs->open_flags & BDRV_O_NATIVE_AIO) {
+        } else if (s->use_linux_aio) {
            LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
            assert(qiov->size == bytes);
            return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
@@ -1285,7 +1293,8 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
 static void raw_aio_plug(BlockDriverState *bs)
 {
 #ifdef CONFIG_LINUX_AIO
-    if (bs->open_flags & BDRV_O_NATIVE_AIO) {
+    BDRVRawState *s = bs->opaque;
+    if (s->use_linux_aio) {
        LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
        laio_io_plug(bs, aio);
    }
@@ -1295,7 +1304,8 @@ static void raw_aio_plug(BlockDriverState *bs)
 static void raw_aio_unplug(BlockDriverState *bs)
 {
 #ifdef CONFIG_LINUX_AIO
-    if (bs->open_flags & BDRV_O_NATIVE_AIO) {
+    BDRVRawState *s = bs->opaque;
+    if (s->use_linux_aio) {
        LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
        laio_io_unplug(bs, aio);
    }
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -32,6 +32,7 @@
 #include "block/thread-pool.h"
 #include "qemu/iov.h"
 #include "qapi/qmp/qstring.h"
+#include "qapi/util.h"
 #include <windows.h>
 #include <winioctl.h>

@@ -252,7 +253,8 @@ static void raw_probe_alignment(BlockDriverState *bs, Error **errp)
    }
 }

-static void raw_parse_flags(int flags, int *access_flags, DWORD *overlapped)
+static void raw_parse_flags(int flags, bool use_aio, int *access_flags,
+                            DWORD *overlapped)
 {
    assert(access_flags != NULL);
    assert(overlapped != NULL);
@@ -264,7 +266,7 @@ static void raw_parse_flags(int flags, int *access_flags, DWORD *overlapped)
    }

    *overlapped = FILE_ATTRIBUTE_NORMAL;
-    if (flags & BDRV_O_NATIVE_AIO) {
+    if (use_aio) {
        *overlapped |= FILE_FLAG_OVERLAPPED;
    }
    if (flags & BDRV_O_NOCACHE) {
@@ -292,10 +294,35 @@ static QemuOptsList raw_runtime_opts = {
            .type = QEMU_OPT_STRING,
            .help = "File name of the image",
        },
+        {
+            .name = "aio",
+            .type = QEMU_OPT_STRING,
+            .help = "host AIO implementation (threads, native)",
+        },
        { /* end of list */ }
    },
 };

+static bool get_aio_option(QemuOpts *opts, int flags, Error **errp)
+{
+    BlockdevAioOptions aio, aio_default;
+
+    aio_default = (flags & BDRV_O_NATIVE_AIO) ? BLOCKDEV_AIO_OPTIONS_NATIVE
+                                              : BLOCKDEV_AIO_OPTIONS_THREADS;
+    aio = qapi_enum_parse(BlockdevAioOptions_lookup, qemu_opt_get(opts, "aio"),
+                          BLOCKDEV_AIO_OPTIONS__MAX, aio_default, errp);
+
+    switch (aio) {
+    case BLOCKDEV_AIO_OPTIONS_NATIVE:
+        return true;
+    case BLOCKDEV_AIO_OPTIONS_THREADS:
+        return false;
+    default:
+        error_setg(errp, "Invalid AIO option");
+    }
+    return false;
+}
+
 static int raw_open(BlockDriverState *bs, QDict *options, int flags,
                    Error **errp)
 {
@@ -305,6 +332,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
    QemuOpts *opts;
    Error *local_err = NULL;
    const char *filename;
+    bool use_aio;
    int ret;

    s->type = FTYPE_FILE;
@@ -319,7 +347,14 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,

    filename = qemu_opt_get(opts, "filename");

-    raw_parse_flags(flags, &access_flags, &overlapped);
+    use_aio = get_aio_option(opts, flags, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    raw_parse_flags(flags, use_aio, &access_flags, &overlapped);

    if (filename[0] && filename[1] == ':') {
        snprintf(s->drive_path, sizeof(s->drive_path), "%c:\\", filename[0]);
@@ -346,7 +381,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
        goto fail;
    }

-    if (flags & BDRV_O_NATIVE_AIO) {
+    if (use_aio) {
        s->aio = win32_aio_init();
        if (s->aio == NULL) {
            CloseHandle(s->hfile);
@@ -647,6 +682,7 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags,

    Error *local_err = NULL;
    const char *filename;
+    bool use_aio;

    QemuOpts *opts = qemu_opts_create(&raw_runtime_opts, NULL, 0,
                                      &error_abort);
@@ -659,6 +695,16 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags,

    filename = qemu_opt_get(opts, "filename");

+    use_aio = get_aio_option(opts, flags, &local_err);
+    if (!local_err && use_aio) {
+        error_setg(&local_err, "AIO is not supported on Windows host devices");
+    }
+    if (local_err) {
+        error_propagate(errp, local_err);
+        ret = -EINVAL;
+        goto done;
+    }
+
    if (strstart(filename, "/dev/cdrom", NULL)) {
        if (find_cdrom(device_name, sizeof(device_name)) < 0) {
            error_setg(errp, "Could not open CD-ROM drive");
@@ -677,7 +723,7 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags,
    }
    s->type = find_device_type(bs, filename);

-    raw_parse_flags(flags, &access_flags, &overlapped);
+    raw_parse_flags(flags, use_aio, &access_flags, &overlapped);

    create_flags = OPEN_EXISTING;

--- a/block/rbd.c
+++ b/block/rbd.c
@@ -71,7 +71,6 @@ typedef enum {

 typedef struct RBDAIOCB {
    BlockAIOCB common;
-    QEMUBH *bh;
    int64_t ret;
    QEMUIOVector *qiov;
    char *bounce;
@@ -602,7 +601,6 @@ static const AIOCBInfo rbd_aiocb_info = {
 static void rbd_finish_bh(void *opaque)
 {
    RADOSCB *rcb = opaque;
-    qemu_bh_delete(rcb->acb->bh);
    qemu_rbd_complete_aio(rcb);
 }

@@ -621,9 +619,8 @@ static void rbd_finish_aiocb(rbd_completion_t c, RADOSCB *rcb)
    rcb->ret = rbd_aio_get_return_value(c);
    rbd_aio_release(c);

-    acb->bh = aio_bh_new(bdrv_get_aio_context(acb->common.bs),
-                         rbd_finish_bh, rcb);
-    qemu_bh_schedule(acb->bh);
+    aio_bh_schedule_oneshot(bdrv_get_aio_context(acb->common.bs),
+                            rbd_finish_bh, rcb);
 }

 static int rbd_aio_discard_wrapper(rbd_image_t image,
@@ -679,7 +676,6 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
    acb->ret = 0;
    acb->error = 0;
    acb->s = s;
-    acb->bh = NULL;

    if (cmd == RBD_AIO_WRITE) {
        qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size);
--- a/block/replication.c
+++ b/block/replication.c
@@ -0,0 +1,659 @@
+/*
+ * Replication Block filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "block/nbd.h"
+#include "block/blockjob.h"
+#include "block/block_int.h"
+#include "block/block_backup.h"
+#include "sysemu/block-backend.h"
+#include "qapi/error.h"
+#include "replication.h"
+
+typedef struct BDRVReplicationState {
+    ReplicationMode mode;
+    int replication_state;
+    BdrvChild *active_disk;
+    BdrvChild *hidden_disk;
+    BdrvChild *secondary_disk;
+    char *top_id;
+    ReplicationState *rs;
+    Error *blocker;
+    int orig_hidden_flags;
+    int orig_secondary_flags;
+    int error;
+} BDRVReplicationState;
+
+enum {
+    BLOCK_REPLICATION_NONE,             /* block replication is not started */
+    BLOCK_REPLICATION_RUNNING,          /* block replication is running */
+    BLOCK_REPLICATION_FAILOVER,         /* failover is running in background */
+    BLOCK_REPLICATION_FAILOVER_FAILED,  /* failover failed */
+    BLOCK_REPLICATION_DONE,             /* block replication is done */
+};
+
+static void replication_start(ReplicationState *rs, ReplicationMode mode,
+                              Error **errp);
+static void replication_do_checkpoint(ReplicationState *rs, Error **errp);
+static void replication_get_error(ReplicationState *rs, Error **errp);
+static void replication_stop(ReplicationState *rs, bool failover,
+                             Error **errp);
+
+#define REPLICATION_MODE        "mode"
+#define REPLICATION_TOP_ID      "top-id"
+static QemuOptsList replication_runtime_opts = {
+    .name = "replication",
+    .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
+    .desc = {
+        {
+            .name = REPLICATION_MODE,
+            .type = QEMU_OPT_STRING,
+        },
+        {
+            .name = REPLICATION_TOP_ID,
+            .type = QEMU_OPT_STRING,
+        },
+        { /* end of list */ }
+    },
+};
+
+static ReplicationOps replication_ops = {
+    .start = replication_start,
+    .checkpoint = replication_do_checkpoint,
+    .get_error = replication_get_error,
+    .stop = replication_stop,
+};
+
+static int replication_open(BlockDriverState *bs, QDict *options,
+                            int flags, Error **errp)
+{
+    int ret;
+    BDRVReplicationState *s = bs->opaque;
+    Error *local_err = NULL;
+    QemuOpts *opts = NULL;
+    const char *mode;
+    const char *top_id;
+
+    ret = -EINVAL;
+    opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        goto fail;
+    }
+
+    mode = qemu_opt_get(opts, REPLICATION_MODE);
+    if (!mode) {
+        error_setg(&local_err, "Missing the option mode");
+        goto fail;
+    }
+
+    if (!strcmp(mode, "primary")) {
+        s->mode = REPLICATION_MODE_PRIMARY;
+    } else if (!strcmp(mode, "secondary")) {
+        s->mode = REPLICATION_MODE_SECONDARY;
+        top_id = qemu_opt_get(opts, REPLICATION_TOP_ID);
+        s->top_id = g_strdup(top_id);
+        if (!s->top_id) {
+            error_setg(&local_err, "Missing the option top-id");
+            goto fail;
+        }
+    } else {
+        error_setg(&local_err,
+                   "The option mode's value should be primary or secondary");
+        goto fail;
+    }
+
+    s->rs = replication_new(bs, &replication_ops);
+
+    ret = 0;
+
+fail:
+    qemu_opts_del(opts);
+    error_propagate(errp, local_err);
+
+    return ret;
+}
+
+static void replication_close(BlockDriverState *bs)
+{
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
+        replication_stop(s->rs, false, NULL);
+    }
+
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        g_free(s->top_id);
+    }
+
+    replication_remove(s->rs);
+}
+
+static int64_t replication_getlength(BlockDriverState *bs)
+{
+    return bdrv_getlength(bs->file->bs);
+}
+
+static int replication_get_io_status(BDRVReplicationState *s)
+{
+    switch (s->replication_state) {
+    case BLOCK_REPLICATION_NONE:
+        return -EIO;
+    case BLOCK_REPLICATION_RUNNING:
+        return 0;
+    case BLOCK_REPLICATION_FAILOVER:
+        return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 0;
+    case BLOCK_REPLICATION_FAILOVER_FAILED:
+        return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 1;
+    case BLOCK_REPLICATION_DONE:
+        /*
+         * active commit job completes, and active disk and secondary_disk
+         * is swapped, so we can operate bs->file directly
+         */
+        return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 0;
+    default:
+        abort();
+    }
+}
+
+static int replication_return_value(BDRVReplicationState *s, int ret)
+{
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        return ret;
+    }
+
+    if (ret < 0) {
+        s->error = ret;
+        ret = 0;
+    }
+
+    return ret;
+}
+
+static coroutine_fn int replication_co_readv(BlockDriverState *bs,
+                                             int64_t sector_num,
+                                             int remaining_sectors,
+                                             QEMUIOVector *qiov)
+{
+    BDRVReplicationState *s = bs->opaque;
+    BdrvChild *child = s->secondary_disk;
+    BlockJob *job = NULL;
+    CowRequest req;
+    int ret;
+
+    if (s->mode == REPLICATION_MODE_PRIMARY) {
+        /* We only use it to forward primary write requests */
+        return -EIO;
+    }
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (child && child->bs) {
+        job = child->bs->job;
+    }
+
+    if (job) {
+        backup_wait_for_overlapping_requests(child->bs->job, sector_num,
+                                             remaining_sectors);
+        backup_cow_request_begin(&req, child->bs->job, sector_num,
+                                 remaining_sectors);
+        ret = bdrv_co_readv(bs->file, sector_num, remaining_sectors,
+                            qiov);
+        backup_cow_request_end(&req);
+        goto out;
+    }
+
+    ret = bdrv_co_readv(bs->file, sector_num, remaining_sectors, qiov);
+out:
+    return replication_return_value(s, ret);
+}
+
+static coroutine_fn int replication_co_writev(BlockDriverState *bs,
+                                              int64_t sector_num,
+                                              int remaining_sectors,
+                                              QEMUIOVector *qiov)
+{
+    BDRVReplicationState *s = bs->opaque;
+    QEMUIOVector hd_qiov;
+    uint64_t bytes_done = 0;
+    BdrvChild *top = bs->file;
+    BdrvChild *base = s->secondary_disk;
+    BdrvChild *target;
+    int ret, n;
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        goto out;
+    }
+
+    if (ret == 0) {
+        ret = bdrv_co_writev(top, sector_num,
+                             remaining_sectors, qiov);
+        return replication_return_value(s, ret);
+    }
+
+    /*
+     * Failover failed, only write to active disk if the sectors
+     * have already been allocated in active disk/hidden disk.
+     */
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+    while (remaining_sectors > 0) {
+        ret = bdrv_is_allocated_above(top->bs, base->bs, sector_num,
+                                      remaining_sectors, &n);
+        if (ret < 0) {
+            goto out1;
+        }
+
+        qemu_iovec_reset(&hd_qiov);
+        qemu_iovec_concat(&hd_qiov, qiov, bytes_done, n * BDRV_SECTOR_SIZE);
+
+        target = ret ? top : base;
+        ret = bdrv_co_writev(target, sector_num, n, &hd_qiov);
+        if (ret < 0) {
+            goto out1;
+        }
+
+        remaining_sectors -= n;
+        sector_num += n;
+        bytes_done += n * BDRV_SECTOR_SIZE;
+    }
+
+out1:
+    qemu_iovec_destroy(&hd_qiov);
+out:
+    return ret;
+}
+
+static bool replication_recurse_is_first_non_filter(BlockDriverState *bs,
+                                                    BlockDriverState *candidate)
+{
+    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
+}
+
+static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
+{
+    Error *local_err = NULL;
+    int ret;
+
+    if (!s->secondary_disk->bs->job) {
+        error_setg(errp, "Backup job was cancelled unexpectedly");
+        return;
+    }
+
+    backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    ret = s->active_disk->bs->drv->bdrv_make_empty(s->active_disk->bs);
+    if (ret < 0) {
+        error_setg(errp, "Cannot make active disk empty");
+        return;
+    }
+
+    ret = s->hidden_disk->bs->drv->bdrv_make_empty(s->hidden_disk->bs);
+    if (ret < 0) {
+        error_setg(errp, "Cannot make hidden disk empty");
+        return;
+    }
+}
+
+static void reopen_backing_file(BDRVReplicationState *s, bool writable,
+                                Error **errp)
+{
+    BlockReopenQueue *reopen_queue = NULL;
+    int orig_hidden_flags, orig_secondary_flags;
+    int new_hidden_flags, new_secondary_flags;
+    Error *local_err = NULL;
+
+    if (writable) {
+        orig_hidden_flags = s->orig_hidden_flags =
+                                bdrv_get_flags(s->hidden_disk->bs);
+        new_hidden_flags = (orig_hidden_flags | BDRV_O_RDWR) &
+                                                    ~BDRV_O_INACTIVE;
+        orig_secondary_flags = s->orig_secondary_flags =
+                                bdrv_get_flags(s->secondary_disk->bs);
+        new_secondary_flags = (orig_secondary_flags | BDRV_O_RDWR) &
+                                                     ~BDRV_O_INACTIVE;
+    } else {
+        orig_hidden_flags = (s->orig_hidden_flags | BDRV_O_RDWR) &
+                                                    ~BDRV_O_INACTIVE;
+        new_hidden_flags = s->orig_hidden_flags;
+        orig_secondary_flags = (s->orig_secondary_flags | BDRV_O_RDWR) &
+                                                    ~BDRV_O_INACTIVE;
+        new_secondary_flags = s->orig_secondary_flags;
+    }
+
+    if (orig_hidden_flags != new_hidden_flags) {
+        reopen_queue = bdrv_reopen_queue(reopen_queue, s->hidden_disk->bs, NULL,
+                                         new_hidden_flags);
+    }
+
+    if (!(orig_secondary_flags & BDRV_O_RDWR)) {
+        reopen_queue = bdrv_reopen_queue(reopen_queue, s->secondary_disk->bs,
+                                         NULL, new_secondary_flags);
+    }
+
+    if (reopen_queue) {
+        bdrv_reopen_multiple(reopen_queue, &local_err);
+        error_propagate(errp, local_err);
+    }
+}
+
+static void backup_job_cleanup(BDRVReplicationState *s)
+{
+    BlockDriverState *top_bs;
+
+    top_bs = bdrv_lookup_bs(s->top_id, s->top_id, NULL);
+    if (!top_bs) {
+        return;
+    }
+    bdrv_op_unblock_all(top_bs, s->blocker);
+    error_free(s->blocker);
+    reopen_backing_file(s, false, NULL);
+}
+
+static void backup_job_completed(void *opaque, int ret)
+{
+    BDRVReplicationState *s = opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_FAILOVER) {
+        /* The backup job is cancelled unexpectedly */
+        s->error = -EIO;
+    }
+
+    backup_job_cleanup(s);
+}
+
+static bool check_top_bs(BlockDriverState *top_bs, BlockDriverState *bs)
+{
+    BdrvChild *child;
+
+    /* The bs itself is the top_bs */
+    if (top_bs == bs) {
+        return true;
+    }
+
+    /* Iterate over top_bs's children */
+    QLIST_FOREACH(child, &top_bs->children, next) {
+        if (child->bs == bs || check_top_bs(child->bs, bs)) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
+static void replication_start(ReplicationState *rs, ReplicationMode mode,
+                              Error **errp)
+{
+    BlockDriverState *bs = rs->opaque;
+    BDRVReplicationState *s;
+    BlockDriverState *top_bs;
+    int64_t active_length, hidden_length, disk_length;
+    AioContext *aio_context;
+    Error *local_err = NULL;
+
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
+    s = bs->opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_NONE) {
+        error_setg(errp, "Block replication is running or done");
+        aio_context_release(aio_context);
+        return;
+    }
+
+    if (s->mode != mode) {
+        error_setg(errp, "The parameter mode's value is invalid, needs %d,"
+                   " but got %d", s->mode, mode);
+        aio_context_release(aio_context);
+        return;
+    }
+
+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        s->active_disk = bs->file;
+        if (!s->active_disk || !s->active_disk->bs ||
+                                    !s->active_disk->bs->backing) {
+            error_setg(errp, "Active disk doesn't have backing file");
+            aio_context_release(aio_context);
+            return;
+        }
+
+        s->hidden_disk = s->active_disk->bs->backing;
+        if (!s->hidden_disk->bs || !s->hidden_disk->bs->backing) {
+            error_setg(errp, "Hidden disk doesn't have backing file");
+            aio_context_release(aio_context);
+            return;
+        }
+
+        s->secondary_disk = s->hidden_disk->bs->backing;
+        if (!s->secondary_disk->bs || !bdrv_has_blk(s->secondary_disk->bs)) {
+            error_setg(errp, "The secondary disk doesn't have block backend");
+            aio_context_release(aio_context);
+            return;
+        }
+
+        /* verify the length */
+        active_length = bdrv_getlength(s->active_disk->bs);
+        hidden_length = bdrv_getlength(s->hidden_disk->bs);
+        disk_length = bdrv_getlength(s->secondary_disk->bs);
+        if (active_length < 0 || hidden_length < 0 || disk_length < 0 ||
+            active_length != hidden_length || hidden_length != disk_length) {
+            error_setg(errp, "Active disk, hidden disk, secondary disk's length"
+                       " are not the same");
+            aio_context_release(aio_context);
+            return;
+        }
+
+        if (!s->active_disk->bs->drv->bdrv_make_empty ||
+            !s->hidden_disk->bs->drv->bdrv_make_empty) {
+            error_setg(errp,
+                       "Active disk or hidden disk doesn't support make_empty");
+            aio_context_release(aio_context);
+            return;
+        }
+
+        /* reopen the backing file in r/w mode */
+        reopen_backing_file(s, true, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            aio_context_release(aio_context);
+            return;
+        }
+
+        /* start backup job now */
+        error_setg(&s->blocker,
+                   "Block device is in use by internal backup job");
+
+        top_bs = bdrv_lookup_bs(s->top_id, s->top_id, NULL);
+        if (!top_bs || !bdrv_is_root_node(top_bs) ||
+            !check_top_bs(top_bs, bs)) {
+            error_setg(errp, "No top_bs or it is invalid");
+            reopen_backing_file(s, false, NULL);
+            aio_context_release(aio_context);
+            return;
+        }
+        bdrv_op_block_all(top_bs, s->blocker);
+        bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
+
+        backup_start("replication-backup", s->secondary_disk->bs,
+                     s->hidden_disk->bs, 0, MIRROR_SYNC_MODE_NONE, NULL, false,
+                     BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
+                     backup_job_completed, s, NULL, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            backup_job_cleanup(s);
+            aio_context_release(aio_context);
+            return;
+        }
+        break;
+    default:
+        aio_context_release(aio_context);
+        abort();
+    }
+
+    s->replication_state = BLOCK_REPLICATION_RUNNING;
+
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        secondary_do_checkpoint(s, errp);
+    }
+
+    s->error = 0;
+    aio_context_release(aio_context);
+}
+
+static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
+{
+    BlockDriverState *bs = rs->opaque;
+    BDRVReplicationState *s;
+    AioContext *aio_context;
+
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
+    s = bs->opaque;
+
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        secondary_do_checkpoint(s, errp);
+    }
+    aio_context_release(aio_context);
+}
+
+static void replication_get_error(ReplicationState *rs, Error **errp)
+{
+    BlockDriverState *bs = rs->opaque;
+    BDRVReplicationState *s;
+    AioContext *aio_context;
+
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
+    s = bs->opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_RUNNING) {
+        error_setg(errp, "Block replication is not running");
+        aio_context_release(aio_context);
+        return;
+    }
+
+    if (s->error) {
+        error_setg(errp, "I/O error occurred");
+        aio_context_release(aio_context);
+        return;
+    }
+    aio_context_release(aio_context);
+}
+
+static void replication_done(void *opaque, int ret)
+{
+    BlockDriverState *bs = opaque;
+    BDRVReplicationState *s = bs->opaque;
+
+    if (ret == 0) {
+        s->replication_state = BLOCK_REPLICATION_DONE;
+
+        /* refresh top bs's filename */
+        bdrv_refresh_filename(bs);
+        s->active_disk = NULL;
+        s->secondary_disk = NULL;
+        s->hidden_disk = NULL;
+        s->error = 0;
+    } else {
+        s->replication_state = BLOCK_REPLICATION_FAILOVER_FAILED;
+        s->error = -EIO;
+    }
+}
+
+static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
+{
+    BlockDriverState *bs = rs->opaque;
+    BDRVReplicationState *s;
+    AioContext *aio_context;
+
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
+    s = bs->opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_RUNNING) {
+        error_setg(errp, "Block replication is not running");
+        aio_context_release(aio_context);
+        return;
+    }
+
+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        s->replication_state = BLOCK_REPLICATION_DONE;
+        s->error = 0;
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        /*
+         * This BDS will be closed, and the job should be completed
+         * before the BDS is closed, because we will access hidden
+         * disk, secondary disk in backup_job_completed().
+         */
+        if (s->secondary_disk->bs->job) {
+            block_job_cancel_sync(s->secondary_disk->bs->job);
+        }
+
+        if (!failover) {
+            secondary_do_checkpoint(s, errp);
+            s->replication_state = BLOCK_REPLICATION_DONE;
+            aio_context_release(aio_context);
+            return;
+        }
+
+        s->replication_state = BLOCK_REPLICATION_FAILOVER;
+        commit_active_start("replication-commit", s->active_disk->bs,
+                            s->secondary_disk->bs, 0, BLOCKDEV_ON_ERROR_REPORT,
+                            replication_done,
+                            bs, errp, true);
+        break;
+    default:
+        aio_context_release(aio_context);
+        abort();
+    }
+    aio_context_release(aio_context);
+}
+
+BlockDriver bdrv_replication = {
+    .format_name                = "replication",
+    .protocol_name              = "replication",
+    .instance_size              = sizeof(BDRVReplicationState),
+
+    .bdrv_open                  = replication_open,
+    .bdrv_close                 = replication_close,
+
+    .bdrv_getlength             = replication_getlength,
+    .bdrv_co_readv              = replication_co_readv,
+    .bdrv_co_writev             = replication_co_writev,
+
+    .is_filter                  = true,
+    .bdrv_recurse_is_first_non_filter = replication_recurse_is_first_non_filter,
+
+    .has_variable_length        = true,
+};
+
+static void bdrv_replication_init(void)
+{
+    bdrv_register(&bdrv_replication);
+}
+
+block_init(bdrv_replication_init);
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1049,7 +1049,7 @@ static int parse_vdiname(BDRVSheepdogState *s, const char *filename,
    const char *host_spec, *vdi_spec;
    int nr_sep, ret;

-    strstart(filename, "sheepdog:", (const char **)&filename);
+    strstart(filename, "sheepdog:", &filename);
    p = q = g_strdup(filename);

    /* count the number of separators */
@@ -2652,7 +2652,7 @@ static int sd_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab)
    req.opcode = SD_OP_READ_VDIS;
    req.data_length = max;

-    ret = do_req(fd, s->aio_context, (SheepdogReq *)&req,
+    ret = do_req(fd, s->aio_context, &req,
                 vdi_inuse, &wlen, &rlen);

    closesocket(fd);
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -508,36 +508,73 @@ static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
    return ret;
 }

+static QemuOptsList ssh_runtime_opts = {
+    .name = "ssh",
+    .head = QTAILQ_HEAD_INITIALIZER(ssh_runtime_opts.head),
+    .desc = {
+        {
+            .name = "host",
+            .type = QEMU_OPT_STRING,
+            .help = "Host to connect to",
+        },
+        {
+            .name = "port",
+            .type = QEMU_OPT_NUMBER,
+            .help = "Port to connect to",
+        },
+        {
+            .name = "path",
+            .type = QEMU_OPT_STRING,
+            .help = "Path of the image on the host",
+        },
+        {
+            .name = "user",
+            .type = QEMU_OPT_STRING,
+            .help = "User as which to connect",
+        },
+        {
+            .name = "host_key_check",
+            .type = QEMU_OPT_STRING,
+            .help = "Defines how and what to check the host key against",
+        },
+    },
+};
+
 static int connect_to_ssh(BDRVSSHState *s, QDict *options,
                          int ssh_flags, int creat_mode, Error **errp)
 {
    int r, ret;
+    QemuOpts *opts = NULL;
+    Error *local_err = NULL;
    const char *host, *user, *path, *host_key_check;
    int port;

-    if (!qdict_haskey(options, "host")) {
+    opts = qemu_opts_create(&ssh_runtime_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        ret = -EINVAL;
+        error_propagate(errp, local_err);
+        goto err;
+    }
+
+    host = qemu_opt_get(opts, "host");
+    if (!host) {
        ret = -EINVAL;
        error_setg(errp, "No hostname was specified");
        goto err;
    }
-    host = qdict_get_str(options, "host");

-    if (qdict_haskey(options, "port")) {
-        port = qdict_get_int(options, "port");
-    } else {
-        port = 22;
-    }
+    port = qemu_opt_get_number(opts, "port", 22);

-    if (!qdict_haskey(options, "path")) {
+    path = qemu_opt_get(opts, "path");
+    if (!path) {
        ret = -EINVAL;
        error_setg(errp, "No path was specified");
        goto err;
    }
-    path = qdict_get_str(options, "path");

-    if (qdict_haskey(options, "user")) {
-        user = qdict_get_str(options, "user");
-    } else {
+    user = qemu_opt_get(opts, "user");
+    if (!user) {
        user = g_get_user_name();
        if (!user) {
            error_setg_errno(errp, errno, "Can't get user name");
@@ -546,9 +583,8 @@ static int connect_to_ssh(BDRVSSHState *s, QDict *options,
        }
    }

-    if (qdict_haskey(options, "host_key_check")) {
-        host_key_check = qdict_get_str(options, "host_key_check");
-    } else {
+    host_key_check = qemu_opt_get(opts, "host_key_check");
+    if (!host_key_check) {
        host_key_check = "yes";
    }

@@ -612,21 +648,14 @@ static int connect_to_ssh(BDRVSSHState *s, QDict *options,
        goto err;
    }

+    qemu_opts_del(opts);
+
    r = libssh2_sftp_fstat(s->sftp_handle, &s->attrs);
    if (r < 0) {
        sftp_error_setg(errp, s, "failed to read file attributes");
        return -EINVAL;
    }

-    /* Delete the options we've used; any not deleted will cause the
-     * block layer to give an error about unused options.
-     */
-    qdict_del(options, "host");
-    qdict_del(options, "port");
-    qdict_del(options, "user");
-    qdict_del(options, "path");
-    qdict_del(options, "host_key_check");
-
    return 0;

 err:
@@ -646,6 +675,8 @@ static int connect_to_ssh(BDRVSSHState *s, QDict *options,
    }
    s->session = NULL;

+    qemu_opts_del(opts);
+
    return ret;
 }

--- a/block/trace-events
+++ b/block/trace-events
@@ -1,4 +1,4 @@
-# See docs/trace-events.txt for syntax documentation.
+# See docs/tracing.txt for syntax documentation.

 # block.c
 bdrv_open_common(void *bs, const char *filename, int flags, const char *format_name) "bs %p filename \"%s\" flags %#x format_name \"%s\""
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -58,14 +58,7 @@
 #include "migration/migration.h"
 #include "qemu/coroutine.h"
 #include "qemu/cutils.h"
-
-#if defined(CONFIG_UUID)
-#include <uuid/uuid.h>
-#else
-/* TODO: move uuid emulation to some central place in QEMU. */
-#include "sysemu/sysemu.h"     /* UUID_FMT */
-typedef unsigned char uuid_t[16];
-#endif
+#include "qemu/uuid.h"

 /* Code configuration options. */

@@ -140,28 +133,6 @@ typedef unsigned char uuid_t[16];
 #define VDI_DISK_SIZE_MAX        ((uint64_t)VDI_BLOCKS_IN_IMAGE_MAX * \
                                  (uint64_t)DEFAULT_CLUSTER_SIZE)

-#if !defined(CONFIG_UUID)
-static inline void uuid_generate(uuid_t out)
-{
-    memset(out, 0, sizeof(uuid_t));
-}
-
-static inline int uuid_is_null(const uuid_t uu)
-{
-    uuid_t null_uuid = { 0 };
-    return memcmp(uu, null_uuid, sizeof(uuid_t)) == 0;
-}
-
-# if defined(CONFIG_VDI_DEBUG)
-static inline void uuid_unparse(const uuid_t uu, char *out)
-{
-    snprintf(out, 37, UUID_FMT,
-            uu[0], uu[1], uu[2], uu[3], uu[4], uu[5], uu[6], uu[7],
-            uu[8], uu[9], uu[10], uu[11], uu[12], uu[13], uu[14], uu[15]);
-}
-# endif
-#endif
-
 typedef struct {
    char text[0x40];
    uint32_t signature;
@@ -182,10 +153,10 @@ typedef struct {
    uint32_t block_extra;       /* unused here */
    uint32_t blocks_in_image;
    uint32_t blocks_allocated;
-    uuid_t uuid_image;
-    uuid_t uuid_last_snap;
-    uuid_t uuid_link;
-    uuid_t uuid_parent;
+    QemuUUID uuid_image;
+    QemuUUID uuid_last_snap;
+    QemuUUID uuid_link;
+    QemuUUID uuid_parent;
    uint64_t unused2[7];
 } QEMU_PACKED VdiHeader;

@@ -206,16 +177,6 @@ typedef struct {
    Error *migration_blocker;
 } BDRVVdiState;

-/* Change UUID from little endian (IPRT = VirtualBox format) to big endian
- * format (network byte order, standard, see RFC 4122) and vice versa.
- */
-static void uuid_convert(uuid_t uuid)
-{
-    bswap32s((uint32_t *)&uuid[0]);
-    bswap16s((uint16_t *)&uuid[4]);
-    bswap16s((uint16_t *)&uuid[6]);
-}
-
 static void vdi_header_to_cpu(VdiHeader *header)
 {
    le32_to_cpus(&header->signature);
@@ -234,10 +195,10 @@ static void vdi_header_to_cpu(VdiHeader *header)
    le32_to_cpus(&header->block_extra);
    le32_to_cpus(&header->blocks_in_image);
    le32_to_cpus(&header->blocks_allocated);
-    uuid_convert(header->uuid_image);
-    uuid_convert(header->uuid_last_snap);
-    uuid_convert(header->uuid_link);
-    uuid_convert(header->uuid_parent);
+    qemu_uuid_bswap(&header->uuid_image);
+    qemu_uuid_bswap(&header->uuid_last_snap);
+    qemu_uuid_bswap(&header->uuid_link);
+    qemu_uuid_bswap(&header->uuid_parent);
 }

 static void vdi_header_to_le(VdiHeader *header)
@@ -258,10 +219,10 @@ static void vdi_header_to_le(VdiHeader *header)
    cpu_to_le32s(&header->block_extra);
    cpu_to_le32s(&header->blocks_in_image);
    cpu_to_le32s(&header->blocks_allocated);
-    uuid_convert(header->uuid_image);
-    uuid_convert(header->uuid_last_snap);
-    uuid_convert(header->uuid_link);
-    uuid_convert(header->uuid_parent);
+    qemu_uuid_bswap(&header->uuid_image);
+    qemu_uuid_bswap(&header->uuid_last_snap);
+    qemu_uuid_bswap(&header->uuid_link);
+    qemu_uuid_bswap(&header->uuid_parent);
 }

 #if defined(CONFIG_VDI_DEBUG)
@@ -469,11 +430,11 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
                   (uint64_t)header.blocks_in_image * header.block_size);
        ret = -ENOTSUP;
        goto fail;
-    } else if (!uuid_is_null(header.uuid_link)) {
+    } else if (!qemu_uuid_is_null(&header.uuid_link)) {
        error_setg(errp, "unsupported VDI image (non-NULL link UUID)");
        ret = -ENOTSUP;
        goto fail;
-    } else if (!uuid_is_null(header.uuid_parent)) {
+    } else if (!qemu_uuid_is_null(&header.uuid_parent)) {
        error_setg(errp, "unsupported VDI image (non-NULL parent UUID)");
        ret = -ENOTSUP;
        goto fail;
@@ -821,8 +782,8 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
    if (image_type == VDI_TYPE_STATIC) {
        header.blocks_allocated = blocks;
    }
-    uuid_generate(header.uuid_image);
-    uuid_generate(header.uuid_last_snap);
+    qemu_uuid_generate(&header.uuid_image);
+    qemu_uuid_generate(&header.uuid_last_snap);
    /* There is no need to set header.uuid_link or header.uuid_parent here. */
 #if defined(CONFIG_VDI_DEBUG)
    vdi_header_print(&header);
--- a/block/vhdx-endian.c
+++ b/block/vhdx-endian.c
@@ -21,9 +21,6 @@
 #include "qemu/bswap.h"
 #include "block/vhdx.h"

-#include <uuid/uuid.h>
-
-
 /*
 * All the VHDX formats on disk are little endian - the following
 * are helper import/export functions to correctly convert
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -25,8 +25,7 @@
 #include "qemu/bswap.h"
 #include "block/vhdx.h"
 #include "migration/migration.h"
-
-#include <uuid/uuid.h>
+#include "qemu/uuid.h"

 /* Options for VHDX creation */

@@ -213,11 +212,11 @@ bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset)
 */
 void vhdx_guid_generate(MSGUID *guid)
 {
-    uuid_t uuid;
+    QemuUUID uuid;
    assert(guid != NULL);

-    uuid_generate(uuid);
-    memcpy(guid, uuid, sizeof(MSGUID));
+    qemu_uuid_generate(&uuid);
+    memcpy(guid, &uuid, sizeof(MSGUID));
 }

 /* Check for region overlaps inside the VHDX image */
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1645,56 +1645,11 @@ vmdk_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
    return ret;
 }

-typedef struct VmdkWriteCompressedCo {
-    BlockDriverState *bs;
-    int64_t sector_num;
-    const uint8_t *buf;
-    int nb_sectors;
-    int ret;
-} VmdkWriteCompressedCo;
-
-static void vmdk_co_write_compressed(void *opaque)
+static int coroutine_fn
+vmdk_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
+                           uint64_t bytes, QEMUIOVector *qiov)
 {
-    VmdkWriteCompressedCo *co = opaque;
-    QEMUIOVector local_qiov;
-    uint64_t offset = co->sector_num * BDRV_SECTOR_SIZE;
-    uint64_t bytes = co->nb_sectors * BDRV_SECTOR_SIZE;
-
-    struct iovec iov = (struct iovec) {
-        .iov_base   = (uint8_t*) co->buf,
-        .iov_len    = bytes,
-    };
-    qemu_iovec_init_external(&local_qiov, &iov, 1);
-
-    co->ret = vmdk_pwritev(co->bs, offset, bytes, &local_qiov, false, false);
-}
-
-static int vmdk_write_compressed(BlockDriverState *bs,
-                                 int64_t sector_num,
-                                 const uint8_t *buf,
-                                 int nb_sectors)
-{
-    BDRVVmdkState *s = bs->opaque;
-
-    if (s->num_extents == 1 && s->extents[0].compressed) {
-        Coroutine *co;
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-        VmdkWriteCompressedCo data = {
-            .bs         = bs,
-            .sector_num = sector_num,
-            .buf        = buf,
-            .nb_sectors = nb_sectors,
-            .ret        = -EINPROGRESS,
-        };
-        co = qemu_coroutine_create(vmdk_co_write_compressed, &data);
-        qemu_coroutine_enter(co);
-        while (data.ret == -EINPROGRESS) {
-            aio_poll(aio_context, true);
-        }
-        return data.ret;
-    } else {
-        return -ENOTSUP;
-    }
+    return vmdk_co_pwritev(bs, offset, bytes, qiov, 0);
 }

 static int coroutine_fn vmdk_co_pwrite_zeroes(BlockDriverState *bs,
@@ -2393,7 +2348,7 @@ static BlockDriver bdrv_vmdk = {
    .bdrv_reopen_prepare          = vmdk_reopen_prepare,
    .bdrv_co_preadv               = vmdk_co_preadv,
    .bdrv_co_pwritev              = vmdk_co_pwritev,
-    .bdrv_write_compressed        = vmdk_write_compressed,
+    .bdrv_co_pwritev_compressed   = vmdk_co_pwritev_compressed,
    .bdrv_co_pwrite_zeroes        = vmdk_co_pwrite_zeroes,
    .bdrv_close                   = vmdk_close,
    .bdrv_create                  = vmdk_create,
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -30,9 +30,7 @@
 #include "qemu/module.h"
 #include "migration/migration.h"
 #include "qemu/bswap.h"
-#if defined(CONFIG_UUID)
-#include <uuid/uuid.h>
-#endif
+#include "qemu/uuid.h"

 /**************************************************************/

@@ -89,7 +87,7 @@ typedef struct vhd_footer {
    uint32_t    checksum;

    /* UUID used to identify a parent hard disk (backing file) */
-    uint8_t     uuid[16];
+    QemuUUID    uuid;

    uint8_t     in_saved_state;
 } QEMU_PACKED VHDFooter;
@@ -980,9 +978,7 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)

    footer->type = cpu_to_be32(disk_type);

-#if defined(CONFIG_UUID)
-    uuid_generate(footer->uuid);
-#endif
+    qemu_uuid_generate(&footer->uuid);

    footer->checksum = cpu_to_be32(vpc_checksum(buf, HEADER_SIZE));

--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2971,7 +2971,8 @@ static BlockDriver vvfat_write_target = {
 static void vvfat_qcow_options(int *child_flags, QDict *child_options,
                               int parent_flags, QDict *parent_options)
 {
-    *child_flags = BDRV_O_RDWR | BDRV_O_NO_FLUSH;
+    qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "off");
+    *child_flags = BDRV_O_NO_FLUSH;
 }

 static const BdrvChildRole child_vvfat_qcow = {
--- a/block/write-threshold.c
+++ b/block/write-threshold.c
@@ -76,8 +76,7 @@ static int coroutine_fn before_write_notify(NotifierWithReturn *notifier,
 static void write_threshold_register_notifier(BlockDriverState *bs)
 {
    bs->write_threshold_notifier.notify = before_write_notify;
-    notifier_with_return_list_add(&bs->before_write_notifiers,
-                                  &bs->write_threshold_notifier);
+    bdrv_add_before_write_notifier(bs, &bs->write_threshold_notifier);
 }

 static void write_threshold_update(BlockDriverState *bs,
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -145,7 +145,8 @@ void qmp_nbd_server_start(SocketAddress *addr,
 void qmp_nbd_server_add(const char *device, bool has_writable, bool writable,
                        Error **errp)
 {
-    BlockBackend *blk;
+    BlockDriverState *bs = NULL;
+    BlockBackend *on_eject_blk;
    NBDExport *exp;

    if (!nbd_server) {
@@ -158,26 +159,22 @@ void qmp_nbd_server_add(const char *device, bool has_writable, bool writable,
        return;
    }

-    blk = blk_by_name(device);
-    if (!blk) {
-        error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
-                  "Device '%s' not found", device);
-        return;
-    }
-    if (!blk_is_inserted(blk)) {
-        error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
+    on_eject_blk = blk_by_name(device);
+
+    bs = bdrv_lookup_bs(device, device, errp);
+    if (!bs) {
        return;
    }

    if (!has_writable) {
        writable = false;
    }
-    if (blk_is_read_only(blk)) {
+    if (bdrv_is_read_only(bs)) {
        writable = false;
    }

-    exp = nbd_export_new(blk, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY, NULL,
-                         errp);
+    exp = nbd_export_new(bs, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY,
+                         NULL, false, on_eject_blk, errp);
    if (!exp) {
        return;
    }
--- a/blockdev.c
+++ b/blockdev.c
--- a/blockjob.c
+++ b/blockjob.c
@@ -132,6 +132,10 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,

    if (job_id == NULL) {
        job_id = bdrv_get_device_name(bs);
+        if (!*job_id) {
+            error_setg(errp, "An explicit job ID is required for this node");
+            return NULL;
+        }
    }

    if (!id_wellformed(job_id)) {
@@ -584,7 +588,6 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,

 typedef struct {
    BlockJob *job;
-    QEMUBH *bh;
    AioContext *aio_context;
    BlockJobDeferToMainLoopFn *fn;
    void *opaque;
@@ -595,8 +598,6 @@ static void block_job_defer_to_main_loop_bh(void *opaque)
    BlockJobDeferToMainLoopData *data = opaque;
    AioContext *aio_context;

-    qemu_bh_delete(data->bh);
-
    /* Prevent race with block_job_defer_to_main_loop() */
    aio_context_acquire(data->aio_context);

@@ -620,13 +621,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
 {
    BlockJobDeferToMainLoopData *data = g_malloc(sizeof(*data));
    data->job = job;
-    data->bh = qemu_bh_new(block_job_defer_to_main_loop_bh, data);
    data->aio_context = blk_get_aio_context(job->blk);
    data->fn = fn;
    data->opaque = opaque;
    job->deferred_to_main_loop = true;

-    qemu_bh_schedule(data->bh);
+    aio_bh_schedule_oneshot(qemu_get_aio_context(),
+                            block_job_defer_to_main_loop_bh, data);
 }

 BlockJobTxn *block_job_txn_new(void)
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -17,6 +17,7 @@
 *  along with this program; if not, see <http://www.gnu.org/licenses/>.
 */
 #include "qemu/osdep.h"
+#include "qemu-version.h"
 #include <machine/trap.h>

 #include "qapi/error.h"
@@ -66,23 +67,6 @@ int cpu_get_pic_interrupt(CPUX86State *env)
 }
 #endif

-/* These are no-ops because we are not threadsafe.  */
-static inline void cpu_exec_start(CPUArchState *env)
-{
-}
-
-static inline void cpu_exec_end(CPUArchState *env)
-{
-}
-
-static inline void start_exclusive(void)
-{
-}
-
-static inline void end_exclusive(void)
-{
-}
-
 void fork_start(void)
 {
 }
@@ -94,14 +78,6 @@ void fork_end(int child)
    }
 }

-void cpu_list_lock(void)
-{
-}
-
-void cpu_list_unlock(void)
-{
-}
-
 #ifdef TARGET_I386
 /***********************************************************/
 /* CPUX86 core interface */
@@ -171,7 +147,11 @@ void cpu_loop(CPUX86State *env)
    //target_siginfo_t info;

    for(;;) {
-        trapnr = cpu_x86_exec(cs);
+        cpu_exec_start(cs);
+        trapnr = cpu_exec(cs);
+        cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
+
        switch(trapnr) {
        case 0x80:
            /* syscall from int $0x80 */
@@ -512,7 +492,10 @@ void cpu_loop(CPUSPARCState *env)
    //target_siginfo_t info;

    while (1) {
-        trapnr = cpu_sparc_exec(cs);
+        cpu_exec_start(cs);
+        trapnr = cpu_exec(cs);
+        cpu_exec_end(cs);
+        process_queued_cpu_work(cs);

        switch (trapnr) {
 #ifndef TARGET_SPARC64
@@ -667,7 +650,8 @@ void cpu_loop(CPUSPARCState *env)

 static void usage(void)
 {
-    printf("qemu-" TARGET_NAME " version " QEMU_VERSION ", Copyright (c) 2003-2008 Fabrice Bellard\n"
+    printf("qemu-" TARGET_NAME " version " QEMU_VERSION QEMU_PKGVERSION
+           ", " QEMU_COPYRIGHT "\n"
           "usage: qemu-" TARGET_NAME " [options] program [arguments...]\n"
           "BSD CPU emulator (compiled for %s emulation)\n"
           "\n"
@@ -711,6 +695,16 @@ static void usage(void)

 THREAD CPUState *thread_cpu;

+bool qemu_cpu_is_self(CPUState *cpu)
+{
+    return thread_cpu == cpu;
+}
+
+void qemu_cpu_kick(CPUState *cpu)
+{
+    cpu_exit(cpu);
+}
+
 /* Assumes contents are already zeroed.  */
 void init_task_state(TaskState *ts)
 {
@@ -746,6 +740,8 @@ int main(int argc, char **argv)
    if (argc <= 1)
        usage();

+    module_call_init(MODULE_INIT_TRACE);
+    qemu_init_cpu_list();
    module_call_init(MODULE_INIT_QOM);

    if ((envlist = envlist_create()) == NULL) {
@@ -1131,7 +1127,6 @@ int main(int argc, char **argv)
        gdbserver_start (gdbstub_port);
        gdb_handlesig(cpu, 0);
    }
-    trace_init_vcpu_events();
    cpu_loop(env);
    /* never exits */
    return 0;
--- a/255
+++ b/255
@@ -212,7 +212,6 @@ sdlabi=""
 virtfs=""
 vnc="yes"
 sparse="no"
-uuid=""
 vde=""
 vnc_sasl=""
 vnc_jpeg=""
@@ -229,6 +228,7 @@ xfs=""

 vhost_net="no"
 vhost_scsi="no"
+vhost_vsock="no"
 kvm="no"
 rdma=""
 gprof="no"
@@ -296,6 +296,7 @@ libiscsi=""
 libnfs=""
 coroutine=""
 coroutine_pool=""
+debug_stack_usage="no"
 seccomp=""
 glusterfs=""
 glusterfs_xlator_opt="no"
@@ -316,10 +317,10 @@ vte=""
 virglrenderer=""
 tpm="yes"
 libssh2=""
-vhdx=""
 numa=""
 tcmalloc="no"
 jemalloc="no"
+replication="yes"

 # parse CC options first
 for opt do
@@ -388,7 +389,11 @@ sdl2_config="${SDL2_CONFIG-${cross_prefix}sdl2-config}"
 ARFLAGS="${ARFLAGS-rv}"

 # default flags for all hosts
-QEMU_CFLAGS="-fno-strict-aliasing -fno-common $QEMU_CFLAGS"
+# We use -fwrapv to tell the compiler that we require a C dialect where
+# left shift of signed integers is well defined and has the expected
+# 2s-complement style results. (Both clang and gcc agree that it
+# provides these semantics.)
+QEMU_CFLAGS="-fno-strict-aliasing -fno-common -fwrapv $QEMU_CFLAGS"
 QEMU_CFLAGS="-Wall -Wundef -Wwrite-strings -Wmissing-prototypes $QEMU_CFLAGS"
 QEMU_CFLAGS="-Wstrict-prototypes -Wredundant-decls $QEMU_CFLAGS"
 QEMU_CFLAGS="-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE $QEMU_CFLAGS"
@@ -505,8 +510,6 @@ elif check_define __arm__ ; then
  cpu="arm"
 elif check_define __aarch64__ ; then
  cpu="aarch64"
-elif check_define __hppa__ ; then
-  cpu="hppa"
 else
  cpu=$(uname -m)
 fi
@@ -674,6 +677,7 @@ Haiku)
  kvm="yes"
  vhost_net="yes"
  vhost_scsi="yes"
+  vhost_vsock="yes"
  QEMU_INCLUDES="-I\$(SRC_PATH)/linux-headers -I$(pwd)/linux-headers $QEMU_INCLUDES"
 ;;
 esac
@@ -882,10 +886,6 @@ for opt do
  ;;
  --disable-slirp) slirp="no"
  ;;
-  --disable-uuid) uuid="no"
-  ;;
-  --enable-uuid) uuid="yes"
-  ;;
  --disable-vde) vde="no"
  ;;
  --enable-vde) vde="yes"
@@ -1005,6 +1005,8 @@ for opt do
  ;;
  --enable-coroutine-pool) coroutine_pool="yes"
  ;;
+  --enable-debug-stack-usage) debug_stack_usage="yes"
+  ;;
  --disable-docs) docs="no"
  ;;
  --enable-docs) docs="yes"
@@ -1017,6 +1019,10 @@ for opt do
  ;;
  --enable-vhost-scsi) vhost_scsi="yes"
  ;;
+  --disable-vhost-vsock) vhost_vsock="no"
+  ;;
+  --enable-vhost-vsock) vhost_vsock="yes"
+  ;;
  --disable-opengl) opengl="no"
  ;;
  --enable-opengl) opengl="yes"
@@ -1094,6 +1100,12 @@ for opt do
  --disable-virtio-blk-data-plane|--enable-virtio-blk-data-plane)
      echo "$0: $opt is obsolete, virtio-blk data-plane is always on" >&2
  ;;
+  --enable-vhdx|--disable-vhdx)
+      echo "$0: $opt is obsolete, VHDX driver is always built" >&2
+  ;;
+  --enable-uuid|--disable-uuid)
+      echo "$0: $opt is obsolete, UUID support is always built" >&2
+  ;;
  --disable-gtk) gtk="no"
  ;;
  --enable-gtk) gtk="yes"
@@ -1134,10 +1146,6 @@ for opt do
  ;;
  --enable-libssh2) libssh2="yes"
  ;;
-  --enable-vhdx) vhdx="yes"
-  ;;
-  --disable-vhdx) vhdx="no"
-  ;;
  --disable-numa) numa="no"
  ;;
  --enable-numa) numa="yes"
@@ -1150,6 +1158,10 @@ for opt do
  ;;
  --enable-jemalloc) jemalloc="yes"
  ;;
+  --disable-replication) replication="no"
+  ;;
+  --enable-replication) replication="yes"
+  ;;
  *)
      echo "ERROR: unknown option $opt"
      echo "Try '$0 --help' for more information"
@@ -1352,7 +1364,6 @@ disabled with --disable-FEATURE, default is enabled if available:
  bluez           bluez stack connectivity
  kvm             KVM acceleration support
  rdma            RDMA-based migration support
-  uuid            uuid support
  vde             support for vde network
  netmap          support for netmap network
  linux-aio       Linux AIO support
@@ -1376,10 +1387,10 @@ disabled with --disable-FEATURE, default is enabled if available:
  archipelago     Archipelago backend
  tpm             TPM support
  libssh2         ssh block device support
-  vhdx            support for the Microsoft VHDX image format
  numa            libnuma support
  tcmalloc        tcmalloc support
  jemalloc        jemalloc support
+  replication     replication support

 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -1452,7 +1463,7 @@ fi
 gcc_flags="-Wold-style-declaration -Wold-style-definition -Wtype-limits"
 gcc_flags="-Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers $gcc_flags"
 gcc_flags="-Wmissing-include-dirs -Wempty-body -Wnested-externs $gcc_flags"
-gcc_flags="-Wendif-labels $gcc_flags"
+gcc_flags="-Wendif-labels -Wno-shift-negative-value $gcc_flags"
 gcc_flags="-Wno-initializer-overrides $gcc_flags"
 gcc_flags="-Wno-string-plus-int $gcc_flags"
 # Note that we do not add -Werror to gcc_flags here, because that would
@@ -1714,6 +1725,19 @@ if test "$cocoa" = "yes"; then
    sdl=no
 fi

+# Some versions of Mac OS X incorrectly define SIZE_MAX
+cat > $TMPC << EOF
+#include <stdint.h>
+#include <stdio.h>
+int main(int argc, char *argv[]) {
+    return printf("%zu", SIZE_MAX);
+}
+EOF
+have_broken_size_max=no
+if ! compile_object -Werror ; then
+    have_broken_size_max=yes
+fi
+
 ##########################################
 # L2TPV3 probe

@@ -1788,28 +1812,19 @@ fi
 ##########################################
 # avx2 optimization requirement check

-
-if test "$static" = "no" ; then
-  cat > $TMPC << EOF
+cat > $TMPC << EOF
 #pragma GCC push_options
 #pragma GCC target("avx2")
 #include <cpuid.h>
 #include <immintrin.h>
-
 static int bar(void *a) {
-    return _mm256_movemask_epi8(_mm256_cmpeq_epi8(*(__m256i *)a, (__m256i){0}));
+    __m256i x = *(__m256i *)a;
+    return _mm256_testz_si256(x, x);
 }
-static void *bar_ifunc(void) {return (void*) bar;}
-int foo(void *a) __attribute__((ifunc("bar_ifunc")));
-int main(int argc, char *argv[]) { return foo(argv[0]);}
+int main(int argc, char *argv[]) { return bar(argv[0]); }
 EOF
-  if compile_object "" ; then
-      if has readelf; then
-          if readelf --syms $TMPO 2>/dev/null |grep -q "IFUNC.*foo"; then
-              avx2_opt="yes"
-          fi
-      fi
-  fi
+if compile_object "" ; then
+  avx2_opt="yes"
 fi

 #########################################
@@ -1953,6 +1968,61 @@ EOF
  # Xen unstable
  elif
      cat > $TMPC <<EOF &&
+/*
+ * If we have stable libs the we don't want the libxc compat
+ * layers, regardless of what CFLAGS we may have been given.
+ *
+ * Also, check if xengnttab_grant_copy_segment_t is defined and
+ * grant copy operation is implemented.
+ */
+#undef XC_WANT_COMPAT_EVTCHN_API
+#undef XC_WANT_COMPAT_GNTTAB_API
+#undef XC_WANT_COMPAT_MAP_FOREIGN_API
+#include <xenctrl.h>
+#include <xenstore.h>
+#include <xenevtchn.h>
+#include <xengnttab.h>
+#include <xenforeignmemory.h>
+#include <stdint.h>
+#include <xen/hvm/hvm_info_table.h>
+#if !defined(HVM_MAX_VCPUS)
+# error HVM_MAX_VCPUS not defined
+#endif
+int main(void) {
+  xc_interface *xc = NULL;
+  xenforeignmemory_handle *xfmem;
+  xenevtchn_handle *xe;
+  xengnttab_handle *xg;
+  xen_domain_handle_t handle;
+  xengnttab_grant_copy_segment_t* seg = NULL;
+
+  xs_daemon_open();
+
+  xc = xc_interface_open(0, 0, 0);
+  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
+  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
+  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
+  xc_hvm_create_ioreq_server(xc, 0, HVM_IOREQSRV_BUFIOREQ_ATOMIC, NULL);
+  xc_domain_create(xc, 0, handle, 0, NULL, NULL);
+
+  xfmem = xenforeignmemory_open(0, 0);
+  xenforeignmemory_map(xfmem, 0, 0, 0, 0, 0);
+
+  xe = xenevtchn_open(0, 0);
+  xenevtchn_fd(xe);
+
+  xg = xengnttab_open(0, 0);
+  xengnttab_grant_copy(xg, 0, seg);
+
+  return 0;
+}
+EOF
+      compile_prog "" "$xen_libs $xen_stable_libs"
+    then
+    xen_ctrl_version=480
+    xen=yes
+  elif
+      cat > $TMPC <<EOF &&
 /*
 * If we have stable libs the we don't want the libxc compat
 * layers, regardless of what CFLAGS we may have been given.
@@ -2656,47 +2726,6 @@ if compile_prog "" "" ; then
   fnmatch="yes"
 fi

-##########################################
-# uuid_generate() probe, used for vdi block driver
-# Note that on some systems (notably MacOSX) no extra library
-# need be linked to get the uuid functions.
-if test "$uuid" != "no" ; then
-  uuid_libs="-luuid"
-  cat > $TMPC << EOF
-#include <uuid/uuid.h>
-int main(void)
-{
-    uuid_t my_uuid;
-    uuid_generate(my_uuid);
-    return 0;
-}
-EOF
-  if compile_prog "" "" ; then
-    uuid="yes"
-  elif compile_prog "" "$uuid_libs" ; then
-    uuid="yes"
-    libs_softmmu="$uuid_libs $libs_softmmu"
-    libs_tools="$uuid_libs $libs_tools"
-  else
-    if test "$uuid" = "yes" ; then
-      feature_not_found "uuid" "Install libuuid devel"
-    fi
-    uuid=no
-  fi
-fi
-
-if test "$vhdx" = "yes" ; then
-    if test "$uuid" = "no" ; then
-        error_exit "uuid required for VHDX support"
-    fi
-elif test "$vhdx" != "no" ; then
-    if test "$uuid" = "yes" ; then
-        vhdx=yes
-    else
-        vhdx=no
-    fi
-fi
-
 ##########################################
 # xfsctl() probe, used for raw-posix
 if test "$xfs" != "no" ; then
@@ -2975,7 +3004,7 @@ for i in $glib_modules; do
    if $pkg_config --atleast-version=$glib_req_ver $i; then
        glib_cflags=$($pkg_config --cflags $i)
        glib_libs=$($pkg_config --libs $i)
-        CFLAGS="$glib_cflags $CFLAGS"
+        QEMU_CFLAGS="$glib_cflags $QEMU_CFLAGS"
        LIBS="$glib_libs $LIBS"
        libs_qga="$glib_libs $libs_qga"
    else
@@ -3009,7 +3038,7 @@ fi

 # g_test_trap_subprocess added in 2.38. Used by some tests.
 glib_subprocess=yes
-if ! $pkg_config --atleast-version=2.38 glib-2.0; then
+if test "$mingw32" = "yes" || ! $pkg_config --atleast-version=2.38 glib-2.0; then
    glib_subprocess=no
 fi

@@ -4070,7 +4099,7 @@ EOF
  if compile_prog "$vss_win32_include" "" ; then
    guest_agent_with_vss="yes"
    QEMU_CFLAGS="$QEMU_CFLAGS $vss_win32_include"
-    libs_qga="-lole32 -loleaut32 -lshlwapi -luuid -lstdc++ -Wl,--enable-stdcall-fixup $libs_qga"
+    libs_qga="-lole32 -loleaut32 -lshlwapi -lstdc++ -Wl,--enable-stdcall-fixup $libs_qga"
    qga_vss_provider="qga/vss-win32/qga-vss.dll qga/vss-win32/qga-vss.tlb"
  else
    if test "$vss_win32_sdk" != "" ; then
@@ -4191,6 +4220,18 @@ if compile_prog "" "" ; then
    posix_madvise=yes
 fi

+##########################################
+# check if we have posix_syslog
+
+posix_syslog=no
+cat > $TMPC << EOF
+#include <syslog.h>
+int main(void) { openlog("qemu", LOG_PID, LOG_DAEMON); syslog(LOG_INFO, "configure"); return 0; }
+EOF
+if compile_prog "" "" ; then
+    posix_syslog=yes
+fi
+
 ##########################################
 # check if trace backend exists

@@ -4306,6 +4347,17 @@ if test "$coroutine" = "gthread" -a "$coroutine_pool" = "yes"; then
  error_exit "'gthread' coroutine backend does not support pool (use --disable-coroutine-pool)"
 fi

+if test "$debug_stack_usage" = "yes"; then
+  if test "$cpu" = "ia64" -o "$cpu" = "hppa"; then
+    error_exit "stack usage debugging is not supported for $cpu"
+  fi
+  if test "$coroutine_pool" = "yes"; then
+    echo "WARN: disabling coroutine pool for stack usage debugging"
+    coroutine_pool=no
+  fi
+fi
+
+
 ##########################################
 # check if we have open_by_handle_at

@@ -4561,7 +4613,6 @@ if test "$libnfs" != "no" ; then
  if $pkg_config --atleast-version=1.9.3 libnfs; then
    libnfs="yes"
    libnfs_libs=$($pkg_config --libs libnfs)
-    LIBS="$LIBS $libnfs_libs"
  else
    if test "$libnfs" = "yes" ; then
      feature_not_found "libnfs" "Install libnfs devel >= 1.9.3"
@@ -4699,7 +4750,16 @@ roms=
 if test \( "$cpu" = "i386" -o "$cpu" = "x86_64" \) -a \
        "$targetos" != "Darwin" -a "$targetos" != "SunOS" -a \
        "$softmmu" = yes ; then
-  roms="optionrom"
+    # Different host OS linkers have different ideas about the name of the ELF
+    # emulation. Linux and OpenBSD use 'elf_i386'; FreeBSD uses the _fbsd
+    # variant; and Windows uses i386pe.
+    for emu in elf_i386 elf_i386_fbsd i386pe; do
+        if "$ld" -verbose 2>&1 | grep -q "^[[:space:]]*$emu[[:space:]]*$"; then
+            ld_i386_emulation="$emu"
+            roms="optionrom"
+            break
+        fi
+    done
 fi
 if test "$cpu" = "ppc64" -a "$targetos" != "Darwin" ; then
  roms="$roms spapr-rtas"
@@ -4858,10 +4918,10 @@ echo "preadv support    $preadv"
 echo "fdatasync         $fdatasync"
 echo "madvise           $madvise"
 echo "posix_madvise     $posix_madvise"
-echo "uuid support      $uuid"
 echo "libcap-ng support $cap_ng"
 echo "vhost-net support $vhost_net"
 echo "vhost-scsi support $vhost_scsi"
+echo "vhost-vsock support $vhost_vsock"
 echo "Trace backends    $trace_backends"
 if have_backend "simple"; then
 echo "Trace output file $trace_file-<pid>"
@@ -4883,6 +4943,7 @@ echo "QGA MSI support   $guest_agent_msi"
 echo "seccomp support   $seccomp"
 echo "coroutine backend $coroutine"
 echo "coroutine pool    $coroutine_pool"
+echo "debug stack usage $debug_stack_usage"
 echo "GlusterFS support $glusterfs"
 echo "Archipelago support $archipelago"
 echo "gcov              $gcov_tool"
@@ -4891,7 +4952,6 @@ echo "TPM support       $tpm"
 echo "libssh2 support   $libssh2"
 echo "TPM passthrough   $tpm_passthrough"
 echo "QOM debugging     $qom_cast_debug"
-echo "vhdx              $vhdx"
 echo "lzo support       $lzo"
 echo "snappy support    $snappy"
 echo "bzip2 support     $bzip2"
@@ -4899,6 +4959,7 @@ echo "NUMA host support $numa"
 echo "tcmalloc support  $tcmalloc"
 echo "jemalloc support  $jemalloc"
 echo "avx2 optimization $avx2_opt"
+echo "replication support $replication"

 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -5047,9 +5108,6 @@ fi
 if test "$fnmatch" = "yes" ; then
  echo "CONFIG_FNMATCH=y" >> $config_host_mak
 fi
-if test "$uuid" = "yes" ; then
-  echo "CONFIG_UUID=y" >> $config_host_mak
-fi
 if test "$xfs" = "yes" ; then
  echo "CONFIG_XFS=y" >> $config_host_mak
 fi
@@ -5165,7 +5223,6 @@ fi
 if test "$glib_subprocess" = "yes" ; then
  echo "CONFIG_HAS_GLIB_SUBPROCESS_TESTS=y" >> $config_host_mak
 fi
-echo "GLIB_CFLAGS=$glib_cflags" >> $config_host_mak
 if test "$gtk" = "yes" ; then
  echo "CONFIG_GTK=y" >> $config_host_mak
  echo "CONFIG_GTKABI=$gtkabi" >> $config_host_mak
@@ -5201,6 +5258,9 @@ fi
 if test "$have_ifaddrs_h" = "yes" ; then
    echo "HAVE_IFADDRS_H=y" >> $config_host_mak
 fi
+if test "$have_broken_size_max" = "yes" ; then
+    echo "HAVE_BROKEN_SIZE_MAX=y" >> $config_host_mak
+fi

 # Work around a system header bug with some kernel/XFS header
 # versions where they both try to define 'struct fsxattr':
@@ -5243,6 +5303,9 @@ fi
 if test "$vhost_net" = "yes" ; then
  echo "CONFIG_VHOST_NET_USED=y" >> $config_host_mak
 fi
+if test "$vhost_vsock" = "yes" ; then
+  echo "CONFIG_VHOST_VSOCK=y" >> $config_host_mak
+fi
 if test "$blobs" = "yes" ; then
  echo "INSTALL_BLOBS=yes" >> $config_host_mak
 fi
@@ -5320,7 +5383,8 @@ if test "$libiscsi" = "yes" ; then
 fi

 if test "$libnfs" = "yes" ; then
-  echo "CONFIG_LIBNFS=y" >> $config_host_mak
+  echo "CONFIG_LIBNFS=m" >> $config_host_mak
+  echo "LIBNFS_LIBS=$libnfs_libs" >> $config_host_mak
 fi

 if test "$seccomp" = "yes"; then
@@ -5351,6 +5415,10 @@ else
  echo "CONFIG_COROUTINE_POOL=0" >> $config_host_mak
 fi

+if test "$debug_stack_usage" = "yes" ; then
+  echo "CONFIG_DEBUG_STACK_USAGE=y" >> $config_host_mak
+fi
+
 if test "$open_by_handle_at" = "yes" ; then
  echo "CONFIG_OPEN_BY_HANDLE=y" >> $config_host_mak
 fi
@@ -5412,10 +5480,6 @@ if test "$libssh2" = "yes" ; then
  echo "LIBSSH2_LIBS=$libssh2_libs" >> $config_host_mak
 fi

-if test "$vhdx" = "yes" ; then
-  echo "CONFIG_VHDX=y" >> $config_host_mak
-fi
-
 # USB host support
 if test "$libusb" = "yes"; then
  echo "HOST_USB=libusb legacy" >> $config_host_mak
@@ -5459,6 +5523,13 @@ if have_backend "ftrace"; then
    feature_not_found "ftrace(trace backend)" "ftrace requires Linux"
  fi
 fi
+if have_backend "syslog"; then
+  if test "$posix_syslog" = "yes" ; then
+    echo "CONFIG_TRACE_SYSLOG=y" >> $config_host_mak
+  else
+    feature_not_found "syslog(trace backend)" "syslog not available"
+  fi
+fi
 echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak

 if test "$rdma" = "yes" ; then
@@ -5469,6 +5540,10 @@ if test "$have_rtnetlink" = "yes" ; then
  echo "CONFIG_RTNETLINK=y" >> $config_host_mak
 fi

+if test "$replication" = "yes" ; then
+  echo "CONFIG_REPLICATION=y" >> $config_host_mak
+fi
+
 # Hold two types of flag:
 #   CONFIG_THREAD_SETNAME_BYTHREAD  - we've got a way of setting the name on
 #                                     a thread we have a handle to
@@ -5539,6 +5614,7 @@ fi
 echo "LDFLAGS=$LDFLAGS" >> $config_host_mak
 echo "LDFLAGS_NOPIE=$LDFLAGS_NOPIE" >> $config_host_mak
 echo "LD_REL_FLAGS=$LD_REL_FLAGS" >> $config_host_mak
+echo "LD_I386_EMULATION=$ld_i386_emulation" >> $config_host_mak
 echo "LIBS+=$LIBS" >> $config_host_mak
 echo "LIBS_TOOLS+=$libs_tools" >> $config_host_mak
 echo "PTHREAD_LIB=$PTHREAD_LIB" >> $config_host_mak
@@ -5854,9 +5930,6 @@ for i in $ARCH $TARGET_BASE_ARCH ; do
  cris)
    disas_config "CRIS"
  ;;
-  hppa)
-    disas_config "HPPA"
-  ;;
  i386|x86_64|x32)
    disas_config "I386"
  ;;
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -147,7 +147,8 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
                           itb->tc_ptr, itb->pc, lookup_symbol(itb->pc));

 #if defined(DEBUG_DISAS)
-    if (qemu_loglevel_mask(CPU_LOG_TB_CPU)) {
+    if (qemu_loglevel_mask(CPU_LOG_TB_CPU)
+        && qemu_log_in_addr_range(itb->pc)) {
 #if defined(TARGET_I386)
        log_cpu_state(cpu, CPU_DUMP_CCOP);
 #elif defined(TARGET_M68K)
@@ -191,7 +192,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
        /* We were asked to stop executing TBs (probably a pending
         * interrupt. We've now stopped, so clear the flag.
         */
-        cpu->tcg_exit_req = 0;
+        atomic_set(&cpu->tcg_exit_req, 0);
    }
    return ret;
 }
@@ -203,20 +204,16 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
                             TranslationBlock *orig_tb, bool ignore_icount)
 {
    TranslationBlock *tb;
-    bool old_tb_flushed;

    /* Should never happen.
       We only end up here when an existing TB is too long.  */
    if (max_cycles > CF_COUNT_MASK)
        max_cycles = CF_COUNT_MASK;

-    old_tb_flushed = cpu->tb_flushed;
-    cpu->tb_flushed = false;
    tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
                     max_cycles | CF_NOCACHE
                         | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
-    tb->orig_tb = cpu->tb_flushed ? NULL : orig_tb;
-    cpu->tb_flushed |= old_tb_flushed;
+    tb->orig_tb = orig_tb;
    /* execute the generated code */
    trace_exec_tb_nocache(tb, tb->pc);
    cpu_tb_exec(cpu, tb);
@@ -241,7 +238,8 @@ static bool tb_cmp(const void *p, const void *d)
    if (tb->pc == desc->pc &&
        tb->page_addr[0] == desc->phys_page1 &&
        tb->cs_base == desc->cs_base &&
-        tb->flags == desc->flags) {
+        tb->flags == desc->flags &&
+        !atomic_read(&tb->invalid)) {
        /* check next page if needed */
        if (tb->page_addr[1] == -1) {
            return true;
@@ -259,7 +257,7 @@ static bool tb_cmp(const void *p, const void *d)
    return false;
 }

-static TranslationBlock *tb_find_physical(CPUState *cpu,
+static TranslationBlock *tb_htable_lookup(CPUState *cpu,
                                          target_ulong pc,
                                          target_ulong cs_base,
                                          uint32_t flags)
@@ -278,72 +276,48 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
    return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
 }

-static TranslationBlock *tb_find_slow(CPUState *cpu,
-                                      target_ulong pc,
-                                      target_ulong cs_base,
-                                      uint32_t flags)
-{
-    TranslationBlock *tb;
-
-    tb = tb_find_physical(cpu, pc, cs_base, flags);
-    if (tb) {
-        goto found;
-    }
-
-#ifdef CONFIG_USER_ONLY
-    /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
-     * taken outside tb_lock.  Since we're momentarily dropping
-     * tb_lock, there's a chance that our desired tb has been
-     * translated.
-     */
-    tb_unlock();
-    mmap_lock();
-    tb_lock();
-    tb = tb_find_physical(cpu, pc, cs_base, flags);
-    if (tb) {
-        mmap_unlock();
-        goto found;
-    }
-#endif
-
-    /* if no translated code available, then translate it now */
-    tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
-
-#ifdef CONFIG_USER_ONLY
-    mmap_unlock();
-#endif
-
-found:
-    /* we add the TB in the virtual pc hash table */
-    cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
-    return tb;
-}
-
-static inline TranslationBlock *tb_find_fast(CPUState *cpu,
-                                             TranslationBlock **last_tb,
-                                             int tb_exit)
+static inline TranslationBlock *tb_find(CPUState *cpu,
+                                        TranslationBlock *last_tb,
+                                        int tb_exit)
 {
    CPUArchState *env = (CPUArchState *)cpu->env_ptr;
    TranslationBlock *tb;
    target_ulong cs_base, pc;
    uint32_t flags;
+    bool have_tb_lock = false;

    /* we record a subset of the CPU state. It will
       always be the same before a given translated block
       is executed. */
    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
-    tb_lock();
-    tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
+    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
    if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                 tb->flags != flags)) {
-        tb = tb_find_slow(cpu, pc, cs_base, flags);
-    }
-    if (cpu->tb_flushed) {
-        /* Ensure that no TB jump will be modified as the
-         * translation buffer has been flushed.
-         */
-        *last_tb = NULL;
-        cpu->tb_flushed = false;
+        tb = tb_htable_lookup(cpu, pc, cs_base, flags);
+        if (!tb) {
+
+            /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
+             * taken outside tb_lock. As system emulation is currently
+             * single threaded the locks are NOPs.
+             */
+            mmap_lock();
+            tb_lock();
+            have_tb_lock = true;
+
+            /* There's a chance that our desired tb has been translated while
+             * taking the locks so we check again inside the lock.
+             */
+            tb = tb_htable_lookup(cpu, pc, cs_base, flags);
+            if (!tb) {
+                /* if no translated code available, then translate it now */
+                tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+            }
+
+            mmap_unlock();
+        }
+
+        /* We add the TB in the virtual pc hash table for the fast lookup */
+        atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
    }
 #ifndef CONFIG_USER_ONLY
    /* We don't take care of direct jumps when address mapping changes in
@@ -351,14 +325,22 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
     * spanning two pages because the mapping for the second page can change.
     */
    if (tb->page_addr[1] != -1) {
-        *last_tb = NULL;
+        last_tb = NULL;
    }
 #endif
    /* See if we can patch the calling TB. */
-    if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
-        tb_add_jump(*last_tb, tb_exit, tb);
+    if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+        if (!have_tb_lock) {
+            tb_lock();
+            have_tb_lock = true;
+        }
+        if (!tb->invalid) {
+            tb_add_jump(last_tb, tb_exit, tb);
+        }
+    }
+    if (have_tb_lock) {
+        tb_unlock();
    }
-    tb_unlock();
    return tb;
 }

@@ -437,8 +419,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
    } else if (replay_has_exception()
               && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
        /* try to cause an exception pending in the log */
-        TranslationBlock *last_tb = NULL; /* Avoid chaining TBs */
-        cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, &last_tb, 0), true);
+        cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
        *ret = -1;
        return true;
 #endif
@@ -509,8 +490,8 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
            *last_tb = NULL;
        }
    }
-    if (unlikely(cpu->exit_request || replay_has_interrupt())) {
-        cpu->exit_request = 0;
+    if (unlikely(atomic_read(&cpu->exit_request) || replay_has_interrupt())) {
+        atomic_set(&cpu->exit_request, 0);
        cpu->exception_index = EXCP_INTERRUPT;
        cpu_loop_exit(cpu);
    }
@@ -522,7 +503,7 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
 {
    uintptr_t ret;

-    if (unlikely(cpu->exit_request)) {
+    if (unlikely(atomic_read(&cpu->exit_request))) {
        return;
    }

@@ -618,10 +599,9 @@ int cpu_exec(CPUState *cpu)
                break;
            }

-            cpu->tb_flushed = false; /* reset before first TB lookup */
            for(;;) {
                cpu_handle_interrupt(cpu, &last_tb);
-                tb = tb_find_fast(cpu, &last_tb, tb_exit);
+                tb = tb_find(cpu, last_tb, tb_exit);
                cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
                /* Try to align the host and virtual clocks
                   if the guest is in advance */
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -0,0 +1,352 @@
+/*
+ * CPU thread main loop - common bits for user and system mode emulation
+ *
+ *  Copyright (c) 2003-2005 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/main-loop.h"
+#include "exec/cpu-common.h"
+#include "qom/cpu.h"
+#include "sysemu/cpus.h"
+
+static QemuMutex qemu_cpu_list_lock;
+static QemuCond exclusive_cond;
+static QemuCond exclusive_resume;
+static QemuCond qemu_work_cond;
+
+/* >= 1 if a thread is inside start_exclusive/end_exclusive.  Written
+ * under qemu_cpu_list_lock, read with atomic operations.
+ */
+static int pending_cpus;
+
+void qemu_init_cpu_list(void)
+{
+    /* This is needed because qemu_init_cpu_list is also called by the
+     * child process in a fork.  */
+    pending_cpus = 0;
+
+    qemu_mutex_init(&qemu_cpu_list_lock);
+    qemu_cond_init(&exclusive_cond);
+    qemu_cond_init(&exclusive_resume);
+    qemu_cond_init(&qemu_work_cond);
+}
+
+void cpu_list_lock(void)
+{
+    qemu_mutex_lock(&qemu_cpu_list_lock);
+}
+
+void cpu_list_unlock(void)
+{
+    qemu_mutex_unlock(&qemu_cpu_list_lock);
+}
+
+static bool cpu_index_auto_assigned;
+
+static int cpu_get_free_index(void)
+{
+    CPUState *some_cpu;
+    int cpu_index = 0;
+
+    cpu_index_auto_assigned = true;
+    CPU_FOREACH(some_cpu) {
+        cpu_index++;
+    }
+    return cpu_index;
+}
+
+static void finish_safe_work(CPUState *cpu)
+{
+    cpu_exec_start(cpu);
+    cpu_exec_end(cpu);
+}
+
+void cpu_list_add(CPUState *cpu)
+{
+    qemu_mutex_lock(&qemu_cpu_list_lock);
+    if (cpu->cpu_index == UNASSIGNED_CPU_INDEX) {
+        cpu->cpu_index = cpu_get_free_index();
+        assert(cpu->cpu_index != UNASSIGNED_CPU_INDEX);
+    } else {
+        assert(!cpu_index_auto_assigned);
+    }
+    QTAILQ_INSERT_TAIL(&cpus, cpu, node);
+    qemu_mutex_unlock(&qemu_cpu_list_lock);
+
+    finish_safe_work(cpu);
+}
+
+void cpu_list_remove(CPUState *cpu)
+{
+    qemu_mutex_lock(&qemu_cpu_list_lock);
+    if (!QTAILQ_IN_USE(cpu, node)) {
+        /* there is nothing to undo since cpu_exec_init() hasn't been called */
+        qemu_mutex_unlock(&qemu_cpu_list_lock);
+        return;
+    }
+
+    assert(!(cpu_index_auto_assigned && cpu != QTAILQ_LAST(&cpus, CPUTailQ)));
+
+    QTAILQ_REMOVE(&cpus, cpu, node);
+    cpu->cpu_index = UNASSIGNED_CPU_INDEX;
+    qemu_mutex_unlock(&qemu_cpu_list_lock);
+}
+
+struct qemu_work_item {
+    struct qemu_work_item *next;
+    run_on_cpu_func func;
+    void *data;
+    bool free, exclusive, done;
+};
+
+static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
+{
+    qemu_mutex_lock(&cpu->work_mutex);
+    if (cpu->queued_work_first == NULL) {
+        cpu->queued_work_first = wi;
+    } else {
+        cpu->queued_work_last->next = wi;
+    }
+    cpu->queued_work_last = wi;
+    wi->next = NULL;
+    wi->done = false;
+    qemu_mutex_unlock(&cpu->work_mutex);
+
+    qemu_cpu_kick(cpu);
+}
+
+void do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data,
+                   QemuMutex *mutex)
+{
+    struct qemu_work_item wi;
+
+    if (qemu_cpu_is_self(cpu)) {
+        func(cpu, data);
+        return;
+    }
+
+    wi.func = func;
+    wi.data = data;
+    wi.done = false;
+    wi.free = false;
+    wi.exclusive = false;
+
+    queue_work_on_cpu(cpu, &wi);
+    while (!atomic_mb_read(&wi.done)) {
+        CPUState *self_cpu = current_cpu;
+
+        qemu_cond_wait(&qemu_work_cond, mutex);
+        current_cpu = self_cpu;
+    }
+}
+
+void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
+{
+    struct qemu_work_item *wi;
+
+    wi = g_malloc0(sizeof(struct qemu_work_item));
+    wi->func = func;
+    wi->data = data;
+    wi->free = true;
+
+    queue_work_on_cpu(cpu, wi);
+}
+
+/* Wait for pending exclusive operations to complete.  The CPU list lock
+   must be held.  */
+static inline void exclusive_idle(void)
+{
+    while (pending_cpus) {
+        qemu_cond_wait(&exclusive_resume, &qemu_cpu_list_lock);
+    }
+}
+
+/* Start an exclusive operation.
+   Must only be called from outside cpu_exec.  */
+void start_exclusive(void)
+{
+    CPUState *other_cpu;
+    int running_cpus;
+
+    qemu_mutex_lock(&qemu_cpu_list_lock);
+    exclusive_idle();
+
+    /* Make all other cpus stop executing.  */
+    atomic_set(&pending_cpus, 1);
+
+    /* Write pending_cpus before reading other_cpu->running.  */
+    smp_mb();
+    running_cpus = 0;
+    CPU_FOREACH(other_cpu) {
+        if (atomic_read(&other_cpu->running)) {
+            other_cpu->has_waiter = true;
+            running_cpus++;
+            qemu_cpu_kick(other_cpu);
+        }
+    }
+
+    atomic_set(&pending_cpus, running_cpus + 1);
+    while (pending_cpus > 1) {
+        qemu_cond_wait(&exclusive_cond, &qemu_cpu_list_lock);
+    }
+
+    /* Can release mutex, no one will enter another exclusive
+     * section until end_exclusive resets pending_cpus to 0.
+     */
+    qemu_mutex_unlock(&qemu_cpu_list_lock);
+}
+
+/* Finish an exclusive operation.  */
+void end_exclusive(void)
+{
+    qemu_mutex_lock(&qemu_cpu_list_lock);
+    atomic_set(&pending_cpus, 0);
+    qemu_cond_broadcast(&exclusive_resume);
+    qemu_mutex_unlock(&qemu_cpu_list_lock);
+}
+
+/* Wait for exclusive ops to finish, and begin cpu execution.  */
+void cpu_exec_start(CPUState *cpu)
+{
+    atomic_set(&cpu->running, true);
+
+    /* Write cpu->running before reading pending_cpus.  */
+    smp_mb();
+
+    /* 1. start_exclusive saw cpu->running == true and pending_cpus >= 1.
+     * After taking the lock we'll see cpu->has_waiter == true and run---not
+     * for long because start_exclusive kicked us.  cpu_exec_end will
+     * decrement pending_cpus and signal the waiter.
+     *
+     * 2. start_exclusive saw cpu->running == false but pending_cpus >= 1.
+     * This includes the case when an exclusive item is running now.
+     * Then we'll see cpu->has_waiter == false and wait for the item to
+     * complete.
+     *
+     * 3. pending_cpus == 0.  Then start_exclusive is definitely going to
+     * see cpu->running == true, and it will kick the CPU.
+     */
+    if (unlikely(atomic_read(&pending_cpus))) {
+        qemu_mutex_lock(&qemu_cpu_list_lock);
+        if (!cpu->has_waiter) {
+            /* Not counted in pending_cpus, let the exclusive item
+             * run.  Since we have the lock, just set cpu->running to true
+             * while holding it; no need to check pending_cpus again.
+             */
+            atomic_set(&cpu->running, false);
+            exclusive_idle();
+            /* Now pending_cpus is zero.  */
+            atomic_set(&cpu->running, true);
+        } else {
+            /* Counted in pending_cpus, go ahead and release the
+             * waiter at cpu_exec_end.
+             */
+        }
+        qemu_mutex_unlock(&qemu_cpu_list_lock);
+    }
+}
+
+/* Mark cpu as not executing, and release pending exclusive ops.  */
+void cpu_exec_end(CPUState *cpu)
+{
+    atomic_set(&cpu->running, false);
+
+    /* Write cpu->running before reading pending_cpus.  */
+    smp_mb();
+
+    /* 1. start_exclusive saw cpu->running == true.  Then it will increment
+     * pending_cpus and wait for exclusive_cond.  After taking the lock
+     * we'll see cpu->has_waiter == true.
+     *
+     * 2. start_exclusive saw cpu->running == false but here pending_cpus >= 1.
+     * This includes the case when an exclusive item started after setting
+     * cpu->running to false and before we read pending_cpus.  Then we'll see
+     * cpu->has_waiter == false and not touch pending_cpus.  The next call to
+     * cpu_exec_start will run exclusive_idle if still necessary, thus waiting
+     * for the item to complete.
+     *
+     * 3. pending_cpus == 0.  Then start_exclusive is definitely going to
+     * see cpu->running == false, and it can ignore this CPU until the
+     * next cpu_exec_start.
+     */
+    if (unlikely(atomic_read(&pending_cpus))) {
+        qemu_mutex_lock(&qemu_cpu_list_lock);
+        if (cpu->has_waiter) {
+            cpu->has_waiter = false;
+            atomic_set(&pending_cpus, pending_cpus - 1);
+            if (pending_cpus == 1) {
+                qemu_cond_signal(&exclusive_cond);
+            }
+        }
+        qemu_mutex_unlock(&qemu_cpu_list_lock);
+    }
+}
+
+void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
+{
+    struct qemu_work_item *wi;
+
+    wi = g_malloc0(sizeof(struct qemu_work_item));
+    wi->func = func;
+    wi->data = data;
+    wi->free = true;
+    wi->exclusive = true;
+
+    queue_work_on_cpu(cpu, wi);
+}
+
+void process_queued_cpu_work(CPUState *cpu)
+{
+    struct qemu_work_item *wi;
+
+    if (cpu->queued_work_first == NULL) {
+        return;
+    }
+
+    qemu_mutex_lock(&cpu->work_mutex);
+    while (cpu->queued_work_first != NULL) {
+        wi = cpu->queued_work_first;
+        cpu->queued_work_first = wi->next;
+        if (!cpu->queued_work_first) {
+            cpu->queued_work_last = NULL;
+        }
+        qemu_mutex_unlock(&cpu->work_mutex);
+        if (wi->exclusive) {
+            /* Running work items outside the BQL avoids the following deadlock:
+             * 1) start_exclusive() is called with the BQL taken while another
+             * CPU is running; 2) cpu_exec in the other CPU tries to takes the
+             * BQL, so it goes to sleep; start_exclusive() is sleeping too, so
+             * neither CPU can proceed.
+             */
+            qemu_mutex_unlock_iothread();
+            start_exclusive();
+            wi->func(cpu, wi->data);
+            end_exclusive();
+            qemu_mutex_lock_iothread();
+        } else {
+            wi->func(cpu, wi->data);
+        }
+        qemu_mutex_lock(&cpu->work_mutex);
+        if (wi->free) {
+            g_free(wi);
+        } else {
+            atomic_mb_set(&wi->done, true);
+        }
+    }
+    qemu_mutex_unlock(&cpu->work_mutex);
+    qemu_cond_broadcast(&qemu_work_cond);
+}
--- a/cpus.c
+++ b/cpus.c
@@ -191,8 +191,12 @@ int64_t cpu_icount_to_ns(int64_t icount)
    return icount << icount_time_shift;
 }

-/* return the host CPU cycle counter and handle stop/restart */
-/* Caller must hold the BQL */
+/* return the time elapsed in VM between vm_start and vm_stop.  Unless
+ * icount is active, cpu_get_ticks() uses units of the host CPU cycle
+ * counter.
+ *
+ * Caller must hold the BQL
+ */
 int64_t cpu_get_ticks(void)
 {
    int64_t ticks;
@@ -219,17 +223,19 @@ int64_t cpu_get_ticks(void)

 static int64_t cpu_get_clock_locked(void)
 {
-    int64_t ticks;
+    int64_t time;

-    ticks = timers_state.cpu_clock_offset;
+    time = timers_state.cpu_clock_offset;
    if (timers_state.cpu_ticks_enabled) {
-        ticks += get_clock();
+        time += get_clock();
    }

-    return ticks;
+    return time;
 }

-/* return the host CPU monotonic timer and handle stop/restart */
+/* Return the monotonic time elapsed in VM, i.e.,
+ * the time between vm_start and vm_stop
+ */
 int64_t cpu_get_clock(void)
 {
    int64_t ti;
@@ -244,7 +250,7 @@ int64_t cpu_get_clock(void)
 }

 /* enable cpu_get_ticks()
- * Caller must hold BQL which server as mutex for vm_clock_seqlock.
+ * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
 */
 void cpu_enable_ticks(void)
 {
@@ -260,7 +266,7 @@ void cpu_enable_ticks(void)

 /* disable cpu_get_ticks() : the clock is stopped. You must not call
 * cpu_get_ticks() after that.
- * Caller must hold BQL which server as mutex for vm_clock_seqlock.
+ * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
 */
 void cpu_disable_ticks(void)
 {
@@ -551,9 +557,8 @@ static const VMStateDescription vmstate_timers = {
    }
 };

-static void cpu_throttle_thread(void *opaque)
+static void cpu_throttle_thread(CPUState *cpu, void *opaque)
 {
-    CPUState *cpu = opaque;
    double pct;
    double throttle_ratio;
    long sleeptime_ns;
@@ -583,7 +588,7 @@ static void cpu_throttle_timer_tick(void *opaque)
    }
    CPU_FOREACH(cpu) {
        if (!atomic_xchg(&cpu->throttle_thread_scheduled, 1)) {
-            async_run_on_cpu(cpu, cpu_throttle_thread, cpu);
+            async_run_on_cpu(cpu, cpu_throttle_thread, NULL);
        }
    }

@@ -745,7 +750,8 @@ static int do_vm_stop(RunState state)
    }

    bdrv_drain_all();
-    ret = blk_flush_all();
+    replay_disable_events();
+    ret = bdrv_flush_all();

    return ret;
 }
@@ -897,79 +903,21 @@ static QemuThread io_thread;
 static QemuCond qemu_cpu_cond;
 /* system init */
 static QemuCond qemu_pause_cond;
-static QemuCond qemu_work_cond;

 void qemu_init_cpu_loop(void)
 {
    qemu_init_sigbus();
    qemu_cond_init(&qemu_cpu_cond);
    qemu_cond_init(&qemu_pause_cond);
-    qemu_cond_init(&qemu_work_cond);
    qemu_cond_init(&qemu_io_proceeded_cond);
    qemu_mutex_init(&qemu_global_mutex);

    qemu_thread_get_self(&io_thread);
 }

-void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
+void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
-    struct qemu_work_item wi;
-
-    if (qemu_cpu_is_self(cpu)) {
-        func(data);
-        return;
-    }
-
-    wi.func = func;
-    wi.data = data;
-    wi.free = false;
-
-    qemu_mutex_lock(&cpu->work_mutex);
-    if (cpu->queued_work_first == NULL) {
-        cpu->queued_work_first = &wi;
-    } else {
-        cpu->queued_work_last->next = &wi;
-    }
-    cpu->queued_work_last = &wi;
-    wi.next = NULL;
-    wi.done = false;
-    qemu_mutex_unlock(&cpu->work_mutex);
-
-    qemu_cpu_kick(cpu);
-    while (!atomic_mb_read(&wi.done)) {
-        CPUState *self_cpu = current_cpu;
-
-        qemu_cond_wait(&qemu_work_cond, &qemu_global_mutex);
-        current_cpu = self_cpu;
-    }
-}
-
-void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
-{
-    struct qemu_work_item *wi;
-
-    if (qemu_cpu_is_self(cpu)) {
-        func(data);
-        return;
-    }
-
-    wi = g_malloc0(sizeof(struct qemu_work_item));
-    wi->func = func;
-    wi->data = data;
-    wi->free = true;
-
-    qemu_mutex_lock(&cpu->work_mutex);
-    if (cpu->queued_work_first == NULL) {
-        cpu->queued_work_first = wi;
-    } else {
-        cpu->queued_work_last->next = wi;
-    }
-    cpu->queued_work_last = wi;
-    wi->next = NULL;
-    wi->done = false;
-    qemu_mutex_unlock(&cpu->work_mutex);
-
-    qemu_cpu_kick(cpu);
+    do_run_on_cpu(cpu, func, data, &qemu_global_mutex);
 }

 static void qemu_kvm_destroy_vcpu(CPUState *cpu)
@@ -984,34 +932,6 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
 {
 }

-static void flush_queued_work(CPUState *cpu)
-{
-    struct qemu_work_item *wi;
-
-    if (cpu->queued_work_first == NULL) {
-        return;
-    }
-
-    qemu_mutex_lock(&cpu->work_mutex);
-    while (cpu->queued_work_first != NULL) {
-        wi = cpu->queued_work_first;
-        cpu->queued_work_first = wi->next;
-        if (!cpu->queued_work_first) {
-            cpu->queued_work_last = NULL;
-        }
-        qemu_mutex_unlock(&cpu->work_mutex);
-        wi->func(wi->data);
-        qemu_mutex_lock(&cpu->work_mutex);
-        if (wi->free) {
-            g_free(wi);
-        } else {
-            atomic_mb_set(&wi->done, true);
-        }
-    }
-    qemu_mutex_unlock(&cpu->work_mutex);
-    qemu_cond_broadcast(&qemu_work_cond);
-}
-
 static void qemu_wait_io_event_common(CPUState *cpu)
 {
    if (cpu->stop) {
@@ -1019,7 +939,7 @@ static void qemu_wait_io_event_common(CPUState *cpu)
        cpu->stopped = true;
        qemu_cond_broadcast(&qemu_pause_cond);
    }
-    flush_queued_work(cpu);
+    process_queued_cpu_work(cpu);
    cpu->thread_kicked = false;
 }

@@ -1488,7 +1408,7 @@ int vm_stop_force_state(RunState state)
        bdrv_drain_all();
        /* Make sure to return an error if the flush in a previous vm_stop()
         * failed. */
-        return blk_flush_all();
+        return bdrv_flush_all();
    }
 }

@@ -1538,7 +1458,9 @@ static int tcg_cpu_exec(CPUState *cpu)
        cpu->icount_decr.u16.low = decr;
        cpu->icount_extra = count;
    }
+    cpu_exec_start(cpu);
    ret = cpu_exec(cpu);
+    cpu_exec_end(cpu);
 #ifdef CONFIG_PROFILER
    tcg_time += profile_getclock() - ti;
 #endif
--- a/cputlb.c
+++ b/cputlb.c
@@ -543,10 +543,8 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
 #undef MMUSUFFIX

 #define MMUSUFFIX _cmmu
-#undef GETPC_ADJ
-#define GETPC_ADJ 0
-#undef GETRA
-#define GETRA() ((uintptr_t)0)
+#undef GETPC
+#define GETPC() ((uintptr_t)0)
 #define SOFTMMU_CODE_ACCESS

 #define SHIFT 0
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -29,10 +29,7 @@
 #include "crypto/pbkdf.h"
 #include "crypto/secret.h"
 #include "crypto/random.h"
-
-#ifdef CONFIG_UUID
-#include <uuid/uuid.h>
-#endif
+#include "qemu/uuid.h"

 #include "qemu/coroutine.h"

@@ -877,18 +874,12 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 }


-static int
-qcrypto_block_luks_uuid_gen(uint8_t *uuidstr, Error **errp)
+static void
+qcrypto_block_luks_uuid_gen(uint8_t *uuidstr)
 {
-#ifdef CONFIG_UUID
-    uuid_t uuid;
-    uuid_generate(uuid);
-    uuid_unparse(uuid, (char *)uuidstr);
-    return 0;
-#else
-    error_setg(errp, "Unable to generate uuids on this platform");
-    return -1;
-#endif
+    QemuUUID uuid;
+    qemu_uuid_generate(&uuid);
+    qemu_uuid_unparse(&uuid, (char *)uuidstr);
 }

 static int
@@ -917,8 +908,12 @@ qcrypto_block_luks_create(QCryptoBlock *block,
    const char *hash_alg;
    char *cipher_mode_spec = NULL;
    QCryptoCipherAlgorithm ivcipheralg = 0;
+    uint64_t iters;

    memcpy(&luks_opts, &options->u.luks, sizeof(luks_opts));
+    if (!luks_opts.has_iter_time) {
+        luks_opts.iter_time = 2000;
+    }
    if (!luks_opts.has_cipher_alg) {
        luks_opts.cipher_alg = QCRYPTO_CIPHER_ALG_AES_256;
    }
@@ -961,10 +956,7 @@ qcrypto_block_luks_create(QCryptoBlock *block,
     * it out to disk
     */
    luks->header.version = QCRYPTO_BLOCK_LUKS_VERSION;
-    if (qcrypto_block_luks_uuid_gen(luks->header.uuid,
-                                    errp) < 0) {
-        goto error;
-    }
+    qcrypto_block_luks_uuid_gen(luks->header.uuid);

    cipher_alg = qcrypto_block_luks_cipher_alg_lookup(luks_opts.cipher_alg,
                                                      errp);
@@ -1064,26 +1056,40 @@ qcrypto_block_luks_create(QCryptoBlock *block,
    /* Determine how many iterations we need to hash the master
     * key, in order to have 1 second of compute time used
     */
-    luks->header.master_key_iterations =
-        qcrypto_pbkdf2_count_iters(luks_opts.hash_alg,
-                                   masterkey, luks->header.key_bytes,
-                                   luks->header.master_key_salt,
-                                   QCRYPTO_BLOCK_LUKS_SALT_LEN,
-                                   &local_err);
+    iters = qcrypto_pbkdf2_count_iters(luks_opts.hash_alg,
+                                       masterkey, luks->header.key_bytes,
+                                       luks->header.master_key_salt,
+                                       QCRYPTO_BLOCK_LUKS_SALT_LEN,
+                                       QCRYPTO_BLOCK_LUKS_DIGEST_LEN,
+                                       &local_err);
    if (local_err) {
        error_propagate(errp, local_err);
        goto error;
    }

+    if (iters > (ULLONG_MAX / luks_opts.iter_time)) {
+        error_setg_errno(errp, ERANGE,
+                         "PBKDF iterations %llu too large to scale",
+                         (unsigned long long)iters);
+        goto error;
+    }
+
+    /* iter_time was in millis, but count_iters reported for secs */
+    iters = iters * luks_opts.iter_time / 1000;
+
    /* Why /= 8 ?  That matches cryptsetup, but there's no
     * explanation why they chose /= 8... Probably so that
     * if all 8 keyslots are active we only spend 1 second
     * in total time to check all keys */
-    luks->header.master_key_iterations /= 8;
-    luks->header.master_key_iterations = MAX(
-        luks->header.master_key_iterations,
-        QCRYPTO_BLOCK_LUKS_MIN_MASTER_KEY_ITERS);
-
+    iters /= 8;
+    if (iters > UINT32_MAX) {
+        error_setg_errno(errp, ERANGE,
+                         "PBKDF iterations %llu larger than %u",
+                         (unsigned long long)iters, UINT32_MAX);
+        goto error;
+    }
+    iters = MAX(iters, QCRYPTO_BLOCK_LUKS_MIN_MASTER_KEY_ITERS);
+    luks->header.master_key_iterations = iters;

    /* Hash the master key, saving the result in the LUKS
     * header. This hash is used when opening the encrypted
@@ -1131,22 +1137,36 @@ qcrypto_block_luks_create(QCryptoBlock *block,
    /* Again we determine how many iterations are required to
     * hash the user password while consuming 1 second of compute
     * time */
-    luks->header.key_slots[0].iterations =
-        qcrypto_pbkdf2_count_iters(luks_opts.hash_alg,
-                                   (uint8_t *)password, strlen(password),
-                                   luks->header.key_slots[0].salt,
-                                   QCRYPTO_BLOCK_LUKS_SALT_LEN,
-                                   &local_err);
+    iters = qcrypto_pbkdf2_count_iters(luks_opts.hash_alg,
+                                       (uint8_t *)password, strlen(password),
+                                       luks->header.key_slots[0].salt,
+                                       QCRYPTO_BLOCK_LUKS_SALT_LEN,
+                                       luks->header.key_bytes,
+                                       &local_err);
    if (local_err) {
        error_propagate(errp, local_err);
        goto error;
    }
-    /* Why /= 2 ?  That matches cryptsetup, but there's no
-     * explanation why they chose /= 2... */
-    luks->header.key_slots[0].iterations /= 2;
-    luks->header.key_slots[0].iterations = MAX(
-        luks->header.key_slots[0].iterations,
-        QCRYPTO_BLOCK_LUKS_MIN_SLOT_KEY_ITERS);
+
+    if (iters > (ULLONG_MAX / luks_opts.iter_time)) {
+        error_setg_errno(errp, ERANGE,
+                         "PBKDF iterations %llu too large to scale",
+                         (unsigned long long)iters);
+        goto error;
+    }
+
+    /* iter_time was in millis, but count_iters reported for secs */
+    iters = iters * luks_opts.iter_time / 1000;
+
+    if (iters > UINT32_MAX) {
+        error_setg_errno(errp, ERANGE,
+                         "PBKDF iterations %llu larger than %u",
+                         (unsigned long long)iters, UINT32_MAX);
+        goto error;
+    }
+
+    luks->header.key_slots[0].iterations =
+        MAX(iters, QCRYPTO_BLOCK_LUKS_MIN_SLOT_KEY_ITERS);


    /* Generate a key that we'll use to encrypt the master
--- a/crypto/block.c
+++ b/crypto/block.c
@@ -59,7 +59,8 @@ QCryptoBlock *qcrypto_block_open(QCryptoBlockOpenOptions *options,

    if (options->format >= G_N_ELEMENTS(qcrypto_block_drivers) ||
        !qcrypto_block_drivers[options->format]) {
-        error_setg(errp, "Unsupported block driver %d", options->format);
+        error_setg(errp, "Unsupported block driver %s",
+                   QCryptoBlockFormat_lookup[options->format]);
        g_free(block);
        return NULL;
    }
@@ -88,7 +89,8 @@ QCryptoBlock *qcrypto_block_create(QCryptoBlockCreateOptions *options,

    if (options->format >= G_N_ELEMENTS(qcrypto_block_drivers) ||
        !qcrypto_block_drivers[options->format]) {
-        error_setg(errp, "Unsupported block driver %d", options->format);
+        error_setg(errp, "Unsupported block driver %s",
+                   QCryptoBlockFormat_lookup[options->format]);
        g_free(block);
        return NULL;
    }
--- a/crypto/cipher-builtin.c
+++ b/crypto/cipher-builtin.c
@@ -244,7 +244,8 @@ static int qcrypto_cipher_init_aes(QCryptoCipher *cipher,
    if (cipher->mode != QCRYPTO_CIPHER_MODE_CBC &&
        cipher->mode != QCRYPTO_CIPHER_MODE_ECB &&
        cipher->mode != QCRYPTO_CIPHER_MODE_XTS) {
-        error_setg(errp, "Unsupported cipher mode %d", cipher->mode);
+        error_setg(errp, "Unsupported cipher mode %s",
+                   QCryptoCipherMode_lookup[cipher->mode]);
        return -1;
    }

@@ -376,7 +377,8 @@ static int qcrypto_cipher_init_des_rfb(QCryptoCipher *cipher,
    QCryptoCipherBuiltin *ctxt;

    if (cipher->mode != QCRYPTO_CIPHER_MODE_ECB) {
-        error_setg(errp, "Unsupported cipher mode %d", cipher->mode);
+        error_setg(errp, "Unsupported cipher mode %s",
+                   QCryptoCipherMode_lookup[cipher->mode]);
        return -1;
    }

@@ -442,7 +444,8 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
        break;
    default:
        error_setg(errp,
-                   "Unsupported cipher algorithm %d", cipher->alg);
+                   "Unsupported cipher algorithm %s",
+                   QCryptoCipherAlgorithm_lookup[cipher->alg]);
        goto error;
    }

--- a/crypto/cipher-gcrypt.c
+++ b/crypto/cipher-gcrypt.c
@@ -70,7 +70,8 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
        gcrymode = GCRY_CIPHER_MODE_CBC;
        break;
    default:
-        error_setg(errp, "Unsupported cipher mode %d", mode);
+        error_setg(errp, "Unsupported cipher mode %s",
+                   QCryptoCipherMode_lookup[mode]);
        return NULL;
    }

@@ -120,7 +121,8 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
        break;

    default:
-        error_setg(errp, "Unsupported cipher algorithm %d", alg);
+        error_setg(errp, "Unsupported cipher algorithm %s",
+                   QCryptoCipherAlgorithm_lookup[alg]);
        return NULL;
    }

@@ -192,6 +194,12 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
    }

    if (cipher->mode == QCRYPTO_CIPHER_MODE_XTS) {
+        if (ctx->blocksize != XTS_BLOCK_SIZE) {
+            error_setg(errp,
+                       "Cipher block size %zu must equal XTS block size %d",
+                       ctx->blocksize, XTS_BLOCK_SIZE);
+            goto error;
+        }
        ctx->iv = g_new0(uint8_t, ctx->blocksize);
    }

--- a/crypto/cipher-nettle.c
+++ b/crypto/cipher-nettle.c
@@ -227,7 +227,8 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
    case QCRYPTO_CIPHER_MODE_XTS:
        break;
    default:
-        error_setg(errp, "Unsupported cipher mode %d", mode);
+        error_setg(errp, "Unsupported cipher mode %s",
+                   QCryptoCipherMode_lookup[mode]);
        return NULL;
    }

@@ -357,7 +358,15 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
        break;

    default:
-        error_setg(errp, "Unsupported cipher algorithm %d", alg);
+        error_setg(errp, "Unsupported cipher algorithm %s",
+                   QCryptoCipherAlgorithm_lookup[alg]);
+        goto error;
+    }
+
+    if (mode == QCRYPTO_CIPHER_MODE_XTS &&
+        ctx->blocksize != XTS_BLOCK_SIZE) {
+        error_setg(errp, "Cipher block size %zu must equal XTS block size %d",
+                   ctx->blocksize, XTS_BLOCK_SIZE);
        goto error;
    }

@@ -422,8 +431,8 @@ int qcrypto_cipher_encrypt(QCryptoCipher *cipher,
        break;

    default:
-        error_setg(errp, "Unsupported cipher algorithm %d",
-                   cipher->alg);
+        error_setg(errp, "Unsupported cipher mode %s",
+                   QCryptoCipherMode_lookup[cipher->mode]);
        return -1;
    }
    return 0;
@@ -456,19 +465,14 @@ int qcrypto_cipher_decrypt(QCryptoCipher *cipher,
        break;

    case QCRYPTO_CIPHER_MODE_XTS:
-        if (ctx->blocksize != XTS_BLOCK_SIZE) {
-            error_setg(errp, "Block size must be %d not %zu",
-                       XTS_BLOCK_SIZE, ctx->blocksize);
-            return -1;
-        }
        xts_decrypt(ctx->ctx, ctx->ctx_tweak,
                    ctx->alg_encrypt_wrapper, ctx->alg_decrypt_wrapper,
                    ctx->iv, len, out, in);
        break;

    default:
-        error_setg(errp, "Unsupported cipher algorithm %d",
-                   cipher->alg);
+        error_setg(errp, "Unsupported cipher mode %s",
+                   QCryptoCipherMode_lookup[cipher->mode]);
        return -1;
    }
    return 0;
--- a/crypto/init.c
+++ b/crypto/init.c
@@ -59,8 +59,7 @@

 #if (defined(CONFIG_GCRYPT) &&                  \
     (!defined(CONFIG_GNUTLS) ||                \
-      !defined(GNUTLS_VERSION_NUMBER) ||       \
-      (GNUTLS_VERSION_NUMBER < 0x020c00)) &&    \
+     (LIBGNUTLS_VERSION_NUMBER < 0x020c00)) &&    \
     (!defined(GCRYPT_VERSION_NUMBER) ||        \
      (GCRYPT_VERSION_NUMBER < 0x010600)))
 #define QCRYPTO_INIT_GCRYPT_THREADS
--- a/crypto/pbkdf-gcrypt.c
+++ b/crypto/pbkdf-gcrypt.c
@@ -28,7 +28,11 @@ bool qcrypto_pbkdf2_supports(QCryptoHashAlgorithm hash)
    switch (hash) {
    case QCRYPTO_HASH_ALG_MD5:
    case QCRYPTO_HASH_ALG_SHA1:
+    case QCRYPTO_HASH_ALG_SHA224:
    case QCRYPTO_HASH_ALG_SHA256:
+    case QCRYPTO_HASH_ALG_SHA384:
+    case QCRYPTO_HASH_ALG_SHA512:
+    case QCRYPTO_HASH_ALG_RIPEMD160:
        return true;
    default:
        return false;
@@ -38,20 +42,33 @@ bool qcrypto_pbkdf2_supports(QCryptoHashAlgorithm hash)
 int qcrypto_pbkdf2(QCryptoHashAlgorithm hash,
                   const uint8_t *key, size_t nkey,
                   const uint8_t *salt, size_t nsalt,
-                   unsigned int iterations,
+                   uint64_t iterations,
                   uint8_t *out, size_t nout,
                   Error **errp)
 {
    static const int hash_map[QCRYPTO_HASH_ALG__MAX] = {
        [QCRYPTO_HASH_ALG_MD5] = GCRY_MD_MD5,
        [QCRYPTO_HASH_ALG_SHA1] = GCRY_MD_SHA1,
+        [QCRYPTO_HASH_ALG_SHA224] = GCRY_MD_SHA224,
        [QCRYPTO_HASH_ALG_SHA256] = GCRY_MD_SHA256,
+        [QCRYPTO_HASH_ALG_SHA384] = GCRY_MD_SHA384,
+        [QCRYPTO_HASH_ALG_SHA512] = GCRY_MD_SHA512,
+        [QCRYPTO_HASH_ALG_RIPEMD160] = GCRY_MD_RMD160,
    };
    int ret;

+    if (iterations > ULONG_MAX) {
+        error_setg_errno(errp, ERANGE,
+                         "PBKDF iterations %llu must be less than %lu",
+                         (long long unsigned)iterations, ULONG_MAX);
+        return -1;
+    }
+
    if (hash >= G_N_ELEMENTS(hash_map) ||
        hash_map[hash] == GCRY_MD_NONE) {
-        error_setg(errp, "Unexpected hash algorithm %d", hash);
+        error_setg_errno(errp, ENOSYS,
+                         "PBKDF does not support hash algorithm %s",
+                         QCryptoHashAlgorithm_lookup[hash]);
        return -1;
    }

--- a/crypto/pbkdf-nettle.c
+++ b/crypto/pbkdf-nettle.c
@@ -20,6 +20,7 @@

 #include "qemu/osdep.h"
 #include <nettle/pbkdf2.h>
+#include <nettle/hmac.h>
 #include "qapi/error.h"
 #include "crypto/pbkdf.h"

@@ -28,7 +29,11 @@ bool qcrypto_pbkdf2_supports(QCryptoHashAlgorithm hash)
 {
    switch (hash) {
    case QCRYPTO_HASH_ALG_SHA1:
+    case QCRYPTO_HASH_ALG_SHA224:
    case QCRYPTO_HASH_ALG_SHA256:
+    case QCRYPTO_HASH_ALG_SHA384:
+    case QCRYPTO_HASH_ALG_SHA512:
+    case QCRYPTO_HASH_ALG_RIPEMD160:
        return true;
    default:
        return false;
@@ -38,28 +43,74 @@ bool qcrypto_pbkdf2_supports(QCryptoHashAlgorithm hash)
 int qcrypto_pbkdf2(QCryptoHashAlgorithm hash,
                   const uint8_t *key, size_t nkey,
                   const uint8_t *salt, size_t nsalt,
-                   unsigned int iterations,
+                   uint64_t iterations,
                   uint8_t *out, size_t nout,
                   Error **errp)
 {
+    union {
+        struct hmac_md5_ctx md5;
+        struct hmac_sha1_ctx sha1;
+        struct hmac_sha224_ctx sha224;
+        struct hmac_sha256_ctx sha256;
+        struct hmac_sha384_ctx sha384;
+        struct hmac_sha512_ctx sha512;
+        struct hmac_ripemd160_ctx ripemd160;
+    } ctx;
+
+    if (iterations > UINT_MAX) {
+        error_setg_errno(errp, ERANGE,
+                         "PBKDF iterations %llu must be less than %u",
+                         (long long unsigned)iterations, UINT_MAX);
+        return -1;
+    }
+
    switch (hash) {
+    case QCRYPTO_HASH_ALG_MD5:
+        hmac_md5_set_key(&ctx.md5, nkey, key);
+        PBKDF2(&ctx.md5, hmac_md5_update, hmac_md5_digest,
+               MD5_DIGEST_SIZE, iterations, nsalt, salt, nout, out);
+        break;
+
    case QCRYPTO_HASH_ALG_SHA1:
-        pbkdf2_hmac_sha1(nkey, key,
-                         iterations,
-                         nsalt, salt,
-                         nout, out);
+        hmac_sha1_set_key(&ctx.sha1, nkey, key);
+        PBKDF2(&ctx.sha1, hmac_sha1_update, hmac_sha1_digest,
+               SHA1_DIGEST_SIZE, iterations, nsalt, salt, nout, out);
+        break;
+
+    case QCRYPTO_HASH_ALG_SHA224:
+        hmac_sha224_set_key(&ctx.sha224, nkey, key);
+        PBKDF2(&ctx.sha224, hmac_sha224_update, hmac_sha224_digest,
+               SHA224_DIGEST_SIZE, iterations, nsalt, salt, nout, out);
        break;

    case QCRYPTO_HASH_ALG_SHA256:
-        pbkdf2_hmac_sha256(nkey, key,
-                           iterations,
-                           nsalt, salt,
-                           nout, out);
+        hmac_sha256_set_key(&ctx.sha256, nkey, key);
+        PBKDF2(&ctx.sha256, hmac_sha256_update, hmac_sha256_digest,
+               SHA256_DIGEST_SIZE, iterations, nsalt, salt, nout, out);
+        break;
+
+    case QCRYPTO_HASH_ALG_SHA384:
+        hmac_sha384_set_key(&ctx.sha384, nkey, key);
+        PBKDF2(&ctx.sha384, hmac_sha384_update, hmac_sha384_digest,
+               SHA384_DIGEST_SIZE, iterations, nsalt, salt, nout, out);
+        break;
+
+    case QCRYPTO_HASH_ALG_SHA512:
+        hmac_sha512_set_key(&ctx.sha512, nkey, key);
+        PBKDF2(&ctx.sha512, hmac_sha512_update, hmac_sha512_digest,
+               SHA512_DIGEST_SIZE, iterations, nsalt, salt, nout, out);
+        break;
+
+    case QCRYPTO_HASH_ALG_RIPEMD160:
+        hmac_ripemd160_set_key(&ctx.ripemd160, nkey, key);
+        PBKDF2(&ctx.ripemd160, hmac_ripemd160_update, hmac_ripemd160_digest,
+               RIPEMD160_DIGEST_SIZE, iterations, nsalt, salt, nout, out);
        break;

    default:
        error_setg_errno(errp, ENOSYS,
-                         "PBKDF does not support hash algorithm %d", hash);
+                         "PBKDF does not support hash algorithm %s",
+                         QCryptoHashAlgorithm_lookup[hash]);
        return -1;
    }
    return 0;
--- a/crypto/pbkdf-stub.c
+++ b/crypto/pbkdf-stub.c
@@ -32,7 +32,7 @@ int qcrypto_pbkdf2(QCryptoHashAlgorithm hash G_GNUC_UNUSED,
                   size_t nkey G_GNUC_UNUSED,
                   const uint8_t *salt G_GNUC_UNUSED,
                   size_t nsalt G_GNUC_UNUSED,
-                   unsigned int iterations G_GNUC_UNUSED,
+                   uint64_t iterations G_GNUC_UNUSED,
                   uint8_t *out G_GNUC_UNUSED,
                   size_t nout G_GNUC_UNUSED,
                   Error **errp)
--- a/crypto/pbkdf.c
+++ b/crypto/pbkdf.c
@@ -62,29 +62,33 @@ static int qcrypto_pbkdf2_get_thread_cpu(unsigned long long *val_ms,
 #endif
 }

-int qcrypto_pbkdf2_count_iters(QCryptoHashAlgorithm hash,
-                               const uint8_t *key, size_t nkey,
-                               const uint8_t *salt, size_t nsalt,
-                               Error **errp)
+uint64_t qcrypto_pbkdf2_count_iters(QCryptoHashAlgorithm hash,
+                                    const uint8_t *key, size_t nkey,
+                                    const uint8_t *salt, size_t nsalt,
+                                    size_t nout,
+                                    Error **errp)
 {
-    uint8_t out[32];
-    long long int iterations = (1 << 15);
+    uint64_t ret = -1;
+    uint8_t *out;
+    uint64_t iterations = (1 << 15);
    unsigned long long delta_ms, start_ms, end_ms;

+    out = g_new(uint8_t, nout);
+
    while (1) {
        if (qcrypto_pbkdf2_get_thread_cpu(&start_ms, errp) < 0) {
-            return -1;
+            goto cleanup;
        }
        if (qcrypto_pbkdf2(hash,
                           key, nkey,
                           salt, nsalt,
                           iterations,
-                           out, sizeof(out),
+                           out, nout,
                           errp) < 0) {
-            return -1;
+            goto cleanup;
        }
        if (qcrypto_pbkdf2_get_thread_cpu(&end_ms, errp) < 0) {
-            return -1;
+            goto cleanup;
        }

        delta_ms = end_ms - start_ms;
@@ -100,11 +104,10 @@ int qcrypto_pbkdf2_count_iters(QCryptoHashAlgorithm hash,

    iterations = iterations * 1000 / delta_ms;

-    if (iterations > INT32_MAX) {
-        error_setg(errp, "Iterations %lld too large for a 32-bit int",
-                   iterations);
-        return -1;
-    }
+    ret = iterations;

-    return iterations;
+ cleanup:
+    memset(out, 0, nout);
+    g_free(out);
+    return ret;
 }
--- a/crypto/tlscredsx509.c
+++ b/crypto/tlscredsx509.c
@@ -615,7 +615,7 @@ qcrypto_tls_creds_x509_load(QCryptoTLSCredsX509 *creds,
    }

    if (cert != NULL && key != NULL) {
-#if GNUTLS_VERSION_NUMBER >= 0x030111
+#if LIBGNUTLS_VERSION_NUMBER >= 0x030111
        char *password = NULL;
        if (creds->passwordid) {
            password = qcrypto_secret_lookup_as_utf8(creds->passwordid,
@@ -630,7 +630,7 @@ qcrypto_tls_creds_x509_load(QCryptoTLSCredsX509 *creds,
                                                    password,
                                                    0);
        g_free(password);
-#else /* GNUTLS_VERSION_NUMBER < 0x030111 */
+#else /* LIBGNUTLS_VERSION_NUMBER < 0x030111 */
        if (creds->passwordid) {
            error_setg(errp, "PKCS8 decryption requires GNUTLS >= 3.1.11");
            goto cleanup;
@@ -638,7 +638,7 @@ qcrypto_tls_creds_x509_load(QCryptoTLSCredsX509 *creds,
        ret = gnutls_certificate_set_x509_key_file(creds->data,
                                                   cert, key,
                                                   GNUTLS_X509_FMT_PEM);
-#endif /* GNUTLS_VERSION_NUMBER < 0x030111 */
+#endif
        if (ret < 0) {
            error_setg(errp, "Cannot load certificate '%s' & key '%s': %s",
                       cert, key, gnutls_strerror(ret));
--- a/crypto/tlssession.c
+++ b/crypto/tlssession.c
@@ -351,16 +351,22 @@ qcrypto_tls_session_check_credentials(QCryptoTLSSession *session,
 {
    if (object_dynamic_cast(OBJECT(session->creds),
                            TYPE_QCRYPTO_TLS_CREDS_ANON)) {
+        trace_qcrypto_tls_session_check_creds(session, "nop");
        return 0;
    } else if (object_dynamic_cast(OBJECT(session->creds),
                            TYPE_QCRYPTO_TLS_CREDS_X509)) {
        if (session->creds->verifyPeer) {
-            return qcrypto_tls_session_check_certificate(session,
-                                                         errp);
+            int ret = qcrypto_tls_session_check_certificate(session,
+                                                            errp);
+            trace_qcrypto_tls_session_check_creds(session,
+                                                  ret == 0 ? "pass" : "fail");
+            return ret;
        } else {
+            trace_qcrypto_tls_session_check_creds(session, "skip");
            return 0;
        }
    } else {
+        trace_qcrypto_tls_session_check_creds(session, "error");
        error_setg(errp, "Unexpected credential type %s",
                   object_get_typename(OBJECT(session->creds)));
        return -1;
--- a/crypto/trace-events
+++ b/crypto/trace-events
@@ -1,4 +1,4 @@
-# See docs/trace-events.txt for syntax documentation.
+# See docs/tracing.txt for syntax documentation.

 # crypto/tlscreds.c
 qcrypto_tls_creds_load_dh(void *creds, const char *filename) "TLS creds load DH creds=%p filename=%s"
@@ -17,3 +17,4 @@ qcrypto_tls_creds_x509_load_cert_list(void *creds, const char *file) "TLS creds

 # crypto/tlssession.c
 qcrypto_tls_session_new(void *session, void *creds, const char *hostname, const char *aclname, int endpoint) "TLS session new session=%p creds=%p hostname=%s aclname=%s endpoint=%d"
+qcrypto_tls_session_check_creds(void *session, const char *status) "TLS session check creds session=%p status=%s"
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -3,7 +3,6 @@
 include pci.mak
 include usb.mak
 CONFIG_VGA=y
-CONFIG_ISA_MMIO=y
 CONFIG_NAND=y
 CONFIG_ECC=y
 CONFIG_SERIAL=y
@@ -87,6 +86,8 @@ CONFIG_ZYNQ=y
 CONFIG_STM32F2XX_TIMER=y
 CONFIG_STM32F2XX_USART=y
 CONFIG_STM32F2XX_SYSCFG=y
+CONFIG_STM32F2XX_ADC=y
+CONFIG_STM32F2XX_SPI=y
 CONFIG_STM32F205_SOC=y

 CONFIG_VERSATILE_PCI=y
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -30,14 +30,12 @@ CONFIG_I8257=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
-CONFIG_PIIX_PCI=y
 CONFIG_HPET=y
 CONFIG_APPLESMC=y
 CONFIG_I8259=y
 CONFIG_PFLASH_CFI01=y
 CONFIG_TPM_TIS=$(CONFIG_TPM)
 CONFIG_MC146818RTC=y
-CONFIG_PAM=y
 CONFIG_PCI_PIIX=y
 CONFIG_WDT_IB700=y
 CONFIG_XEN_I386=$(CONFIG_XEN)
--- a/default-configs/ppc-softmmu.mak
+++ b/default-configs/ppc-softmmu.mak
@@ -3,7 +3,6 @@
 include pci.mak
 include sound.mak
 include usb.mak
-CONFIG_ISA_MMIO=y
 CONFIG_ESCC=y
 CONFIG_M48T59=y
 CONFIG_SERIAL=y
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -4,7 +4,6 @@ include pci.mak
 include sound.mak
 include usb.mak
 CONFIG_VIRTIO_VGA=y
-CONFIG_ISA_MMIO=y
 CONFIG_ESCC=y
 CONFIG_M48T59=y
 CONFIG_SERIAL=y
--- a/default-configs/sparc64-softmmu.mak
+++ b/default-configs/sparc64-softmmu.mak
@@ -2,7 +2,6 @@

 include pci.mak
 include usb.mak
-CONFIG_ISA_MMIO=y
 CONFIG_M48T59=y
 CONFIG_PTIMER=y
 CONFIG_SERIAL=y
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -30,14 +30,12 @@ CONFIG_I8257=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
-CONFIG_PIIX_PCI=y
 CONFIG_HPET=y
 CONFIG_APPLESMC=y
 CONFIG_I8259=y
 CONFIG_PFLASH_CFI01=y
 CONFIG_TPM_TIS=$(CONFIG_TPM)
 CONFIG_MC146818RTC=y
-CONFIG_PAM=y
 CONFIG_PCI_PIIX=y
 CONFIG_WDT_IB700=y
 CONFIG_XEN_I386=$(CONFIG_XEN)
--- a/disas.c
+++ b/disas.c
@@ -310,8 +310,6 @@ void disas(FILE *out, void *code, unsigned long size)
    print_insn = print_insn_m68k;
 #elif defined(__s390__)
    print_insn = print_insn_s390;
-#elif defined(__hppa__)
-    print_insn = print_insn_hppa;
 #elif defined(__ia64__)
    print_insn = print_insn_ia64;
 #endif
--- a/disas/Makefile.objs
+++ b/disas/Makefile.objs
@@ -9,7 +9,6 @@ libvixldir = $(SRC_PATH)/disas/libvixl
 # versions do not.
 arm-a64.o-cflags := -I$(libvixldir) -Wno-sign-compare
 common-obj-$(CONFIG_CRIS_DIS) += cris.o
-common-obj-$(CONFIG_HPPA_DIS) += hppa.o
 common-obj-$(CONFIG_I386_DIS) += i386.o
 common-obj-$(CONFIG_IA64_DIS) += ia64.o
 common-obj-$(CONFIG_M68K_DIS) += m68k.o
--- a/disas/arm.c
+++ b/disas/arm.c
@@ -24,7 +24,6 @@

 #include "qemu/osdep.h"
 #include "disas/bfd.h"
-#define ISSPACE(x) ((x) == ' ' || (x) == '\t' || (x) == '\n')

 #define ARM_EXT_V1	 0
 #define ARM_EXT_V2	 0
@@ -73,15 +72,6 @@ static void floatformat_to_double (unsigned char *data, double *dest)

 /* End of qemu specific additions.  */

-/* FIXME: Belongs in global header.  */
-#ifndef strneq
-#define strneq(a,b,n)	(strncmp ((a), (b), (n)) == 0)
-#endif
-
-#ifndef NUM_ELEM
-#define NUM_ELEM(a)     (sizeof (a) / sizeof (a)[0])
-#endif
-
 struct opcode32
 {
  unsigned long arch;		/* Architecture defining this insn.  */
@@ -1528,7 +1518,6 @@ static const char *const iwmmxt_cregnames[] =
 /* Default to GCC register name set.  */
 static unsigned int regname_selected = 1;

-#define NUM_ARM_REGNAMES  NUM_ELEM (regnames)
 #define arm_regnames      regnames[regname_selected].reg_names

 static bfd_boolean force_thumb = false;
--- a/disas/hppa.c
+++ b/disas/hppa.c
--- a/disas/sh4.c
+++ b/disas/sh4.c
@@ -264,12 +264,6 @@ sh_dsp_reg_nums;
   be some confusion between DSP and FPU etc.  */
 #define SH_ARCH_UNKNOWN_ARCH 0xffffffff

-/* These are defined in bfd/cpu-sh.c .  */
-unsigned int sh_get_arch_from_bfd_mach (unsigned long mach);
-unsigned int sh_get_arch_up_from_bfd_mach (unsigned long mach);
-unsigned long sh_get_bfd_mach_from_arch_set (unsigned int arch_set);
-/* bfd_boolean sh_merge_bfd_arch (bfd *ibfd, bfd *obfd); */
-
 /* Below are the 'architecture sets'.
   They describe the following inheritance graph:

--- a/docs/block-replication.txt
+++ b/docs/block-replication.txt
@@ -0,0 +1,239 @@
+Block replication
+----------------------------------------
+Copyright Fujitsu, Corp. 2016
+Copyright (c) 2016 Intel Corporation
+Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Block replication is used for continuous checkpoints. It is designed
+for COLO (COarse-grain LOck-stepping) where the Secondary VM is running.
+It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
+where the Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoints. The VM state of the Primary and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort during a vmstate checkpoint, the disk modification operations of
+the Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
+        +----------------------+            +------------------------+
+        |Primary Write Requests|            |Secondary Write Requests|
+        +----------------------+            +------------------------+
+                  |                                       |
+                  |                                      (4)
+                  |                                       V
+                  |                              /-------------\
+                  |      Copy and Forward        |             |
+                  |---------(1)----------+       | Disk Buffer |
+                  |                      |       |             |
+                  |                     (3)      \-------------/
+                  |                 speculative      ^
+                  |                write through    (2)
+                  |                      |           |
+                  V                      V           |
+           +--------------+           +----------------+
+           | Primary Disk |           | Secondary Disk |
+           +--------------+           +----------------+
+
+    1) Primary write requests will be copied and forwarded to Secondary
+       QEMU.
+    2) Before Primary write requests are written to Secondary disk, the
+       original sector content will be read from Secondary disk and
+       buffered in the Disk buffer, but it will not overwrite the existing
+       sector content (it could be from either "Secondary Write Requests" or
+       previous COW of "Primary Write Requests") in the Disk buffer.
+    3) Primary write requests will be written to Secondary disk.
+    4) Secondary write requests will be buffered in the Disk buffer and it
+       will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+
+         virtio-blk       ||
+             ^            ||                            .----------
+             |            ||                            | Secondary
+        1 Quorum          ||                            '----------
+         /      \         ||
+        /        \        ||
+   Primary    2 filter
+     disk         ^                                                             virtio-blk
+                  |                                                                  ^
+                3 NBD  ------->  3 NBD                                               |
+                client    ||     server                                          2 filter
+                          ||        ^                                                ^
+--------.                 ||        |                                                |
+Primary |                 ||  Secondary disk <--------- hidden-disk 5 <--------- active-disk 4
+--------'                 ||        |          backing        ^       backing
+                          ||        |                         |
+                          ||        |                         |
+                          ||        '-------------------------'
+                          ||           drive-backup sync=none 6
+
+1) The disk on the primary is represented by a block device with two
+children, providing replication between a primary disk and the host that
+runs the secondary VM. The read pattern (fifo) for quorum can be extended
+to make the primary always read from the local disk instead of going through
+NBD.
+
+2) The new block filter (the name is replication) will control the block
+replication.
+
+3) The secondary disk receives writes from the primary VM through QEMU's
+embedded NBD server (speculative write-through).
+
+4) The disk on the secondary is represented by a custom block device
+(called active-disk). It should start as an empty disk, and the format
+should support bdrv_make_empty() and backing file.
+
+5) The hidden-disk is created automatically. It buffers the original content
+that is modified by the primary VM. It should also start as an empty disk,
+and the driver supports bdrv_make_empty() and backing file.
+
+6) The drive-backup job (sync=none) is run to allow hidden-disk to buffer
+any state that would otherwise be lost by the speculative write-through
+of the NBD server into the secondary disk. So before block replication,
+the primary disk and secondary disk should contain the same data.
+
+== Failure Handling ==
+There are 7 internal errors when block replication is running:
+1. I/O error on primary disk
+2. Forwarding primary write requests failed
+3. Backup failed
+4. I/O error on secondary disk
+5. I/O error on active disk
+6. Making active disk or hidden disk empty failed
+7. Doing failover failed
+In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
+4 and 6, we just report block replication's error to FT/HA manager (which
+decides when to do a new checkpoint, when to do failover).
+In case 7, if active commit failed, we use replication failover failed state
+in Secondary's write operation (what decides which target to write).
+
+== New block driver interface ==
+We add four block driver interfaces to control block replication:
+a. replication_start_all()
+   Start block replication, called in migration/checkpoint thread.
+   We must call block_replication_start_all() in secondary QEMU before
+   calling block_replication_start_all() in primary QEMU. The caller
+   must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+b. replication_do_checkpoint_all()
+   This interface is called after all VM state is transferred to
+   Secondary QEMU. The Disk buffer will be dropped in this interface.
+   The caller must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+c. replication_get_error_all()
+   This interface is called to check if error happened in replication.
+   The caller must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+d. replication_stop_all()
+   It is called on failover. We will flush the Disk buffer into
+   Secondary Disk and stop block replication. The vm should be stopped
+   before calling it if you use this API to shutdown the guest, or other
+   things except failover. The caller must hold the I/O mutex lock if it is
+   in migration/checkpoint thread.
+
+== Usage ==
+Primary:
+  -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
+         children.0.file.filename=1.raw,\
+         children.0.driver=raw
+
+  Run qmp command in primary qemu:
+    { 'execute': 'human-monitor-command',
+      'arguments': {
+          'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=xxxx,file.port=xxxx,file.export=colo1,node-name=nbd_client1'
+      }
+    }
+    { 'execute': 'x-blockdev-change',
+      'arguments': {
+          'parent': 'colo1',
+          'node': 'nbd_client1'
+      }
+    }
+  Note:
+  1. There should be only one NBD Client for each primary disk.
+  2. host is the secondary physical machine's hostname or IP
+  3. Each disk must have its own export name.
+  4. It is all a single argument to -drive and you should ignore the
+     leading whitespace.
+  5. The qmp command line must be run after running qmp command line in
+     secondary qemu.
+  6. After failover we need remove children.1 (replication driver).
+
+Secondary:
+  -drive if=none,driver=raw,file.filename=1.raw,id=colo1 \
+  -drive if=xxx,id=topxxx,driver=replication,mode=secondary,top-id=topxxx\
+         file.file.filename=active_disk.qcow2,\
+         file.driver=qcow2,\
+         file.backing.file.filename=hidden_disk.qcow2,\
+         file.backing.driver=qcow2,\
+         file.backing.backing=colo1
+
+  Then run qmp command in secondary qemu:
+    { 'execute': 'nbd-server-start',
+      'arguments': {
+          'addr': {
+              'type': 'inet',
+              'data': {
+                  'host': 'xxx',
+                  'port': 'xxx'
+              }
+          }
+      }
+    }
+    { 'execute': 'nbd-server-add',
+      'arguments': {
+          'device': 'colo1',
+          'writable': true
+      }
+    }
+
+  Note:
+  1. The export name in secondary QEMU command line is the secondary
+     disk's id.
+  2. The export name for the same disk must be the same
+  3. The qmp command nbd-server-start and nbd-server-add must be run
+     before running the qmp command migrate on primary QEMU
+  4. Active disk, hidden disk and nbd target's length should be the
+     same.
+  5. It is better to put active disk and hidden disk in ramdisk.
+  6. It is all a single argument to -drive, and you should ignore
+     the leading whitespace.
+
+After Failover:
+Primary:
+  The secondary host is down, so we should run the following qmp command
+  to remove the nbd child from the quorum:
+  { 'execute': 'x-blockdev-change',
+    'arguments': {
+        'parent': 'colo1',
+        'child': 'children.1'
+    }
+  }
+  { 'execute': 'human-monitor-command',
+    'arguments': {
+        'command-line': 'drive_del xxxx'
+    }
+  }
+  Note: there is no qmp command to remove the blockdev now
+
+Secondary:
+  The primary host is down, so we should do the following thing:
+  { 'execute': 'nbd-server-stop' }
+
+TODO:
+1. Continuous block replication
+2. Shared disk
--- a/docs/colo-proxy.txt
+++ b/docs/colo-proxy.txt
@@ -0,0 +1,188 @@
+COLO-proxy
+----------
+Copyright (c) 2016 Intel Corporation
+Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+Copyright (c) 2016 Fujitsu, Corp.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+This document gives an overview of COLO proxy's design.
+
+== Background ==
+COLO-proxy is a part of COLO project. It is used
+to compare the network package to help COLO decide
+whether to do checkpoint. With COLO-proxy's help,
+COLO greatly improves the performance.
+
+The filter-redirector, filter-mirror, colo-compare
+and filter-rewriter compose the COLO-proxy.
+
+== Architecture ==
+
+COLO-Proxy is based on qemu netfilter and it's a plugin for qemu netfilter
+(except colo-compare). It keep Secondary VM connect normally to
+client and compare packets sent by PVM with sent by SVM.
+If the packet difference, notify COLO-frame to do checkpoint and send
+all primary packet has queued. Otherwise just send the queued primary
+packet and drop the queued secondary packet.
+
+Below is a COLO proxy ascii figure:
+
+ Primary qemu                                                           Secondary qemu
+--------------------------------------------------------------+       +----------------------------------------------------------------+
+| +----------------------------------------------------------+ |       |  +-----------------------------------------------------------+ |
+| |                                                          | |       |  |                                                           | |
+| |                        guest                             | |       |  |                        guest                              | |
+| |                                                          | |       |  |                                                           | |
+| +-------^--------------------------+-----------------------+ |       |  +---------------------+--------+----------------------------+ |
+|         |                          |                         |       |                        ^        |                              |
+|         |                          |                         |       |                        |        |                              |
+|         |  +------------------------------------------------------+  |                        |        |                              |
+|netfilter|  |                       |                         |    |  |   netfilter            |        |                              |
+| +----------+ +----------------------------+                  |    |  |  +-----------------------------------------------------------+ |
+| |       |  |                       |      |        out       |    |  |  |                     |        |  filter excute order       | |
+| |       |  |          +-----------------------------+        |    |  |  |                     |        | +------------------->      | |
+| |       |  |          |            |      |         |        |    |  |  |                     |        |   TCP                      | |
+| | +-----+--+-+  +-----v----+ +-----v----+ |pri +----+----+sec|    |  |  | +------------+  +---+----+---v+rewriter++  +------------+ | |
+| | |          |  |          | |          | |in  |         |in |    |  |  | |            |  |        |              |  |            | | |
+| | |  filter  |  |  filter  | |  filter  +------>  colo   <------+ +-------->  filter   +--> adjust |   adjust     +-->   filter   | | |
+| | |  mirror  |  |redirector| |redirector| |    | compare |   |  |    |  | | redirector |  | ack    |   seq        |  | redirector | | |
+| | |          |  |          | |          | |    |         |   |  |    |  | |            |  |        |              |  |            | | |
+| | +----^-----+  +----+-----+ +----------+ |    +---------+   |  |    |  | +------------+  +--------+--------------+  +---+--------+ | |
+| |      |   tx        |   rx           rx  |                  |  |    |  |            tx                        all       |  rx      | |
+| |      |             |                    |                  |  |    |  +-----------------------------------------------------------+ |
+| |      |             +--------------+     |                  |  |    |                                                   |            |
+| |      |   filter excute order      |     |                  |  |    |                                                   |            |
+| |      |  +---------------->        |     |                  |  +--------------------------------------------------------+            |
+| +-----------------------------------------+                  |       |                                                                |
+|        |                            |                        |       |                                                                |
+--------------------------------------------------------------+       +----------------------------------------------------------------+
+         |guest receive               | guest send
+         |                            |
+--------+----------------------------v------------------------+
+|                                                              |                          NOTE: filter direction is rx/tx/all
+|                         tap                                  |                          rx:receive packets sent to the netdev
+|                                                              |                          tx:receive packets sent by the netdev
+--------------------------------------------------------------+
+
+1.Guest receive packet route:
+
+Primary:
+
+Tap --> Mirror Client Filter
+Mirror client will send packet to guest,at the
+same time, copy and forward packet to secondary
+mirror server.
+
+Secondary:
+
+Mirror Server Filter --> TCP Rewriter
+If receive packet is TCP packet,we will adjust ack
+and update TCP checksum, then send to secondary
+guest. Otherwise directly send to guest.
+
+2.Guest send packet route:
+
+Primary:
+
+Guest --> Redirect Server Filter
+Redirect server filter receive primary guest packet
+but do nothing, just pass to next filter.
+
+Redirect Server Filter --> COLO-Compare
+COLO-compare receive primary guest packet then
+waiting scondary redirect packet to compare it.
+If packet same,send queued primary packet and clear
+queued secondary packet, Otherwise send primary packet
+and do checkpoint.
+
+COLO-Compare --> Another Redirector Filter
+The redirector get packet from colo-compare by use
+chardev socket.
+
+Redirector Filter --> Tap
+Send the packet.
+
+Secondary:
+
+Guest --> TCP Rewriter Filter
+If the packet is TCP packet,we will adjust seq
+and update TCP checksum. Then send it to
+redirect client filter. Otherwise directly send to
+redirect client filter.
+
+Redirect Client Filter --> Redirect Server Filter
+Forward packet to primary.
+
+== Components introduction ==
+
+Filter-mirror is a netfilter plugin.
+It gives qemu the ability to mirror
+packets to a chardev.
+
+Filter-redirector is a netfilter plugin.
+It gives qemu the ability to redirect net packet.
+Redirector can redirect filter's net packet to outdev,
+and redirect indev's packet to filter.
+
+                    filter
+                      +
+          redirector  |
+             +--------------+
+             |        |     |
+             |        |     |
+             |        |     |
+  indev +---------+   +---------->  outdev
+             |    |         |
+             |    |         |
+             |    |         |
+             +--------------+
+                  |
+                  v
+                filter
+
+COLO-compare, we do packet comparing job.
+Packets coming from the primary char indev will be sent to outdev.
+Packets coming from the secondary char dev will be dropped after comparing.
+COLO-comapre need two input chardev and one output chardev:
+primary_in=chardev1-id (source: primary send packet)
+secondary_in=chardev2-id (source: secondary send packet)
+outdev=chardev3-id
+
+Filter-rewriter will rewrite some of secondary packet to make
+secondary guest's tcp connection established successfully.
+In this module we will rewrite tcp packet's ack to the secondary
+from primary,and rewrite tcp packet's seq to the primary from
+secondary.
+
+== Usage ==
+
+Here, we use demo ip and port discribe more clearly.
+Primary(ip:3.3.3.3):
+-netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
+-device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66
+-chardev socket,id=mirror0,host=3.3.3.3,port=9003,server,nowait
+-chardev socket,id=compare1,host=3.3.3.3,port=9004,server,nowait
+-chardev socket,id=compare0,host=3.3.3.3,port=9001,server,nowait
+-chardev socket,id=compare0-0,host=3.3.3.3,port=9001
+-chardev socket,id=compare_out,host=3.3.3.3,port=9005,server,nowait
+-chardev socket,id=compare_out0,host=3.3.3.3,port=9005
+-object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0
+-object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out
+-object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0
+-object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0
+
+Secondary(ip:3.3.3.8):
+-netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,down script=/etc/qemu-ifdown
+-device e1000,netdev=hn0,mac=52:a4:00:12:78:66
+-chardev socket,id=red0,host=3.3.3.3,port=9003
+-chardev socket,id=red1,host=3.3.3.3,port=9004
+-object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0
+-object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1
+
+Note:
+  a.COLO-proxy must work with COLO-frame and Block-replication.
+  b.Primary COLO must be started firstly, because COLO-proxy needs
+    chardev socket server running before secondary started.
+  c.Filter-rewriter only rewrite tcp packet.
--- a/docs/generic-loader.txt
+++ b/docs/generic-loader.txt
@@ -0,0 +1,84 @@
+Copyright (c) 2016 Xilinx Inc.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.  See
+the COPYING file in the top-level directory.
+
+
+The 'loader' device allows the user to load multiple images or values into
+QEMU at startup.
+
+Loading Data into Memory Values
+---------------------
+The loader device allows memory values to be set from the command line. This
+can be done by following the syntax below:
+
+     -device loader,addr=<addr>,data=<data>,data-len=<data-len>
+                   [,data-be=<data-be>][,cpu-num=<cpu-num>]
+
+    <addr>      - The address to store the data in.
+    <data>      - The value to be written to the address. The maximum size of
+                  the data is 8 bytes.
+    <data-len>  - The length of the data in bytes. This argument must be
+                  included if the data argument is.
+    <data-be>   - Set to true if the data to be stored on the guest should be
+                  written as big endian data. The default is to write little
+                  endian data.
+    <cpu-num>   - The number of the CPU's address space where the data should
+                  be loaded. If not specified the address space of the first
+                  CPU is used.
+
+All values are parsed using the standard QemuOps parsing. This allows the user
+to specify any values in any format supported. By default the values
+will be parsed as decimal. To use hex values the user should prefix the number
+with a '0x'.
+
+An example of loading value 0x8000000e to address 0xfd1a0104 is:
+    -device loader,addr=0xfd1a0104,data=0x8000000e,data-len=4
+
+Setting a CPU's Program Counter
+---------------------
+The loader device allows the CPU's PC to be set from the command line. This
+can be done by following the syntax below:
+
+     -device loader,addr=<addr>,cpu-num=<cpu-num>
+
+    <addr>      - The value to use as the CPU's PC.
+    <cpu-num>   - The number of the CPU whose PC should be set to the
+                  specified value.
+
+All values are parsed using the standard QemuOps parsing. This allows the user
+to specify any values in any format supported. By default the values
+will be parsed as decimal. To use hex values the user should prefix the number
+with a '0x'.
+
+An example of setting CPU 0's PC to 0x8000 is:
+    -device loader,addr=0x8000,cpu-num=0
+
+Loading Files
+---------------------
+The loader device also allows files to be loaded into memory. This can be done
+similarly to setting memory values. The syntax is shown below:
+
+    -device loader,file=<file>[,addr=<addr>][,cpu-num=<cpu-num>][,force-raw=<raw>]
+
+    <file>      - A file to be loaded into memory
+    <addr>      - The addr in memory that the file should be loaded. This is
+                  ignored if you are using an ELF (unless force-raw is true).
+                  This is required if you aren't loading an ELF.
+    <cpu-num>   - This specifies the CPU that should be used. This is an
+                  optional argument and will cause the CPU's PC to be set to
+                  where the image is stored or in the case of an ELF file to
+                  the value in the header. This option should only be used
+                  for the boot image.
+                  This will also cause the image to be written to the specified
+                  CPU's address space. If not specified, the default is CPU 0.
+    <force-raw> - Forces the file to be treated as a raw image. This can be
+                  used to specify the load address of ELF files.
+
+All values are parsed using the standard QemuOps parsing. This allows the user
+to specify any values in any format supported. By default the values
+will be parsed as decimal. To use hex values the user should prefix the number
+with a '0x'.
+
+An example of loading an ELF file which CPU0 will boot is shown below:
+    -device loader,file=./images/boot.elf,cpu-num=0
--- a/Show More
+++ b/Show More
@@ -1 +1 @@
 .6.91
 .7.50