Compare commits

..

126 Commits

Author SHA1 Message Date
Chunyan Liu
3456706297 Don't change total_size when creating fixed vpc
After upstream commit #3f3f20d... vpc: Fix size in fixed image
creation. Total_sectors is rounded up to match the geometry,
total_size is also changed as well (Total_sectors * BDRV_SECTOR_SIZE).
This brings problem for bnc#943446 - qemu-img convert doesn't create
MB aligned VHDs anymore. Add patch to exclude the change.

Signed-off-by: Chunyan Liu <cyliu@suse.com>
[BR: BSC#943446]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-11-04 09:45:55 -07:00
P J P
6bfc70462b e1000: Avoid infinite loop in processing transmit descriptor (CVE-2015-6815)
While processing transmit descriptors, it could lead to an infinite
loop if 'bytes' was to become zero; Add a check to avoid it.

[The guest can force 'bytes' to 0 by setting the hdr_len and mss
descriptor fields to 0.
--Stefan]

Signed-off-by: P J P <pjp@fedoraproject.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 1441383666-6590-1-git-send-email-stefanha@redhat.com
(cherry picked from commit b947ac2bf2)
[BR: BSC#944697 CVE-2015-6815]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-10-08 18:56:57 +02:00
Jason Wang
77bd1c30f4 virtio-net: correctly drop truncated packets
When packet is truncated during receiving, we drop the packets but
neither discard the descriptor nor add and signal used
descriptor. This will lead several issues:

- sg mappings are leaked
- rx will be stalled if a lots of packets were truncated

In order to be consistent with vhost, fix by discarding the descriptor
in this case.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit 0cf33fb6b4)
[BR: BSC#947159 CVE-2015-7295]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-10-08 18:56:56 +02:00
Jason Wang
0d9bbf181e virtio: introduce virtqueue_discard()
This patch introduces virtqueue_discard() to discard a descriptor and
unmap the sgs. This will be used by the patch that will discard
descriptor when packet is truncated.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit 29b9f5efd7)
[BR: BSC#947159 CVE-2015-7295]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-10-08 18:56:56 +02:00
Jason Wang
c7f5d47e08 virtio: introduce virtqueue_unmap_sg()
Factor out sg unmapping logic. This will be reused by the patch that
can discard descriptor.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Andrew James <andrew.james@hpe.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit ce31746157)
[BR: BSC#947159 CVE-2015-7295]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-10-08 18:56:56 +02:00
John Snow
aa7fb34246 ide: fix ATAPI command permissions
We're a little too lenient with what we'll let an ATAPI drive handle.
Clamp down on the IDE command execution table to remove CD_OK permissions
from commands that are not and have never been ATAPI commands.

For ATAPI command validity, please see:
- ATA4 Section 6.5 ("PACKET Command feature set")
- ATA8/ACS Section 4.3 ("The PACKET feature set")
- ACS3 Section 4.3 ("The PACKET feature set")

ACS3 has a historical command validity table in Table B.4
("Historical Command Assignments") that can be referenced to find when
a command was introduced, deprecated, obsoleted, etc.

The only reference for ATAPI command validity is by checking that
version's PACKET feature set section.

ATAPI was introduced by T13 into ATA4, all commands retired prior to ATA4
therefore are assumed to have never been ATAPI commands.

Mandatory commands, as listed in ATA8-ACS3, are:

- DEVICE RESET
- EXECUTE DEVICE DIAGNOSTIC
- IDENTIFY DEVICE
- IDENTIFY PACKET DEVICE
- NOP
- PACKET
- READ SECTOR(S)
- SET FEATURES

Optional commands as listed in ATA8-ACS3, are:

- FLUSH CACHE
- READ LOG DMA EXT
- READ LOG EXT
- WRITE LOG DMA EXT
- WRITE LOG EXT

All other commands are illegal to send to an ATAPI device and should
be rejected by the device.

CD_OK removal justifications:

0x06 WIN_DSM              Defined in ACS2. Not valid for ATAPI.
0x21 WIN_READ_ONCE        Retired in ATA5. Not ATAPI in ATA4.
0x94 WIN_STANDBYNOW2      Retired in ATA4. Did not coexist with ATAPI.
0x95 WIN_IDLEIMMEDIATE2   Retired in ATA4. Did not coexist with ATAPI.
0x96 WIN_STANDBY2         Retired in ATA4. Did not coexist with ATAPI.
0x97 WIN_SETIDLE2         Retired in ATA4. Did not coexist with ATAPI.
0x98 WIN_CHECKPOWERMODE2  Retired in ATA4. Did not coexist with ATAPI.
0x99 WIN_SLEEPNOW2        Retired in ATA4. Did not coexist with ATAPI.
0xE0 WIN_STANDBYNOW1      Not part of ATAPI in ATA4, ACS or ACS3.
0xE1 WIN_IDLEIMMDIATE     Not part of ATAPI in ATA4, ACS or ACS3.
0xE2 WIN_STANDBY          Not part of ATAPI in ATA4, ACS or ACS3.
0xE3 WIN_SETIDLE1         Not part of ATAPI in ATA4, ACS or ACS3.
0xE4 WIN_CHECKPOWERMODE1  Not part of ATAPI in ATA4, ACS or ACS3.
0xE5 WIN_SLEEPNOW1        Not part of ATAPI in ATA4, ACS or ACS3.
0xF8 WIN_READ_NATIVE_MAX  Obsoleted in ACS3. Not ATAPI in ATA4 or ACS.

This patch fixes a divide by zero fault that can be caused by sending
the WIN_READ_NATIVE_MAX command to an ATAPI drive, which causes it to
attempt to use zeroed CHS values to perform sector arithmetic.

Reported-by: Qinghao Tang <luodalongde@gmail.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1441816082-21031-1-git-send-email-jsnow@redhat.com
CC: qemu-stable@nongnu.org
(cherry picked from commit d9033e1d3a)
[BR: CVE-2015-6855 BSC#945404]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-10-08 18:56:56 +02:00
P J P
4c3849714f net: add checks to validate ring buffer pointers(CVE-2015-5279)
Ne2000 NIC uses ring buffer of NE2000_MEM_SIZE(49152)
bytes to process network packets. While receiving packets
via ne2000_receive() routine, a local 'index' variable
could exceed the ring buffer size, which could lead to a
memory buffer overflow. Added other checks at initialisation.

Reported-by: Qinghao Tang <luodalongde@gmail.com>
Signed-off-by: P J P <pjp@fedoraproject.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 9bbdbc66e5)
[BR: BSC#945987]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-10-08 18:56:56 +02:00
P J P
8970c6a04e net: avoid infinite loop when receiving packets(CVE-2015-5278)
Ne2000 NIC uses ring buffer of NE2000_MEM_SIZE(49152)
bytes to process network packets. While receiving packets
via ne2000_receive() routine, a local 'index' variable
could exceed the ring buffer size, leading to an infinite
loop situation.

Reported-by: Qinghao Tang <luodalongde@gmail.com>
Signed-off-by: P J P <pjp@fedoraproject.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 737d2b3c41)
[BR: BSC#945989]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-10-08 18:56:56 +02:00
Max Reitz
0bcc356da2 qcow2: Make qcow2_alloc_bytes() more explicit
In case of -EAGAIN returned by update_refcount(), we should discard the
cluster offset we were trying to allocate and request a new one, because
in theory that old offset might now be taken by a refcount block.

In practice, this was not the case due to update_refcount() generally
returning strictly monotonic increasing cluster offsets. However, this
behavior is not set in stone, and it is also not obvious when looking at
qcow2_alloc_bytes() alone, so we should not rely on it.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 2ac01520be)
[BR: BSC#936537]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-10-08 18:56:55 +02:00
Jindřich Makovička
5cc51586d1 qcow2: Handle EAGAIN returned from update_refcount
Fixes a crash during image compression

Signed-off-by: Jindřich Makovička <makovick@gmail.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 3e5feb6202)
[BR: BSC#936537]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-10-08 18:56:55 +02:00
Gerd Hoffmann
7637d9fbfe vnc: fix memory corruption (CVE-2015-5225)
The _cmp_bytes variable added by commit "bea60dd ui/vnc: fix potential
memory corruption issues" can become negative.  Result is (possibly
exploitable) memory corruption.  Reason for that is it uses the stride
instead of bytes per scanline to apply limits.

For the server surface is is actually fine.  vnc creates that itself,
there is never any padding and thus scanline length always equals stride.

For the guest surface scanline length and stride are typically identical
too, but it doesn't has to be that way.  So add and use a new variable
(guest_ll) for the guest scanline length.  Also rename min_stride to
line_bytes to make more clear what it actually is.  Finally sprinkle
in an assert() to make sure we never use a negative _cmp_bytes again.

Reported-by: 范祚至(库特) <zuozhi.fzz@alibaba-inc.com>
Reviewed-by: P J P <ppandit@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit eb8934b041)
[AF: BSC#942845]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-08 18:56:55 +02:00
Alexander Graf
589b223ca5 dictzip: Fix on big endian systems
The dictzip code in SLE11 received some treatment over time to support
running on big endian hosts. Somewhere in the transition to SLE12 this
support got lost. Add it back in again from the SLE11 code base.

Furthermore while at it, fix up the debug prints to not emit warnings.

[AG: BSC#937572]
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-08 18:56:41 +02:00
Michael Tokarev
225bfdb2ed slirp: use less predictable directory name in /tmp for smb config (CVE-2015-4037)
In this version I used mkdtemp(3) which is:

        _BSD_SOURCE
        || /* Since glibc 2.10: */
            (_POSIX_C_SOURCE >= 200809L || _XOPEN_SOURCE >= 700)

(POSIX.1-2008), so should be available on systems we care about.

While at it, reset the resulting directory name within smb structure
on error so cleanup function wont try to remove directory which we
failed to create.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
(cherry picked from commit 8b8f1c7e9d)
[BR: BSC#932267]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-10-08 18:55:48 +02:00
Alexander Graf
fc1b871631 linux-user: Allocate thunk size dynamically
We store all struct types in an array of static size without ever
checking whether we overrun it. Of course some day someone (like me
in another, ancient ALSA enabling patch set) will run into the limit
without realizing it.

So let's make the allocation dynamic. We already know the number of
structs that we want to allocate, so we only need to pass the variable
into the respective piece of code.

Also, to ensure we don't accidently overwrite random memory, add some
asserts to sanity check whether a thunk is actually part of our array.

Signed-off-by: Alexander Graf <agraf@suse.de>
2015-10-08 18:55:48 +02:00
Petr Matousek
a78f57a82b pcnet: force the buffer access to be in bounds during tx
4096 is the maximum length per TMD and it is also currently the size of
the relay buffer pcnet driver uses for sending the packet data to QEMU
for further processing. With packet spanning multiple TMDs it can
happen that the overall packet size will be bigger than sizeof(buffer),
which results in memory corruption.

Fix this by only allowing to queue maximum sizeof(buffer) bytes.

This is CVE-2015-3209.

Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Reported-by: Matt Tait <matttait@google.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[BR: BSC#932770]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-10-08 18:55:48 +02:00
Bruce Rogers
457b3ba055 configure: reduce libseccomp version requirement to 2.1.0
It turns out that although the version of libseccomp used in SLE12
is in fact v2.1.1, the package has a bug where pkg-config incorrectly
reports that the version is v2.1.0. (see bsc#932372)

Adjust the configure script detection to allow for v2.1.0.

Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-10-08 18:55:48 +02:00
Bruce Rogers
bff3afb57d qtest: Increase socket timeout to accomodate unpredictable build service
After encountering make check errors for some time, it appears that the
problem is the occasional socket timeout, which is probably due to a
combination of a slow and overloaded build host in the build service.

Increase the timeout from 5 to 15 seconds and hope we don't see the
problem again.

Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-10-08 18:55:47 +02:00
Alexander Graf
eb70eb33cb s390: Default virtio-blk to ccw
When spawning a drive with -drive if=virtio on s390x, we want to
create a ccw device by default, as that is our default machine.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-08 18:55:47 +02:00
Alexander Graf
8526e2d444 s390: Remove legacy s390-virtio machine type
We don't want to confuse users by offering the legacy, broken
machine type -M s390-virtio. Let's just not include the machine
description in the first place..

Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Explicitly declare s390-virtio-ccw the default for v2.0.0-rc0]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-08 18:55:47 +02:00
Andreas Färber
99bc193099 acpi_piix4: Fix migration from SLE11 SP2
qemu-kvm 0.15 uses the same GPE format as qemu 1.4, but as version 2
rather than 3.

Addresses part of BNC#812836.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-08 18:55:47 +02:00
Andreas Färber
5e454fe423 i8254: Fix migration from SLE11 SP2
qemu-kvm 0.15 had a VMSTATE_UINT32(flags, PITState) field that
qemu 1.4 does not have.

Addresses part of BNC#812836.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-08 18:55:47 +02:00
Andreas Färber
41eb27692f vga: Raise VRAM to 16 MiB for pc-0.15 and below
qemu-kvm.git commit a7fe0297840908a4fd65a1cf742481ccd45960eb
(Extend vram size to 16MB) deviated from qemu.git since kvm-61, and only
in commit 9e56edcf8d (vga: raise default
vgamem size) did qemu.git adjust the VRAM size for v1.2.

Add compatibility properties so that up to and including pc-0.15 we
maintain migration compatibility with qemu-kvm rather than QEMU and
from pc-1.0 on with QEMU (last qemu-kvm release was 1.2).

Addresses part of BNC#812836.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-08 18:55:47 +02:00
Alexander Graf
c251fa606e vnc: provide fake color map
Our current VNC code does not handle color maps (aka non-true-color) at all
and aborts if a client requests them. There are 2 major issues with this:

 1) A VNC viewer on an 8-bit X11 system may request color maps
 2) RealVNC _always_ starts requesting color maps, then moves on to full color

In order to support these 2 use cases, let's just create a fake color map
that covers exactly our normal true color 8 bit color space. That way we don't
lose anything over a client that wants true color.

Reported-by: Sascha Wehnert <swehnert@suse.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-08 18:55:47 +02:00
Bruce Rogers
c923a7775c increase x86_64 physical bits to 42
Allow for guests with higher amounts of ram. The current thought
is that 2TB specified on qemu commandline would be an appropriate
limit. Note that this requires the next higher bit value since
the highest address is actually more than 2TB due to the pci
memory hole.

Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-08 18:55:47 +02:00
Andreas Färber
09d66eb5f5 Raise soft address space limit to hard limit
For SLES we want users to be able to use large memory configurations
with KVM without fiddling with ulimit -Sv.

Signed-off-by: Andreas Färber <afaerber@suse.de>
[BR: add include for sys/resource.h]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-10-08 18:55:46 +02:00
Dinar Valeev
b313f6d23d configure: Enable PIE for ppc and ppc64 hosts
Signed-off-by: Dinar Valeev <dvaleev@suse.com>
[AF: Rebased for v1.7]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-08 18:55:46 +02:00
Bruce Rogers
319e0e347f virtfs-proxy-helper: Provide __u64 for broken sys/capability.h
Fixes the build on SLE 11 SP2.

[AF: Extend to ppc64]
2015-10-08 18:55:46 +02:00
Alexander Graf
daede1f0f2 linux-user: lseek: explicitly cast non-set offsets to signed
When doing lseek, SEEK_SET indicates that the offset is an unsigned variable.
Other seek types have parameters that can be negative.

When converting from 32bit to 64bit parameters, we need to take this into
account and enable SEEK_END and SEEK_CUR to be negative, while SEEK_SET stays
absolute positioned which we need to maintain as unsigned.

Signed-off-by: Alexander Graf <agraf@suse.de>
2015-10-08 18:55:46 +02:00
Alexander Graf
789adc278e Make char muxer more robust wrt small FIFOs
Virtio-Console can only process one character at a time. Using it on S390
gave me strage "lags" where I got the character I pressed before when
pressing one. So I typed in "abc" and only received "a", then pressed "d"
but the guest received "b" and so on.

While the stdio driver calls a poll function that just processes on its
queue in case virtio-console can't take multiple characters at once, the
muxer does not have such callbacks, so it can't empty its queue.

To work around that limitation, I introduced a new timer that only gets
active when the guest can not receive any more characters. In that case
it polls again after a while to check if the guest is now receiving input.

This patch fixes input when using -nographic on s390 for me.
2015-10-08 18:55:46 +02:00
Alexander Graf
c4c86364b3 console: add question-mark escape operator
Some termcaps (found using SLES11SP1) use [? sequences. According to man
console_codes (http://linux.die.net/man/4/console_codes) the question mark
is a nop and should simply be ignored.

This patch does exactly that, rendering screen output readable when
outputting guest serial consoles to the graphical console emulator.

Signed-off-by: Alexander Graf <agraf@suse.de>
2015-10-08 18:55:46 +02:00
Alexander Graf
b31a018f58 Legacy Patch kvm-qemu-preXX-report-default-mac-used.patch 2015-10-08 18:55:46 +02:00
Alexander Graf
588bf41e1f Legacy Patch kvm-qemu-preXX-dictzip3.patch 2015-10-08 18:55:46 +02:00
Alexander Graf
babe800bcc block: Add tar container format
Tar is a very widely used format to store data in. Sometimes people even put
virtual machine images in there.

So it makes sense for qemu to be able to read from tar files. I implemented a
written from scratch reader that also knows about the GNU sparse format, which
is what pigz creates.

This version checks for filenames that end on well-known extensions. The logic
could be changed to search for filenames given on the command line, but that
would require changes to more parts of qemu.

The tar reader in conjunctiuon with dzip gives us the chance to download
tar'ed up virtual machine images (even via http) and instantly make use of
them.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
[TH: Use bdrv_open options instead of filename]
Signed-off-by: Tim Hardeck <thardeck@suse.de>
[AF: bdrv_file_open got an Error **errp argument, bdrv_delete -> brd_unref]
[AF: qemu_opts_create_nofail() -> qemu_opts_create(),
     bdrv_file_open() -> bdrv_open(), based on work by brogers]
[AF: error_is_set() dropped for v2.1.0-rc0]
[AF: BlockDriverAIOCB -> BlockAIOCB,
     BlockDriverCompletionFunc -> BlockCompletionFunc,
     qemu_aio_release() -> qemu_aio_unref(),
     drop tar_aio_cancel()]
[AF: common-obj-y -> block-obj-y, drop probe hook (bsc#945778)]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-08 18:55:12 +02:00
Alexander Graf
17e0a09e25 block: Add support for DictZip enabled gzip files
DictZip is an extension to the gzip format that allows random seeks in gzip
compressed files by cutting the file into pieces and storing the piece offsets
in the "extra" header of the gzip format.

Thanks to that extension, we can use gzip compressed files as block backend,
though only in read mode.

This makes a lot of sense when stacked with tar files that can then be shipped
to VM users. If a VM image is inside a tar file that is inside a DictZip
enabled gzip file, the user can run the tar.gz file as is without having to
extract the image first.

Tar patch follows.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
[TH: Use bdrv_open options instead of filename]
Signed-off-by: Tim Hardeck <thardeck@suse.de>
[AF: Error **errp added for bdrv_file_open, bdrv_delete -> bdrv_unref]
[AF: qemu_opts_create_nofail() -> qemu_opts_create(),
     bdrv_file_open() -> bdrv_open(), based on work by brogers]
[AF: error_is_set() dropped for v2.1.0-rc0]
[AF: BlockDriverAIOCB -> BlockAIOCB,
     BlockDriverCompletionFunc -> BlockCompletionFunc,
     qemu_aio_release() -> qemu_aio_unref(),
     drop dictzip_aio_cancel()]
[AF: common-obj-y -> block-obj-y, drop probe hook (bsc#945778)]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-08 18:53:52 +02:00
Alexander Graf
ed87ee47e0 linux-user: use target_ulong
Linux syscalls pass pointers or data length or other information of that sort
to the kernel. This is all stuff you don't want to have sign extended.
Otherwise a host 64bit variable parameter with a size parameter will extend
it to a negative number, breaking lseek for example.

Pass syscall arguments as ulong always.

Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-12 09:45:57 -06:00
Alexander Graf
621ddf509e linux-user: add more blk ioctls
Implement a few more ioctls that operate on block devices.

Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-12 09:45:56 -06:00
Andreas Färber
ac10d55b82 vnc: password-file= and incoming-connections=
TBD (from SUSE Studio team)
2015-08-12 09:45:56 -06:00
Andreas Färber
8e9b8b2204 slirp: -nooutgoing
TBD (from SUSE Studio team)
2015-08-12 09:45:56 -06:00
Alexander Graf
2ac80007da linux-user: XXX disable fiemap
agraf: fiemap breaks in libarchive. Disable it for now.
2015-08-12 09:45:55 -06:00
Alexander Graf
4168180338 linux-user: implement FS_IOC_SETFLAGS ioctl
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-12 09:45:55 -06:00
Alexander Graf
683051c844 linux-user: implement FS_IOC_GETFLAGS ioctl
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-12 09:45:55 -06:00
Alexander Graf
4a7c93f23e linux-user: Fake /proc/cpuinfo
Fedora 17 for ARM reads /proc/cpuinfo and fails if it doesn't contain
ARM related contents. This patch implements a quick hack to expose real
/proc/cpuinfo data taken from a real world machine.

The real fix would be to generate at least the flags automatically based
on the selected CPU. Please do not submit this patch upstream until this
has happened.

Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased for v1.6 and v1.7]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-08-12 09:45:55 -06:00
Alexander Graf
66e090df95 linux-user: lock tb flushing too
Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased onto exec.c/translate-all.c split for 1.4]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-08-12 09:45:55 -06:00
Alexander Graf
acc6309904 linux-user: Run multi-threaded code on a single core
Running multi-threaded code can easily expose some of the fundamental
breakages in QEMU's design. It's just not a well supported scenario.

So if we pin the whole process to a single host CPU, we guarantee that
we will never have concurrent memory access actually happen. We can still
get scheduled away at any time, so it's no complete guarantee, but apparently
it reduces the odds well enough to get my test cases to pass.

This gets Java 1.7 working for me again on my test box.

Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-12 09:45:55 -06:00
Alexander Graf
9bb28247d7 linux-user: lock tcg
The tcg code generator is not thread safe. Lock its generation between
different threads.

Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased onto exec.c/translate-all.c split for 1.4]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-08-12 09:45:55 -06:00
Alexander Graf
1a8b5a066a linux-user: Ignore broken loop ioctl
During invocations of losetup, we run into an ioctl that doesn't
exist. However, because of that we output an error, which then
screws up the kiwi logic around that call.

So let's silently ignore that bogus ioctl.

Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-12 09:45:55 -06:00
Alexander Graf
5673fb20cd linux-user: arm: no tb_flush on reset
When running automoc4 as linux-user guest program, it segfaults right after
it creates a thread. Bisecting pointed to commit a84fac1426 which introduces
tb_flush on reset.

So something in our thread creation is broken. But for now, let's revert the
change to at least get a working build again.
2015-08-12 09:45:54 -06:00
Alexander Graf
f8898ec278 linux-user: binfmt: support host binaries
When we have a working host binary equivalent for the guest binary we're
trying to run, let's just use that instead as it will be a lot faster.

Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-12 09:45:54 -06:00
Alexander Graf
efa9daceb1 linux-user: fix segfault deadlock
When entering the guest we take a lock to ensure that nobody else messes
with our TB chaining while we're doing it. If we get a segfault inside that
code, we manage to work on, but will not unlock the lock.

This patch forces unlocking of that lock in the segv handler. I'm not sure
this is the right approach though. Maybe we should rather make sure we don't
segfault in the code? I would greatly appreciate someone more intelligible
than me to look at this :).

Example code to trigger this is at: http://csgraf.de/tmp/conftest.c

Reported-by: Fabio Erculiani <lxnay@sabayon.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-08-12 09:45:54 -06:00
Alexander Graf
ed19da1195 PPC: KVM: Disable mmu notifier check
When using hugetlbfs (which is required for HV mode KVM on 970), we
check for MMU notifiers that on 970 can not be implemented properly.

So disable the check for mmu notifiers on PowerPC guests, making
KVM guests work there, even if possibly racy in some odd circumstances.
2015-08-12 09:45:53 -06:00
Alexander Graf
b8ddc9be4c linux-user: add binfmt wrapper for argv[0] handling
When using qemu's linux-user binaries through binfmt, argv[0] gets lost
along the execution because qemu only gets passed in the full file name
to the executable while argv[0] can be something completely different.

This breaks in some subtile situations, such as the grep and make test
suites.

This patch adds a wrapper binary called qemu-$TARGET-binfmt that can be
used with binfmt's P flag which passes the full path _and_ argv[0] to
the binfmt handler.

The binary would be smart enough to be versatile and only exist in the
system once, creating the qemu binary path names from its own argv[0].
However, this seemed like it didn't fit the make system too well, so
we're currently creating a new binary for each target archictecture.

CC: Reinhard Max <max@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased onto new Makefile infrastructure, twice]
[AF: Updated for aarch64 for v2.0.0-rc1]
[AF: Rebased onto Makefile changes for v2.1.0-rc0]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-08-12 09:45:53 -06:00
Ulrich Hecht
0228bc7ae0 block/vmdk: Support creation of SCSI VMDK images in qemu-img
Signed-off-by: Ulrich Hecht <uli@suse.de>
[AF: Changed BLOCK_FLAG_SCSI from 8 to 16 for v1.2]
[AF: Rebased onto upstream VMDK SCSI support]
[AF: Rebased onto skipping of image creation in v1.7]
[AF: Simplified in preparation for v1.7.1/v2.0]
[BR: Rebased onto v2.0]
[BR: Rebased onto v2.1]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2015-08-12 09:45:53 -06:00
Alexander Graf
a5a901fc4c qemu-cvs-ioctl_nodirection
the direction given in the ioctl should be correct so we can assume the
communication is uni-directional. The alsa developers did not like this
concept though and declared ioctls IOC_R and IOC_W even though they were
IOC_RW.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
2015-08-12 09:45:52 -06:00
Alexander Graf
ca05fad76e qemu-cvs-ioctl_debug
Extends unsupported ioctl debug output.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
2015-08-12 09:45:52 -06:00
Ulrich Hecht
6855fbd119 qemu-cvs-gettimeofday
No clue what this is for.
2015-08-12 09:45:52 -06:00
Alexander Graf
02792b8b1d qemu-cvs-alsa_mmap
Hack to prevent ALSA from using mmap() interface to simplify emulation.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
2015-08-12 09:45:51 -06:00
Alexander Graf
f72dd33b9a qemu-cvs-alsa_ioctl
Implements ALSA ioctls on PPC hosts.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
2015-08-12 09:45:51 -06:00
Alexander Graf
cc2fb7f00d qemu-cvs-alsa_bitfield
Implements TYPE_INTBITFIELD partially. (required for ALSA support)

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
2015-08-12 09:45:51 -06:00
Ulrich Hecht
bd2ce7536f qemu-0.9.0.cvs-binfmt
Fixes binfmt_misc setup script:
- x86_64 is i386-compatible
- m68k signature fixed
- path to QEMU

Signed-off-by: Ulrich Hecht <uli@suse.de>
[AF: Update path for qemu-aarch64 for v2.0.0-rc1]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-08-12 09:45:50 -06:00
Alexander Graf
f2b44e9116 XXX work around SA_RESTART race with boehm-gc (ARM only)
[AF: CPUState -> CPUArchState, adapt to reindentation]
[BR: fix references to opaque]
2015-08-12 09:45:50 -06:00
Alexander Graf
b5af05aeb7 XXX dont dump core on sigabort 2015-08-12 09:45:50 -06:00
Michael Roth
dfa83a6bae Update version for 2.3.1 release 2015-08-10 16:09:34 -05:00
Paolo Bonzini
35a616edef qemu-char: handle EINTR for TCP character devices
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 9172f428af)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-08-10 15:31:16 -05:00
Stefan Hajnoczi
35c30d3efd rtl8139: check TCP Data Offset field (CVE-2015-5165)
The TCP Data Offset field contains the length of the header.  Make sure
it is valid and does not exceed the IP data length.

Reported-by: 朱东海(启路) <donghai.zdh@alibaba-inc.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 8357946b15)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-08-04 12:34:00 -05:00
Stefan Hajnoczi
f4c861fd68 rtl8139: skip offload on short TCP header (CVE-2015-5165)
TCP Large Segment Offload accesses the TCP header in the packet.  If the
packet is too short we must not attempt to access header fields:

  tcp_header *p_tcp_hdr = (tcp_header*)(eth_payload_data + hlen);
  int tcp_hlen = TCP_HEADER_DATA_OFFSET(p_tcp_hdr);

Reported-by: 朱东海(启路) <donghai.zdh@alibaba-inc.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 4240be4563)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-08-04 12:33:54 -05:00
Stefan Hajnoczi
b7a197c39e rtl8139: check IP Total Length field (CVE-2015-5165)
The IP Total Length field includes the IP header and data.  Make sure it
is valid and does not exceed the Ethernet payload size.

Reported-by: 朱东海(启路) <donghai.zdh@alibaba-inc.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit c6296ea88d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-08-04 12:33:48 -05:00
Stefan Hajnoczi
85611098ff rtl8139: check IP Header Length field (CVE-2015-5165)
The IP Header Length field was only checked in the IP checksum case, but
is used in other cases too.

Reported-by: 朱东海(启路) <donghai.zdh@alibaba-inc.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 03247d43c5)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-08-04 12:33:42 -05:00
Stefan Hajnoczi
ce4f451bbb rtl8139: skip offload on short Ethernet/IP header (CVE-2015-5165)
Transmit offload features access Ethernet and IP headers the packet.  If
the packet is too short we must not attempt to access header fields:

  int proto = be16_to_cpu(*(uint16_t *)(saved_buffer + 12));
  ...
  eth_payload_data = saved_buffer + ETH_HLEN;
  ...
  ip = (ip_header*)eth_payload_data;
  if (IP_HEADER_VERSION(ip) != IP_HEADER_VERSION_4) {

Reported-by: 朱东海(启路) <donghai.zdh@alibaba-inc.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit e1c120a9c5)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-08-04 12:33:36 -05:00
Stefan Hajnoczi
6722c126f3 rtl8139: drop tautologous if (ip) {...} statement (CVE-2015-5165)
The previous patch stopped using the ip pointer as an indicator that the
IP header is present.  When we reach the if (ip) {...} statement we know
ip is always non-NULL.

Remove the if statement to reduce nesting.

Reported-by: 朱东海(启路) <donghai.zdh@alibaba-inc.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit d6812d60e7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-08-04 12:33:30 -05:00
Stefan Hajnoczi
8dd45dcd83 rtl8139: avoid nested ifs in IP header parsing (CVE-2015-5165)
Transmit offload needs to parse packet headers.  If header fields have
unexpected values the offload processing is skipped.

The code currently uses nested ifs because there is relatively little
input validation.  The next patches will add missing input validation
and a goto label is more appropriate to avoid deep if statement nesting.

Reported-by: 朱东海(启路) <donghai.zdh@alibaba-inc.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 39b8e7dcaf)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-08-04 12:32:40 -05:00
Aurelien Jarno
e750591c8a tcg/mips: fix add2
The add2 code in the tcg_out_addsub2 function doesn't take into account
the case where rl == al == bl. In that case we can't compute the carry
after the addition. As it corresponds to a multiplication by 2, the
carry bit is the bit 31.

While this is a corner case, this prevents x86-64 guests to boot on a
MIPS host.

Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
(cherry picked from commit c99d69694a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-08-04 12:30:37 -05:00
Aurelien Jarno
f9c0ae2723 tcg/mips: fix TLB loading for BE host with 32-bit guests
For 32-bit guest, we load a 32-bit address from the TLB, so there is no
need to compensate for the low or high part. This fixes 32-bit guests on
big-endian hosts.

Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
(cherry picked from commit e72c4fb81d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-08-04 12:30:20 -05:00
Stefano Stabellini
c8bd74d1d5 Fix release_drive on unplugged devices (pci_piix3_xen_ide_unplug)
pci_piix3_xen_ide_unplug should completely unhook the unplugged
IDEDevice from the corresponding BlockBackend, otherwise the next call
to release_drive will try to detach the drive again.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
(cherry picked from commit 6cd387833d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-08-04 12:25:11 -05:00
Kevin Wolf
d1557697fd ide: Clear DRQ after handling all expected accesses
This is additional hardening against an end_transfer_func that fails to
clear the DRQ status bit. The bit must be unset as soon as the PIO
transfer has completed, so it's better to do this in a central place
instead of duplicating the code in all commands (and forgetting it in
some).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
(cherry picked from commit cb72cba830)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 22:19:55 -05:00
Kevin Wolf
86d6fe4cb0 ide/atapi: Fix START STOP UNIT command completion
The command must be completed on all code paths. START STOP UNIT with
pwrcnd set should succeed without doing anything.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 03441c3a4a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 22:19:39 -05:00
Kevin Wolf
9634e45e0b ide: Check array bounds before writing to io_buffer (CVE-2015-5154)
If the end_transfer_func of a command is called because enough data has
been read or written for the current PIO transfer, and it fails to
correctly call the command completion functions, the DRQ bit in the
status register and s->end_transfer_func may remain set. This allows the
guest to access further bytes in s->io_buffer beyond s->data_end, and
eventually overflowing the io_buffer.

One case where this currently happens is emulation of the ATAPI command
START STOP UNIT.

This patch fixes the problem by adding explicit array bounds checks
before accessing the buffer instead of relying on end_transfer_func to
function correctly.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
(cherry picked from commit d2ff858545)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 22:19:29 -05:00
Jeff Cody
0dc545e977 block: qemu-iotests - add check for multiplication overflow in vpc
This checks that VPC is able to successfully fail (without segfault)
on an image file with a max_table_entries that exceeds 0x40000000.

This table entry is within the valid range for VPC (although too large
for this sample image).

Cc: qemu-stable@nongnu.org
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 77c102c26e)
Conflicts:
	tests/qemu-iotests/group

* removed context dependency on iotest not present in 2.3.0 group
  file

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 22:18:23 -05:00
Jeff Cody
358f0ee234 block: vpc - prevent overflow if max_table_entries >= 0x40000000
When we allocate the pagetable based on max_table_entries, we multiply
the max table entry value by 4 to accomodate a table of 32-bit integers.
However, max_table_entries is a uint32_t, and the VPC driver accepts
ranges for that entry over 0x40000000.  So during this allocation:

s->pagetable = qemu_try_blockalign(bs->file, s->max_table_entries * 4);

The size arg overflows, allocating significantly less memory than
expected.

Since qemu_try_blockalign() size argument is size_t, cast the
multiplication correctly to prevent overflow.

The value of "max_table_entries * 4" is used elsewhere in the code as
well, so store the correct value for use in all those cases.

We also check the Max Tables Entries value, to make sure that it is <
SIZE_MAX / 4, so we know the pagetable size will fit in size_t.

Cc: qemu-stable@nongnu.org
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit b15deac795)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 22:16:01 -05:00
Paolo Bonzini
961c74a841 scsi: fix buffer overflow in scsi_req_parse_cdb (CVE-2015-5158)
This is a guest-triggerable buffer overflow present in QEMU 2.2.0
and newer.  scsi_cdb_length returns -1 as an error value, but the
caller does not check it.

Luckily, the massive overflow means that QEMU will just SIGSEGV,
making the impact much smaller.

Reported-by: Zhu Donghai (朱东海) <donghai.zdh@alibaba-inc.com>
Fixes: 1894df0281
Reviewed-by: Fam Zheng <famz@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit c170aad8b0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 22:13:25 -05:00
Alex Williamson
98fe91ed66 vfio/pci: Fix bootindex
bootindex was incorrectly changed to a device Property during the
platform code split, resulting in it no longer working.  Remove it.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-stable@nongnu.org # v2.3+
(cherry picked from commit 759b484c5d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 22:12:30 -05:00
Jason Wang
46addaa0b5 virtio-net: unbreak any layout
Commit 032a74a1c0
("virtio-net: byteswap virtio-net header") breaks any layout by
requiring out_sg[0].iov_len >= n->guest_hdr_len. Fixing this by
copying header to temporary buffer if swap is needed, and then use
this buffer as part of out_sg.

Fixes 032a74a1c0
("virtio-net: byteswap virtio-net header")
Cc: qemu-stable@nongnu.org
Cc: clg@fr.ibm.com
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

(cherry picked from commit feb93f3617)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 22:10:36 -05:00
Alex Williamson
5a4568717c vfio/pci: Fix RTL8168 NIC quirks
The RTL8168 quirk correctly describes using bit 31 as a signal to
mark a latch/completion, but the code mistakenly uses bit 28.  This
causes the Realtek driver to spin on this register for quite a while,
20k cycles on Windows 7 v7.092 driver.  Then it gets frustrated and
tries to set the bit itself and spins for another 20k cycles.  For
some this still results in a working driver, for others not.  About
the only thing the code really does in its current form is protect
the guest from sneaking in writes to the real hardware MSI-X table.
The fix is obviously to use bit 31 as we document that we should.

The other problem doesn't seem to affect current drivers as nobody
seems to use these window registers for writes to the MSI-X table, but
we need to use the stored data when a write is triggered, not the
value of the current write, which only provides the offset.

Note that only the Windows drivers from Realtek seem to use these
registers, the Microsoft drivers provided with Windows 8.1 do not
access them, nor do Linux in-kernel drivers.

Link: https://bugs.launchpad.net/qemu/+bug/1384892
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-stable@nongnu.org # v2.1+
(cherry picked from commit 69970fcef9)
Conflicts:
	hw/vfio/pci.c

* removed dependency on 3b643495

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 22:07:43 -05:00
James Hogan
87740cecc3 mips/kvm: Sign extend registers written to KVM
In case we're running on a 64-bit host, be sure to sign extend the
general purpose registers and hi/lo/pc before writing them to KVM, so as
to take advantage of MIPS32/MIPS64 compatibility.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: kvm@vger.kernel.org
Cc: qemu-stable@nongnu.org
Message-Id: <1429871214-23514-3-git-send-email-james.hogan@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 02dae26ac4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 22:00:11 -05:00
James Hogan
8df2a9acd2 mips/kvm: Fix Big endian 32-bit register access
Fix access to 32-bit registers on big endian targets. The pointer passed
to the kernel must be for the actual 32-bit value, not a temporary
64-bit value, otherwise on big endian systems the kernel will only
interpret the upper half.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: kvm@vger.kernel.org
Cc: qemu-stable@nongnu.org
Message-Id: <1429871214-23514-2-git-send-email-james.hogan@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit f8b3e48b2d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 22:00:07 -05:00
Fam Zheng
c5c71e87aa block: Initialize local_err in bdrv_append_temp_snapshot
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1436156684-16526-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit c2e0dbbfd7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 21:56:40 -05:00
马文霜
2060efae47 Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES
Last month, we experienced several guests crash(6cores-8cores), qemu logs
display the following messages:

qemu-system-x86_64: /build/qemu-2.1.2/kvm-all.c:976:
kvm_irqchip_commit_routes: Assertion `ret == 0' failed.

After analysis and verification, we can confirm it's irq-balance
daemon(in guest) leads to the assertion failure. Start a 8 core guest with
two disks, execute the following scripts will reproduce the BUG quickly:

irq_affinity.sh
========================================================================

vda_irq_num=25
vdb_irq_num=27
while [ 1 ]
do
    for irq in {1,2,4,8,10,20,40,80}
        do
            echo $irq > /proc/irq/$vda_irq_num/smp_affinity
            echo $irq > /proc/irq/$vdb_irq_num/smp_affinity
            dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct
            dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct
        done
done
========================================================================

QEMU setup static irq route entries in kvm_pc_setup_irq_routing(), PIC and
IOAPIC share the first 15 GSI numbers, take up 23 GSI numbers, but take up
38 irq route entries. When change irq smp_affinity in guest, a dynamic route
entry may be setup, the current logic is: if allocate GSI number succeeds,
a new route entry can be added. The available dynamic GSI numbers is
1021(KVM_MAX_IRQ_ROUTES-23), but available irq route entries is only
986(KVM_MAX_IRQ_ROUTES-38), GSI numbers greater than route entries.
irq-balance's behavior will eventually leads to total irq route entries
exceed KVM_MAX_IRQ_ROUTES, ioctl(KVM_SET_GSI_ROUTING) fail and
kvm_irqchip_commit_routes() trigger assertion failure.

This patch fix the BUG.

Signed-off-by: Wenshuang Ma <kevinnma@tencent.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit bdf026317d)
Conflicts:
	kvm-all.c

* remove context dependency on bd2a8884
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 21:54:31 -05:00
Michael Roth
8d64975c98 target-ppc: fix hugepage support when using memory-backend-file
Current PPC code relies on -mem-path being used in order for
hugepage support to be detected. With the introduction of
MemoryBackendFile we can now handle this via:
  -object memory-file-backend,mem-path=...,id=hugemem0 \
  -numa node,id=mem0,memdev=hugemem0

Management tools like libvirt treat the 2 approaches as
interchangeable in some cases, which can lead to user-visible
regressions even for previously supported guest configurations.

Fix these by also iterating through any configured memory
backends that may be backed by hugepages.

Since the old code assumed hugepages always backed the entirety
of guest memory, play it safe an pick the minimum across the
max pages sizes for all backends, even ones that aren't backed
by hugepages.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
(cherry picked from commit 2d103aae87)
Conflicts:
	target-ppc/kvm.c

*remove context dependency on header includes not in 2.3.0

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 21:50:55 -05:00
David Gibson
9b4420ad62 spapr_vty: lookup should only return valid VTY objects
If a guest passes the reg property of a valid VIO object that is not a VTY
to either H_GET_TERM_CHAR or H_PUT_TERM_CHAR, QEMU hits a dynamic cast
assertion and aborts.

PAPR+ says "Hypervisor checks the termno parameter for validity against the
Vterm IOA unit addresses assigned to the partition, else return H_Parameter."

This patch adds a type check to ensure vty_lookup() either returns a pointer
to a valid VTY object or NULL.  H_GET_TERM_CHAR and H_PUT_TERM_CHAR will
now return H_PARAMETER to the guest instead of crashing.

The patch has no effect on the reg == 0 hack used to implement the RTAS call
display-character.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
(cherry picked from commit 0f888bfadd)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 21:48:27 -05:00
Christian Borntraeger
99c3468d8f s390x/ipl: Fix boot if no bootindex was specified
commit fa92e218df ("s390x/ipl: avoid sign extension") introduced
a regression:

qemu-system-s390x -drive file=image.qcow,format=qcow2
does not boot, the bios states
"No virtio-blk device found!"

adding bootindex=1 does boot.

The reason is that the uint32_t as return value will not do the right
thing for the return -1 (default without bootindex).
The bios itself, will interpret a 64bit -1 as autodetect (but it will
interpret 32bit -1 as ccw device address ff.ff.ffff)

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: qemu-stable@nongnu.org # v2.3.0
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
(cherry picked from commit 6efd2c2a12)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 21:47:15 -05:00
Peter Lieven
1c17e8c7d3 block/nfs: limit maximum readahead size to 1MB
a malicious caller could otherwise specify a very
large value via the URI and force libnfs to allocate
a large amount of memory for the readahead buffer.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1435317241-25585-1-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 29c838cdc9)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 21:46:36 -05:00
John Snow
ffd060d51f iotests: add QMP event waiting queue
A filter is added to allow callers to request very specific
events to be pulled from the event queue, while leaving undesired
events still in the stream.

This allows us to poll for completion data for multiple asynchronous
events in any arbitrary order.

A new timeout context is added to the qmp pull_event method's
wait parameter to allow tests to fail if they do not complete
within some expected period of time.

Also fixed is a bug in qmp.pull_event where we try to retrieve an event
from an empty list if we attempt to retrieve an event with wait=False
but no events have occurred.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1429314609-29776-19-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 7898f74e78)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 21:46:08 -05:00
Fam Zheng
e4fb4bea37 iotests: Use event_wait in wait_ready
Only poll the specific type of event we are interested in, to avoid
stealing events that should be consumed by someone else.

Suggested-by: John Snow <jsnow@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit d7b2529792)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 21:46:08 -05:00
Fam Zheng
edc0a65326 qemu-iotests: Add test case for mirror with unmap
This checks that the discard on mirror source that effectively zeroes
data is also reflected by the data of target.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit c615091793)
Conflicts:
	tests/qemu-iotests/group

*remove context dependencies on newer block tests

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 21:46:07 -05:00
Fam Zheng
c62f6c8f67 qemu-iotests: Make block job methods common
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 866323f39d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 21:46:07 -05:00
Fam Zheng
3d8b7aed60 block: Fix dirty bitmap in bdrv_co_discard
Unsetting dirty globally with discard is not very correct. The discard may zero
out sectors (depending on can_write_zeroes_with_unmap), we should replicate
this change to destination side to make sure that the guest sees the same data.

Calling bdrv_reset_dirty also troubles mirror job because the hbitmap iterator
doesn't expect unsetting of bits after current position.

So let's do it the opposite way which fixes both problems: set the dirty bits
if we are to discard it.

Reported-by: wangxiaolong@ucloud.cn
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 508249952c)
Conflicts:
	block/io.c

* applied manually to avoid dependency on 61007b316
* squashed in 6e82e4b bdrv_reset_dirty() is static in
  2.3.0 and becomes unused as of this patch
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 21:46:07 -05:00
Fam Zheng
27ed14c4d7 mirror: Do zero write on target if sectors not allocated
If guest discards a source cluster, mirroring with bdrv_aio_readv is overkill.
Some protocols do zero upon discard, where it's best to use
bdrv_aio_write_zeroes, otherwise, bdrv_aio_discard will be enough.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit dcfb3beb51)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 21:46:07 -05:00
Fam Zheng
6a45a1b8e4 qmp: Add optional bool "unmap" to drive-mirror
If specified as "true", it allows discarding on target sectors where source is
not allocated.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 0fc9f8ea28)

* added to maintain any interdependencies between patches in the
  set. not intended as a new feature for 2.3.1, though it's there
  for anyone interested

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 21:43:36 -05:00
Fam Zheng
6cacd2651a block: Add bdrv_get_block_status_above
Like bdrv_is_allocated_above, this function follows the backing chain until seeing
BDRV_BLOCK_ALLOCATED.  Base is not included.

Reimplement bdrv_is_allocated on top.

[Initialized bdrv_co_get_block_status_above() ret to 0 to silence
mingw64 compiler warning about the unitialized variable.  assert(bs !=
base) prevents that case but I suppose the program could be compiled
with -DNDEBUG.
--Stefan]

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit ba3f0e2545)
Conflicts:
	block/io.c

* applied manually to avoid dependency on 61007b316
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 20:37:30 -05:00
Cornelia Huck
e8248a5af1 virtio-ccw: complete handling of guest-initiated resets
For a guest-initiated reset, we need to not only reset the virtio device,
but also reset the VirtioCcwDevice into a clean state. This includes
resetting the indicators, or else a guest will not be able to e.g.
switch from classic interrupts to adapter interrupts.

Split off this routine into a new function virtio_ccw_reset_virtio()
to make the distinction between resetting the virtio-related devices
and the base subchannel device clear.

CC: qemu-stable@nongnu.org
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
(cherry picked from commit fa8b0ca5d1)
Conflicts:
	hw/s390x/virtio-ccw.c

*removed context dependency on 0b352fd

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:50:08 -05:00
Jason Wang
81cb0a5657 vhost: correctly pass error to caller in vhost_dev_enable_notifiers()
We override the error value r in fail_vq, this will cause the caller
can't detect the failure which may cause the caller may disable the
notifiers twice if vhost is failed to start. Fix this by using another
variable to keep track the return value of set_host_notifier().

Fixes b0b3db7955 ("vhost-net: cleanup
host notifiers at last step")

Cc: qemu-stable@nongnu.org
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 16617e36b0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:44:36 -05:00
Laszlo Ersek
6130c46232 hw/core: rebase sysbus_get_fw_dev_path() to g_strdup_printf()
This is done mainly for improving readability, and in preparation for the
next patch, but Markus pointed out another bonus for the string being
returned:

"No arbitrary length limit. Before the patch, it's 39 characters, and the
code breaks catastrophically when qdev_fw_name() is longer: the second
snprintf() is called with its first argument pointing beyond path[], and
its second argument underflowing to a huge size."

Cc: qemu-stable@nongnu.org
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 5ba03e2dd7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:41:25 -05:00
Petr Matousek
49ef542e41 i8254: fix out-of-bounds memory access in pit_ioport_read()
Due converting PIO to the new memory read/write api we no longer provide
separate I/O region lenghts for read and write operations. As a result,
reading from PIT Mode/Command register will end with accessing
pit->channels with invalid index.

Fix this by ignoring read from the Mode/Command register.

This is CVE-2015-3214.

Reported-by: Matt Tait <matttait@google.com>
Fixes: 0505bcdec8
Cc: qemu-stable@nongnu.org
Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit d4862a87e3)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:40:21 -05:00
Gerd Hoffmann
c270245a53 spice-display: fix segfault in qemu_spice_create_update
Although it is pretty unusual the stride for the guest image and the
mirror image maintained by spice-display can be different.  So use
separate variables for them.

https://bugzilla.redhat.com/show_bug.cgi?id=1163047

Cc: qemu-stable@nongnu.org
Reported-by: perrier vincent <clownix@clownix.net>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit c6e484707f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:34:12 -05:00
Alberto Garcia
9272707a1f sdl2: fix crash in handle_windowevent() when restoring the screen size
The Ctrl-Alt-u keyboard shortcut restores the screen to its original
size. In the SDL2 UI this is done by destroying the window and
creating a new one. The old window emits SDL_WINDOWEVENT_HIDDEN when
it's destroyed, but trying to call SDL_GetWindowFromID() from that
event's window ID returns a null pointer. handle_windowevent() assumes
that the pointer is never null so it results in a crash.

Cc: qemu-stable@nongnu.org
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 08d49df0db)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:33:50 -05:00
Fam Zheng
c759f1a078 vmdk: Use vmdk_find_index_in_cluster everywhere
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 90df601f06)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:33:01 -05:00
Fam Zheng
714b54401c vmdk: Fix index_in_cluster calculation in vmdk_co_get_block_status
It has the similar issue with b1649fae49. Since the calculation
is repeated for a few times already, introduce a function so it can be
reused.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 61f0ed1d54)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:32:49 -05:00
Max Reitz
e7e08380c3 iotests: qcow2 COW with minimal L2 cache size
This adds a test case to test 103 for performing a COW operation in a
qcow2 image using an L2 cache with minimal size (which should be at
least two clusters so the COW can access both source and destination
simultaneously).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit a4291eafc5)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:27:34 -05:00
Max Reitz
c631ee6520 qcow2: Set MIN_L2_CACHE_SIZE to 2
The L2 cache must cover at least two L2 tables, because during COW two
L2 tables are accessed simultaneously.

Reported-by: Alexander Graf <agraf@suse.de>
Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Tested-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 57e2166959)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:26:34 -05:00
Gerd Hoffmann
b153c8d3f3 kbd: add brazil kbd keys to x11 evdev map
This patch adds the two extra brazilian keys to the evdev keymap for
X11.  This patch gets the two keys going with the vnc, gtk and sdl1
UIs.

The SDL2 library complains it doesn't know these keys, so the SDL2
library must be fixed before we can update ui/sdl2-keymap.h

Cc: qemu-stable@nongnu.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 33aa30cafc)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:25:10 -05:00
Gerd Hoffmann
f45048225a kbd: add brazil kbd keys to qemu
The brazilian computer keyboard layout has two extra keys (compared to
the usual 105-key intl ps/2 keyboard).  This patch makes these two keys
known to qemu.

For historic reasons qemu has two ways to specify a key:  A QKeyCode
(name-based) or a number (ps/2 scancode based).  Therefore we have to
update multiple places to make new keys known to qemu:

  (1) The QKeyCode definition in qapi-schema.json
  (2) The QKeyCode <-> number mapping table in ui/input-keymap.c

This patch does just that.  With this patch applied you can send those
two keys to the guest using the send-key monitor command.

Cc: qemu-stable@nongnu.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit b771f470f3)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:25:03 -05:00
Justin Ossevoort
ae0fa48f51 qga/commands-posix: Fix bug in guest-fstrim
The FITRIM ioctl updates the fstrim_range structure it receives. This
way the caller can determine how many bytes were trimmed. The
guest-fstrim logic reuses the same fstrim_range for each filesystem,
effectively limiting each filesystem to trim at most as much as the
previous was able to trim.

If a previous filesystem would have trimmed 0 bytes, than the next
filesystem would report an error 'Invalid argument' because a FITRIM
request with length 0 is not valid.

This change resets the fstrim_range structure for each filesystem.

Signed-off-by: Justin Ossevoort <justin@quarantainenet.nl>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
(cherry picked from commit 73a652a1b0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:24:00 -05:00
Shannon Zhao
bb3a1da4d4 hw/acpi/aml-build: Fix memory leak
Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
(cherry picked from commit afcf905cff)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:21:41 -05:00
Fam Zheng
b48a391cff qemu-iotests: Test unaligned sub-block zero write
Test zero write in byte range 512~1024 for 4k alignment.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1431522721-3266-4-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit ab53c44718)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:18:59 -05:00
Fam Zheng
cc883fe42d block: Fix NULL deference for unaligned write if qiov is NULL
For zero write, callers pass in NULL qiov (qemu-io "write -z" or
scsi-disk "write same").

Commit fc3959e466 fixed bdrv_co_write_zeroes which is the common case
for this bug, but it still exists in bdrv_aio_write_zeroes. A simpler
fix would be in bdrv_co_do_pwritev which is the NULL dereference point
and covers both cases.

So don't access it in bdrv_co_do_pwritev in this case, use three aligned
writes.

[Initialize ret to 0 in bdrv_co_do_zero_pwritev() to avoid uninitialized
variable warning with gcc 4.9.2.
--Stefan]

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1431522721-3266-3-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 9eeb6dd1b2)
Conflicts:
	block/io.c

* moved hunks into corresponding location in block.c due to lack of
  61007b316 in v2.3.0
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 18:15:27 -05:00
Michael Roth
4072585ecf Revert "block: Fix unaligned zero write"
This reverts commit fc3959e466.

From upstream commit d01c07f:
  This reverts commit fc3959e466.

  The core write code already handles the case, so remove this
  duplication.

  Because commit 61007b316 moved the touched code from block.c to
  block/io.c, the change is manually reverted.

  Signed-off-by: Fam Zheng <famz@redhat.com>
  Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
  Reviewed-by: Kevin Wolf <kwolf@redhat.com>

v2.3.0 does not contain 61007b316 so we can revert the change
directly.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-29 17:05:56 -05:00
Petr Matousek
959fad0ff1 fdc: force the fifo access to be in bounds of the allocated buffer
During processing of certain commands such as FD_CMD_READ_ID and
FD_CMD_DRIVE_SPECIFICATION_COMMAND the fifo memory access could
get out of bounds leading to memory corruption with values coming
from the guest.

Fix this by making sure that the index is always bounded by the
allocated memory.

This is CVE-2015-3456.

Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit e907746266)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-28 18:26:06 -05:00
Peter Maydell
a4bb522ee5 target-arm: Avoid buffer overrun on UNPREDICTABLE ldrd/strd
A LDRD or STRD where rd is not an even number is UNPREDICTABLE.
We were letting this fall through, which is OK unless rd is 15,
in which case we would attempt to do a load_reg or store_reg
to a nonexistent r16 for the second half of the double-word.
Catch the odd-numbered-rd cases and UNDEF them instead.

To do this we rearrange the structure of the code a little
so we can put the UNDEF catches at the top before we've
allocated TCG temporaries.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1431348973-21315-1-git-send-email-peter.maydell@linaro.org
(cherry picked from commit 3960c336ad)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-28 18:23:18 -05:00
Jason Wang
cf6c213981 virtio-net: fix the upper bound when trying to delete queues
Virtqueue were indexed from zero, so don't delete virtqueue whose
index is n->max_queues * 2 + 1.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit 27a46dcf50)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-28 18:22:35 -05:00
Michal Kazior
cf3297868c usb: fix usb-net segfault
The dev->config pointer isn't set until guest
system initializes usb devices (via
usb_desc_set_config). However qemu networking can
go through some motions prior to that, e.g.:

 #0  is_rndis (s=0x555557261970) at hw/usb/dev-network.c:653
 #1  0x000055555585f723 in usbnet_can_receive (nc=0x55555641e820) at hw/usb/dev-network.c:1315
 #2  0x000055555587635e in qemu_can_send_packet (sender=0x5555572660a0) at net/net.c:470
 #3  0x0000555555878e34 in net_hub_port_can_receive (nc=0x5555562d7800) at net/hub.c:101
 #4  0x000055555587635e in qemu_can_send_packet (sender=0x5555562d7980) at net/net.c:470
 #5  0x000055555587dbca in tap_can_send (opaque=0x5555562d7980) at net/tap.c:172

The command to reproduce most reliably was:

 qemu-system-i386 -usb -device usb-net,vlan=0 -net tap,vlan=0

This wasn't strictly a problem with tap. Other
networking endpoints (vde, user) could trigger
this problem as well.

Fixes: https://bugs.launchpad.net/qemu/+bug/1050823
Cc: qemu-stable@nongnu.org
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 278412d0e7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-28 18:20:04 -05:00
Kevin Wolf
ad9c167fd2 qcow2: Flush pending discards before allocating cluster
Before a freed cluster can be reused, pending discards for this cluster
must be processed.

The original assumption was that this was not a problem because discards
are only cached during discard/write zeroes operations, which are
synchronous so that no concurrent write requests can cause cluster
allocations.

However, the discard/write zeroes operation itself can allocate a new L2
table (and it has to in order to put zero flags there), so make sure we
can cope with the situation.

This fixes https://bugs.launchpad.net/bugs/1349972.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit ecbda7a225)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-28 18:19:33 -05:00
Fam Zheng
d8e231fce2 vmdk: Fix overflow if l1_size is 0x20000000
Richard Jones caught this bug with afl fuzzer.

In fact, that's the only possible value to overflow (extent->l1_size =
0x20000000) l1_size:

l1_size = extent->l1_size * sizeof(long) => 0x80000000;

g_try_malloc returns NULL because l1_size is interpreted as negative
during type casting from 'int' to 'gsize', which yields a enormous
value. Hence, by coincidence, we get a "not too bad" behavior:

qemu-img: Could not open '/tmp/afl6.img': Could not open
'/tmp/afl6.img': Cannot allocate memory

Values larger than 0x20000000 will be refused by the validation in
vmdk_add_extent.

Values smaller than 0x20000000 will not overflow l1_size.

Cc: qemu-stable@nongnu.org
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 13c4941cdd)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-28 18:19:02 -05:00
Fam Zheng
53cd79c117 vmdk: Fix next_cluster_sector for compressed write
This fixes the bug introduced by commit c6ac36e (vmdk: Optimize cluster
allocation).

Sometimes, write_len could be larger than cluster size, because it
contains both data and marker.  We must advance next_cluster_sector in
this case, otherwise the image gets corrupted.

Cc: qemu-stable@nongnu.org
Reported-by: Antoni Villalonga <qemu-list@friki.cat>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 5e82a31eb9)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-28 18:18:27 -05:00
Bogdan Purcareata
3dd15f3e58 nbd/trivial: fix type cast for ioctl
This fixes ioctl behavior on powerpc e6500 platforms with 64bit kernel and 32bit
userspace. The current type cast has no effect there and the value passed to the
kernel is still 0. Probably an issue related to the compiler, since I'm assuming
the same configuration works on a similar setup on x86.

Also ensure consistency with previous type cast in TRACE message.

Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Message-Id: <1428058914-32050-1-git-send-email-bogdan.purcareata@freescale.com>
Cc: qemu-stable@nongnu.org
[Fix parens as noticed by Michael. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

(cherry picked from commit d064d9f381)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-28 18:14:07 -05:00
Ján Tomko
4c59860506 Strip brackets from vnc host
Commit v2.2.0-1530-ge556032 vnc: switch to inet_listen_opts
bypassed the use of inet_parse in inet_listen, making literal
IPv6 addresses enclosed in brackets fail:

qemu-kvm: -vnc [::1]:0: Failed to start VNC server on `(null)': address
resolution failed for [::1]:5900: Name or service not known

Strip the brackets to make it work again.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 274c3b52e1)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-28 18:13:19 -05:00
Peter Lieven
b575af0730 block/iscsi: do not forget to logout from target
We actually were always impolitely dropping the connection and
not cleanly logging out.

CC: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1429193313-4263-2-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 20474e9aa0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-28 18:08:49 -05:00
Stefan Hajnoczi
d3b59789e8 bt-sdp: fix broken uuids power-of-2 calculation
The binary search in sdp_uuid_match() only works when the number of
elements to search is a power of two.

  lo = record->uuid;
  hi = record->uuids;
  while (hi >>= 1)
      if (lo[hi] <= val)
          lo += hi;

  return *lo == val;

I noticed that the record->uuids calculation in
sdp_service_record_build() was suspect:

  record->uuids = 1 << ffs(record->uuids - 1);

Unlike most ffs(val) - 1 users, the expression is ffs(val - 1)!

Actually ffs() is the wrong function to use for power-of-2.  Use
pow2ceil() to achieve the correct effect.  Now the record->uuid[] array
is sized correctly and the binary search in sdp_uuid_match() should
work.

I'm not sure how to run/test this code.

Cc: Andrzej Zaborowski <balrog@zabor.org>
Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1427124571-28598-2-git-send-email-stefanha@redhat.com
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 588ef9d411)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-07-28 17:46:44 -05:00
898 changed files with 16589 additions and 35115 deletions

View File

@@ -172,8 +172,7 @@ F: hw/unicore32/
X86
M: Paolo Bonzini <pbonzini@redhat.com>
M: Richard Henderson <rth@twiddle.net>
M: Eduardo Habkost <ehabkost@redhat.com>
S: Maintained
S: Odd Fixes
F: target-i386/
F: hw/i386/
@@ -704,13 +703,10 @@ F: tests/virtio-9p-test.c
T: git git://github.com/kvaneesh/QEMU.git
virtio-blk
M: Kevin Wolf <kwolf@redhat.com>
M: Stefan Hajnoczi <stefanha@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: hw/block/virtio-blk.c
F: hw/block/dataplane/*
F: hw/virtio/dataplane/*
T: git git://github.com/stefanha/qemu.git block
virtio-ccw
M: Cornelia Huck <cornelia.huck@de.ibm.com>
@@ -735,14 +731,12 @@ F: backends/rng*.c
nvme
M: Keith Busch <keith.busch@intel.com>
L: qemu-block@nongnu.org
S: Supported
F: hw/block/nvme*
F: tests/nvme-test.c
megasas
M: Hannes Reinecke <hare@suse.de>
L: qemu-block@nongnu.org
S: Supported
F: hw/scsi/megasas.c
F: hw/scsi/mfi.h
@@ -760,12 +754,6 @@ S: Maintained
F: hw/net/vmxnet*
F: hw/scsi/vmw_pvscsi*
Rocker
M: Scott Feldman <sfeldma@gmail.com>
M: Jiri Pirko <jiri@resnulli.us>
S: Maintained
F: hw/net/rocker/
Subsystems
----------
Audio
@@ -778,27 +766,21 @@ F: tests/ac97-test.c
F: tests/es1370-test.c
F: tests/intel-hda-test.c
Block layer core
Block
M: Kevin Wolf <kwolf@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block*
F: block/
F: hw/block/
F: include/block/
F: qemu-img*
F: qemu-io*
F: tests/qemu-iotests/
T: git git://repo.or.cz/qemu/kevin.git block
Block I/O path
M: Stefan Hajnoczi <stefanha@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: async.c
F: aio-*.c
F: block/io.c
F: block*
F: block/
F: hw/block/
F: migration/block*
F: qemu-img*
F: qemu-io*
F: tests/image-fuzzer/
F: tests/qemu-iotests/
T: git git://repo.or.cz/qemu/kevin.git block
T: git git://github.com/stefanha/qemu.git block
Block Jobs
@@ -813,14 +795,6 @@ F: block/stream.h
F: block/mirror.c
T: git git://github.com/codyprime/qemu-kvm-jtc.git block
Block QAPI, monitor, command line
M: Markus Armbruster <armbru@redhat.com>
S: Supported
F: blockdev.c
F: block/qapi.c
F: qapi/block*.json
T: git git://repo.or.cz/qemu/armbru.git block-next
Character Devices
M: Paolo Bonzini <pbonzini@redhat.com>
S: Maintained
@@ -931,29 +905,21 @@ F: nbd.*
F: qemu-nbd.c
T: git git://github.com/bonzini/qemu.git nbd-next
NUMA
M: Eduardo Habkost <ehabkost@redhat.com>
S: Maintained
F: numa.c
F: include/sysemu/numa.h
K: numa|NUMA
K: srat|SRAT
T: git git://github.com/ehabkost/qemu.git numa
QAPI
M: Markus Armbruster <armbru@redhat.com>
M: Luiz Capitulino <lcapitulino@redhat.com>
M: Michael Roth <mdroth@linux.vnet.ibm.com>
S: Supported
S: Maintained
F: qapi/
F: tests/qapi-schema/
T: git git://repo.or.cz/qemu/armbru.git qapi-next
T: git git://repo.or.cz/qemu/qmp-unstable.git queue/qmp
QAPI Schema
M: Eric Blake <eblake@redhat.com>
M: Luiz Capitulino <lcapitulino@redhat.com>
M: Markus Armbruster <armbru@redhat.com>
S: Supported
F: qapi-schema.json
T: git git://repo.or.cz/qemu/armbru.git qapi-next
T: git git://repo.or.cz/qemu/qmp-unstable.git queue/qmp
QObject
M: Luiz Capitulino <lcapitulino@redhat.com>
@@ -978,14 +944,13 @@ X: qom/cpu.c
F: tests/qom-test.c
QMP
M: Markus Armbruster <armbru@redhat.com>
S: Supported
M: Luiz Capitulino <lcapitulino@redhat.com>
S: Maintained
F: qmp.c
F: monitor.c
F: qmp-commands.hx
F: docs/qmp/
F: scripts/qmp/
T: git git://repo.or.cz/qemu/armbru.git qapi-next
F: QMP/
T: git git://repo.or.cz/qemu/qmp-unstable.git queue/qmp
SLIRP
M: Jan Kiszka <jan.kiszka@siemens.com>
@@ -1129,7 +1094,6 @@ Block drivers
-------------
VMDK
M: Fam Zheng <famz@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/vmdk.c
@@ -1160,7 +1124,6 @@ T: git git://github.com/codyprime/qemu-kvm-jtc.git block
VDI
M: Stefan Weil <sw@weilnetz.de>
L: qemu-block@nongnu.org
S: Maintained
F: block/vdi.c
@@ -1168,7 +1131,6 @@ iSCSI
M: Ronnie Sahlberg <ronniesahlberg@gmail.com>
M: Paolo Bonzini <pbonzini@redhat.com>
M: Peter Lieven <pl@kamp.de>
L: qemu-block@nongnu.org
S: Supported
F: block/iscsi.c
@@ -1210,102 +1172,7 @@ S: Supported
F: block/gluster.c
T: git git://github.com/codyprime/qemu-kvm-jtc.git block
Null Block Driver
M: Fam Zheng <famz@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/null.c
Bootdevice
M: Gonglei <arei.gonglei@huawei.com>
S: Maintained
F: bootdevice.c
Quorum
M: Alberto Garcia <berto@igalia.com>
S: Supported
F: block/quorum.c
L: qemu-block@nongnu.org
blkverify
M: Stefan Hajnoczi <stefanha@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/blkverify.c
bochs
M: Stefan Hajnoczi <stefanha@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/bochs.c
cloop
M: Stefan Hajnoczi <stefanha@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/cloop.c
dmg
M: Stefan Hajnoczi <stefanha@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/dmg.c
parallels
M: Stefan Hajnoczi <stefanha@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/parallels.c
qed
M: Stefan Hajnoczi <stefanha@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/qed.c
raw
M: Kevin Wolf <kwolf@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/linux-aio.c
F: block/raw-aio.h
F: block/raw-posix.c
F: block/raw-win32.c
F: block/raw_bsd.c
F: block/win32-aio.c
qcow2
M: Kevin Wolf <kwolf@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/qcow2*
qcow
M: Kevin Wolf <kwolf@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/qcow.c
blkdebug
M: Kevin Wolf <kwolf@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/blkdebug.c
vpc
M: Kevin Wolf <kwolf@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/vpc.c
vvfat
M: Kevin Wolf <kwolf@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/vvfat.c
Image format fuzzer
M: Stefan Hajnoczi <stefanha@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: tests/image-fuzzer/

View File

@@ -243,17 +243,17 @@ qapi-py = $(SRC_PATH)/scripts/qapi.py $(SRC_PATH)/scripts/ordereddict.py
qga/qapi-generated/qga-qapi-types.c qga/qapi-generated/qga-qapi-types.h :\
$(SRC_PATH)/qga/qapi-schema.json $(SRC_PATH)/scripts/qapi-types.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-types.py \
$(gen-out-type) -o qga/qapi-generated -p "qga-" $<, \
$(gen-out-type) -o qga/qapi-generated -p "qga-" -i $<, \
" GEN $@")
qga/qapi-generated/qga-qapi-visit.c qga/qapi-generated/qga-qapi-visit.h :\
$(SRC_PATH)/qga/qapi-schema.json $(SRC_PATH)/scripts/qapi-visit.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-visit.py \
$(gen-out-type) -o qga/qapi-generated -p "qga-" $<, \
$(gen-out-type) -o qga/qapi-generated -p "qga-" -i $<, \
" GEN $@")
qga/qapi-generated/qga-qmp-commands.h qga/qapi-generated/qga-qmp-marshal.c :\
$(SRC_PATH)/qga/qapi-schema.json $(SRC_PATH)/scripts/qapi-commands.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-commands.py \
$(gen-out-type) -o qga/qapi-generated -p "qga-" $<, \
$(gen-out-type) -o qga/qapi-generated -p "qga-" -i $<, \
" GEN $@")
qapi-modules = $(SRC_PATH)/qapi-schema.json $(SRC_PATH)/qapi/common.json \
@@ -263,22 +263,22 @@ qapi-modules = $(SRC_PATH)/qapi-schema.json $(SRC_PATH)/qapi/common.json \
qapi-types.c qapi-types.h :\
$(qapi-modules) $(SRC_PATH)/scripts/qapi-types.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-types.py \
$(gen-out-type) -o "." -b $<, \
$(gen-out-type) -o "." -b -i $<, \
" GEN $@")
qapi-visit.c qapi-visit.h :\
$(qapi-modules) $(SRC_PATH)/scripts/qapi-visit.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-visit.py \
$(gen-out-type) -o "." -b $<, \
$(gen-out-type) -o "." -b -i $<, \
" GEN $@")
qapi-event.c qapi-event.h :\
$(qapi-modules) $(SRC_PATH)/scripts/qapi-event.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-event.py \
$(gen-out-type) -o "." $<, \
$(gen-out-type) -o "." -b -i $<, \
" GEN $@")
qmp-commands.h qmp-marshal.c :\
$(qapi-modules) $(SRC_PATH)/scripts/qapi-commands.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-commands.py \
$(gen-out-type) -o "." -m $<, \
$(gen-out-type) -o "." -m -i $<, \
" GEN $@")
QGALIB_GEN=$(addprefix qga/qapi-generated/, qga-qapi-types.h qga-qapi-visit.h qga-qmp-commands.h)
@@ -296,7 +296,6 @@ clean:
rm -f fsdev/*.pod
rm -rf .libs */.libs
rm -f qemu-img-cmds.h
rm -f ui/shader/*-vert.h ui/shader/*-frag.h
@# May not be present in GENERATED_HEADERS
rm -f trace/generated-tracers-dtrace.dtrace*
rm -f trace/generated-tracers-dtrace.h*
@@ -442,22 +441,6 @@ cscope:
find "$(SRC_PATH)" -name "*.[chsS]" -print | sed 's,^\./,,' > ./cscope.files
cscope -b
# opengl shader programs
ui/shader/%-vert.h: $(SRC_PATH)/ui/shader/%.vert $(SRC_PATH)/scripts/shaderinclude.pl
@mkdir -p $(dir $@)
$(call quiet-command,\
perl $(SRC_PATH)/scripts/shaderinclude.pl $< > $@,\
" VERT $@")
ui/shader/%-frag.h: $(SRC_PATH)/ui/shader/%.frag $(SRC_PATH)/scripts/shaderinclude.pl
@mkdir -p $(dir $@)
$(call quiet-command,\
perl $(SRC_PATH)/scripts/shaderinclude.pl $< > $@,\
" FRAG $@")
ui/console-gl.o: $(SRC_PATH)/ui/console-gl.c \
ui/shader/texture-blit-vert.h ui/shader/texture-blit-frag.h
# documentation
MAKEINFO=makeinfo
MAKEINFOFLAGS=--no-headers --no-split --number-sections

View File

@@ -34,6 +34,10 @@ endif
PROGS=$(QEMU_PROG) $(QEMU_PROGW)
STPFILES=
ifdef CONFIG_LINUX_USER
PROGS+=$(QEMU_PROG)-binfmt
endif
config-target.h: config-target.h-timestamp
config-target.h-timestamp: config-target.mak
@@ -108,6 +112,8 @@ QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR) -I$(SRC_PATH)/linux-user
obj-y += linux-user/
obj-y += gdbstub.o thunk.o user-exec.o
obj-binfmt-y += linux-user/
endif #CONFIG_LINUX_USER
#########################################################
@@ -134,7 +140,7 @@ obj-$(CONFIG_KVM) += kvm-all.o
obj-y += memory.o savevm.o cputlb.o
obj-y += memory_mapping.o
obj-y += dump.o
LIBS := $(libs_softmmu) $(LIBS)
LIBS+=$(libs_softmmu)
# xen support
obj-$(CONFIG_XEN) += xen-common.o
@@ -156,7 +162,11 @@ endif # CONFIG_SOFTMMU
# Workaround for http://gcc.gnu.org/PR55489, see configure.
%/translate.o: QEMU_CFLAGS += $(TRANSLATE_OPT_CFLAGS)
ifdef CONFIG_LINUX_USER
dummy := $(call unnest-vars,,obj-y obj-binfmt-y)
else
dummy := $(call unnest-vars,,obj-y)
endif
all-obj-y := $(obj-y)
target-obj-y :=
@@ -180,10 +190,9 @@ $(QEMU_PROG_BUILD): config-devices.mak
# build either PROG or PROGW
$(QEMU_PROG_BUILD): $(all-obj-y) ../libqemuutil.a ../libqemustub.a
$(call LINK, $(filter-out %.mak, $^))
ifdef CONFIG_DARWIN
$(call quiet-command,Rez -append $(SRC_PATH)/pc-bios/qemu.rsrc -o $@," REZ $(TARGET_DIR)$@")
$(call quiet-command,SetFile -a C $@," SETFILE $(TARGET_DIR)$@")
endif
$(QEMU_PROG)-binfmt: $(obj-binfmt-y)
$(call LINK,$^)
gdbstub-xml.c: $(TARGET_XML_FILES) $(SRC_PATH)/scripts/feature_to_c.sh
$(call quiet-command,rm -f $@ && $(SHELL) $(SRC_PATH)/scripts/feature_to_c.sh $@ $(TARGET_XML_FILES)," GEN $(TARGET_DIR)$@")

View File

@@ -1 +1 @@
2.3.50
2.3.1

View File

@@ -24,6 +24,7 @@ struct AioHandler
IOHandler *io_read;
IOHandler *io_write;
int deleted;
int pollfds_idx;
void *opaque;
QLIST_ENTRY(AioHandler) node;
};
@@ -82,6 +83,7 @@ void aio_set_fd_handler(AioContext *ctx,
node->io_read = io_read;
node->io_write = io_write;
node->opaque = opaque;
node->pollfds_idx = -1;
node->pfd.events = (io_read ? G_IO_IN | G_IO_HUP | G_IO_ERR : 0);
node->pfd.events |= (io_write ? G_IO_OUT | G_IO_ERR : 0);
@@ -184,61 +186,13 @@ bool aio_dispatch(AioContext *ctx)
return progress;
}
/* These thread-local variables are used only in a small part of aio_poll
* around the call to the poll() system call. In particular they are not
* used while aio_poll is performing callbacks, which makes it much easier
* to think about reentrancy!
*
* Stack-allocated arrays would be perfect but they have size limitations;
* heap allocation is expensive enough that we want to reuse arrays across
* calls to aio_poll(). And because poll() has to be called without holding
* any lock, the arrays cannot be stored in AioContext. Thread-local data
* has none of the disadvantages of these three options.
*/
static __thread GPollFD *pollfds;
static __thread AioHandler **nodes;
static __thread unsigned npfd, nalloc;
static __thread Notifier pollfds_cleanup_notifier;
static void pollfds_cleanup(Notifier *n, void *unused)
{
g_assert(npfd == 0);
g_free(pollfds);
g_free(nodes);
nalloc = 0;
}
static void add_pollfd(AioHandler *node)
{
if (npfd == nalloc) {
if (nalloc == 0) {
pollfds_cleanup_notifier.notify = pollfds_cleanup;
qemu_thread_atexit_add(&pollfds_cleanup_notifier);
nalloc = 8;
} else {
g_assert(nalloc <= INT_MAX);
nalloc *= 2;
}
pollfds = g_renew(GPollFD, pollfds, nalloc);
nodes = g_renew(AioHandler *, nodes, nalloc);
}
nodes[npfd] = node;
pollfds[npfd] = (GPollFD) {
.fd = node->pfd.fd,
.events = node->pfd.events,
};
npfd++;
}
bool aio_poll(AioContext *ctx, bool blocking)
{
AioHandler *node;
bool was_dispatching;
int i, ret;
int ret;
bool progress;
int64_t timeout;
aio_context_acquire(ctx);
was_dispatching = ctx->dispatching;
progress = false;
@@ -256,36 +210,39 @@ bool aio_poll(AioContext *ctx, bool blocking)
ctx->walking_handlers++;
assert(npfd == 0);
g_array_set_size(ctx->pollfds, 0);
/* fill pollfds */
QLIST_FOREACH(node, &ctx->aio_handlers, node) {
node->pollfds_idx = -1;
if (!node->deleted && node->pfd.events) {
add_pollfd(node);
GPollFD pfd = {
.fd = node->pfd.fd,
.events = node->pfd.events,
};
node->pollfds_idx = ctx->pollfds->len;
g_array_append_val(ctx->pollfds, pfd);
}
}
timeout = blocking ? aio_compute_timeout(ctx) : 0;
ctx->walking_handlers--;
/* wait until next event */
if (timeout) {
aio_context_release(ctx);
}
ret = qemu_poll_ns((GPollFD *)pollfds, npfd, timeout);
if (timeout) {
aio_context_acquire(ctx);
}
ret = qemu_poll_ns((GPollFD *)ctx->pollfds->data,
ctx->pollfds->len,
blocking ? aio_compute_timeout(ctx) : 0);
/* if we have any readable fds, dispatch event */
if (ret > 0) {
for (i = 0; i < npfd; i++) {
nodes[i]->pfd.revents = pollfds[i].revents;
QLIST_FOREACH(node, &ctx->aio_handlers, node) {
if (node->pollfds_idx != -1) {
GPollFD *pfd = &g_array_index(ctx->pollfds, GPollFD,
node->pollfds_idx);
node->pfd.revents = pfd->revents;
}
}
}
npfd = 0;
ctx->walking_handlers--;
/* Run dispatch even if there were no readable fds to run timers */
aio_set_dispatching(ctx, true);
if (aio_dispatch(ctx)) {
@@ -293,7 +250,5 @@ bool aio_poll(AioContext *ctx, bool blocking)
}
aio_set_dispatching(ctx, was_dispatching);
aio_context_release(ctx);
return progress;
}

View File

@@ -283,7 +283,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
int count;
int timeout;
aio_context_acquire(ctx);
have_select_revents = aio_prepare(ctx);
if (have_select_revents) {
blocking = false;
@@ -324,13 +323,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
timeout = blocking
? qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)) : 0;
if (timeout) {
aio_context_release(ctx);
}
ret = WaitForMultipleObjects(count, events, FALSE, timeout);
if (timeout) {
aio_context_acquire(ctx);
}
aio_set_dispatching(ctx, true);
if (first && aio_bh_poll(ctx)) {
@@ -356,6 +349,5 @@ bool aio_poll(AioContext *ctx, bool blocking)
progress |= timerlistgroup_run_timers(&ctx->tlg);
aio_set_dispatching(ctx, was_dispatching);
aio_context_release(ctx);
return progress;
}

View File

@@ -24,7 +24,6 @@
#include <stdint.h>
#include <stdarg.h>
#include <stdlib.h>
#include <zlib.h>
#ifndef _WIN32
#include <sys/types.h>
#include <sys/mman.h>
@@ -128,7 +127,6 @@ static uint64_t bitmap_sync_count;
#define RAM_SAVE_FLAG_CONTINUE 0x20
#define RAM_SAVE_FLAG_XBZRLE 0x40
/* 0x80 is reserved in migration.h start with 0x100 next */
#define RAM_SAVE_FLAG_COMPRESS_PAGE 0x100
static struct defconfig_file {
const char *filename;
@@ -318,147 +316,6 @@ static uint64_t migration_dirty_pages;
static uint32_t last_version;
static bool ram_bulk_stage;
struct CompressParam {
bool start;
bool done;
QEMUFile *file;
QemuMutex mutex;
QemuCond cond;
RAMBlock *block;
ram_addr_t offset;
};
typedef struct CompressParam CompressParam;
struct DecompressParam {
bool start;
QemuMutex mutex;
QemuCond cond;
void *des;
uint8 *compbuf;
int len;
};
typedef struct DecompressParam DecompressParam;
static CompressParam *comp_param;
static QemuThread *compress_threads;
/* comp_done_cond is used to wake up the migration thread when
* one of the compression threads has finished the compression.
* comp_done_lock is used to co-work with comp_done_cond.
*/
static QemuMutex *comp_done_lock;
static QemuCond *comp_done_cond;
/* The empty QEMUFileOps will be used by file in CompressParam */
static const QEMUFileOps empty_ops = { };
static bool compression_switch;
static bool quit_comp_thread;
static bool quit_decomp_thread;
static DecompressParam *decomp_param;
static QemuThread *decompress_threads;
static uint8_t *compressed_data_buf;
static int do_compress_ram_page(CompressParam *param);
static void *do_data_compress(void *opaque)
{
CompressParam *param = opaque;
while (!quit_comp_thread) {
qemu_mutex_lock(&param->mutex);
/* Re-check the quit_comp_thread in case of
* terminate_compression_threads is called just before
* qemu_mutex_lock(&param->mutex) and after
* while(!quit_comp_thread), re-check it here can make
* sure the compression thread terminate as expected.
*/
while (!param->start && !quit_comp_thread) {
qemu_cond_wait(&param->cond, &param->mutex);
}
if (!quit_comp_thread) {
do_compress_ram_page(param);
}
param->start = false;
qemu_mutex_unlock(&param->mutex);
qemu_mutex_lock(comp_done_lock);
param->done = true;
qemu_cond_signal(comp_done_cond);
qemu_mutex_unlock(comp_done_lock);
}
return NULL;
}
static inline void terminate_compression_threads(void)
{
int idx, thread_count;
thread_count = migrate_compress_threads();
quit_comp_thread = true;
for (idx = 0; idx < thread_count; idx++) {
qemu_mutex_lock(&comp_param[idx].mutex);
qemu_cond_signal(&comp_param[idx].cond);
qemu_mutex_unlock(&comp_param[idx].mutex);
}
}
void migrate_compress_threads_join(void)
{
int i, thread_count;
if (!migrate_use_compression()) {
return;
}
terminate_compression_threads();
thread_count = migrate_compress_threads();
for (i = 0; i < thread_count; i++) {
qemu_thread_join(compress_threads + i);
qemu_fclose(comp_param[i].file);
qemu_mutex_destroy(&comp_param[i].mutex);
qemu_cond_destroy(&comp_param[i].cond);
}
qemu_mutex_destroy(comp_done_lock);
qemu_cond_destroy(comp_done_cond);
g_free(compress_threads);
g_free(comp_param);
g_free(comp_done_cond);
g_free(comp_done_lock);
compress_threads = NULL;
comp_param = NULL;
comp_done_cond = NULL;
comp_done_lock = NULL;
}
void migrate_compress_threads_create(void)
{
int i, thread_count;
if (!migrate_use_compression()) {
return;
}
quit_comp_thread = false;
compression_switch = true;
thread_count = migrate_compress_threads();
compress_threads = g_new0(QemuThread, thread_count);
comp_param = g_new0(CompressParam, thread_count);
comp_done_cond = g_new0(QemuCond, 1);
comp_done_lock = g_new0(QemuMutex, 1);
qemu_cond_init(comp_done_cond);
qemu_mutex_init(comp_done_lock);
for (i = 0; i < thread_count; i++) {
/* com_param[i].file is just used as a dummy buffer to save data, set
* it's ops to empty.
*/
comp_param[i].file = qemu_fopen_ops(NULL, &empty_ops);
comp_param[i].done = true;
qemu_mutex_init(&comp_param[i].mutex);
qemu_cond_init(&comp_param[i].cond);
qemu_thread_create(compress_threads + i, "compress",
do_data_compress, comp_param + i,
QEMU_THREAD_JOINABLE);
}
}
/**
* save_page_header: Write page header to wire
*
@@ -663,16 +520,12 @@ static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
static int64_t start_time;
static int64_t bytes_xfer_prev;
static int64_t num_dirty_pages_period;
static uint64_t xbzrle_cache_miss_prev;
static uint64_t iterations_prev;
static void migration_bitmap_sync_init(void)
{
start_time = 0;
bytes_xfer_prev = 0;
num_dirty_pages_period = 0;
xbzrle_cache_miss_prev = 0;
iterations_prev = 0;
}
/* Called with iothread lock held, to protect ram_list.dirty_memory[] */
@@ -683,6 +536,8 @@ static void migration_bitmap_sync(void)
MigrationState *s = migrate_get_current();
int64_t end_time;
int64_t bytes_xfer_now;
static uint64_t xbzrle_cache_miss_prev;
static uint64_t iterations_prev;
bitmap_sync_count++;
@@ -730,7 +585,7 @@ static void migration_bitmap_sync(void)
mig_throttle_on = false;
}
if (migrate_use_xbzrle()) {
if (iterations_prev != acct_info.iterations) {
if (iterations_prev != 0) {
acct_info.xbzrle_cache_miss_rate =
(double)(acct_info.xbzrle_cache_miss -
xbzrle_cache_miss_prev) /
@@ -744,36 +599,8 @@ static void migration_bitmap_sync(void)
s->dirty_bytes_rate = s->dirty_pages_rate * TARGET_PAGE_SIZE;
start_time = end_time;
num_dirty_pages_period = 0;
s->dirty_sync_count = bitmap_sync_count;
}
s->dirty_sync_count = bitmap_sync_count;
}
/**
* save_zero_page: Send the zero page to the stream
*
* Returns: Number of pages written.
*
* @f: QEMUFile where to send the data
* @block: block that contains the page we want to send
* @offset: offset inside the block for the page
* @p: pointer to the page
* @bytes_transferred: increase it with the number of transferred bytes
*/
static int save_zero_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
uint8_t *p, uint64_t *bytes_transferred)
{
int pages = -1;
if (is_zero_range(p, TARGET_PAGE_SIZE)) {
acct_info.dup_pages++;
*bytes_transferred += save_page_header(f, block,
offset | RAM_SAVE_FLAG_COMPRESS);
qemu_put_byte(f, 0);
*bytes_transferred += 1;
pages = 1;
}
return pages;
}
/**
@@ -824,22 +651,25 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
acct_info.dup_pages++;
}
}
} else {
pages = save_zero_page(f, block, offset, p, bytes_transferred);
if (pages > 0) {
/* Must let xbzrle know, otherwise a previous (now 0'd) cached
* page would be stale
} else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
acct_info.dup_pages++;
*bytes_transferred += save_page_header(f, block,
offset | RAM_SAVE_FLAG_COMPRESS);
qemu_put_byte(f, 0);
*bytes_transferred += 1;
pages = 1;
/* Must let xbzrle know, otherwise a previous (now 0'd) cached
* page would be stale
*/
xbzrle_cache_zero_page(current_addr);
} else if (!ram_bulk_stage && migrate_use_xbzrle()) {
pages = save_xbzrle_page(f, &p, current_addr, block,
offset, last_stage, bytes_transferred);
if (!last_stage) {
/* Can't send this cached data async, since the cache page
* might get updated before it gets to the wire
*/
xbzrle_cache_zero_page(current_addr);
} else if (!ram_bulk_stage && migrate_use_xbzrle()) {
pages = save_xbzrle_page(f, &p, current_addr, block,
offset, last_stage, bytes_transferred);
if (!last_stage) {
/* Can't send this cached data async, since the cache page
* might get updated before it gets to the wire
*/
send_async = false;
}
send_async = false;
}
}
@@ -862,178 +692,6 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
return pages;
}
static int do_compress_ram_page(CompressParam *param)
{
int bytes_sent, blen;
uint8_t *p;
RAMBlock *block = param->block;
ram_addr_t offset = param->offset;
p = memory_region_get_ram_ptr(block->mr) + (offset & TARGET_PAGE_MASK);
bytes_sent = save_page_header(param->file, block, offset |
RAM_SAVE_FLAG_COMPRESS_PAGE);
blen = qemu_put_compression_data(param->file, p, TARGET_PAGE_SIZE,
migrate_compress_level());
bytes_sent += blen;
return bytes_sent;
}
static inline void start_compression(CompressParam *param)
{
param->done = false;
qemu_mutex_lock(&param->mutex);
param->start = true;
qemu_cond_signal(&param->cond);
qemu_mutex_unlock(&param->mutex);
}
static inline void start_decompression(DecompressParam *param)
{
qemu_mutex_lock(&param->mutex);
param->start = true;
qemu_cond_signal(&param->cond);
qemu_mutex_unlock(&param->mutex);
}
static uint64_t bytes_transferred;
static void flush_compressed_data(QEMUFile *f)
{
int idx, len, thread_count;
if (!migrate_use_compression()) {
return;
}
thread_count = migrate_compress_threads();
for (idx = 0; idx < thread_count; idx++) {
if (!comp_param[idx].done) {
qemu_mutex_lock(comp_done_lock);
while (!comp_param[idx].done && !quit_comp_thread) {
qemu_cond_wait(comp_done_cond, comp_done_lock);
}
qemu_mutex_unlock(comp_done_lock);
}
if (!quit_comp_thread) {
len = qemu_put_qemu_file(f, comp_param[idx].file);
bytes_transferred += len;
}
}
}
static inline void set_compress_params(CompressParam *param, RAMBlock *block,
ram_addr_t offset)
{
param->block = block;
param->offset = offset;
}
static int compress_page_with_multi_thread(QEMUFile *f, RAMBlock *block,
ram_addr_t offset,
uint64_t *bytes_transferred)
{
int idx, thread_count, bytes_xmit = -1, pages = -1;
thread_count = migrate_compress_threads();
qemu_mutex_lock(comp_done_lock);
while (true) {
for (idx = 0; idx < thread_count; idx++) {
if (comp_param[idx].done) {
bytes_xmit = qemu_put_qemu_file(f, comp_param[idx].file);
set_compress_params(&comp_param[idx], block, offset);
start_compression(&comp_param[idx]);
pages = 1;
acct_info.norm_pages++;
*bytes_transferred += bytes_xmit;
break;
}
}
if (pages > 0) {
break;
} else {
qemu_cond_wait(comp_done_cond, comp_done_lock);
}
}
qemu_mutex_unlock(comp_done_lock);
return pages;
}
/**
* ram_save_compressed_page: compress the given page and send it to the stream
*
* Returns: Number of pages written.
*
* @f: QEMUFile where to send the data
* @block: block that contains the page we want to send
* @offset: offset inside the block for the page
* @last_stage: if we are at the completion stage
* @bytes_transferred: increase it with the number of transferred bytes
*/
static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
ram_addr_t offset, bool last_stage,
uint64_t *bytes_transferred)
{
int pages = -1;
uint64_t bytes_xmit;
MemoryRegion *mr = block->mr;
uint8_t *p;
int ret;
p = memory_region_get_ram_ptr(mr) + offset;
bytes_xmit = 0;
ret = ram_control_save_page(f, block->offset,
offset, TARGET_PAGE_SIZE, &bytes_xmit);
if (bytes_xmit) {
*bytes_transferred += bytes_xmit;
pages = 1;
}
if (block == last_sent_block) {
offset |= RAM_SAVE_FLAG_CONTINUE;
}
if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
if (ret != RAM_SAVE_CONTROL_DELAYED) {
if (bytes_xmit > 0) {
acct_info.norm_pages++;
} else if (bytes_xmit == 0) {
acct_info.dup_pages++;
}
}
} else {
/* When starting the process of a new block, the first page of
* the block should be sent out before other pages in the same
* block, and all the pages in last block should have been sent
* out, keeping this order is important, because the 'cont' flag
* is used to avoid resending the block name.
*/
if (block != last_sent_block) {
flush_compressed_data(f);
pages = save_zero_page(f, block, offset, p, bytes_transferred);
if (pages == -1) {
set_compress_params(&comp_param[0], block, offset);
/* Use the qemu thread to compress the data to make sure the
* first page is sent out before other pages
*/
bytes_xmit = do_compress_ram_page(&comp_param[0]);
acct_info.norm_pages++;
qemu_put_qemu_file(f, comp_param[0].file);
*bytes_transferred += bytes_xmit;
pages = 1;
}
} else {
pages = save_zero_page(f, block, offset, p, bytes_transferred);
if (pages == -1) {
pages = compress_page_with_multi_thread(f, block, offset,
bytes_transferred);
}
}
}
return pages;
}
/**
* ram_find_and_save_block: Finds a dirty page and sends it to f
*
@@ -1073,22 +731,10 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
block = QLIST_FIRST_RCU(&ram_list.blocks);
complete_round = true;
ram_bulk_stage = false;
if (migrate_use_xbzrle()) {
/* If xbzrle is on, stop using the data compression at this
* point. In theory, xbzrle can do better than compression.
*/
flush_compressed_data(f);
compression_switch = false;
}
}
} else {
if (compression_switch && migrate_use_compression()) {
pages = ram_save_compressed_page(f, block, offset, last_stage,
bytes_transferred);
} else {
pages = ram_save_page(f, block, offset, last_stage,
bytes_transferred);
}
pages = ram_save_page(f, block, offset, last_stage,
bytes_transferred);
/* if page is unmodified, continue to the next */
if (pages > 0) {
@@ -1104,6 +750,8 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
return pages;
}
static uint64_t bytes_transferred;
void acct_update_position(QEMUFile *f, size_t size, bool zero)
{
uint64_t pages = size / TARGET_PAGE_SIZE;
@@ -1317,7 +965,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
}
i++;
}
flush_compressed_data(f);
rcu_read_unlock();
/*
@@ -1359,7 +1006,6 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
}
}
flush_compressed_data(f);
ram_control_after_iterate(f, RAM_CONTROL_FINISH);
migration_end();
@@ -1467,104 +1113,10 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
}
}
static void *do_data_decompress(void *opaque)
{
DecompressParam *param = opaque;
unsigned long pagesize;
while (!quit_decomp_thread) {
qemu_mutex_lock(&param->mutex);
while (!param->start && !quit_decomp_thread) {
qemu_cond_wait(&param->cond, &param->mutex);
pagesize = TARGET_PAGE_SIZE;
if (!quit_decomp_thread) {
/* uncompress() will return failed in some case, especially
* when the page is dirted when doing the compression, it's
* not a problem because the dirty page will be retransferred
* and uncompress() won't break the data in other pages.
*/
uncompress((Bytef *)param->des, &pagesize,
(const Bytef *)param->compbuf, param->len);
}
param->start = false;
}
qemu_mutex_unlock(&param->mutex);
}
return NULL;
}
void migrate_decompress_threads_create(void)
{
int i, thread_count;
thread_count = migrate_decompress_threads();
decompress_threads = g_new0(QemuThread, thread_count);
decomp_param = g_new0(DecompressParam, thread_count);
compressed_data_buf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
quit_decomp_thread = false;
for (i = 0; i < thread_count; i++) {
qemu_mutex_init(&decomp_param[i].mutex);
qemu_cond_init(&decomp_param[i].cond);
decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
qemu_thread_create(decompress_threads + i, "decompress",
do_data_decompress, decomp_param + i,
QEMU_THREAD_JOINABLE);
}
}
void migrate_decompress_threads_join(void)
{
int i, thread_count;
quit_decomp_thread = true;
thread_count = migrate_decompress_threads();
for (i = 0; i < thread_count; i++) {
qemu_mutex_lock(&decomp_param[i].mutex);
qemu_cond_signal(&decomp_param[i].cond);
qemu_mutex_unlock(&decomp_param[i].mutex);
}
for (i = 0; i < thread_count; i++) {
qemu_thread_join(decompress_threads + i);
qemu_mutex_destroy(&decomp_param[i].mutex);
qemu_cond_destroy(&decomp_param[i].cond);
g_free(decomp_param[i].compbuf);
}
g_free(decompress_threads);
g_free(decomp_param);
g_free(compressed_data_buf);
decompress_threads = NULL;
decomp_param = NULL;
compressed_data_buf = NULL;
}
static void decompress_data_with_multi_threads(uint8_t *compbuf,
void *host, int len)
{
int idx, thread_count;
thread_count = migrate_decompress_threads();
while (true) {
for (idx = 0; idx < thread_count; idx++) {
if (!decomp_param[idx].start) {
memcpy(decomp_param[idx].compbuf, compbuf, len);
decomp_param[idx].des = host;
decomp_param[idx].len = len;
start_decompression(&decomp_param[idx]);
break;
}
}
if (idx < thread_count) {
break;
}
}
}
static int ram_load(QEMUFile *f, void *opaque, int version_id)
{
int flags = 0, ret = 0;
static uint64_t seq_iter;
int len = 0;
seq_iter++;
@@ -1644,23 +1196,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
}
qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
break;
case RAM_SAVE_FLAG_COMPRESS_PAGE:
host = host_from_stream_offset(f, addr, flags);
if (!host) {
error_report("Invalid RAM offset " RAM_ADDR_FMT, addr);
ret = -EINVAL;
break;
}
len = qemu_get_be32(f);
if (len < 0 || len > compressBound(TARGET_PAGE_SIZE)) {
error_report("Invalid compressed data length: %d", len);
ret = -EINVAL;
break;
}
qemu_get_buffer(f, compressed_data_buf, len);
decompress_data_with_multi_threads(compressed_data_buf, host, len);
break;
case RAM_SAVE_FLAG_XBZRLE:
host = host_from_stream_offset(f, addr, flags);
if (!host) {

10
async.c
View File

@@ -230,6 +230,7 @@ aio_ctx_finalize(GSource *source)
event_notifier_cleanup(&ctx->notifier);
rfifolock_destroy(&ctx->lock);
qemu_mutex_destroy(&ctx->bh_lock);
g_array_free(ctx->pollfds, TRUE);
timerlistgroup_deinit(&ctx->tlg);
}
@@ -280,6 +281,12 @@ static void aio_timerlist_notify(void *opaque)
aio_notify(opaque);
}
static void aio_rfifolock_cb(void *opaque)
{
/* Kick owner thread in case they are blocked in aio_poll() */
aio_notify(opaque);
}
AioContext *aio_context_new(Error **errp)
{
int ret;
@@ -295,9 +302,10 @@ AioContext *aio_context_new(Error **errp)
aio_set_event_notifier(ctx, &ctx->notifier,
(EventNotifierHandler *)
event_notifier_test_and_clear);
ctx->pollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
ctx->thread_pool = NULL;
qemu_mutex_init(&ctx->bh_lock);
rfifolock_init(&ctx->lock, NULL, NULL);
rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
return ctx;

View File

@@ -43,7 +43,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
return;
}
if (!fb->mem_path) {
error_setg(errp, "mem-path property not set");
error_setg(errp, "mem_path property not set");
return;
}
#ifndef CONFIG_LINUX

View File

@@ -96,20 +96,6 @@ bool tpm_backend_get_tpm_established_flag(TPMBackend *s)
return k->ops->get_tpm_established_flag(s);
}
int tpm_backend_reset_tpm_established_flag(TPMBackend *s, uint8_t locty)
{
TPMBackendClass *k = TPM_BACKEND_GET_CLASS(s);
return k->ops->reset_tpm_established_flag(s, locty);
}
TPMVersion tpm_backend_get_tpm_version(TPMBackend *s)
{
TPMBackendClass *k = TPM_BACKEND_GET_CLASS(s);
return k->ops->get_tpm_version(s);
}
static bool tpm_backend_prop_get_opened(Object *obj, Error **errp)
{
TPMBackend *s = TPM_BACKEND(obj);
@@ -179,6 +165,17 @@ void tpm_backend_thread_end(TPMBackendThread *tbt)
}
}
void tpm_backend_thread_tpm_reset(TPMBackendThread *tbt,
GFunc func, gpointer user_data)
{
if (!tbt->pool) {
tpm_backend_thread_create(tbt, func, user_data);
} else {
g_thread_pool_push(tbt->pool, (gpointer)TPM_BACKEND_CMD_TPM_RESET,
NULL);
}
}
static const TypeInfo tpm_backend_info = {
.name = TYPE_TPM_BACKEND,
.parent = TYPE_OBJECT,

View File

@@ -58,6 +58,7 @@ int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
/* We're already registered one balloon handler. How many can
* a guest really have?
*/
error_report("Another balloon device already registered");
return -1;
}
balloon_event_fn = event_func;

2866
block.c

File diff suppressed because it is too large Load Diff

View File

@@ -1,4 +1,4 @@
block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o bochs.o vpc.o vvfat.o
block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o
@@ -9,7 +9,7 @@ block-obj-y += block-backend.o snapshot.o qapi.o
block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
block-obj-$(CONFIG_POSIX) += raw-posix.o
block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
block-obj-y += null.o mirror.o io.o
block-obj-y += null.o mirror.o
block-obj-y += nbd.o nbd-client.o sheepdog.o
block-obj-$(CONFIG_LIBISCSI) += iscsi.o
@@ -19,6 +19,8 @@ block-obj-$(CONFIG_RBD) += rbd.o
block-obj-$(CONFIG_GLUSTERFS) += gluster.o
block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
block-obj-$(CONFIG_LIBSSH2) += ssh.o
block-obj-y += dictzip.o
block-obj-y += tar.o
block-obj-y += accounting.o
block-obj-y += write-threshold.o
@@ -37,7 +39,6 @@ gluster.o-libs := $(GLUSTERFS_LIBS)
ssh.o-cflags := $(LIBSSH2_CFLAGS)
ssh.o-libs := $(LIBSSH2_LIBS)
archipelago.o-libs := $(ARCHIPELAGO_LIBS)
block-obj-m += dmg.o
dmg.o-libs := $(BZIP2_LIBS)
qcow.o-libs := -lz
linux-aio.o-libs := -laio

View File

@@ -37,8 +37,6 @@ typedef struct CowRequest {
typedef struct BackupBlockJob {
BlockJob common;
BlockDriverState *target;
/* bitmap for sync=dirty-bitmap */
BdrvDirtyBitmap *sync_bitmap;
MirrorSyncMode sync_mode;
RateLimit limit;
BlockdevOnError on_source_error;
@@ -244,91 +242,6 @@ static void backup_complete(BlockJob *job, void *opaque)
g_free(data);
}
static bool coroutine_fn yield_and_check(BackupBlockJob *job)
{
if (block_job_is_cancelled(&job->common)) {
return true;
}
/* we need to yield so that bdrv_drain_all() returns.
* (without, VM does not reboot)
*/
if (job->common.speed) {
uint64_t delay_ns = ratelimit_calculate_delay(&job->limit,
job->sectors_read);
job->sectors_read = 0;
block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
} else {
block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, 0);
}
if (block_job_is_cancelled(&job->common)) {
return true;
}
return false;
}
static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
{
bool error_is_read;
int ret = 0;
int clusters_per_iter;
uint32_t granularity;
int64_t sector;
int64_t cluster;
int64_t end;
int64_t last_cluster = -1;
BlockDriverState *bs = job->common.bs;
HBitmapIter hbi;
granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
clusters_per_iter = MAX((granularity / BACKUP_CLUSTER_SIZE), 1);
bdrv_dirty_iter_init(job->sync_bitmap, &hbi);
/* Find the next dirty sector(s) */
while ((sector = hbitmap_iter_next(&hbi)) != -1) {
cluster = sector / BACKUP_SECTORS_PER_CLUSTER;
/* Fake progress updates for any clusters we skipped */
if (cluster != last_cluster + 1) {
job->common.offset += ((cluster - last_cluster - 1) *
BACKUP_CLUSTER_SIZE);
}
for (end = cluster + clusters_per_iter; cluster < end; cluster++) {
do {
if (yield_and_check(job)) {
return ret;
}
ret = backup_do_cow(bs, cluster * BACKUP_SECTORS_PER_CLUSTER,
BACKUP_SECTORS_PER_CLUSTER, &error_is_read);
if ((ret < 0) &&
backup_error_action(job, error_is_read, -ret) ==
BLOCK_ERROR_ACTION_REPORT) {
return ret;
}
} while (ret < 0);
}
/* If the bitmap granularity is smaller than the backup granularity,
* we need to advance the iterator pointer to the next cluster. */
if (granularity < BACKUP_CLUSTER_SIZE) {
bdrv_set_dirty_iter(&hbi, cluster * BACKUP_SECTORS_PER_CLUSTER);
}
last_cluster = cluster - 1;
}
/* Play some final catchup with the progress meter */
end = DIV_ROUND_UP(job->common.len, BACKUP_CLUSTER_SIZE);
if (last_cluster + 1 < end) {
job->common.offset += ((end - last_cluster - 1) * BACKUP_CLUSTER_SIZE);
}
return ret;
}
static void coroutine_fn backup_run(void *opaque)
{
BackupBlockJob *job = opaque;
@@ -346,7 +259,8 @@ static void coroutine_fn backup_run(void *opaque)
qemu_co_rwlock_init(&job->flush_rwlock);
start = 0;
end = DIV_ROUND_UP(job->common.len, BACKUP_CLUSTER_SIZE);
end = DIV_ROUND_UP(job->common.len / BDRV_SECTOR_SIZE,
BACKUP_SECTORS_PER_CLUSTER);
job->bitmap = hbitmap_alloc(end, 0);
@@ -364,13 +278,28 @@ static void coroutine_fn backup_run(void *opaque)
qemu_coroutine_yield();
job->common.busy = true;
}
} else if (job->sync_mode == MIRROR_SYNC_MODE_DIRTY_BITMAP) {
ret = backup_run_incremental(job);
} else {
/* Both FULL and TOP SYNC_MODE's require copying.. */
for (; start < end; start++) {
bool error_is_read;
if (yield_and_check(job)) {
if (block_job_is_cancelled(&job->common)) {
break;
}
/* we need to yield so that qemu_aio_flush() returns.
* (without, VM does not reboot)
*/
if (job->common.speed) {
uint64_t delay_ns = ratelimit_calculate_delay(
&job->limit, job->sectors_read);
job->sectors_read = 0;
block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
} else {
block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, 0);
}
if (block_job_is_cancelled(&job->common)) {
break;
}
@@ -428,18 +357,6 @@ static void coroutine_fn backup_run(void *opaque)
qemu_co_rwlock_wrlock(&job->flush_rwlock);
qemu_co_rwlock_unlock(&job->flush_rwlock);
if (job->sync_bitmap) {
BdrvDirtyBitmap *bm;
if (ret < 0) {
/* Merge the successor back into the parent, delete nothing. */
bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
assert(bm);
} else {
/* Everything is fine, delete this bitmap and install the backup. */
bm = bdrv_dirty_bitmap_abdicate(bs, job->sync_bitmap, NULL);
assert(bm);
}
}
hbitmap_free(job->bitmap);
bdrv_iostatus_disable(target);
@@ -452,7 +369,6 @@ static void coroutine_fn backup_run(void *opaque)
void backup_start(BlockDriverState *bs, BlockDriverState *target,
int64_t speed, MirrorSyncMode sync_mode,
BdrvDirtyBitmap *sync_bitmap,
BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
BlockCompletionFunc *cb, void *opaque,
@@ -496,36 +412,17 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
return;
}
if (sync_mode == MIRROR_SYNC_MODE_DIRTY_BITMAP) {
if (!sync_bitmap) {
error_setg(errp, "must provide a valid bitmap name for "
"\"dirty-bitmap\" sync mode");
return;
}
/* Create a new bitmap, and freeze/disable this one. */
if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) {
return;
}
} else if (sync_bitmap) {
error_setg(errp,
"a sync_bitmap was provided to backup_run, "
"but received an incompatible sync_mode (%s)",
MirrorSyncMode_lookup[sync_mode]);
return;
}
len = bdrv_getlength(bs);
if (len < 0) {
error_setg_errno(errp, -len, "unable to get length for '%s'",
bdrv_get_device_name(bs));
goto error;
return;
}
BackupBlockJob *job = block_job_create(&backup_job_driver, bs, speed,
cb, opaque, errp);
if (!job) {
goto error;
return;
}
bdrv_op_block_all(target, job->common.blocker);
@@ -534,15 +431,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
job->on_target_error = on_target_error;
job->target = target;
job->sync_mode = sync_mode;
job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_DIRTY_BITMAP ?
sync_bitmap : NULL;
job->common.len = len;
job->common.co = qemu_coroutine_create(backup_run);
qemu_coroutine_enter(job->common.co, job);
return;
error:
if (sync_bitmap) {
bdrv_reclaim_dirty_bitmap(bs, sync_bitmap, NULL);
}
}

View File

@@ -721,11 +721,6 @@ static int64_t blkdebug_getlength(BlockDriverState *bs)
return bdrv_getlength(bs->file);
}
static int blkdebug_truncate(BlockDriverState *bs, int64_t offset)
{
return bdrv_truncate(bs->file, offset);
}
static void blkdebug_refresh_filename(BlockDriverState *bs)
{
QDict *opts;
@@ -784,7 +779,6 @@ static BlockDriver bdrv_blkdebug = {
.bdrv_file_open = blkdebug_open,
.bdrv_close = blkdebug_close,
.bdrv_getlength = blkdebug_getlength,
.bdrv_truncate = blkdebug_truncate,
.bdrv_refresh_filename = blkdebug_refresh_filename,
.bdrv_aio_readv = blkdebug_aio_readv,

View File

@@ -515,17 +515,6 @@ int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
return bdrv_write(blk->bs, sector_num, buf, nb_sectors);
}
int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
int nb_sectors, BdrvRequestFlags flags)
{
int ret = blk_check_request(blk, sector_num, nb_sectors);
if (ret < 0) {
return ret;
}
return bdrv_write_zeroes(blk->bs, sector_num, nb_sectors, flags);
}
static void error_callback_bh(void *opaque)
{
struct BlockBackendAIOCB *acb = opaque;

584
block/dictzip.c Normal file
View File

@@ -0,0 +1,584 @@
/*
* DictZip Block driver for dictzip enabled gzip files
*
* Use the "dictzip" tool from the "dictd" package to create gzip files that
* contain the extra DictZip headers.
*
* dictzip(1) is a compression program which creates compressed files in the
* gzip format (see RFC 1952). However, unlike gzip(1), dictzip(1) compresses
* the file in pieces and stores an index to the pieces in the gzip header.
* This allows random access to the file at the granularity of the compressed
* pieces (currently about 64kB) while maintaining good compression ratios
* (within 5% of the expected ratio for dictionary data).
* dictd(8) uses files stored in this format.
*
* For details on DictZip see http://dict.org/.
*
* Copyright (c) 2009 Alexander Graf <agraf@suse.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "qemu-common.h"
#include "block/block_int.h"
#include <zlib.h>
// #define DEBUG
#ifdef DEBUG
#define dprintf(fmt, ...) do { printf("dzip: " fmt, ## __VA_ARGS__); } while (0)
#else
#define dprintf(fmt, ...) do { } while (0)
#endif
#define SECTOR_SIZE 512
#define Z_STREAM_COUNT 4
#define CACHE_COUNT 20
/* magic values */
#define GZ_MAGIC1 0x1f
#define GZ_MAGIC2 0x8b
#define DZ_MAGIC1 'R'
#define DZ_MAGIC2 'A'
#define GZ_FEXTRA 0x04 /* Optional field (random access index) */
#define GZ_FNAME 0x08 /* Original name */
#define GZ_COMMENT 0x10 /* Zero-terminated, human-readable comment */
#define GZ_FHCRC 0x02 /* Header CRC16 */
/* offsets */
#define GZ_ID 0 /* GZ_MAGIC (16bit) */
#define GZ_FLG 3 /* FLaGs (see above) */
#define GZ_XLEN 10 /* eXtra LENgth (16bit) */
#define GZ_SI 12 /* Subfield ID (16bit) */
#define GZ_VERSION 16 /* Version for subfield format */
#define GZ_CHUNKSIZE 18 /* Chunk size (16bit) */
#define GZ_CHUNKCNT 20 /* Number of chunks (16bit) */
#define GZ_RNDDATA 22 /* Random access data (16bit) */
#define GZ_99_CHUNKSIZE 18 /* Chunk size (32bit) */
#define GZ_99_CHUNKCNT 22 /* Number of chunks (32bit) */
#define GZ_99_FILESIZE 26 /* Size of unpacked file (64bit) */
#define GZ_99_RNDDATA 34 /* Random access data (32bit) */
struct BDRVDictZipState;
typedef struct DictZipAIOCB {
BlockAIOCB common;
struct BDRVDictZipState *s;
QEMUIOVector *qiov; /* QIOV of the original request */
QEMUIOVector *qiov_gz; /* QIOV of the gz subrequest */
QEMUBH *bh; /* BH for cache */
z_stream *zStream; /* stream to use for decoding */
int zStream_id; /* stream id of the above pointer */
size_t start; /* offset into the uncompressed file */
size_t len; /* uncompressed bytes to read */
uint8_t *gzipped; /* the gzipped data */
uint8_t *buf; /* cached result */
size_t gz_len; /* amount of gzip data */
size_t gz_start; /* uncompressed starting point of gzip data */
uint64_t offset; /* offset for "start" into the uncompressed chunk */
int chunks_len; /* amount of uncompressed data in all gzip data */
} DictZipAIOCB;
typedef struct dict_cache {
size_t start;
size_t len;
uint8_t *buf;
} DictCache;
typedef struct BDRVDictZipState {
BlockDriverState *hd;
z_stream zStream[Z_STREAM_COUNT];
DictCache cache[CACHE_COUNT];
int cache_index;
uint8_t stream_in_use;
uint64_t chunk_len;
uint32_t chunk_cnt;
uint16_t *chunks;
uint32_t *chunks32;
uint64_t *offsets;
int64_t file_len;
} BDRVDictZipState;
static int start_zStream(z_stream *zStream)
{
zStream->zalloc = NULL;
zStream->zfree = NULL;
zStream->opaque = NULL;
zStream->next_in = 0;
zStream->avail_in = 0;
zStream->next_out = NULL;
zStream->avail_out = 0;
return inflateInit2( zStream, -15 );
}
static QemuOptsList runtime_opts = {
.name = "dzip",
.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
.desc = {
{
.name = "filename",
.type = QEMU_OPT_STRING,
.help = "URL to the dictzip file",
},
{ /* end of list */ }
},
};
static int dictzip_open(BlockDriverState *bs, QDict *options, int flags, Error **errp)
{
BDRVDictZipState *s = bs->opaque;
const char *err = "Unknown (read error?)";
uint8_t magic[2];
char buf[100];
uint8_t header_flags;
uint16_t chunk_len16;
uint16_t chunk_cnt16;
uint32_t chunk_len32;
uint16_t header_ver;
uint16_t tmp_short;
uint64_t offset;
int chunks_len;
int headerLength = GZ_XLEN - 1;
int rnd_offs;
int ret;
int i;
QemuOpts *opts;
Error *local_err = NULL;
const char *filename;
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err != NULL) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
filename = qemu_opt_get(opts, "filename");
if (!strncmp(filename, "dzip://", 7))
filename += 7;
else if (!strncmp(filename, "dzip:", 5))
filename += 5;
ret = bdrv_open(&s->hd, filename, NULL, NULL, flags | BDRV_O_PROTOCOL, NULL, &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
qemu_opts_del(opts);
return ret;
}
/* initialize zlib streams */
for (i = 0; i < Z_STREAM_COUNT; i++) {
if (start_zStream( &s->zStream[i] ) != Z_OK) {
err = s->zStream[i].msg;
goto fail;
}
}
/* gzip header */
if (bdrv_pread(s->hd, GZ_ID, &magic, sizeof(magic)) != sizeof(magic))
goto fail;
if (!((magic[0] == GZ_MAGIC1) && (magic[1] == GZ_MAGIC2))) {
err = "No gzip file";
goto fail;
}
/* dzip header */
if (bdrv_pread(s->hd, GZ_FLG, &header_flags, 1) != 1)
goto fail;
if (!(header_flags & GZ_FEXTRA)) {
err = "Not a dictzip file (wrong flags)";
goto fail;
}
/* extra length */
if (bdrv_pread(s->hd, GZ_XLEN, &tmp_short, 2) != 2)
goto fail;
headerLength += le16_to_cpu(tmp_short) + 2;
/* DictZip magic */
if (bdrv_pread(s->hd, GZ_SI, &magic, 2) != 2)
goto fail;
if (magic[0] != DZ_MAGIC1 || magic[1] != DZ_MAGIC2) {
err = "Not a dictzip file (missing extra magic)";
goto fail;
}
/* DictZip version */
if (bdrv_pread(s->hd, GZ_VERSION, &header_ver, 2) != 2)
goto fail;
header_ver = le16_to_cpu(header_ver);
switch (header_ver) {
case 1: /* Normal DictZip */
/* number of chunks */
if (bdrv_pread(s->hd, GZ_CHUNKSIZE, &chunk_len16, 2) != 2)
goto fail;
s->chunk_len = le16_to_cpu(chunk_len16);
/* chunk count */
if (bdrv_pread(s->hd, GZ_CHUNKCNT, &chunk_cnt16, 2) != 2)
goto fail;
s->chunk_cnt = le16_to_cpu(chunk_cnt16);
chunks_len = sizeof(short) * s->chunk_cnt;
rnd_offs = GZ_RNDDATA;
break;
case 99: /* Special Alex pigz version */
/* number of chunks */
if (bdrv_pread(s->hd, GZ_99_CHUNKSIZE, &chunk_len32, 4) != 4)
goto fail;
dprintf("chunk len [%#x] = %d\n", GZ_99_CHUNKSIZE, chunk_len32);
s->chunk_len = le32_to_cpu(chunk_len32);
/* chunk count */
if (bdrv_pread(s->hd, GZ_99_CHUNKCNT, &s->chunk_cnt, 4) != 4)
goto fail;
s->chunk_cnt = le32_to_cpu(s->chunk_cnt);
dprintf("chunk len | count = %"PRId64" | %d\n", s->chunk_len, s->chunk_cnt);
/* file size */
if (bdrv_pread(s->hd, GZ_99_FILESIZE, &s->file_len, 8) != 8)
goto fail;
s->file_len = le64_to_cpu(s->file_len);
chunks_len = sizeof(int) * s->chunk_cnt;
rnd_offs = GZ_99_RNDDATA;
break;
default:
err = "Invalid DictZip version";
goto fail;
}
/* random access data */
s->chunks = g_malloc(chunks_len);
if (header_ver == 99)
s->chunks32 = (uint32_t *)s->chunks;
if (bdrv_pread(s->hd, rnd_offs, s->chunks, chunks_len) != chunks_len)
goto fail;
/* orig filename */
if (header_flags & GZ_FNAME) {
if (bdrv_pread(s->hd, headerLength + 1, buf, sizeof(buf)) != sizeof(buf))
goto fail;
buf[sizeof(buf) - 1] = '\0';
headerLength += strlen(buf) + 1;
if (strlen(buf) == sizeof(buf))
goto fail;
dprintf("filename: %s\n", buf);
}
/* comment field */
if (header_flags & GZ_COMMENT) {
if (bdrv_pread(s->hd, headerLength, buf, sizeof(buf)) != sizeof(buf))
goto fail;
buf[sizeof(buf) - 1] = '\0';
headerLength += strlen(buf) + 1;
if (strlen(buf) == sizeof(buf))
goto fail;
dprintf("comment: %s\n", buf);
}
if (header_flags & GZ_FHCRC)
headerLength += 2;
/* uncompressed file length*/
if (!s->file_len) {
uint32_t file_len;
if (bdrv_pread(s->hd, bdrv_getlength(s->hd) - 4, &file_len, 4) != 4)
goto fail;
s->file_len = le32_to_cpu(file_len);
}
/* compute offsets */
s->offsets = g_malloc(sizeof( *s->offsets ) * s->chunk_cnt);
for (offset = headerLength + 1, i = 0; i < s->chunk_cnt; i++) {
s->offsets[i] = offset;
switch (header_ver) {
case 1:
offset += le16_to_cpu(s->chunks[i]);
break;
case 99:
offset += le32_to_cpu(s->chunks32[i]);
break;
}
dprintf("chunk %#"PRIx64" - %#"PRIx64" = offset %#"PRIx64" -> %#"PRIx64"\n", i * s->chunk_len, (i+1) * s->chunk_len, s->offsets[i], offset);
}
qemu_opts_del(opts);
return 0;
fail:
fprintf(stderr, "DictZip: Error opening file: %s\n", err);
bdrv_unref(s->hd);
if (s->chunks)
g_free(s->chunks);
qemu_opts_del(opts);
return -EINVAL;
}
/* This callback gets invoked when we have the result in cache already */
static void dictzip_cache_cb(void *opaque)
{
DictZipAIOCB *acb = (DictZipAIOCB *)opaque;
qemu_iovec_from_buf(acb->qiov, 0, acb->buf, acb->len);
acb->common.cb(acb->common.opaque, 0);
qemu_bh_delete(acb->bh);
qemu_aio_unref(acb);
}
/* This callback gets invoked by the underlying block reader when we have
* all compressed data. We uncompress in here. */
static void dictzip_read_cb(void *opaque, int ret)
{
DictZipAIOCB *acb = (DictZipAIOCB *)opaque;
struct BDRVDictZipState *s = acb->s;
uint8_t *buf;
DictCache *cache;
int r, i;
buf = g_malloc(acb->chunks_len);
/* try to find zlib stream for decoding */
do {
for (i = 0; i < Z_STREAM_COUNT; i++) {
if (!(s->stream_in_use & (1 << i))) {
s->stream_in_use |= (1 << i);
acb->zStream_id = i;
acb->zStream = &s->zStream[i];
break;
}
}
} while(!acb->zStream);
/* sure, we could handle more streams, but this callback should be single
threaded and when it's not, we really want to know! */
assert(i == 0);
/* uncompress the chunk */
acb->zStream->next_in = acb->gzipped;
acb->zStream->avail_in = acb->gz_len;
acb->zStream->next_out = buf;
acb->zStream->avail_out = acb->chunks_len;
r = inflate( acb->zStream, Z_PARTIAL_FLUSH );
if ( (r != Z_OK) && (r != Z_STREAM_END) )
fprintf(stderr, "Error inflating: [%d] %s\n", r, acb->zStream->msg);
if ( r == Z_STREAM_END )
inflateReset(acb->zStream);
dprintf("inflating [%d] left: %d | %d bytes\n", r, acb->zStream->avail_in, acb->zStream->avail_out);
s->stream_in_use &= ~(1 << acb->zStream_id);
/* nofity the caller */
qemu_iovec_from_buf(acb->qiov, 0, buf + acb->offset, acb->len);
acb->common.cb(acb->common.opaque, 0);
/* fill the cache */
cache = &s->cache[s->cache_index];
s->cache_index++;
if (s->cache_index == CACHE_COUNT)
s->cache_index = 0;
cache->len = 0;
if (cache->buf)
g_free(cache->buf);
cache->start = acb->gz_start;
cache->buf = buf;
cache->len = acb->chunks_len;
/* free occupied ressources */
g_free(acb->qiov_gz);
qemu_aio_unref(acb);
}
static const AIOCBInfo dictzip_aiocb_info = {
.aiocb_size = sizeof(DictZipAIOCB),
};
/* This is where we get a request from a caller to read something */
static BlockAIOCB *dictzip_aio_readv(BlockDriverState *bs,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque)
{
BDRVDictZipState *s = bs->opaque;
DictZipAIOCB *acb;
QEMUIOVector *qiov_gz;
struct iovec *iov;
uint8_t *buf;
size_t start = sector_num * SECTOR_SIZE;
size_t len = nb_sectors * SECTOR_SIZE;
size_t end = start + len;
size_t gz_start;
size_t gz_len;
int64_t gz_sector_num;
int gz_nb_sectors;
int first_chunk, last_chunk;
int first_offset;
int i;
acb = qemu_aio_get(&dictzip_aiocb_info, bs, cb, opaque);
if (!acb)
return NULL;
/* Search Cache */
for (i = 0; i < CACHE_COUNT; i++) {
if (!s->cache[i].len)
continue;
if ((start >= s->cache[i].start) &&
(end <= (s->cache[i].start + s->cache[i].len))) {
acb->buf = s->cache[i].buf + (start - s->cache[i].start);
acb->len = len;
acb->qiov = qiov;
acb->bh = qemu_bh_new(dictzip_cache_cb, acb);
qemu_bh_schedule(acb->bh);
return &acb->common;
}
}
/* No cache, so let's decode */
/* We need to read these chunks */
first_chunk = start / s->chunk_len;
first_offset = start - first_chunk * s->chunk_len;
last_chunk = end / s->chunk_len;
gz_start = s->offsets[first_chunk];
gz_len = 0;
for (i = first_chunk; i <= last_chunk; i++) {
if (s->chunks32)
gz_len += le32_to_cpu(s->chunks32[i]);
else
gz_len += le16_to_cpu(s->chunks[i]);
}
gz_sector_num = gz_start / SECTOR_SIZE;
gz_nb_sectors = (gz_len / SECTOR_SIZE);
/* account for tail and heads */
while ((gz_start + gz_len) > ((gz_sector_num + gz_nb_sectors) * SECTOR_SIZE))
gz_nb_sectors++;
/* Allocate qiov, iov and buf in one chunk so we only need to free qiov */
qiov_gz = g_malloc0(sizeof(QEMUIOVector) + sizeof(struct iovec) +
(gz_nb_sectors * SECTOR_SIZE));
iov = (struct iovec *)(((char *)qiov_gz) + sizeof(QEMUIOVector));
buf = ((uint8_t *)iov) + sizeof(struct iovec *);
/* Kick off the read by the backing file, so we can start decompressing */
iov->iov_base = (void *)buf;
iov->iov_len = gz_nb_sectors * 512;
qemu_iovec_init_external(qiov_gz, iov, 1);
dprintf("read %zd - %zd => %zd - %zd\n", start, end, gz_start, gz_start + gz_len);
acb->s = s;
acb->qiov = qiov;
acb->qiov_gz = qiov_gz;
acb->start = start;
acb->len = len;
acb->gzipped = buf + (gz_start % SECTOR_SIZE);
acb->gz_len = gz_len;
acb->gz_start = first_chunk * s->chunk_len;
acb->offset = first_offset;
acb->chunks_len = (last_chunk - first_chunk + 1) * s->chunk_len;
return bdrv_aio_readv(s->hd, gz_sector_num, qiov_gz, gz_nb_sectors,
dictzip_read_cb, acb);
}
static void dictzip_close(BlockDriverState *bs)
{
BDRVDictZipState *s = bs->opaque;
int i;
for (i = 0; i < CACHE_COUNT; i++) {
if (!s->cache[i].len)
continue;
g_free(s->cache[i].buf);
}
for (i = 0; i < Z_STREAM_COUNT; i++) {
inflateEnd(&s->zStream[i]);
}
if (s->chunks)
g_free(s->chunks);
if (s->offsets)
g_free(s->offsets);
dprintf("Close\n");
}
static int64_t dictzip_getlength(BlockDriverState *bs)
{
BDRVDictZipState *s = bs->opaque;
dprintf("getlength -> %ld\n", s->file_len);
return s->file_len;
}
static BlockDriver bdrv_dictzip = {
.format_name = "dzip",
.protocol_name = "dzip",
.instance_size = sizeof(BDRVDictZipState),
.bdrv_file_open = dictzip_open,
.bdrv_close = dictzip_close,
.bdrv_getlength = dictzip_getlength,
.bdrv_aio_readv = dictzip_aio_readv,
};
static void dictzip_block_init(void)
{
bdrv_register(&bdrv_dictzip);
}
block_init(dictzip_block_init);

2603
block/io.c

File diff suppressed because it is too large Load Diff

View File

@@ -2,7 +2,7 @@
* QEMU Block driver for iSCSI images
*
* Copyright (c) 2010-2011 Ronnie Sahlberg <ronniesahlberg@gmail.com>
* Copyright (c) 2012-2015 Peter Lieven <pl@kamp.de>
* Copyright (c) 2012-2014 Peter Lieven <pl@kamp.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
@@ -57,6 +57,9 @@ typedef struct IscsiLun {
int events;
QEMUTimer *nop_timer;
QEMUTimer *event_timer;
uint8_t lbpme;
uint8_t lbprz;
uint8_t has_write_same;
struct scsi_inquiry_logical_block_provisioning lbp;
struct scsi_inquiry_block_limits bl;
unsigned char *zeroblock;
@@ -64,11 +67,6 @@ typedef struct IscsiLun {
int cluster_sectors;
bool use_16_for_rw;
bool write_protected;
bool lbpme;
bool lbprz;
bool dpofua;
bool has_write_same;
bool force_next_flush;
} IscsiLun;
typedef struct IscsiTask {
@@ -81,7 +79,6 @@ typedef struct IscsiTask {
QEMUBH *bh;
IscsiLun *iscsilun;
QEMUTimer retry_timer;
bool force_next_flush;
} IscsiTask;
typedef struct IscsiAIOCB {
@@ -103,7 +100,7 @@ typedef struct IscsiAIOCB {
#define NOP_INTERVAL 5000
#define MAX_NOP_FAILURES 3
#define ISCSI_CMD_RETRIES ARRAY_SIZE(iscsi_retry_times)
static const unsigned iscsi_retry_times[] = {8, 32, 128, 512, 2048, 8192, 32768};
static const unsigned iscsi_retry_times[] = {8, 32, 128, 512, 2048};
/* this threshold is a trade-off knob to choose between
* the potential additional overhead of an extra GET_LBA_STATUS request
@@ -186,13 +183,10 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
iTask->do_retry = 1;
goto out;
}
/* status 0x28 is SCSI_TASK_SET_FULL. It was first introduced
* in libiscsi 1.10.0. Hardcode this value here to avoid
* the need to bump the libiscsi requirement to 1.10.0 */
if (status == SCSI_STATUS_BUSY || status == 0x28) {
if (status == SCSI_STATUS_BUSY) {
unsigned retry_time =
exp_random(iscsi_retry_times[iTask->retries - 1]);
error_report("iSCSI Busy/TaskSetFull (retry #%u in %u ms): %s",
error_report("iSCSI Busy (retry #%u in %u ms): %s",
iTask->retries, retry_time,
iscsi_get_error(iscsi));
aio_timer_init(iTask->iscsilun->aio_context,
@@ -205,8 +199,6 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
}
}
error_report("iSCSI Failure: %s", iscsi_get_error(iscsi));
} else {
iTask->iscsilun->force_next_flush |= iTask->force_next_flush;
}
out:
@@ -377,7 +369,6 @@ static int coroutine_fn iscsi_co_writev(BlockDriverState *bs,
struct IscsiTask iTask;
uint64_t lba;
uint32_t num_sectors;
int fua;
if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
return -EINVAL;
@@ -393,17 +384,15 @@ static int coroutine_fn iscsi_co_writev(BlockDriverState *bs,
num_sectors = sector_qemu2lun(nb_sectors, iscsilun);
iscsi_co_init_iscsitask(iscsilun, &iTask);
retry:
fua = iscsilun->dpofua && !bs->enable_write_cache;
iTask.force_next_flush = !fua;
if (iscsilun->use_16_for_rw) {
iTask.task = iscsi_write16_task(iscsilun->iscsi, iscsilun->lun, lba,
NULL, num_sectors * iscsilun->block_size,
iscsilun->block_size, 0, 0, fua, 0, 0,
iscsilun->block_size, 0, 0, 0, 0, 0,
iscsi_co_generic_cb, &iTask);
} else {
iTask.task = iscsi_write10_task(iscsilun->iscsi, iscsilun->lun, lba,
NULL, num_sectors * iscsilun->block_size,
iscsilun->block_size, 0, 0, fua, 0, 0,
iscsilun->block_size, 0, 0, 0, 0, 0,
iscsi_co_generic_cb, &iTask);
}
if (iTask.task == NULL) {
@@ -471,7 +460,7 @@ static int64_t coroutine_fn iscsi_co_get_block_status(BlockDriverState *bs,
*pnum = nb_sectors;
/* LUN does not support logical block provisioning */
if (!iscsilun->lbpme) {
if (iscsilun->lbpme == 0) {
goto out;
}
@@ -631,12 +620,8 @@ static int coroutine_fn iscsi_co_flush(BlockDriverState *bs)
return 0;
}
if (!iscsilun->force_next_flush) {
return 0;
}
iscsilun->force_next_flush = false;
iscsi_co_init_iscsitask(iscsilun, &iTask);
retry:
if (iscsi_synchronizecache10_task(iscsilun->iscsi, iscsilun->lun, 0, 0, 0,
0, iscsi_co_generic_cb, &iTask) == NULL) {
@@ -932,7 +917,6 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
}
iscsi_co_init_iscsitask(iscsilun, &iTask);
iTask.force_next_flush = true;
retry:
if (use_16_for_ws) {
iTask.task = iscsi_writesame16_task(iscsilun->iscsi, iscsilun->lun, lba,
@@ -1137,8 +1121,8 @@ static void iscsi_readcapacity_sync(IscsiLun *iscsilun, Error **errp)
} else {
iscsilun->block_size = rc16->block_length;
iscsilun->num_blocks = rc16->returned_lba + 1;
iscsilun->lbpme = !!rc16->lbpme;
iscsilun->lbprz = !!rc16->lbprz;
iscsilun->lbpme = rc16->lbpme;
iscsilun->lbprz = rc16->lbprz;
iscsilun->use_16_for_rw = (rc16->returned_lba > 0xffffffff);
}
}
@@ -1269,12 +1253,11 @@ static void iscsi_attach_aio_context(BlockDriverState *bs,
iscsi_timed_set_events, iscsilun);
}
static void iscsi_modesense_sync(IscsiLun *iscsilun)
static bool iscsi_is_write_protected(IscsiLun *iscsilun)
{
struct scsi_task *task;
struct scsi_mode_sense *ms = NULL;
iscsilun->write_protected = false;
iscsilun->dpofua = false;
bool wrprotected = false;
task = iscsi_modesense6_sync(iscsilun->iscsi, iscsilun->lun,
1, SCSI_MODESENSE_PC_CURRENT,
@@ -1295,13 +1278,13 @@ static void iscsi_modesense_sync(IscsiLun *iscsilun)
iscsi_get_error(iscsilun->iscsi));
goto out;
}
iscsilun->write_protected = ms->device_specific_parameter & 0x80;
iscsilun->dpofua = ms->device_specific_parameter & 0x10;
wrprotected = ms->device_specific_parameter & 0x80;
out:
if (task) {
scsi_free_scsi_task(task);
}
return wrprotected;
}
/*
@@ -1420,8 +1403,7 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
scsi_free_scsi_task(task);
task = NULL;
iscsi_modesense_sync(iscsilun);
iscsilun->write_protected = iscsi_is_write_protected(iscsilun);
/* Check the write protect flag of the LUN if we want to write */
if (iscsilun->type == TYPE_DISK && (flags & BDRV_O_RDWR) &&
iscsilun->write_protected) {
@@ -1499,7 +1481,7 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
iscsilun->bl.opt_unmap_gran * iscsilun->block_size <= 16 * 1024 * 1024) {
iscsilun->cluster_sectors = (iscsilun->bl.opt_unmap_gran *
iscsilun->block_size) >> BDRV_SECTOR_BITS;
if (iscsilun->lbprz) {
if (iscsilun->lbprz && !(bs->open_flags & BDRV_O_NOCACHE)) {
iscsilun->allocationmap = iscsi_allocationmap_init(iscsilun);
if (iscsilun->allocationmap == NULL) {
ret = -ENOMEM;
@@ -1673,7 +1655,7 @@ out:
static int iscsi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
{
IscsiLun *iscsilun = bs->opaque;
bdi->unallocated_blocks_are_zero = iscsilun->lbprz;
bdi->unallocated_blocks_are_zero = !!iscsilun->lbprz;
bdi->can_write_zeroes_with_unmap = iscsilun->lbprz && iscsilun->lbp.lbpws;
bdi->cluster_size = iscsilun->cluster_sectors * BDRV_SECTOR_SIZE;
return 0;

View File

@@ -57,6 +57,7 @@ typedef struct MirrorBlockJob {
int in_flight;
int sectors_in_flight;
int ret;
bool unmap;
} MirrorBlockJob;
typedef struct MirrorOp {
@@ -125,9 +126,11 @@ static void mirror_write_complete(void *opaque, int ret)
MirrorOp *op = opaque;
MirrorBlockJob *s = op->s;
if (ret < 0) {
BlockDriverState *source = s->common.bs;
BlockErrorAction action;
bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, op->nb_sectors);
bdrv_set_dirty_bitmap(source, s->dirty_bitmap, op->sector_num,
op->nb_sectors);
action = mirror_error_action(s, false, -ret);
if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
s->ret = ret;
@@ -141,9 +144,11 @@ static void mirror_read_complete(void *opaque, int ret)
MirrorOp *op = opaque;
MirrorBlockJob *s = op->s;
if (ret < 0) {
BlockDriverState *source = s->common.bs;
BlockErrorAction action;
bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, op->nb_sectors);
bdrv_set_dirty_bitmap(source, s->dirty_bitmap, op->sector_num,
op->nb_sectors);
action = mirror_error_action(s, true, -ret);
if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
s->ret = ret;
@@ -163,12 +168,15 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
uint64_t delay_ns = 0;
MirrorOp *op;
int pnum;
int64_t ret;
s->sector_num = hbitmap_iter_next(&s->hbi);
if (s->sector_num < 0) {
bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
bdrv_dirty_iter_init(source, s->dirty_bitmap, &s->hbi);
s->sector_num = hbitmap_iter_next(&s->hbi);
trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
trace_mirror_restart_iter(s,
bdrv_get_dirty_count(source, s->dirty_bitmap));
assert(s->sector_num >= 0);
}
@@ -283,14 +291,29 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
next_sector += sectors_per_chunk;
}
bdrv_reset_dirty_bitmap(s->dirty_bitmap, sector_num, nb_sectors);
bdrv_reset_dirty_bitmap(source, s->dirty_bitmap, sector_num,
nb_sectors);
/* Copy the dirty cluster. */
s->in_flight++;
s->sectors_in_flight += nb_sectors;
trace_mirror_one_iteration(s, sector_num, nb_sectors);
bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
mirror_read_complete, op);
ret = bdrv_get_block_status_above(source, NULL, sector_num,
nb_sectors, &pnum);
if (ret < 0 || pnum < nb_sectors ||
(ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
mirror_read_complete, op);
} else if (ret & BDRV_BLOCK_ZERO) {
bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
mirror_write_complete, op);
} else {
assert(!(ret & BDRV_BLOCK_DATA));
bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
mirror_write_complete, op);
}
return delay_ns;
}
@@ -440,7 +463,7 @@ static void coroutine_fn mirror_run(void *opaque)
assert(n > 0);
if (ret == 1) {
bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
bdrv_set_dirty_bitmap(bs, s->dirty_bitmap, sector_num, n);
sector_num = next;
} else {
sector_num += n;
@@ -448,7 +471,7 @@ static void coroutine_fn mirror_run(void *opaque)
}
}
bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
bdrv_dirty_iter_init(bs, s->dirty_bitmap, &s->hbi);
last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
for (;;) {
uint64_t delay_ns = 0;
@@ -460,7 +483,7 @@ static void coroutine_fn mirror_run(void *opaque)
goto immediate_exit;
}
cnt = bdrv_get_dirty_count(s->dirty_bitmap);
cnt = bdrv_get_dirty_count(bs, s->dirty_bitmap);
/* s->common.offset contains the number of bytes already processed so
* far, cnt is the number of dirty sectors remaining and
* s->sectors_in_flight is the number of sectors currently being
@@ -469,7 +492,7 @@ static void coroutine_fn mirror_run(void *opaque)
(cnt + s->sectors_in_flight) * BDRV_SECTOR_SIZE;
/* Note that even when no rate limit is applied we need to yield
* periodically with no pending I/O so that bdrv_drain_all() returns.
* periodically with no pending I/O so that qemu_aio_flush() returns.
* We do so every SLICE_TIME nanoseconds, or when there is an error,
* or when the source is clean, whichever comes first.
*/
@@ -482,6 +505,9 @@ static void coroutine_fn mirror_run(void *opaque)
continue;
} else if (cnt != 0) {
delay_ns = mirror_iteration(s);
if (delay_ns == 0) {
continue;
}
}
}
@@ -507,7 +533,7 @@ static void coroutine_fn mirror_run(void *opaque)
should_complete = s->should_complete ||
block_job_is_cancelled(&s->common);
cnt = bdrv_get_dirty_count(s->dirty_bitmap);
cnt = bdrv_get_dirty_count(bs, s->dirty_bitmap);
}
}
@@ -522,7 +548,7 @@ static void coroutine_fn mirror_run(void *opaque)
*/
trace_mirror_before_drain(s, cnt);
bdrv_drain(bs);
cnt = bdrv_get_dirty_count(s->dirty_bitmap);
cnt = bdrv_get_dirty_count(bs, s->dirty_bitmap);
}
ret = 0;
@@ -625,7 +651,7 @@ static void mirror_complete(BlockJob *job, Error **errp)
}
s->should_complete = true;
block_job_enter(&s->common);
block_job_resume(job);
}
static const BlockJobDriver mirror_job_driver = {
@@ -647,10 +673,11 @@ static const BlockJobDriver commit_active_job_driver = {
static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
const char *replaces,
int64_t speed, uint32_t granularity,
int64_t speed, int64_t granularity,
int64_t buf_size,
BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
bool unmap,
BlockCompletionFunc *cb,
void *opaque, Error **errp,
const BlockJobDriver *driver,
@@ -659,7 +686,15 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
MirrorBlockJob *s;
if (granularity == 0) {
granularity = bdrv_get_default_bitmap_granularity(target);
/* Choose the default granularity based on the target file's cluster
* size, clamped between 4k and 64k. */
BlockDriverInfo bdi;
if (bdrv_get_info(target, &bdi) >= 0 && bdi.cluster_size != 0) {
granularity = MAX(4096, bdi.cluster_size);
granularity = MIN(65536, granularity);
} else {
granularity = 65536;
}
}
assert ((granularity & (granularity - 1)) == 0);
@@ -685,8 +720,9 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
s->base = base;
s->granularity = granularity;
s->buf_size = MAX(buf_size, granularity);
s->unmap = unmap;
s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, errp);
if (!s->dirty_bitmap) {
return;
}
@@ -700,24 +736,21 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
void mirror_start(BlockDriverState *bs, BlockDriverState *target,
const char *replaces,
int64_t speed, uint32_t granularity, int64_t buf_size,
int64_t speed, int64_t granularity, int64_t buf_size,
MirrorSyncMode mode, BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
bool unmap,
BlockCompletionFunc *cb,
void *opaque, Error **errp)
{
bool is_none_mode;
BlockDriverState *base;
if (mode == MIRROR_SYNC_MODE_DIRTY_BITMAP) {
error_setg(errp, "Sync mode 'dirty-bitmap' not supported");
return;
}
is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
base = mode == MIRROR_SYNC_MODE_TOP ? bs->backing_hd : NULL;
mirror_start_job(bs, target, replaces,
speed, granularity, buf_size,
on_source_error, on_target_error, cb, opaque, errp,
on_source_error, on_target_error, unmap, cb, opaque, errp,
&mirror_job_driver, is_none_mode, base);
}
@@ -765,7 +798,7 @@ void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
bdrv_ref(base);
mirror_start_job(bs, base, NULL, speed, 0, 0,
on_error, on_error, cb, opaque, &local_err,
on_error, on_error, false, cb, opaque, &local_err,
&commit_active_job_driver, false, base);
if (local_err) {
error_propagate(errp, local_err);

View File

@@ -35,6 +35,8 @@
#include "sysemu/sysemu.h"
#include <nfsc/libnfs.h>
#define QEMU_NFS_MAX_READAHEAD_SIZE 1048576
typedef struct NFSClient {
struct nfs_context *context;
struct nfsfh *fh;
@@ -327,6 +329,11 @@ static int64_t nfs_client_open(NFSClient *client, const char *filename,
nfs_set_tcp_syncnt(client->context, val);
#ifdef LIBNFS_FEATURE_READAHEAD
} else if (!strcmp(qp->p[i].name, "readahead")) {
if (val > QEMU_NFS_MAX_READAHEAD_SIZE) {
error_report("NFS Warning: Truncating NFS readahead"
" size to %d", QEMU_NFS_MAX_READAHEAD_SIZE);
val = QEMU_NFS_MAX_READAHEAD_SIZE;
}
nfs_set_readahead(client->context, val);
#endif
} else {

View File

@@ -12,11 +12,8 @@
#include "block/block_int.h"
#define NULL_OPT_LATENCY "latency-ns"
typedef struct {
int64_t length;
int64_t latency_ns;
} BDRVNullState;
static QemuOptsList runtime_opts = {
@@ -33,12 +30,6 @@ static QemuOptsList runtime_opts = {
.type = QEMU_OPT_SIZE,
.help = "size of the null block",
},
{
.name = NULL_OPT_LATENCY,
.type = QEMU_OPT_NUMBER,
.help = "nanoseconds (approximated) to wait "
"before completing request",
},
{ /* end of list */ }
},
};
@@ -48,20 +39,13 @@ static int null_file_open(BlockDriverState *bs, QDict *options, int flags,
{
QemuOpts *opts;
BDRVNullState *s = bs->opaque;
int ret = 0;
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &error_abort);
s->length =
qemu_opt_get_size(opts, BLOCK_OPT_SIZE, 1 << 30);
s->latency_ns =
qemu_opt_get_number(opts, NULL_OPT_LATENCY, 0);
if (s->latency_ns < 0) {
error_setg(errp, "latency-ns is invalid");
ret = -EINVAL;
}
qemu_opts_del(opts);
return ret;
return 0;
}
static void null_close(BlockDriverState *bs)
@@ -74,40 +58,28 @@ static int64_t null_getlength(BlockDriverState *bs)
return s->length;
}
static coroutine_fn int null_co_common(BlockDriverState *bs)
{
BDRVNullState *s = bs->opaque;
if (s->latency_ns) {
co_aio_sleep_ns(bdrv_get_aio_context(bs), QEMU_CLOCK_REALTIME,
s->latency_ns);
}
return 0;
}
static coroutine_fn int null_co_readv(BlockDriverState *bs,
int64_t sector_num, int nb_sectors,
QEMUIOVector *qiov)
{
return null_co_common(bs);
return 0;
}
static coroutine_fn int null_co_writev(BlockDriverState *bs,
int64_t sector_num, int nb_sectors,
QEMUIOVector *qiov)
{
return null_co_common(bs);
return 0;
}
static coroutine_fn int null_co_flush(BlockDriverState *bs)
{
return null_co_common(bs);
return 0;
}
typedef struct {
BlockAIOCB common;
QEMUBH *bh;
QEMUTimer timer;
} NullAIOCB;
static const AIOCBInfo null_aiocb_info = {
@@ -122,33 +94,15 @@ static void null_bh_cb(void *opaque)
qemu_aio_unref(acb);
}
static void null_timer_cb(void *opaque)
{
NullAIOCB *acb = opaque;
acb->common.cb(acb->common.opaque, 0);
timer_deinit(&acb->timer);
qemu_aio_unref(acb);
}
static inline BlockAIOCB *null_aio_common(BlockDriverState *bs,
BlockCompletionFunc *cb,
void *opaque)
{
NullAIOCB *acb;
BDRVNullState *s = bs->opaque;
acb = qemu_aio_get(&null_aiocb_info, bs, cb, opaque);
/* Only emulate latency after vcpu is running. */
if (s->latency_ns) {
aio_timer_init(bdrv_get_aio_context(bs), &acb->timer,
QEMU_CLOCK_REALTIME, SCALE_NS,
null_timer_cb, acb);
timer_mod_ns(&acb->timer,
qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + s->latency_ns);
} else {
acb->bh = aio_bh_new(bdrv_get_aio_context(bs), null_bh_cb, acb);
qemu_bh_schedule(acb->bh);
}
acb->bh = aio_bh_new(bdrv_get_aio_context(bs), null_bh_cb, acb);
qemu_bh_schedule(acb->bh);
return &acb->common;
}
@@ -177,12 +131,6 @@ static BlockAIOCB *null_aio_flush(BlockDriverState *bs,
return null_aio_common(bs, cb, opaque);
}
static int null_reopen_prepare(BDRVReopenState *reopen_state,
BlockReopenQueue *queue, Error **errp)
{
return 0;
}
static BlockDriver bdrv_null_co = {
.format_name = "null-co",
.protocol_name = "null-co",
@@ -195,7 +143,6 @@ static BlockDriver bdrv_null_co = {
.bdrv_co_readv = null_co_readv,
.bdrv_co_writev = null_co_writev,
.bdrv_co_flush_to_disk = null_co_flush,
.bdrv_reopen_prepare = null_reopen_prepare,
};
static BlockDriver bdrv_null_aio = {
@@ -210,7 +157,6 @@ static BlockDriver bdrv_null_aio = {
.bdrv_aio_readv = null_aio_readv,
.bdrv_aio_writev = null_aio_writev,
.bdrv_aio_flush = null_aio_flush,
.bdrv_reopen_prepare = null_reopen_prepare,
};
static void bdrv_null_init(void)

View File

@@ -2,12 +2,8 @@
* Block driver for Parallels disk image format
*
* Copyright (c) 2007 Alex Beregszaszi
* Copyright (c) 2015 Denis V. Lunev <den@openvz.org>
*
* This code was originally based on comparing different disk images created
* by Parallels. Currently it is based on opened OpenVZ sources
* available at
* http://git.openvz.org/?p=ploop;a=summary
* This code is based on comparing different disk images created by Parallels.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
@@ -30,539 +26,63 @@
#include "qemu-common.h"
#include "block/block_int.h"
#include "qemu/module.h"
#include "qemu/bitmap.h"
#include "qapi/util.h"
/**************************************************************/
#define HEADER_MAGIC "WithoutFreeSpace"
#define HEADER_MAGIC2 "WithouFreSpacExt"
#define HEADER_VERSION 2
#define HEADER_INUSE_MAGIC (0x746F6E59)
#define DEFAULT_CLUSTER_SIZE 1048576 /* 1 MiB */
#define HEADER_SIZE 64
// always little-endian
typedef struct ParallelsHeader {
struct parallels_header {
char magic[16]; // "WithoutFreeSpace"
uint32_t version;
uint32_t heads;
uint32_t cylinders;
uint32_t tracks;
uint32_t bat_entries;
uint32_t catalog_entries;
uint64_t nb_sectors;
uint32_t inuse;
uint32_t data_off;
char padding[12];
} QEMU_PACKED ParallelsHeader;
typedef enum ParallelsPreallocMode {
PRL_PREALLOC_MODE_FALLOCATE = 0,
PRL_PREALLOC_MODE_TRUNCATE = 1,
PRL_PREALLOC_MODE_MAX = 2,
} ParallelsPreallocMode;
static const char *prealloc_mode_lookup[] = {
"falloc",
"truncate",
NULL,
};
} QEMU_PACKED;
typedef struct BDRVParallelsState {
/** Locking is conservative, the lock protects
* - image file extending (truncate, fallocate)
* - any access to block allocation table
*/
CoMutex lock;
ParallelsHeader *header;
uint32_t header_size;
bool header_unclean;
unsigned long *bat_dirty_bmap;
unsigned int bat_dirty_block;
uint32_t *bat_bitmap;
unsigned int bat_size;
int64_t data_end;
uint64_t prealloc_size;
ParallelsPreallocMode prealloc_mode;
uint32_t *catalog_bitmap;
unsigned int catalog_size;
unsigned int tracks;
unsigned int off_multiplier;
} BDRVParallelsState;
#define PARALLELS_OPT_PREALLOC_MODE "prealloc-mode"
#define PARALLELS_OPT_PREALLOC_SIZE "prealloc-size"
static QemuOptsList parallels_runtime_opts = {
.name = "parallels",
.head = QTAILQ_HEAD_INITIALIZER(parallels_runtime_opts.head),
.desc = {
{
.name = PARALLELS_OPT_PREALLOC_SIZE,
.type = QEMU_OPT_SIZE,
.help = "Preallocation size on image expansion",
.def_value_str = "128MiB",
},
{
.name = PARALLELS_OPT_PREALLOC_MODE,
.type = QEMU_OPT_STRING,
.help = "Preallocation mode on image expansion "
"(allowed values: falloc, truncate)",
.def_value_str = "falloc",
},
{ /* end of list */ },
},
};
static int64_t bat2sect(BDRVParallelsState *s, uint32_t idx)
static int parallels_probe(const uint8_t *buf, int buf_size, const char *filename)
{
return (uint64_t)le32_to_cpu(s->bat_bitmap[idx]) * s->off_multiplier;
}
const struct parallels_header *ph = (const void *)buf;
static uint32_t bat_entry_off(uint32_t idx)
{
return sizeof(ParallelsHeader) + sizeof(uint32_t) * idx;
}
static int64_t seek_to_sector(BDRVParallelsState *s, int64_t sector_num)
{
uint32_t index, offset;
index = sector_num / s->tracks;
offset = sector_num % s->tracks;
/* not allocated */
if ((index >= s->bat_size) || (s->bat_bitmap[index] == 0)) {
return -1;
}
return bat2sect(s, index) + offset;
}
static int cluster_remainder(BDRVParallelsState *s, int64_t sector_num,
int nb_sectors)
{
int ret = s->tracks - sector_num % s->tracks;
return MIN(nb_sectors, ret);
}
static int64_t block_status(BDRVParallelsState *s, int64_t sector_num,
int nb_sectors, int *pnum)
{
int64_t start_off = -2, prev_end_off = -2;
*pnum = 0;
while (nb_sectors > 0 || start_off == -2) {
int64_t offset = seek_to_sector(s, sector_num);
int to_end;
if (start_off == -2) {
start_off = offset;
prev_end_off = offset;
} else if (offset != prev_end_off) {
break;
}
to_end = cluster_remainder(s, sector_num, nb_sectors);
nb_sectors -= to_end;
sector_num += to_end;
*pnum += to_end;
if (offset > 0) {
prev_end_off += to_end;
}
}
return start_off;
}
static int64_t allocate_clusters(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, int *pnum)
{
BDRVParallelsState *s = bs->opaque;
uint32_t idx, to_allocate, i;
int64_t pos, space;
pos = block_status(s, sector_num, nb_sectors, pnum);
if (pos > 0) {
return pos;
}
idx = sector_num / s->tracks;
if (idx >= s->bat_size) {
return -EINVAL;
}
to_allocate = (sector_num + *pnum + s->tracks - 1) / s->tracks - idx;
space = to_allocate * s->tracks;
if (s->data_end + space > bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS) {
int ret;
space += s->prealloc_size;
if (s->prealloc_mode == PRL_PREALLOC_MODE_FALLOCATE) {
ret = bdrv_write_zeroes(bs->file, s->data_end, space, 0);
} else {
ret = bdrv_truncate(bs->file,
(s->data_end + space) << BDRV_SECTOR_BITS);
}
if (ret < 0) {
return ret;
}
}
for (i = 0; i < to_allocate; i++) {
s->bat_bitmap[idx + i] = cpu_to_le32(s->data_end / s->off_multiplier);
s->data_end += s->tracks;
bitmap_set(s->bat_dirty_bmap,
bat_entry_off(idx) / s->bat_dirty_block, 1);
}
return bat2sect(s, idx) + sector_num % s->tracks;
}
static coroutine_fn int parallels_co_flush_to_os(BlockDriverState *bs)
{
BDRVParallelsState *s = bs->opaque;
unsigned long size = DIV_ROUND_UP(s->header_size, s->bat_dirty_block);
unsigned long bit;
qemu_co_mutex_lock(&s->lock);
bit = find_first_bit(s->bat_dirty_bmap, size);
while (bit < size) {
uint32_t off = bit * s->bat_dirty_block;
uint32_t to_write = s->bat_dirty_block;
int ret;
if (off + to_write > s->header_size) {
to_write = s->header_size - off;
}
ret = bdrv_pwrite(bs->file, off, (uint8_t *)s->header + off, to_write);
if (ret < 0) {
qemu_co_mutex_unlock(&s->lock);
return ret;
}
bit = find_next_bit(s->bat_dirty_bmap, size, bit + 1);
}
bitmap_zero(s->bat_dirty_bmap, size);
qemu_co_mutex_unlock(&s->lock);
return 0;
}
static int64_t coroutine_fn parallels_co_get_block_status(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, int *pnum)
{
BDRVParallelsState *s = bs->opaque;
int64_t offset;
qemu_co_mutex_lock(&s->lock);
offset = block_status(s, sector_num, nb_sectors, pnum);
qemu_co_mutex_unlock(&s->lock);
if (offset < 0) {
if (buf_size < HEADER_SIZE)
return 0;
}
return (offset << BDRV_SECTOR_BITS) |
BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
}
static coroutine_fn int parallels_co_writev(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
{
BDRVParallelsState *s = bs->opaque;
uint64_t bytes_done = 0;
QEMUIOVector hd_qiov;
int ret = 0;
qemu_iovec_init(&hd_qiov, qiov->niov);
while (nb_sectors > 0) {
int64_t position;
int n, nbytes;
qemu_co_mutex_lock(&s->lock);
position = allocate_clusters(bs, sector_num, nb_sectors, &n);
qemu_co_mutex_unlock(&s->lock);
if (position < 0) {
ret = (int)position;
break;
}
nbytes = n << BDRV_SECTOR_BITS;
qemu_iovec_reset(&hd_qiov);
qemu_iovec_concat(&hd_qiov, qiov, bytes_done, nbytes);
ret = bdrv_co_writev(bs->file, position, n, &hd_qiov);
if (ret < 0) {
break;
}
nb_sectors -= n;
sector_num += n;
bytes_done += nbytes;
}
qemu_iovec_destroy(&hd_qiov);
return ret;
}
static coroutine_fn int parallels_co_readv(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
{
BDRVParallelsState *s = bs->opaque;
uint64_t bytes_done = 0;
QEMUIOVector hd_qiov;
int ret = 0;
qemu_iovec_init(&hd_qiov, qiov->niov);
while (nb_sectors > 0) {
int64_t position;
int n, nbytes;
qemu_co_mutex_lock(&s->lock);
position = block_status(s, sector_num, nb_sectors, &n);
qemu_co_mutex_unlock(&s->lock);
nbytes = n << BDRV_SECTOR_BITS;
if (position < 0) {
qemu_iovec_memset(qiov, bytes_done, 0, nbytes);
} else {
qemu_iovec_reset(&hd_qiov);
qemu_iovec_concat(&hd_qiov, qiov, bytes_done, nbytes);
ret = bdrv_co_readv(bs->file, position, n, &hd_qiov);
if (ret < 0) {
break;
}
}
nb_sectors -= n;
sector_num += n;
bytes_done += nbytes;
}
qemu_iovec_destroy(&hd_qiov);
return ret;
}
static int parallels_check(BlockDriverState *bs, BdrvCheckResult *res,
BdrvCheckMode fix)
{
BDRVParallelsState *s = bs->opaque;
int64_t size, prev_off, high_off;
int ret;
uint32_t i;
bool flush_bat = false;
int cluster_size = s->tracks << BDRV_SECTOR_BITS;
size = bdrv_getlength(bs->file);
if (size < 0) {
res->check_errors++;
return size;
}
if (s->header_unclean) {
fprintf(stderr, "%s image was not closed correctly\n",
fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR");
res->corruptions++;
if (fix & BDRV_FIX_ERRORS) {
/* parallels_close will do the job right */
res->corruptions_fixed++;
s->header_unclean = false;
}
}
res->bfi.total_clusters = s->bat_size;
res->bfi.compressed_clusters = 0; /* compression is not supported */
high_off = 0;
prev_off = 0;
for (i = 0; i < s->bat_size; i++) {
int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
if (off == 0) {
prev_off = 0;
continue;
}
/* cluster outside the image */
if (off > size) {
fprintf(stderr, "%s cluster %u is outside image\n",
fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
res->corruptions++;
if (fix & BDRV_FIX_ERRORS) {
prev_off = 0;
s->bat_bitmap[i] = 0;
res->corruptions_fixed++;
flush_bat = true;
continue;
}
}
res->bfi.allocated_clusters++;
if (off > high_off) {
high_off = off;
}
if (prev_off != 0 && (prev_off + cluster_size) != off) {
res->bfi.fragmented_clusters++;
}
prev_off = off;
}
if (flush_bat) {
ret = bdrv_pwrite_sync(bs->file, 0, s->header, s->header_size);
if (ret < 0) {
res->check_errors++;
return ret;
}
}
res->image_end_offset = high_off + cluster_size;
if (size > res->image_end_offset) {
int64_t count;
count = DIV_ROUND_UP(size - res->image_end_offset, cluster_size);
fprintf(stderr, "%s space leaked at the end of the image %" PRId64 "\n",
fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
size - res->image_end_offset);
res->leaks += count;
if (fix & BDRV_FIX_LEAKS) {
ret = bdrv_truncate(bs->file, res->image_end_offset);
if (ret < 0) {
res->check_errors++;
return ret;
}
res->leaks_fixed += count;
}
}
return 0;
}
static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
{
int64_t total_size, cl_size;
uint8_t tmp[BDRV_SECTOR_SIZE];
Error *local_err = NULL;
BlockDriverState *file;
uint32_t bat_entries, bat_sectors;
ParallelsHeader header;
int ret;
total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
BDRV_SECTOR_SIZE);
cl_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE,
DEFAULT_CLUSTER_SIZE), BDRV_SECTOR_SIZE);
ret = bdrv_create_file(filename, opts, &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
return ret;
}
file = NULL;
ret = bdrv_open(&file, filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_PROTOCOL, NULL, &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
return ret;
}
ret = bdrv_truncate(file, 0);
if (ret < 0) {
goto exit;
}
bat_entries = DIV_ROUND_UP(total_size, cl_size);
bat_sectors = DIV_ROUND_UP(bat_entry_off(bat_entries), cl_size);
bat_sectors = (bat_sectors * cl_size) >> BDRV_SECTOR_BITS;
memset(&header, 0, sizeof(header));
memcpy(header.magic, HEADER_MAGIC2, sizeof(header.magic));
header.version = cpu_to_le32(HEADER_VERSION);
/* don't care much about geometry, it is not used on image level */
header.heads = cpu_to_le32(16);
header.cylinders = cpu_to_le32(total_size / BDRV_SECTOR_SIZE / 16 / 32);
header.tracks = cpu_to_le32(cl_size >> BDRV_SECTOR_BITS);
header.bat_entries = cpu_to_le32(bat_entries);
header.nb_sectors = cpu_to_le64(DIV_ROUND_UP(total_size, BDRV_SECTOR_SIZE));
header.data_off = cpu_to_le32(bat_sectors);
/* write all the data */
memset(tmp, 0, sizeof(tmp));
memcpy(tmp, &header, sizeof(header));
ret = bdrv_pwrite(file, 0, tmp, BDRV_SECTOR_SIZE);
if (ret < 0) {
goto exit;
}
ret = bdrv_write_zeroes(file, 1, bat_sectors - 1, 0);
if (ret < 0) {
goto exit;
}
ret = 0;
done:
bdrv_unref(file);
return ret;
exit:
error_setg_errno(errp, -ret, "Failed to create Parallels image");
goto done;
}
static int parallels_probe(const uint8_t *buf, int buf_size,
const char *filename)
{
const ParallelsHeader *ph = (const void *)buf;
if (buf_size < sizeof(ParallelsHeader)) {
return 0;
}
if ((!memcmp(ph->magic, HEADER_MAGIC, 16) ||
!memcmp(ph->magic, HEADER_MAGIC2, 16)) &&
(le32_to_cpu(ph->version) == HEADER_VERSION)) {
!memcmp(ph->magic, HEADER_MAGIC2, 16)) &&
(le32_to_cpu(ph->version) == HEADER_VERSION))
return 100;
}
return 0;
}
static int parallels_update_header(BlockDriverState *bs)
{
BDRVParallelsState *s = bs->opaque;
unsigned size = MAX(bdrv_opt_mem_align(bs->file), sizeof(ParallelsHeader));
if (size > s->header_size) {
size = s->header_size;
}
return bdrv_pwrite_sync(bs->file, 0, s->header, size);
}
static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVParallelsState *s = bs->opaque;
ParallelsHeader ph;
int ret, size, i;
QemuOpts *opts = NULL;
Error *local_err = NULL;
char *buf;
int i;
struct parallels_header ph;
int ret;
bs->read_only = 1; // no write support yet
ret = bdrv_pread(bs->file, 0, &ph, sizeof(ph));
if (ret < 0) {
@@ -595,90 +115,25 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
goto fail;
}
s->bat_size = le32_to_cpu(ph.bat_entries);
if (s->bat_size > INT_MAX / sizeof(uint32_t)) {
s->catalog_size = le32_to_cpu(ph.catalog_entries);
if (s->catalog_size > INT_MAX / 4) {
error_setg(errp, "Catalog too large");
ret = -EFBIG;
goto fail;
}
size = bat_entry_off(s->bat_size);
s->header_size = ROUND_UP(size, bdrv_opt_mem_align(bs->file));
s->header = qemu_try_blockalign(bs->file, s->header_size);
if (s->header == NULL) {
s->catalog_bitmap = g_try_new(uint32_t, s->catalog_size);
if (s->catalog_size && s->catalog_bitmap == NULL) {
ret = -ENOMEM;
goto fail;
}
s->data_end = le32_to_cpu(ph.data_off);
if (s->data_end == 0) {
s->data_end = ROUND_UP(bat_entry_off(s->bat_size), BDRV_SECTOR_SIZE);
}
if (s->data_end < s->header_size) {
/* there is not enough unused space to fit to block align between BAT
and actual data. We can't avoid read-modify-write... */
s->header_size = size;
}
ret = bdrv_pread(bs->file, 0, s->header, s->header_size);
ret = bdrv_pread(bs->file, 64, s->catalog_bitmap, s->catalog_size * 4);
if (ret < 0) {
goto fail;
}
s->bat_bitmap = (uint32_t *)(s->header + 1);
for (i = 0; i < s->bat_size; i++) {
int64_t off = bat2sect(s, i);
if (off >= s->data_end) {
s->data_end = off + s->tracks;
}
}
if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
/* Image was not closed correctly. The check is mandatory */
s->header_unclean = true;
if ((flags & BDRV_O_RDWR) && !(flags & BDRV_O_CHECK)) {
error_setg(errp, "parallels: Image was not closed correctly; "
"cannot be opened read/write");
ret = -EACCES;
goto fail;
}
}
opts = qemu_opts_create(&parallels_runtime_opts, NULL, 0, &local_err);
if (local_err != NULL) {
goto fail_options;
}
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err != NULL) {
goto fail_options;
}
s->prealloc_size =
qemu_opt_get_size_del(opts, PARALLELS_OPT_PREALLOC_SIZE, 0);
s->prealloc_size = MAX(s->tracks, s->prealloc_size >> BDRV_SECTOR_BITS);
buf = qemu_opt_get_del(opts, PARALLELS_OPT_PREALLOC_MODE);
s->prealloc_mode = qapi_enum_parse(prealloc_mode_lookup, buf,
PRL_PREALLOC_MODE_MAX, PRL_PREALLOC_MODE_FALLOCATE, &local_err);
g_free(buf);
if (local_err != NULL) {
goto fail_options;
}
if (!bdrv_has_zero_init(bs->file) ||
bdrv_truncate(bs->file, bdrv_getlength(bs->file)) != 0) {
s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
}
if (flags & BDRV_O_RDWR) {
s->header->inuse = cpu_to_le32(HEADER_INUSE_MAGIC);
ret = parallels_update_header(bs);
if (ret < 0) {
goto fail;
}
}
s->bat_dirty_block = 4 * getpagesize();
s->bat_dirty_bmap =
bitmap_new(DIV_ROUND_UP(s->header_size, s->bat_dirty_block));
for (i = 0; i < s->catalog_size; i++)
le32_to_cpus(&s->catalog_bitmap[i]);
qemu_co_mutex_init(&s->lock);
return 0;
@@ -687,67 +142,67 @@ fail_format:
error_setg(errp, "Image not in Parallels format");
ret = -EINVAL;
fail:
qemu_vfree(s->header);
g_free(s->catalog_bitmap);
return ret;
fail_options:
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num)
{
BDRVParallelsState *s = bs->opaque;
uint32_t index, offset;
index = sector_num / s->tracks;
offset = sector_num % s->tracks;
/* not allocated */
if ((index >= s->catalog_size) || (s->catalog_bitmap[index] == 0))
return -1;
return
((uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset) * 512;
}
static int parallels_read(BlockDriverState *bs, int64_t sector_num,
uint8_t *buf, int nb_sectors)
{
while (nb_sectors > 0) {
int64_t position = seek_to_sector(bs, sector_num);
if (position >= 0) {
if (bdrv_pread(bs->file, position, buf, 512) != 512)
return -1;
} else {
memset(buf, 0, 512);
}
nb_sectors--;
sector_num++;
buf += 512;
}
return 0;
}
static coroutine_fn int parallels_co_read(BlockDriverState *bs, int64_t sector_num,
uint8_t *buf, int nb_sectors)
{
int ret;
BDRVParallelsState *s = bs->opaque;
qemu_co_mutex_lock(&s->lock);
ret = parallels_read(bs, sector_num, buf, nb_sectors);
qemu_co_mutex_unlock(&s->lock);
return ret;
}
static void parallels_close(BlockDriverState *bs)
{
BDRVParallelsState *s = bs->opaque;
if (bs->open_flags & BDRV_O_RDWR) {
s->header->inuse = 0;
parallels_update_header(bs);
}
if (bs->open_flags & BDRV_O_RDWR) {
bdrv_truncate(bs->file, s->data_end << BDRV_SECTOR_BITS);
}
g_free(s->bat_dirty_bmap);
qemu_vfree(s->header);
g_free(s->catalog_bitmap);
}
static QemuOptsList parallels_create_opts = {
.name = "parallels-create-opts",
.head = QTAILQ_HEAD_INITIALIZER(parallels_create_opts.head),
.desc = {
{
.name = BLOCK_OPT_SIZE,
.type = QEMU_OPT_SIZE,
.help = "Virtual disk size",
},
{
.name = BLOCK_OPT_CLUSTER_SIZE,
.type = QEMU_OPT_SIZE,
.help = "Parallels image cluster size",
.def_value_str = stringify(DEFAULT_CLUSTER_SIZE),
},
{ /* end of list */ }
}
};
static BlockDriver bdrv_parallels = {
.format_name = "parallels",
.instance_size = sizeof(BDRVParallelsState),
.bdrv_probe = parallels_probe,
.bdrv_open = parallels_open,
.bdrv_read = parallels_co_read,
.bdrv_close = parallels_close,
.bdrv_co_get_block_status = parallels_co_get_block_status,
.bdrv_has_zero_init = bdrv_has_zero_init_1,
.bdrv_co_flush_to_os = parallels_co_flush_to_os,
.bdrv_co_readv = parallels_co_readv,
.bdrv_co_writev = parallels_co_writev,
.bdrv_create = parallels_create,
.bdrv_check = parallels_check,
.create_opts = &parallels_create_opts,
};
static void bdrv_parallels_init(void)

View File

@@ -31,10 +31,8 @@
#include "qapi/qmp/types.h"
#include "sysemu/block-backend.h"
BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs, Error **errp)
BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs)
{
ImageInfo **p_image_info;
BlockDriverState *bs0;
BlockDeviceInfo *info = g_malloc0(sizeof(*info));
info->file = g_strdup(bs->filename);
@@ -94,25 +92,6 @@ BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs, Error **errp)
info->write_threshold = bdrv_write_threshold_get(bs);
bs0 = bs;
p_image_info = &info->image;
while (1) {
Error *local_err = NULL;
bdrv_query_image_info(bs0, p_image_info, &local_err);
if (local_err) {
error_propagate(errp, local_err);
qapi_free_BlockDeviceInfo(info);
return NULL;
}
if (bs0->drv && bs0->backing_hd) {
bs0 = bs0->backing_hd;
(*p_image_info)->has_backing_image = true;
p_image_info = &((*p_image_info)->backing_image);
} else {
break;
}
}
return info;
}
@@ -285,6 +264,9 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo **p_info,
{
BlockInfo *info = g_malloc0(sizeof(*info));
BlockDriverState *bs = blk_bs(blk);
BlockDriverState *bs0;
ImageInfo **p_image_info;
Error *local_err = NULL;
info->device = g_strdup(blk_name(blk));
info->type = g_strdup("unknown");
info->locked = blk_dev_is_medium_locked(blk);
@@ -307,9 +289,23 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo **p_info,
if (bs->drv) {
info->has_inserted = true;
info->inserted = bdrv_block_device_info(bs, errp);
if (info->inserted == NULL) {
goto err;
info->inserted = bdrv_block_device_info(bs);
bs0 = bs;
p_image_info = &info->inserted->image;
while (1) {
bdrv_query_image_info(bs0, p_image_info, &local_err);
if (local_err) {
error_propagate(errp, local_err);
goto err;
}
if (bs0->drv && bs0->backing_hd) {
bs0 = bs0->backing_hd;
(*p_image_info)->has_backing_image = true;
p_image_info = &((*p_image_info)->backing_image);
} else {
break;
}
}
}
@@ -523,6 +519,9 @@ static void dump_qobject(fprintf_function func_fprintf, void *f,
QDECREF(value);
break;
}
case QTYPE_NONE:
break;
case QTYPE_MAX:
default:
abort();
}

View File

@@ -124,7 +124,7 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
snprintf(version, sizeof(version), "QCOW version %" PRIu32,
header.version);
error_set(errp, QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
bdrv_get_device_or_node_name(bs), "qcow", version);
bdrv_get_device_name(bs), "qcow", version);
ret = -ENOTSUP;
goto fail;
}
@@ -229,9 +229,9 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
}
/* Disable migration when qcow images are used */
error_setg(&s->migration_blocker, "The qcow format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
error_set(&s->migration_blocker,
QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
"qcow", bdrv_get_device_name(bs), "live migration");
migrate_add_blocker(s->migration_blocker);
qemu_co_mutex_init(&s->lock);
@@ -269,7 +269,6 @@ static int qcow_set_key(BlockDriverState *bs, const char *key)
for(i = 0;i < len;i++) {
keybuf[i] = key[i];
}
assert(bs->encrypted);
s->crypt_method = s->crypt_method_header;
if (AES_set_encrypt_key(keybuf, 128, &s->aes_encrypt_key) != 0)
@@ -412,10 +411,9 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
bdrv_truncate(bs->file, cluster_offset + s->cluster_size);
/* if encrypted, we must initialize the cluster
content which won't be written */
if (bs->encrypted &&
if (s->crypt_method &&
(n_end - n_start) < s->cluster_sectors) {
uint64_t start_sect;
assert(s->crypt_method);
start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
memset(s->cluster_data + 512, 0x00, 512);
for(i = 0; i < s->cluster_sectors; i++) {
@@ -592,8 +590,7 @@ static coroutine_fn int qcow_co_readv(BlockDriverState *bs, int64_t sector_num,
if (ret < 0) {
break;
}
if (bs->encrypted) {
assert(s->crypt_method);
if (s->crypt_method) {
encrypt_sectors(s, sector_num, buf, buf,
n, 0,
&s->aes_decrypt_key);
@@ -664,8 +661,7 @@ static coroutine_fn int qcow_co_writev(BlockDriverState *bs, int64_t sector_num,
ret = -EIO;
break;
}
if (bs->encrypted) {
assert(s->crypt_method);
if (s->crypt_method) {
if (!cluster_data) {
cluster_data = g_malloc0(s->cluster_size);
}

View File

@@ -28,68 +28,62 @@
#include "trace.h"
typedef struct Qcow2CachedTable {
int64_t offset;
bool dirty;
uint64_t lru_counter;
int ref;
void* table;
int64_t offset;
bool dirty;
int cache_hits;
int ref;
} Qcow2CachedTable;
struct Qcow2Cache {
Qcow2CachedTable *entries;
struct Qcow2Cache *depends;
Qcow2CachedTable* entries;
struct Qcow2Cache* depends;
int size;
bool depends_on_flush;
void *table_array;
uint64_t lru_counter;
};
static inline void *qcow2_cache_get_table_addr(BlockDriverState *bs,
Qcow2Cache *c, int table)
{
BDRVQcowState *s = bs->opaque;
return (uint8_t *) c->table_array + (size_t) table * s->cluster_size;
}
static inline int qcow2_cache_get_table_idx(BlockDriverState *bs,
Qcow2Cache *c, void *table)
{
BDRVQcowState *s = bs->opaque;
ptrdiff_t table_offset = (uint8_t *) table - (uint8_t *) c->table_array;
int idx = table_offset / s->cluster_size;
assert(idx >= 0 && idx < c->size && table_offset % s->cluster_size == 0);
return idx;
}
Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
{
BDRVQcowState *s = bs->opaque;
Qcow2Cache *c;
int i;
c = g_new0(Qcow2Cache, 1);
c->size = num_tables;
c->entries = g_try_new0(Qcow2CachedTable, num_tables);
c->table_array = qemu_try_blockalign(bs->file,
(size_t) num_tables * s->cluster_size);
if (!c->entries) {
goto fail;
}
if (!c->entries || !c->table_array) {
qemu_vfree(c->table_array);
g_free(c->entries);
g_free(c);
c = NULL;
for (i = 0; i < c->size; i++) {
c->entries[i].table = qemu_try_blockalign(bs->file, s->cluster_size);
if (c->entries[i].table == NULL) {
goto fail;
}
}
return c;
fail:
if (c->entries) {
for (i = 0; i < c->size; i++) {
qemu_vfree(c->entries[i].table);
}
}
g_free(c->entries);
g_free(c);
return NULL;
}
int qcow2_cache_destroy(BlockDriverState *bs, Qcow2Cache *c)
int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
{
int i;
for (i = 0; i < c->size; i++) {
assert(c->entries[i].ref == 0);
qemu_vfree(c->entries[i].table);
}
qemu_vfree(c->table_array);
g_free(c->entries);
g_free(c);
@@ -157,8 +151,8 @@ static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
}
ret = bdrv_pwrite(bs->file, c->entries[i].offset,
qcow2_cache_get_table_addr(bs, c, i), s->cluster_size);
ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
s->cluster_size);
if (ret < 0) {
return ret;
}
@@ -234,51 +228,63 @@ int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c)
for (i = 0; i < c->size; i++) {
assert(c->entries[i].ref == 0);
c->entries[i].offset = 0;
c->entries[i].lru_counter = 0;
c->entries[i].cache_hits = 0;
}
c->lru_counter = 0;
return 0;
}
static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
{
int i;
int min_count = INT_MAX;
int min_index = -1;
for (i = 0; i < c->size; i++) {
if (c->entries[i].ref) {
continue;
}
if (c->entries[i].cache_hits < min_count) {
min_index = i;
min_count = c->entries[i].cache_hits;
}
/* Give newer hits priority */
/* TODO Check how to optimize the replacement strategy */
if (c->entries[i].cache_hits > 1) {
c->entries[i].cache_hits /= 2;
}
}
if (min_index == -1) {
/* This can't happen in current synchronous code, but leave the check
* here as a reminder for whoever starts using AIO with the cache */
abort();
}
return min_index;
}
static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
uint64_t offset, void **table, bool read_from_disk)
{
BDRVQcowState *s = bs->opaque;
int i;
int ret;
int lookup_index;
uint64_t min_lru_counter = UINT64_MAX;
int min_lru_index = -1;
trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
offset, read_from_disk);
/* Check if the table is already cached */
i = lookup_index = (offset / s->cluster_size * 4) % c->size;
do {
const Qcow2CachedTable *t = &c->entries[i];
if (t->offset == offset) {
for (i = 0; i < c->size; i++) {
if (c->entries[i].offset == offset) {
goto found;
}
if (t->ref == 0 && t->lru_counter < min_lru_counter) {
min_lru_counter = t->lru_counter;
min_lru_index = i;
}
if (++i == c->size) {
i = 0;
}
} while (i != lookup_index);
if (min_lru_index == -1) {
/* This can't happen in current synchronous code, but leave the check
* here as a reminder for whoever starts using AIO with the cache */
abort();
}
/* Cache miss: write a table back and replace it */
i = min_lru_index;
/* If not, write a table back and replace it */
i = qcow2_cache_find_entry_to_replace(c);
trace_qcow2_cache_get_replace_entry(qemu_coroutine_self(),
c == s->l2_table_cache, i);
if (i < 0) {
@@ -298,19 +304,22 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
}
ret = bdrv_pread(bs->file, offset, qcow2_cache_get_table_addr(bs, c, i),
s->cluster_size);
ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
if (ret < 0) {
return ret;
}
}
/* Give the table some hits for the start so that it won't be replaced
* immediately. The number 32 is completely arbitrary. */
c->entries[i].cache_hits = 32;
c->entries[i].offset = offset;
/* And return the right table */
found:
c->entries[i].cache_hits++;
c->entries[i].ref++;
*table = qcow2_cache_get_table_addr(bs, c, i);
*table = c->entries[i].table;
trace_qcow2_cache_get_done(qemu_coroutine_self(),
c == s->l2_table_cache, i);
@@ -330,24 +339,36 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
return qcow2_cache_do_get(bs, c, offset, table, false);
}
void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
{
int i = qcow2_cache_get_table_idx(bs, c, *table);
int i;
for (i = 0; i < c->size; i++) {
if (c->entries[i].table == *table) {
goto found;
}
}
return -ENOENT;
found:
c->entries[i].ref--;
*table = NULL;
if (c->entries[i].ref == 0) {
c->entries[i].lru_counter = ++c->lru_counter;
}
assert(c->entries[i].ref >= 0);
return 0;
}
void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, Qcow2Cache *c,
void *table)
void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
{
int i = qcow2_cache_get_table_idx(bs, c, table);
assert(c->entries[i].offset != 0);
int i;
for (i = 0; i < c->size; i++) {
if (c->entries[i].table == table) {
goto found;
}
}
abort();
found:
c->entries[i].dirty = true;
}

View File

@@ -253,14 +253,17 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
memcpy(l2_table, old_table, s->cluster_size);
qcow2_cache_put(bs, s->l2_table_cache, (void **) &old_table);
ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
if (ret < 0) {
goto fail;
}
}
/* write the l2 table to the file */
BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
trace_qcow2_l2_allocate_write_l2(bs, l1_index);
qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
ret = qcow2_cache_flush(bs, s->l2_table_cache);
if (ret < 0) {
goto fail;
@@ -400,8 +403,7 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs,
goto out;
}
if (bs->encrypted) {
assert(s->crypt_method);
if (s->crypt_method) {
qcow2_encrypt_sectors(s, start_sect + n_start,
iov.iov_base, iov.iov_base, n, 1,
&s->aes_encrypt_key);
@@ -690,9 +692,12 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
/* compressed clusters never have the copied flag */
BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
l2_table[l2_index] = cpu_to_be64(cluster_offset);
qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
if (ret < 0) {
return 0;
}
return cluster_offset;
}
@@ -766,7 +771,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
if (ret < 0) {
goto err;
}
qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
assert(l2_index + m->nb_clusters <= s->l2_size);
for (i = 0; i < m->nb_clusters; i++) {
@@ -784,7 +789,10 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
}
qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
if (ret < 0) {
goto err;
}
/*
* If this was a COW, we need to decrease the refcount of the old cluster.
@@ -936,7 +944,7 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
uint64_t *l2_table;
unsigned int nb_clusters;
unsigned int keep_clusters;
int ret;
int ret, pret;
trace_qcow2_handle_copied(qemu_coroutine_self(), guest_offset, *host_offset,
*bytes);
@@ -1003,7 +1011,10 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
/* Cleanup */
out:
qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
pret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
if (pret < 0) {
return pret;
}
/* Only return a host offset if we actually made progress. Otherwise we
* would make requirements for handle_alloc() that it can't fulfill */
@@ -1128,7 +1139,10 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
* wrong with our code. */
assert(nb_clusters > 0);
qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
if (ret < 0) {
return ret;
}
/* Allocate, if necessary at a given offset in the image file */
alloc_cluster_offset = start_of_cluster(s, *host_offset);
@@ -1456,7 +1470,7 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
}
/* First remove L2 entries */
qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
if (!full_discard && s->qcow_version >= 3) {
l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
} else {
@@ -1467,7 +1481,10 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
qcow2_free_any_clusters(bs, old_l2_entry, 1, type);
}
qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
if (ret < 0) {
return ret;
}
return nb_clusters;
}
@@ -1541,7 +1558,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
old_offset = be64_to_cpu(l2_table[l2_index + i]);
/* Update L2 entries */
qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
if (old_offset & QCOW_OFLAG_COMPRESSED) {
l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
@@ -1550,7 +1567,10 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
}
}
qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
if (ret < 0) {
return ret;
}
return nb_clusters;
}
@@ -1740,10 +1760,14 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
if (is_active_l1) {
if (l2_dirty) {
qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
qcow2_cache_depends_on_flush(s->l2_table_cache);
}
qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
ret = qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table);
if (ret < 0) {
l2_table = NULL;
goto fail;
}
} else {
if (l2_dirty) {
ret = qcow2_pre_write_overlap_check(bs,
@@ -1774,7 +1798,12 @@ fail:
if (!is_active_l1) {
qemu_vfree(l2_table);
} else {
qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
if (ret < 0) {
qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table);
} else {
ret = qcow2_cache_put(bs, s->l2_table_cache,
(void **)&l2_table);
}
}
}
return ret;

View File

@@ -265,7 +265,10 @@ int qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index,
block_index = cluster_index & (s->refcount_block_size - 1);
*refcount = s->get_refcount(refcount_block, block_index);
qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
ret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
if (ret < 0) {
return ret;
}
return 0;
}
@@ -421,7 +424,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
/* Now the new refcount block needs to be written to disk */
BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE);
qcow2_cache_entry_mark_dirty(bs, s->refcount_block_cache, *refcount_block);
qcow2_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
ret = qcow2_cache_flush(bs, s->refcount_block_cache);
if (ret < 0) {
goto fail_block;
@@ -445,7 +448,10 @@ static int alloc_refcount_block(BlockDriverState *bs,
return -EAGAIN;
}
qcow2_cache_put(bs, s->refcount_block_cache, refcount_block);
ret = qcow2_cache_put(bs, s->refcount_block_cache, refcount_block);
if (ret < 0) {
goto fail_block;
}
/*
* If we come here, we need to grow the refcount table. Again, a new
@@ -717,8 +723,13 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
/* Load the refcount block and allocate it if needed */
if (table_index != old_table_index) {
if (refcount_block) {
qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
ret = qcow2_cache_put(bs, s->refcount_block_cache,
&refcount_block);
if (ret < 0) {
goto fail;
}
}
ret = alloc_refcount_block(bs, cluster_index, &refcount_block);
if (ret < 0) {
goto fail;
@@ -726,8 +737,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
}
old_table_index = table_index;
qcow2_cache_entry_mark_dirty(bs, s->refcount_block_cache,
refcount_block);
qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
/* we can update the count and save it */
block_index = cluster_index & (s->refcount_block_size - 1);
@@ -763,7 +773,11 @@ fail:
/* Write last changed block to disk */
if (refcount_block) {
qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
int wret;
wret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
if (wret < 0) {
return ret < 0 ? ret : wret;
}
}
/*
@@ -940,19 +954,27 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
}
free_in_cluster = s->cluster_size - offset_into_cluster(s, offset);
if (!offset || free_in_cluster < size) {
int64_t new_cluster = alloc_clusters_noref(bs, s->cluster_size);
if (new_cluster < 0) {
return new_cluster;
do {
if (!offset || free_in_cluster < size) {
int64_t new_cluster = alloc_clusters_noref(bs, s->cluster_size);
if (new_cluster < 0) {
return new_cluster;
}
if (!offset || ROUND_UP(offset, s->cluster_size) != new_cluster) {
offset = new_cluster;
free_in_cluster = s->cluster_size;
} else {
free_in_cluster += s->cluster_size;
}
}
if (!offset || ROUND_UP(offset, s->cluster_size) != new_cluster) {
offset = new_cluster;
assert(offset);
ret = update_refcount(bs, offset, size, 1, false, QCOW2_DISCARD_NEVER);
if (ret < 0) {
offset = 0;
}
}
assert(offset);
ret = update_refcount(bs, offset, size, 1, false, QCOW2_DISCARD_NEVER);
} while (ret == -EAGAIN);
if (ret < 0) {
return ret;
}
@@ -1168,12 +1190,15 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
s->refcount_block_cache);
}
l2_table[j] = cpu_to_be64(offset);
qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache,
l2_table);
qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
}
}
qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
if (ret < 0) {
goto fail;
}
if (addend != 0) {
ret = qcow2_update_cluster_refcount(bs, l2_offset >>
@@ -2438,7 +2463,7 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset,
if (ret < 0) {
return ret;
} else if (ret > 0) {
int metadata_ol_bitnr = ctz32(ret);
int metadata_ol_bitnr = ffs(ret) - 1;
assert(metadata_ol_bitnr < QCOW2_OL_MAX_BITNR);
qcow2_signal_corruption(bs, true, offset, size, "Preventing invalid "

View File

@@ -351,8 +351,10 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
memset(sn, 0, sizeof(*sn));
/* Generate an ID */
find_new_snapshot_id(bs, sn_info->id_str, sizeof(sn_info->id_str));
/* Generate an ID if it wasn't passed */
if (sn_info->id_str[0] == '\0') {
find_new_snapshot_id(bs, sn_info->id_str, sizeof(sn_info->id_str));
}
/* Check that the ID is unique */
if (find_snapshot_by_id_and_name(bs, sn_info->id_str, NULL) >= 0) {

View File

@@ -208,7 +208,7 @@ static void GCC_FMT_ATTR(3, 4) report_unsupported(BlockDriverState *bs,
va_end(ap);
error_set(errp, QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
bdrv_get_device_or_node_name(bs), "qcow2", msg);
bdrv_get_device_name(bs), "qcow2", msg);
}
static void report_unsupported_feature(BlockDriverState *bs,
@@ -1037,7 +1037,6 @@ static int qcow2_set_key(BlockDriverState *bs, const char *key)
for(i = 0;i < len;i++) {
keybuf[i] = key[i];
}
assert(bs->encrypted);
s->crypt_method = s->crypt_method_header;
if (AES_set_encrypt_key(keybuf, 128, &s->aes_encrypt_key) != 0)
@@ -1225,9 +1224,7 @@ static coroutine_fn int qcow2_co_readv(BlockDriverState *bs, int64_t sector_num,
goto fail;
}
if (bs->encrypted) {
assert(s->crypt_method);
if (s->crypt_method) {
/*
* For encrypted images, read everything into a temporary
* contiguous buffer on which the AES functions can work.
@@ -1258,8 +1255,7 @@ static coroutine_fn int qcow2_co_readv(BlockDriverState *bs, int64_t sector_num,
if (ret < 0) {
goto fail;
}
if (bs->encrypted) {
assert(s->crypt_method);
if (s->crypt_method) {
qcow2_encrypt_sectors(s, sector_num, cluster_data,
cluster_data, cur_nr_sectors, 0, &s->aes_decrypt_key);
qemu_iovec_from_buf(qiov, bytes_done,
@@ -1319,7 +1315,7 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
trace_qcow2_writev_start_part(qemu_coroutine_self());
index_in_cluster = sector_num & (s->cluster_sectors - 1);
cur_nr_sectors = remaining_sectors;
if (bs->encrypted &&
if (s->crypt_method &&
cur_nr_sectors >
QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors - index_in_cluster) {
cur_nr_sectors =
@@ -1338,8 +1334,7 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
cur_nr_sectors * 512);
if (bs->encrypted) {
assert(s->crypt_method);
if (s->crypt_method) {
if (!cluster_data) {
cluster_data = qemu_try_blockalign(bs->file,
QCOW_MAX_CRYPT_CLUSTERS
@@ -1489,8 +1484,7 @@ static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
* that means we don't have to worry about reopening them here.
*/
if (bs->encrypted) {
assert(s->crypt_method);
if (s->crypt_method) {
crypt_method = s->crypt_method;
memcpy(&aes_encrypt_key, &s->aes_encrypt_key, sizeof(aes_encrypt_key));
memcpy(&aes_decrypt_key, &s->aes_decrypt_key, sizeof(aes_decrypt_key));
@@ -1519,7 +1513,7 @@ static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
return;
}
if (bs->encrypted) {
if (crypt_method) {
s->crypt_method = crypt_method;
memcpy(&s->aes_encrypt_key, &aes_encrypt_key, sizeof(aes_encrypt_key));
memcpy(&s->aes_decrypt_key, &aes_decrypt_key, sizeof(aes_decrypt_key));
@@ -1808,7 +1802,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
{
/* Calculate cluster_bits */
int cluster_bits;
cluster_bits = ctz32(cluster_size);
cluster_bits = ffs(cluster_size) - 1;
if (cluster_bits < MIN_CLUSTER_BITS || cluster_bits > MAX_CLUSTER_BITS ||
(1 << cluster_bits) != cluster_size)
{
@@ -2116,7 +2110,7 @@ static int qcow2_create(const char *filename, QemuOpts *opts, Error **errp)
goto finish;
}
refcount_order = ctz32(refcount_bits);
refcount_order = ffs(refcount_bits) - 1;
ret = qcow2_create2(filename, size, backing_file, backing_fmt, flags,
cluster_size, prealloc, opts, version, refcount_order,
@@ -2830,7 +2824,6 @@ void qcow2_signal_corruption(BlockDriverState *bs, bool fatal, int64_t offset,
int64_t size, const char *message_format, ...)
{
BDRVQcowState *s = bs->opaque;
const char *node_name;
char *message;
va_list ap;
@@ -2854,11 +2847,8 @@ void qcow2_signal_corruption(BlockDriverState *bs, bool fatal, int64_t offset,
"corruption events will be suppressed\n", message);
}
node_name = bdrv_get_node_name(bs);
qapi_event_send_block_image_corrupted(bdrv_get_device_name(bs),
*node_name != '\0', node_name,
message, offset >= 0, offset,
size >= 0, size,
qapi_event_send_block_image_corrupted(bdrv_get_device_name(bs), message,
offset >= 0, offset, size >= 0, size,
fatal, &error_abort);
g_free(message);

View File

@@ -62,7 +62,8 @@
#define MIN_CLUSTER_BITS 9
#define MAX_CLUSTER_BITS 21
#define MIN_L2_CACHE_SIZE 1 /* cluster */
/* Must be at least 2 to cover COW */
#define MIN_L2_CACHE_SIZE 2 /* clusters */
/* Must be at least 4 to cover all cases of refcount table growth */
#define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */
@@ -574,8 +575,7 @@ int qcow2_read_snapshots(BlockDriverState *bs);
Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, Qcow2Cache *c,
void *table);
void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
Qcow2Cache *dependency);
@@ -587,6 +587,6 @@ int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
void **table);
int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
void **table);
void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
#endif

View File

@@ -408,7 +408,7 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
snprintf(buf, sizeof(buf), "%" PRIx64,
s->header.features & ~QED_FEATURE_MASK);
error_set(errp, QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
bdrv_get_device_or_node_name(bs), "QED", buf);
bdrv_get_device_name(bs), "QED", buf);
return -ENOTSUP;
}
if (!qed_is_cluster_size_valid(s->header.cluster_size)) {
@@ -436,9 +436,9 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
s->table_nelems = (s->header.cluster_size * s->header.table_size) /
sizeof(uint64_t);
s->l2_shift = ctz32(s->header.cluster_size);
s->l2_shift = ffs(s->header.cluster_size) - 1;
s->l2_mask = s->table_nelems - 1;
s->l1_shift = s->l2_shift + ctz32(s->table_nelems);
s->l1_shift = s->l2_shift + ffs(s->table_nelems) - 1;
/* Header size calculation must not overflow uint32_t */
if (s->header.header_size > UINT32_MAX / s->header.cluster_size) {

View File

@@ -226,7 +226,10 @@ static void quorum_report_bad(QuorumAIOCB *acb, char *node_name, int ret)
static void quorum_report_failure(QuorumAIOCB *acb)
{
const char *reference = bdrv_get_device_or_node_name(acb->common.bs);
const char *reference = bdrv_get_device_name(acb->common.bs)[0] ?
bdrv_get_device_name(acb->common.bs) :
acb->common.bs->node_name;
qapi_event_send_quorum_failure(reference, acb->sector_num,
acb->nb_sectors, &error_abort);
}

View File

@@ -301,7 +301,6 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
{
BDRVRawState *s = bs->opaque;
char *buf;
size_t max_align = MAX(MAX_BLOCKSIZE, getpagesize());
/* For /dev/sg devices the alignment is not really used.
With buffered I/O, we don't have any restrictions. */
@@ -331,9 +330,9 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
/* If we could not get the sizes so far, we can only guess them */
if (!s->buf_align) {
size_t align;
buf = qemu_memalign(max_align, 2 * max_align);
for (align = 512; align <= max_align; align <<= 1) {
if (raw_is_io_aligned(fd, buf + align, max_align)) {
buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
if (raw_is_io_aligned(fd, buf + align, MAX_BLOCKSIZE)) {
s->buf_align = align;
break;
}
@@ -343,8 +342,8 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
if (!bs->request_alignment) {
size_t align;
buf = qemu_memalign(s->buf_align, max_align);
for (align = 512; align <= max_align; align <<= 1) {
buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
if (raw_is_io_aligned(fd, buf, align)) {
bs->request_alignment = align;
break;
@@ -726,8 +725,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
BDRVRawState *s = bs->opaque;
raw_probe_alignment(bs, s->fd, errp);
bs->bl.min_mem_alignment = s->buf_align;
bs->bl.opt_mem_alignment = MAX(s->buf_align, getpagesize());
bs->bl.opt_mem_alignment = s->buf_align;
}
static int check_for_dasd(int fd)

View File

@@ -325,7 +325,7 @@ static int qemu_rbd_create(const char *filename, QemuOpts *opts, Error **errp)
error_setg(errp, "obj size too small");
return -EINVAL;
}
obj_order = ctz32(objsize);
obj_order = ffs(objsize) - 1;
}
clientname = qemu_rbd_parse_clientname(conf, clientname_buf);

View File

@@ -1716,7 +1716,7 @@ static int parse_block_size_shift(BDRVSheepdogState *s, QemuOpts *opt)
if ((object_size - 1) & object_size) { /* not a power of 2? */
return -EINVAL;
}
obj_order = ctz32(object_size);
obj_order = ffs(object_size) - 1;
if (obj_order < 20 || obj_order > 31) {
return -EINVAL;
}
@@ -2341,7 +2341,6 @@ static int sd_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
if (ret < 0) {
error_report("failed to create inode for snapshot: %s",
error_get_pretty(local_err));
error_free(local_err);
goto cleanup;
}

View File

@@ -246,9 +246,9 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
if (bs->file) {
return bdrv_snapshot_delete(bs->file, snapshot_id, name, errp);
}
error_setg(errp, "Block format '%s' used by device '%s' "
"does not support internal snapshot deletion",
drv->format_name, bdrv_get_device_name(bs));
error_set(errp, QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
drv->format_name, bdrv_get_device_name(bs),
"internal snapshot deletion");
return -ENOTSUP;
}
@@ -329,9 +329,9 @@ int bdrv_snapshot_load_tmp(BlockDriverState *bs,
if (drv->bdrv_snapshot_load_tmp) {
return drv->bdrv_snapshot_load_tmp(bs, snapshot_id, name, errp);
}
error_setg(errp, "Block format '%s' used by device '%s' "
"does not support temporarily loading internal snapshots",
drv->format_name, bdrv_get_device_name(bs));
error_set(errp, QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
drv->format_name, bdrv_get_device_name(bs),
"temporarily load internal snapshot");
return -ENOTSUP;
}

377
block/tar.c Normal file
View File

@@ -0,0 +1,377 @@
/*
* Tar block driver
*
* Copyright (c) 2009 Alexander Graf <agraf@suse.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "qemu-common.h"
#include "block/block_int.h"
// #define DEBUG
#ifdef DEBUG
#define dprintf(fmt, ...) do { printf("tar: " fmt, ## __VA_ARGS__); } while (0)
#else
#define dprintf(fmt, ...) do { } while (0)
#endif
#define SECTOR_SIZE 512
#define POSIX_TAR_MAGIC "ustar"
#define OFFS_LENGTH 0x7c
#define OFFS_TYPE 0x9c
#define OFFS_MAGIC 0x101
#define OFFS_S_SP 0x182
#define OFFS_S_EXT 0x1e2
#define OFFS_S_LENGTH 0x1e3
#define OFFS_SX_EXT 0x1f8
typedef struct SparseCache {
uint64_t start;
uint64_t end;
} SparseCache;
typedef struct BDRVTarState {
BlockDriverState *hd;
size_t file_sec;
uint64_t file_len;
SparseCache *sparse;
int sparse_num;
uint64_t last_end;
char longfile[2048];
} BDRVTarState;
static int str_ends(char *str, const char *end)
{
int end_len = strlen(end);
int str_len = strlen(str);
if (str_len < end_len)
return 0;
return !strncmp(str + str_len - end_len, end, end_len);
}
static int is_target_file(BlockDriverState *bs, char *filename,
char *header)
{
int retval = 0;
if (str_ends(filename, ".raw"))
retval = 1;
if (str_ends(filename, ".qcow"))
retval = 1;
if (str_ends(filename, ".qcow2"))
retval = 1;
if (str_ends(filename, ".vmdk"))
retval = 1;
if (retval &&
(header[OFFS_TYPE] != '0') &&
(header[OFFS_TYPE] != 'S')) {
retval = 0;
}
dprintf("does filename %s match? %s\n", filename, retval ? "yes" : "no");
/* make sure we're not using this name again */
filename[0] = '\0';
return retval;
}
static uint64_t tar2u64(char *ptr)
{
uint64_t retval;
char oldend = ptr[12];
ptr[12] = '\0';
if (*ptr & 0x80) {
/* XXX we only support files up to 64 bit length */
retval = be64_to_cpu(*(uint64_t *)(ptr+4));
dprintf("Convert %lx -> %#lx\n", *(uint64_t*)(ptr+4), retval);
} else {
retval = strtol(ptr, NULL, 8);
dprintf("Convert %s -> %#lx\n", ptr, retval);
}
ptr[12] = oldend;
return retval;
}
static void tar_sparse(BDRVTarState *s, uint64_t offs, uint64_t len)
{
SparseCache *sparse;
if (!len)
return;
if (!(offs - s->last_end)) {
s->last_end += len;
return;
}
if (s->last_end > offs)
return;
dprintf("Last chunk until %lx new chunk at %lx\n", s->last_end, offs);
s->sparse = g_realloc(s->sparse, (s->sparse_num + 1) * sizeof(SparseCache));
sparse = &s->sparse[s->sparse_num];
sparse->start = s->last_end;
sparse->end = offs;
s->last_end = offs + len;
s->sparse_num++;
dprintf("Sparse at %lx end=%lx\n", sparse->start,
sparse->end);
}
static QemuOptsList runtime_opts = {
.name = "tar",
.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
.desc = {
{
.name = "filename",
.type = QEMU_OPT_STRING,
.help = "URL to the tar file",
},
{ /* end of list */ }
},
};
static int tar_open(BlockDriverState *bs, QDict *options, int flags, Error **errp)
{
BDRVTarState *s = bs->opaque;
char header[SECTOR_SIZE];
char *real_file = header;
char *magic;
size_t header_offs = 0;
int ret;
QemuOpts *opts;
Error *local_err = NULL;
const char *filename;
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err != NULL) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
filename = qemu_opt_get(opts, "filename");
if (!strncmp(filename, "tar://", 6))
filename += 6;
else if (!strncmp(filename, "tar:", 4))
filename += 4;
ret = bdrv_open(&s->hd, filename, NULL, NULL, flags | BDRV_O_PROTOCOL, NULL, &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
qemu_opts_del(opts);
return ret;
}
/* Search the file for an image */
do {
/* tar header */
if (bdrv_pread(s->hd, header_offs, header, SECTOR_SIZE) != SECTOR_SIZE)
goto fail;
if ((header_offs > 1) && !header[0]) {
fprintf(stderr, "Tar: No image file found in archive\n");
goto fail;
}
magic = &header[OFFS_MAGIC];
if (strncmp(magic, POSIX_TAR_MAGIC, 5)) {
fprintf(stderr, "Tar: Invalid magic: %s\n", magic);
goto fail;
}
dprintf("file type: %c\n", header[OFFS_TYPE]);
/* file length*/
s->file_len = (tar2u64(&header[OFFS_LENGTH]) + (SECTOR_SIZE - 1)) &
~(SECTOR_SIZE - 1);
s->file_sec = (header_offs / SECTOR_SIZE) + 1;
header_offs += s->file_len + SECTOR_SIZE;
if (header[OFFS_TYPE] == 'L') {
bdrv_pread(s->hd, header_offs - s->file_len, s->longfile,
sizeof(s->longfile));
s->longfile[sizeof(s->longfile)-1] = '\0';
real_file = header;
} else if (s->longfile[0]) {
real_file = s->longfile;
} else {
real_file = header;
}
} while(!is_target_file(bs, real_file, header));
/* We found an image! */
if (header[OFFS_TYPE] == 'S') {
uint8_t isextended;
int i;
for (i = OFFS_S_SP; i < (OFFS_S_SP + (4 * 24)); i += 24)
tar_sparse(s, tar2u64(&header[i]), tar2u64(&header[i+12]));
s->file_len = tar2u64(&header[OFFS_S_LENGTH]);
isextended = header[OFFS_S_EXT];
while (isextended) {
if (bdrv_pread(s->hd, s->file_sec * SECTOR_SIZE, header,
SECTOR_SIZE) != SECTOR_SIZE)
goto fail;
for (i = 0; i < (21 * 24); i += 24)
tar_sparse(s, tar2u64(&header[i]), tar2u64(&header[i+12]));
isextended = header[OFFS_SX_EXT];
s->file_sec++;
}
tar_sparse(s, s->file_len, 1);
}
qemu_opts_del(opts);
return 0;
fail:
fprintf(stderr, "Tar: Error opening file\n");
bdrv_unref(s->hd);
qemu_opts_del(opts);
return -EINVAL;
}
typedef struct TarAIOCB {
BlockAIOCB common;
QEMUBH *bh;
} TarAIOCB;
/* This callback gets invoked when we have pure sparseness */
static void tar_sparse_cb(void *opaque)
{
TarAIOCB *acb = (TarAIOCB *)opaque;
acb->common.cb(acb->common.opaque, 0);
qemu_bh_delete(acb->bh);
qemu_aio_unref(acb);
}
static AIOCBInfo tar_aiocb_info = {
.aiocb_size = sizeof(TarAIOCB),
};
/* This is where we get a request from a caller to read something */
static BlockAIOCB *tar_aio_readv(BlockDriverState *bs,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque)
{
BDRVTarState *s = bs->opaque;
SparseCache *sparse;
int64_t sec_file = sector_num + s->file_sec;
int64_t start = sector_num * SECTOR_SIZE;
int64_t end = start + (nb_sectors * SECTOR_SIZE);
int i;
TarAIOCB *acb;
for (i = 0; i < s->sparse_num; i++) {
sparse = &s->sparse[i];
if (sparse->start > end) {
/* We expect the cache to be start increasing */
break;
} else if ((sparse->start < start) && (sparse->end <= start)) {
/* sparse before our offset */
sec_file -= (sparse->end - sparse->start) / SECTOR_SIZE;
} else if ((sparse->start <= start) && (sparse->end >= end)) {
/* all our sectors are sparse */
char *buf = g_malloc0(nb_sectors * SECTOR_SIZE);
acb = qemu_aio_get(&tar_aiocb_info, bs, cb, opaque);
qemu_iovec_from_buf(qiov, 0, buf, nb_sectors * SECTOR_SIZE);
g_free(buf);
acb->bh = qemu_bh_new(tar_sparse_cb, acb);
qemu_bh_schedule(acb->bh);
return &acb->common;
} else if (((sparse->start >= start) && (sparse->start < end)) ||
((sparse->end >= start) && (sparse->end < end))) {
/* we're semi-sparse (worst case) */
/* let's go synchronous and read all sectors individually */
char *buf = g_malloc(nb_sectors * SECTOR_SIZE);
uint64_t offs;
for (offs = 0; offs < (nb_sectors * SECTOR_SIZE);
offs += SECTOR_SIZE) {
bdrv_pread(bs, (sector_num * SECTOR_SIZE) + offs,
buf + offs, SECTOR_SIZE);
}
qemu_iovec_from_buf(qiov, 0, buf, nb_sectors * SECTOR_SIZE);
acb = qemu_aio_get(&tar_aiocb_info, bs, cb, opaque);
acb->bh = qemu_bh_new(tar_sparse_cb, acb);
qemu_bh_schedule(acb->bh);
return &acb->common;
}
}
return bdrv_aio_readv(s->hd, sec_file, qiov, nb_sectors,
cb, opaque);
}
static void tar_close(BlockDriverState *bs)
{
dprintf("Close\n");
}
static int64_t tar_getlength(BlockDriverState *bs)
{
BDRVTarState *s = bs->opaque;
dprintf("getlength -> %ld\n", s->file_len);
return s->file_len;
}
static BlockDriver bdrv_tar = {
.format_name = "tar",
.protocol_name = "tar",
.instance_size = sizeof(BDRVTarState),
.bdrv_file_open = tar_open,
.bdrv_close = tar_close,
.bdrv_getlength = tar_getlength,
.bdrv_aio_readv = tar_aio_readv,
};
static void tar_block_init(void)
{
bdrv_register(&bdrv_tar);
}
block_init(tar_block_init);

View File

@@ -502,9 +502,9 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
}
/* Disable migration when vdi images are used */
error_setg(&s->migration_blocker, "The vdi format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
error_set(&s->migration_blocker,
QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
"vdi", bdrv_get_device_name(bs), "live migration");
migrate_add_blocker(s->migration_blocker);
qemu_co_mutex_init(&s->write_lock);

View File

@@ -1002,9 +1002,9 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags,
/* TODO: differencing files */
/* Disable migration when VHDX images are used */
error_setg(&s->migration_blocker, "The vhdx format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
error_set(&s->migration_blocker,
QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
"vhdx", bdrv_get_device_name(bs), "live migration");
migrate_add_blocker(s->migration_blocker);
return 0;
@@ -1269,7 +1269,7 @@ static coroutine_fn int vhdx_co_writev(BlockDriverState *bs, int64_t sector_num,
iov1.iov_base = qemu_blockalign(bs, iov1.iov_len);
memset(iov1.iov_base, 0, iov1.iov_len);
qemu_iovec_concat_iov(&hd_qiov, &iov1, 1, 0,
iov1.iov_len);
sinfo.block_offset);
sectors_to_write += iov1.iov_len >> BDRV_SECTOR_BITS;
}
@@ -1285,7 +1285,7 @@ static coroutine_fn int vhdx_co_writev(BlockDriverState *bs, int64_t sector_num,
iov2.iov_base = qemu_blockalign(bs, iov2.iov_len);
memset(iov2.iov_base, 0, iov2.iov_len);
qemu_iovec_concat_iov(&hd_qiov, &iov2, 1, 0,
iov2.iov_len);
sinfo.block_offset);
sectors_to_write += iov2.iov_len >> BDRV_SECTOR_BITS;
}
}

View File

@@ -524,7 +524,7 @@ static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
}
ret = vmdk_add_extent(bs, file, false,
le32_to_cpu(header.disk_sectors),
(int64_t)le32_to_cpu(header.l1dir_offset) << 9,
le32_to_cpu(header.l1dir_offset) << 9,
0,
le32_to_cpu(header.l1dir_size),
4096,
@@ -670,7 +670,7 @@ static int vmdk_open_vmdk4(BlockDriverState *bs,
snprintf(buf, sizeof(buf), "VMDK version %" PRId32,
le32_to_cpu(header.version));
error_set(errp, QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
bdrv_get_device_or_node_name(bs), "vmdk", buf);
bdrv_get_device_name(bs), "vmdk", buf);
return -ENOTSUP;
} else if (le32_to_cpu(header.version) == 3 && (flags & BDRV_O_RDWR)) {
/* VMware KB 2064959 explains that version 3 added support for
@@ -963,9 +963,9 @@ static int vmdk_open(BlockDriverState *bs, QDict *options, int flags,
qemu_co_mutex_init(&s->lock);
/* Disable migration when VMDK images are used */
error_setg(&s->migration_blocker, "The vmdk format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
error_set(&s->migration_blocker,
QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
"vmdk", bdrv_get_device_name(bs), "live migration");
migrate_add_blocker(s->migration_blocker);
g_free(buf);
return 0;
@@ -1248,6 +1248,17 @@ static VmdkExtent *find_extent(BDRVVmdkState *s,
return NULL;
}
static inline uint64_t vmdk_find_index_in_cluster(VmdkExtent *extent,
int64_t sector_num)
{
uint64_t index_in_cluster, extent_begin_sector, extent_relative_sector_num;
extent_begin_sector = extent->end_sector - extent->sectors;
extent_relative_sector_num = sector_num - extent_begin_sector;
index_in_cluster = extent_relative_sector_num % extent->cluster_sectors;
return index_in_cluster;
}
static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, int *pnum)
{
@@ -1285,7 +1296,7 @@ static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
break;
}
index_in_cluster = sector_num % extent->cluster_sectors;
index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
n = extent->cluster_sectors - index_in_cluster;
if (n > nb_sectors) {
n = nb_sectors;
@@ -1413,7 +1424,6 @@ static int vmdk_read(BlockDriverState *bs, int64_t sector_num,
BDRVVmdkState *s = bs->opaque;
int ret;
uint64_t n, index_in_cluster;
uint64_t extent_begin_sector, extent_relative_sector_num;
VmdkExtent *extent = NULL;
uint64_t cluster_offset;
@@ -1425,9 +1435,7 @@ static int vmdk_read(BlockDriverState *bs, int64_t sector_num,
ret = get_cluster_offset(bs, extent, NULL,
sector_num << 9, false, &cluster_offset,
0, 0);
extent_begin_sector = extent->end_sector - extent->sectors;
extent_relative_sector_num = sector_num - extent_begin_sector;
index_in_cluster = extent_relative_sector_num % extent->cluster_sectors;
index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
n = extent->cluster_sectors - index_in_cluster;
if (n > nb_sectors) {
n = nb_sectors;
@@ -1489,7 +1497,6 @@ static int vmdk_write(BlockDriverState *bs, int64_t sector_num,
VmdkExtent *extent = NULL;
int ret;
int64_t index_in_cluster, n;
uint64_t extent_begin_sector, extent_relative_sector_num;
uint64_t cluster_offset;
VmdkMetaData m_data;
@@ -1505,9 +1512,7 @@ static int vmdk_write(BlockDriverState *bs, int64_t sector_num,
if (!extent) {
return -EIO;
}
extent_begin_sector = extent->end_sector - extent->sectors;
extent_relative_sector_num = sector_num - extent_begin_sector;
index_in_cluster = extent_relative_sector_num % extent->cluster_sectors;
index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
n = extent->cluster_sectors - index_in_cluster;
if (n > nb_sectors) {
n = nb_sectors;
@@ -1851,13 +1856,16 @@ static int vmdk_create(const char *filename, QemuOpts *opts, Error **errp)
if (qemu_opt_get_bool_del(opts, BLOCK_OPT_COMPAT6, false)) {
flags |= BLOCK_FLAG_COMPAT6;
}
if (qemu_opt_get_bool_del(opts, BLOCK_OPT_SCSI, false)) {
flags |= BLOCK_FLAG_SCSI;
}
fmt = qemu_opt_get_del(opts, BLOCK_OPT_SUBFMT);
if (qemu_opt_get_bool_del(opts, BLOCK_OPT_ZEROED_GRAIN, false)) {
zeroed_grain = true;
}
if (!adapter_type) {
adapter_type = g_strdup("ide");
adapter_type = g_strdup(flags & BLOCK_FLAG_SCSI ? "lsilogic" : "ide");
} else if (strcmp(adapter_type, "ide") &&
strcmp(adapter_type, "buslogic") &&
strcmp(adapter_type, "lsilogic") &&
@@ -2272,6 +2280,12 @@ static QemuOptsList vmdk_create_opts = {
.help = "Enable efficient zero writes "
"using the zeroed-grain GTE feature"
},
{
.name = BLOCK_OPT_SCSI,
.type = QEMU_OPT_BOOL,
.help = "SCSI image",
.def_value_str = "off"
},
{ /* end of list */ }
}
};

View File

@@ -168,6 +168,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
uint8_t buf[HEADER_SIZE];
uint32_t checksum;
uint64_t computed_size;
uint64_t pagetable_size;
int disk_type = VHD_DYNAMIC;
int ret;
@@ -269,7 +270,17 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
goto fail;
}
s->pagetable = qemu_try_blockalign(bs->file, s->max_table_entries * 4);
if (s->max_table_entries > SIZE_MAX / 4 ||
s->max_table_entries > (int) INT_MAX / 4) {
error_setg(errp, "Max Table Entries too large (%" PRId32 ")",
s->max_table_entries);
ret = -EINVAL;
goto fail;
}
pagetable_size = (uint64_t) s->max_table_entries * 4;
s->pagetable = qemu_try_blockalign(bs->file, pagetable_size);
if (s->pagetable == NULL) {
ret = -ENOMEM;
goto fail;
@@ -277,14 +288,13 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
s->bat_offset = be64_to_cpu(dyndisk_header->table_offset);
ret = bdrv_pread(bs->file, s->bat_offset, s->pagetable,
s->max_table_entries * 4);
ret = bdrv_pread(bs->file, s->bat_offset, s->pagetable, pagetable_size);
if (ret < 0) {
goto fail;
}
s->free_data_block_offset =
(s->bat_offset + (s->max_table_entries * 4) + 511) & ~511;
ROUND_UP(s->bat_offset + pagetable_size, 512);
for (i = 0; i < s->max_table_entries; i++) {
be32_to_cpus(&s->pagetable[i]);
@@ -318,9 +328,9 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
qemu_co_mutex_init(&s->lock);
/* Disable migration when VHD images are used */
error_setg(&s->migration_blocker, "The vpc format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
error_set(&s->migration_blocker,
QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
"vpc", bdrv_get_device_name(bs), "live migration");
migrate_add_blocker(s->migration_blocker);
return 0;
@@ -813,7 +823,9 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
}
} else {
total_sectors = (int64_t)cyls * heads * secs_per_cyl;
total_size = total_sectors * BDRV_SECTOR_SIZE;
if (disk_type != VHD_FIXED) {
total_size = total_sectors * BDRV_SECTOR_SIZE;
}
}
/* Prepare the Hard Disk Footer */

View File

@@ -1180,10 +1180,9 @@ static int vvfat_open(BlockDriverState *bs, QDict *options, int flags,
/* Disable migration when vvfat is used rw */
if (s->qcow) {
error_setg(&s->migration_blocker,
"The vvfat (rw) format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
error_set(&s->migration_blocker,
QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
"vvfat (rw)", bdrv_get_device_name(bs), "live migration");
migrate_add_blocker(s->migration_blocker);
}

View File

@@ -933,7 +933,7 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type)
devopts = qemu_opts_create(qemu_find_opts("device"), NULL, 0,
&error_abort);
if (arch_type == QEMU_ARCH_S390X) {
qemu_opt_set(devopts, "driver", "virtio-blk-s390", &error_abort);
qemu_opt_set(devopts, "driver", "virtio-blk-ccw", &error_abort);
} else {
qemu_opt_set(devopts, "driver", "virtio-blk-pci", &error_abort);
}
@@ -1164,68 +1164,6 @@ out_aio_context:
return NULL;
}
/**
* block_dirty_bitmap_lookup:
* Return a dirty bitmap (if present), after validating
* the node reference and bitmap names.
*
* @node: The name of the BDS node to search for bitmaps
* @name: The name of the bitmap to search for
* @pbs: Output pointer for BDS lookup, if desired. Can be NULL.
* @paio: Output pointer for aio_context acquisition, if desired. Can be NULL.
* @errp: Output pointer for error information. Can be NULL.
*
* @return: A bitmap object on success, or NULL on failure.
*/
static BdrvDirtyBitmap *block_dirty_bitmap_lookup(const char *node,
const char *name,
BlockDriverState **pbs,
AioContext **paio,
Error **errp)
{
BlockDriverState *bs;
BdrvDirtyBitmap *bitmap;
AioContext *aio_context;
if (!node) {
error_setg(errp, "Node cannot be NULL");
return NULL;
}
if (!name) {
error_setg(errp, "Bitmap name cannot be NULL");
return NULL;
}
bs = bdrv_lookup_bs(node, node, NULL);
if (!bs) {
error_setg(errp, "Node '%s' not found", node);
return NULL;
}
aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
bitmap = bdrv_find_dirty_bitmap(bs, name);
if (!bitmap) {
error_setg(errp, "Dirty bitmap '%s' not found", name);
goto fail;
}
if (pbs) {
*pbs = bs;
}
if (paio) {
*paio = aio_context;
} else {
aio_context_release(aio_context);
}
return bitmap;
fail:
aio_context_release(aio_context);
return NULL;
}
/* New and old BlockDriverState structs for atomic group operations */
typedef struct BlkTransactionState BlkTransactionState;
@@ -1310,14 +1248,13 @@ static void internal_snapshot_prepare(BlkTransactionState *common,
}
if (bdrv_is_read_only(bs)) {
error_setg(errp, "Device '%s' is read only", device);
error_set(errp, QERR_DEVICE_IS_READ_ONLY, device);
return;
}
if (!bdrv_can_snapshot(bs)) {
error_setg(errp, "Block format '%s' used by device '%s' "
"does not support internal snapshots",
bs->drv->format_name, device);
error_set(errp, QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
bs->drv->format_name, device, "internal snapshot");
return;
}
@@ -1585,7 +1522,6 @@ static void drive_backup_prepare(BlkTransactionState *common, Error **errp)
backup->sync,
backup->has_mode, backup->mode,
backup->has_speed, backup->speed,
backup->has_bitmap, backup->bitmap,
backup->has_on_source_error, backup->on_source_error,
backup->has_on_target_error, backup->on_target_error,
&local_err);
@@ -2017,102 +1953,6 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
aio_context_release(aio_context);
}
void qmp_block_dirty_bitmap_add(const char *node, const char *name,
bool has_granularity, uint32_t granularity,
Error **errp)
{
AioContext *aio_context;
BlockDriverState *bs;
if (!name || name[0] == '\0') {
error_setg(errp, "Bitmap name cannot be empty");
return;
}
bs = bdrv_lookup_bs(node, node, errp);
if (!bs) {
return;
}
aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
if (has_granularity) {
if (granularity < 512 || !is_power_of_2(granularity)) {
error_setg(errp, "Granularity must be power of 2 "
"and at least 512");
goto out;
}
} else {
/* Default to cluster size, if available: */
granularity = bdrv_get_default_bitmap_granularity(bs);
}
bdrv_create_dirty_bitmap(bs, granularity, name, errp);
out:
aio_context_release(aio_context);
}
void qmp_block_dirty_bitmap_remove(const char *node, const char *name,
Error **errp)
{
AioContext *aio_context;
BlockDriverState *bs;
BdrvDirtyBitmap *bitmap;
bitmap = block_dirty_bitmap_lookup(node, name, &bs, &aio_context, errp);
if (!bitmap || !bs) {
return;
}
if (bdrv_dirty_bitmap_frozen(bitmap)) {
error_setg(errp,
"Bitmap '%s' is currently frozen and cannot be removed",
name);
goto out;
}
bdrv_dirty_bitmap_make_anon(bitmap);
bdrv_release_dirty_bitmap(bs, bitmap);
out:
aio_context_release(aio_context);
}
/**
* Completely clear a bitmap, for the purposes of synchronizing a bitmap
* immediately after a full backup operation.
*/
void qmp_block_dirty_bitmap_clear(const char *node, const char *name,
Error **errp)
{
AioContext *aio_context;
BdrvDirtyBitmap *bitmap;
BlockDriverState *bs;
bitmap = block_dirty_bitmap_lookup(node, name, &bs, &aio_context, errp);
if (!bitmap || !bs) {
return;
}
if (bdrv_dirty_bitmap_frozen(bitmap)) {
error_setg(errp,
"Bitmap '%s' is currently frozen and cannot be modified",
name);
goto out;
} else if (!bdrv_dirty_bitmap_enabled(bitmap)) {
error_setg(errp,
"Bitmap '%s' is currently disabled and cannot be cleared",
name);
goto out;
}
bdrv_clear_dirty_bitmap(bitmap);
out:
aio_context_release(aio_context);
}
int hmp_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
{
const char *id = qdict_get_str(qdict, "id");
@@ -2215,7 +2055,7 @@ void qmp_block_resize(bool has_device, const char *device,
error_set(errp, QERR_UNSUPPORTED);
break;
case -EACCES:
error_setg(errp, "Device '%s' is read only", device);
error_set(errp, QERR_DEVICE_IS_READ_ONLY, device);
break;
case -EBUSY:
error_set(errp, QERR_DEVICE_IN_USE, device);
@@ -2430,7 +2270,6 @@ void qmp_drive_backup(const char *device, const char *target,
enum MirrorSyncMode sync,
bool has_mode, enum NewImageMode mode,
bool has_speed, int64_t speed,
bool has_bitmap, const char *bitmap,
bool has_on_source_error, BlockdevOnError on_source_error,
bool has_on_target_error, BlockdevOnError on_target_error,
Error **errp)
@@ -2439,7 +2278,6 @@ void qmp_drive_backup(const char *device, const char *target,
BlockDriverState *bs;
BlockDriverState *target_bs;
BlockDriverState *source = NULL;
BdrvDirtyBitmap *bmap = NULL;
AioContext *aio_context;
BlockDriver *drv = NULL;
Error *local_err = NULL;
@@ -2539,16 +2377,7 @@ void qmp_drive_backup(const char *device, const char *target,
bdrv_set_aio_context(target_bs, aio_context);
if (has_bitmap) {
bmap = bdrv_find_dirty_bitmap(bs, bitmap);
if (!bmap) {
error_setg(errp, "Bitmap '%s' could not be found", bitmap);
goto out;
}
}
backup_start(bs, target_bs, speed, sync, bmap,
on_source_error, on_target_error,
backup_start(bs, target_bs, speed, sync, on_source_error, on_target_error,
block_job_cb, bs, &local_err);
if (local_err != NULL) {
bdrv_unref(target_bs);
@@ -2562,7 +2391,7 @@ out:
BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
{
return bdrv_named_nodes_list(errp);
return bdrv_named_nodes_list();
}
void qmp_blockdev_backup(const char *device, const char *target,
@@ -2609,8 +2438,8 @@ void qmp_blockdev_backup(const char *device, const char *target,
bdrv_ref(target_bs);
bdrv_set_aio_context(target_bs, aio_context);
backup_start(bs, target_bs, speed, sync, NULL, on_source_error,
on_target_error, block_job_cb, bs, &local_err);
backup_start(bs, target_bs, speed, sync, on_source_error, on_target_error,
block_job_cb, bs, &local_err);
if (local_err != NULL) {
bdrv_unref(target_bs);
error_propagate(errp, local_err);
@@ -2632,6 +2461,7 @@ void qmp_drive_mirror(const char *device, const char *target,
bool has_buf_size, int64_t buf_size,
bool has_on_source_error, BlockdevOnError on_source_error,
bool has_on_target_error, BlockdevOnError on_target_error,
bool has_unmap, bool unmap,
Error **errp)
{
BlockBackend *blk;
@@ -2663,6 +2493,9 @@ void qmp_drive_mirror(const char *device, const char *target,
if (!has_buf_size) {
buf_size = DEFAULT_MIRROR_BUF_SIZE;
}
if (!has_unmap) {
unmap = true;
}
if (granularity != 0 && (granularity < 512 || granularity > 1048576 * 64)) {
error_set(errp, QERR_INVALID_PARAMETER_VALUE, "granularity",
@@ -2802,6 +2635,7 @@ void qmp_drive_mirror(const char *device, const char *target,
has_replaces ? replaces : NULL,
speed, granularity, buf_size, sync,
on_source_error, on_target_error,
unmap,
block_job_cb, bs, &local_err);
if (local_err != NULL) {
bdrv_unref(target_bs);
@@ -2870,7 +2704,7 @@ void qmp_block_job_cancel(const char *device,
force = false;
}
if (job->user_paused && !force) {
if (job->paused && !force) {
error_setg(errp, "The block job for device '%s' is currently paused",
device);
goto out;
@@ -2887,11 +2721,10 @@ void qmp_block_job_pause(const char *device, Error **errp)
AioContext *aio_context;
BlockJob *job = find_block_job(device, &aio_context, errp);
if (!job || job->user_paused) {
if (!job) {
return;
}
job->user_paused = true;
trace_qmp_block_job_pause(job);
block_job_pause(job);
aio_context_release(aio_context);
@@ -2902,11 +2735,10 @@ void qmp_block_job_resume(const char *device, Error **errp)
AioContext *aio_context;
BlockJob *job = find_block_job(device, &aio_context, errp);
if (!job || !job->user_paused) {
if (!job) {
return;
}
job->user_paused = false;
trace_qmp_block_job_resume(job);
block_job_resume(job);
aio_context_release(aio_context);

View File

@@ -107,7 +107,7 @@ void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
void block_job_complete(BlockJob *job, Error **errp)
{
if (job->pause_count || job->cancelled || !job->driver->complete) {
if (job->paused || job->cancelled || !job->driver->complete) {
error_set(errp, QERR_BLOCK_JOB_NOT_READY,
bdrv_get_device_name(job->bs));
return;
@@ -118,26 +118,17 @@ void block_job_complete(BlockJob *job, Error **errp)
void block_job_pause(BlockJob *job)
{
job->pause_count++;
job->paused = true;
}
bool block_job_is_paused(BlockJob *job)
{
return job->pause_count > 0;
return job->paused;
}
void block_job_resume(BlockJob *job)
{
assert(job->pause_count > 0);
job->pause_count--;
if (job->pause_count) {
return;
}
block_job_enter(job);
}
void block_job_enter(BlockJob *job)
{
job->paused = false;
block_job_iostatus_reset(job);
if (job->co && !job->busy) {
qemu_coroutine_enter(job->co, NULL);
@@ -147,7 +138,7 @@ void block_job_enter(BlockJob *job)
void block_job_cancel(BlockJob *job)
{
job->cancelled = true;
block_job_enter(job);
block_job_resume(job);
}
bool block_job_is_cancelled(BlockJob *job)
@@ -267,7 +258,7 @@ BlockJobInfo *block_job_query(BlockJob *job)
info->device = g_strdup(bdrv_get_device_name(job->bs));
info->len = job->len;
info->busy = job->busy;
info->paused = job->pause_count > 0;
info->paused = job->paused;
info->offset = job->offset;
info->speed = job->speed;
info->io_status = job->iostatus;
@@ -344,8 +335,6 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
IO_OPERATION_TYPE_WRITE,
action, &error_abort);
if (action == BLOCK_ERROR_ACTION_STOP) {
/* make the pause user visible, which will be resumed from QMP. */
job->user_paused = true;
block_job_pause(job);
block_job_iostatus_set_err(job, error);
if (bs != job->bs) {

View File

@@ -905,6 +905,7 @@ int main(int argc, char **argv)
#endif
}
tcg_exec_init(0);
cpu_exec_init_all();
/* NOTE: we need to init the CPU at this stage to get
qemu_host_page_size */
cpu = cpu_init(cpu_model);

137
configure vendored
View File

@@ -103,8 +103,7 @@ update_cxxflags() {
}
compile_object() {
local_cflags="$1"
do_cc $QEMU_CFLAGS $local_cflags -c -o $TMPO $TMPC
do_cc $QEMU_CFLAGS -c -o $TMPO $TMPC
}
compile_prog() {
@@ -337,7 +336,6 @@ libssh2=""
vhdx=""
quorum=""
numa=""
tcmalloc="no"
# parse CC options first
for opt do
@@ -437,12 +435,6 @@ EOF
compile_object
}
write_c_skeleton() {
cat > $TMPC <<EOF
int main(void) { return 0; }
EOF
}
if check_define __linux__ ; then
targetos="Linux"
elif check_define _WIN32 ; then
@@ -712,7 +704,9 @@ if test "$mingw32" = "yes" ; then
# enable C99/POSIX format strings (needs mingw32-runtime 3.15 or later)
QEMU_CFLAGS="-D__USE_MINGW_ANSI_STDIO=1 $QEMU_CFLAGS"
LIBS="-lwinmm -lws2_32 -liphlpapi $LIBS"
write_c_skeleton;
cat > $TMPC << EOF
int main(void) { return 0; }
EOF
if compile_prog "" "-liberty" ; then
LIBS="-liberty $LIBS"
fi
@@ -1140,10 +1134,6 @@ for opt do
;;
--enable-numa) numa="yes"
;;
--disable-tcmalloc) tcmalloc="no"
;;
--enable-tcmalloc) tcmalloc="yes"
;;
*)
echo "ERROR: unknown option $opt"
echo "Try '$0 --help' for more information"
@@ -1417,8 +1407,6 @@ Advanced options (experts only):
--enable-quorum enable quorum block filter support
--disable-numa disable libnuma support
--enable-numa enable libnuma support
--disable-tcmalloc disable tcmalloc support
--enable-tcmalloc enable tcmalloc support
NOTE: The object files are built at the place where configure is launched
EOF
@@ -1450,7 +1438,10 @@ if test -z "$werror" ; then
fi
# check that the C compiler works.
write_c_skeleton;
cat > $TMPC <<EOF
int main(void) { return 0; }
EOF
if compile_object ; then
: C compiler works ok
else
@@ -1498,20 +1489,16 @@ gcc_flags="-Wno-string-plus-int $gcc_flags"
# enable it for all configure tests. If a configure test failed due
# to -Werror this would just silently disable some features,
# so it's too error prone.
cc_has_warning_flag() {
write_c_skeleton;
cat > $TMPC << EOF
int main(void) { return 0; }
EOF
for flag in $gcc_flags; do
# Use the positive sense of the flag when testing for -Wno-wombat
# support (gcc will happily accept the -Wno- form of unknown
# warning options).
optflag="$(echo $1 | sed -e 's/^-Wno-/-W/')"
compile_prog "-Werror $optflag" ""
}
for flag in $gcc_flags; do
if cc_has_warning_flag $flag ; then
QEMU_CFLAGS="$QEMU_CFLAGS $flag"
optflag="$(echo $flag | sed -e 's/^-Wno-/-W/')"
if compile_prog "-Werror $optflag" "" ; then
QEMU_CFLAGS="$QEMU_CFLAGS $flag"
fi
done
@@ -1562,20 +1549,9 @@ if test "$static" = "yes" ; then
fi
fi
# Unconditional check for compiler __thread support
cat > $TMPC << EOF
static __thread int tls_var;
int main(void) { return tls_var; }
EOF
if ! compile_prog "-Werror" "" ; then
error_exit "Your compiler does not support the __thread specifier for " \
"Thread-Local Storage (TLS). Please upgrade to a version that does."
fi
if test "$pie" = ""; then
case "$cpu-$targetos" in
i386-Linux|x86_64-Linux|x32-Linux|i386-OpenBSD|x86_64-OpenBSD)
i386-Linux|x86_64-Linux|x32-Linux|ppc*-Linux|i386-OpenBSD|x86_64-OpenBSD)
;;
*)
pie="no"
@@ -1613,7 +1589,7 @@ EOF
fi
fi
if compile_prog "-Werror -fno-pie" "-nopie"; then
if compile_prog "-fno-pie" "-nopie"; then
CFLAGS_NOPIE="-fno-pie"
LDFLAGS_NOPIE="-nopie"
fi
@@ -1873,7 +1849,7 @@ fi
if test "$seccomp" != "no" ; then
if test "$cpu" = "i386" || test "$cpu" = "x86_64" &&
$pkg_config --atleast-version=2.1.1 libseccomp; then
$pkg_config --atleast-version=2.1.0 libseccomp; then
libs_softmmu="$libs_softmmu `$pkg_config --libs libseccomp`"
QEMU_CFLAGS="$QEMU_CFLAGS `$pkg_config --cflags libseccomp`"
seccomp="yes"
@@ -2779,7 +2755,12 @@ fi
##########################################
# glib support probe
glib_req_ver=2.22
if test "$mingw32" = yes; then
# g_poll is required in order to integrate with the glib main loop.
glib_req_ver=2.20
else
glib_req_ver=2.12
fi
glib_modules=gthread-2.0
if test "$modules" = yes; then
glib_modules="$glib_modules gmodule-2.0"
@@ -2803,18 +2784,6 @@ if ! $pkg_config --atleast-version=2.38 glib-2.0; then
glib_subprocess=no
fi
# Silence clang 3.5.0 warnings about glib attribute __alloc_size__ usage
cat > $TMPC << EOF
#include <glib.h>
int main(void) { return 0; }
EOF
if ! compile_prog "$glib_cflags -Werror" "$glib_libs" ; then
if cc_has_warning_flag "-Wno-unknown-attributes"; then
glib_cflags="-Wno-unknown-attributes $glib_cflags"
CFLAGS="-Wno-unknown-attributes $CFLAGS"
fi
fi
##########################################
# SHA command probe for modules
if test "$modules" = yes; then
@@ -3166,14 +3135,14 @@ else
fi
if test "$opengl" != "no" ; then
opengl_pkgs="gl glesv2 epoxy egl"
opengl_pkgs="gl"
if $pkg_config $opengl_pkgs x11 && test "$have_glx" = "yes"; then
opengl_cflags="$($pkg_config --cflags $opengl_pkgs) $x11_cflags"
opengl_libs="$($pkg_config --libs $opengl_pkgs) $x11_libs"
opengl=yes
else
if test "$opengl" = "yes" ; then
feature_not_found "opengl" "Please install opengl (mesa) devel pkgs: $opengl_pkgs"
feature_not_found "opengl" "Install GL devel (e.g. MESA)"
fi
opengl_cflags=""
opengl_libs=""
@@ -3361,22 +3330,6 @@ EOF
fi
fi
##########################################
# tcmalloc probe
if test "$tcmalloc" = "yes" ; then
cat > $TMPC << EOF
#include <stdlib.h>
int main(void) { malloc(1); return 0; }
EOF
if compile_prog "" "-ltcmalloc" ; then
LIBS="-ltcmalloc $LIBS"
else
feature_not_found "tcmalloc" "install gperftools devel"
fi
fi
##########################################
# signalfd probe
signalfd="no"
@@ -4205,33 +4158,6 @@ if compile_prog "" "" ; then
getauxval=yes
fi
########################################
# check if ccache is interfering with
# semantic analysis of macros
ccache_cpp2=no
cat > $TMPC << EOF
static const int Z = 1;
#define fn() ({ Z; })
#define TAUT(X) ((X) == Z)
#define PAREN(X, Y) (X == Y)
#define ID(X) (X)
int main(int argc, char *argv[])
{
int x = 0, y = 0;
x = ID(x);
x = fn();
fn();
if (PAREN(x, y)) return 0;
if (TAUT(Z)) return 0;
return 0;
}
EOF
if ! compile_object "-Werror"; then
ccache_cpp2=yes
fi
##########################################
# End of CC checks
# After here, no more $cc or $ld runs
@@ -4515,7 +4441,6 @@ echo "lzo support $lzo"
echo "snappy support $snappy"
echo "bzip2 support $bzip2"
echo "NUMA host support $numa"
echo "tcmalloc support $tcmalloc"
if test "$sdl_too_old" = "yes"; then
echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -5244,6 +5169,8 @@ case "$target_name" in
TARGET_BASE_ARCH=mips
echo "TARGET_ABI_MIPSN64=y" >> $config_target_mak
;;
tricore)
;;
moxie)
;;
or32)
@@ -5292,9 +5219,7 @@ case "$target_name" in
echo "TARGET_ABI32=y" >> $config_target_mak
;;
s390x)
gdb_xml_files="s390x-core64.xml s390-acr.xml s390-fpr.xml s390-vx.xml"
;;
tricore)
gdb_xml_files="s390x-core64.xml s390-acr.xml s390-fpr.xml"
;;
unicore32)
;;
@@ -5525,10 +5450,6 @@ if test "$numa" = "yes"; then
echo "CONFIG_NUMA=y" >> $config_host_mak
fi
if test "$ccache_cpp2" = "yes"; then
echo "export CCACHE_CPP2=y" >> $config_host_mak
fi
# build tree in object directory in case the source is not in the current directory
DIRS="tests tests/tcg tests/tcg/cris tests/tcg/lm32 tests/libqos tests/qapi-schema tests/tcg/xtensa tests/qemu-iotests"
DIRS="$DIRS fsdev"

3
cpus.c
View File

@@ -1016,7 +1016,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
qemu_cond_signal(&qemu_cpu_cond);
/* wait for initial kick-off after machine start */
while (first_cpu->stopped) {
while (QTAILQ_FIRST(&cpus)->stopped) {
qemu_cond_wait(tcg_halt_cond, &qemu_global_mutex);
/* process any pending work */
@@ -1435,7 +1435,6 @@ CpuInfoList *qmp_query_cpus(Error **errp)
info->value->CPU = cpu->cpu_index;
info->value->current = (cpu == first_cpu);
info->value->halted = cpu->halted;
info->value->qom_path = object_get_canonical_path(OBJECT(cpu));
info->value->thread_id = cpu->thread_id;
#if defined(TARGET_I386)
info->value->has_pc = true;

View File

@@ -249,9 +249,9 @@ static void tlb_add_large_page(CPUArchState *env, target_ulong vaddr,
* Called from TCG-generated code, which is under an RCU read-side
* critical section.
*/
void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
hwaddr paddr, MemTxAttrs attrs, int prot,
int mmu_idx, target_ulong size)
void tlb_set_page(CPUState *cpu, target_ulong vaddr,
hwaddr paddr, int prot,
int mmu_idx, target_ulong size)
{
CPUArchState *env = cpu->env_ptr;
MemoryRegionSection *section;
@@ -301,8 +301,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
/* refill the tlb */
env->iotlb[mmu_idx][index].addr = iotlb - vaddr;
env->iotlb[mmu_idx][index].attrs = attrs;
env->iotlb[mmu_idx][index] = iotlb - vaddr;
te->addend = addend - vaddr;
if (prot & PAGE_READ) {
te->addr_read = address;
@@ -332,17 +331,6 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
}
}
/* Add a new TLB entry, but without specifying the memory
* transaction attributes to be used.
*/
void tlb_set_page(CPUState *cpu, target_ulong vaddr,
hwaddr paddr, int prot,
int mmu_idx, target_ulong size)
{
tlb_set_page_with_attrs(cpu, vaddr, paddr, MEMTXATTRS_UNSPECIFIED,
prot, mmu_idx, size);
}
/* NOTE: this function can trigger an exception */
/* NOTE2: the returned address is not exactly the physical address: it
* is actually a ram_addr_t (in system mode; the user mode emulation
@@ -361,7 +349,7 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
(addr & TARGET_PAGE_MASK))) {
cpu_ldub_code(env1, addr);
}
pd = env1->iotlb[mmu_idx][page_index].addr & ~TARGET_PAGE_MASK;
pd = env1->iotlb[mmu_idx][page_index] & ~TARGET_PAGE_MASK;
mr = iotlb_to_region(cpu, pd);
if (memory_region_is_unassigned(mr)) {
CPUClass *cc = CPU_GET_CLASS(cpu);

View File

@@ -3,4 +3,4 @@
# We support all the 32 bit boards so need all their config
include arm-softmmu.mak
CONFIG_XLNX_ZYNQMP=y
# Currently no 64-bit specific config requirements

View File

@@ -101,4 +101,3 @@ CONFIG_ALLWINNER_A10=y
CONFIG_XIO3130=y
CONFIG_IOH3420=y
CONFIG_I82801B11=y
CONFIG_ACPI=y

View File

@@ -15,9 +15,6 @@ CONFIG_PCSPK=y
CONFIG_PCKBD=y
CONFIG_FDC=y
CONFIG_ACPI=y
CONFIG_ACPI_X86=y
CONFIG_ACPI_MEMORY_HOTPLUG=y
CONFIG_ACPI_CPU_HOTPLUG=y
CONFIG_APM=y
CONFIG_I8257=y
CONFIG_IDE_ISA=y

View File

@@ -1,3 +1,11 @@
# Default configuration for microblazeel-softmmu
include microblaze-softmmu.mak
CONFIG_PTIMER=y
CONFIG_PFLASH_CFI01=y
CONFIG_SERIAL=y
CONFIG_XILINX=y
CONFIG_XILINX_AXI=y
CONFIG_XILINX_SPI=y
CONFIG_XILINX_ETHLITE=y
CONFIG_SSI=y
CONFIG_SSI_M25P80=y

View File

@@ -15,9 +15,6 @@ CONFIG_PCSPK=y
CONFIG_PCKBD=y
CONFIG_FDC=y
CONFIG_ACPI=y
CONFIG_ACPI_X86=y
CONFIG_ACPI_MEMORY_HOTPLUG=y
CONFIG_ACPI_CPU_HOTPLUG=y
CONFIG_APM=y
CONFIG_I8257=y
CONFIG_PIIX4=y

View File

@@ -15,9 +15,6 @@ CONFIG_PCSPK=y
CONFIG_PCKBD=y
CONFIG_FDC=y
CONFIG_ACPI=y
CONFIG_ACPI_X86=y
CONFIG_ACPI_MEMORY_HOTPLUG=y
CONFIG_ACPI_CPU_HOTPLUG=y
CONFIG_APM=y
CONFIG_I8257=y
CONFIG_PIIX4=y

View File

@@ -15,9 +15,6 @@ CONFIG_PCSPK=y
CONFIG_PCKBD=y
CONFIG_FDC=y
CONFIG_ACPI=y
CONFIG_ACPI_X86=y
CONFIG_ACPI_MEMORY_HOTPLUG=y
CONFIG_ACPI_CPU_HOTPLUG=y
CONFIG_APM=y
CONFIG_I8257=y
CONFIG_PIIX4=y

View File

@@ -15,9 +15,6 @@ CONFIG_PCSPK=y
CONFIG_PCKBD=y
CONFIG_FDC=y
CONFIG_ACPI=y
CONFIG_ACPI_X86=y
CONFIG_ACPI_MEMORY_HOTPLUG=y
CONFIG_ACPI_CPU_HOTPLUG=y
CONFIG_APM=y
CONFIG_I8257=y
CONFIG_PIIX4=y

View File

@@ -36,4 +36,3 @@ CONFIG_EDU=y
CONFIG_VGA=y
CONFIG_VGA_PCI=y
CONFIG_IVSHMEM=$(CONFIG_KVM)
CONFIG_ROCKER=y

View File

@@ -15,9 +15,6 @@ CONFIG_PCSPK=y
CONFIG_PCKBD=y
CONFIG_FDC=y
CONFIG_ACPI=y
CONFIG_ACPI_X86=y
CONFIG_ACPI_MEMORY_HOTPLUG=y
CONFIG_ACPI_CPU_HOTPLUG=y
CONFIG_APM=y
CONFIG_I8257=y
CONFIG_IDE_ISA=y

View File

@@ -10,6 +10,7 @@
#include "sysemu/block-backend.h"
#include "sysemu/dma.h"
#include "trace.h"
#include "qemu/range.h"
#include "qemu/thread.h"
#include "qemu/main-loop.h"
@@ -27,8 +28,7 @@ int dma_memory_set(AddressSpace *as, dma_addr_t addr, uint8_t c, dma_addr_t len)
memset(fillbuf, c, FILLBUF_SIZE);
while (len > 0) {
l = len < FILLBUF_SIZE ? len : FILLBUF_SIZE;
error |= address_space_rw(as, addr, MEMTXATTRS_UNSPECIFIED,
fillbuf, l, true);
error |= address_space_rw(as, addr, fillbuf, l, true);
len -= l;
addr += l;
}
@@ -92,6 +92,14 @@ static void reschedule_dma(void *opaque)
dma_blk_cb(dbs, 0);
}
static void continue_after_map_failure(void *opaque)
{
DMAAIOCB *dbs = (DMAAIOCB *)opaque;
dbs->bh = qemu_bh_new(reschedule_dma, dbs);
qemu_bh_schedule(dbs->bh);
}
static void dma_blk_unmap(DMAAIOCB *dbs)
{
int i;
@@ -153,9 +161,7 @@ static void dma_blk_cb(void *opaque, int ret)
if (dbs->iov.size == 0) {
trace_dma_map_wait(dbs);
dbs->bh = aio_bh_new(blk_get_aio_context(dbs->blk),
reschedule_dma, dbs);
cpu_register_map_client(dbs->bh);
cpu_register_map_client(dbs, continue_after_map_failure);
return;
}
@@ -177,11 +183,6 @@ static void dma_aio_cancel(BlockAIOCB *acb)
if (dbs->acb) {
blk_aio_cancel_async(dbs->acb);
}
if (dbs->bh) {
cpu_unregister_map_client(dbs->bh);
qemu_bh_delete(dbs->bh);
dbs->bh = NULL;
}
}

View File

@@ -281,7 +281,7 @@ note that the other barrier may actually be in a driver that runs in
the guest!
For the purposes of pairing, smp_read_barrier_depends() and smp_rmb()
both count as read barriers. A read barrier shall pair with a write
both count as read barriers. A read barriers shall pair with a write
barrier or a full barrier; a write barrier shall pair with a read
barrier or a full barrier. A full barrier can pair with anything.
For example:
@@ -294,7 +294,7 @@ For example:
smp_rmb();
y = a;
Note that the "writing" thread is accessing the variables in the
Note that the "writing" thread are accessing the variables in the
opposite order as the "reading" thread. This is expected: stores
before the write barrier will normally match the loads after the
read barrier, and vice versa. The same is true for more than 2

View File

@@ -1,352 +0,0 @@
<!--
Copyright 2015 John Snow <jsnow@redhat.com> and Red Hat, Inc.
All rights reserved.
This file is licensed via The FreeBSD Documentation License, the full text of
which is included at the end of this document.
-->
# Dirty Bitmaps and Incremental Backup
* Dirty Bitmaps are objects that track which data needs to be backed up for the
next incremental backup.
* Dirty bitmaps can be created at any time and attached to any node
(not just complete drives.)
## Dirty Bitmap Names
* A dirty bitmap's name is unique to the node, but bitmaps attached to different
nodes can share the same name.
## Bitmap Modes
* A Bitmap can be "frozen," which means that it is currently in-use by a backup
operation and cannot be deleted, renamed, written to, reset,
etc.
## Basic QMP Usage
### Supported Commands ###
* block-dirty-bitmap-add
* block-dirty-bitmap-remove
* block-dirty-bitmap-clear
### Creation
* To create a new bitmap, enabled, on the drive with id=drive0:
```json
{ "execute": "block-dirty-bitmap-add",
"arguments": {
"node": "drive0",
"name": "bitmap0"
}
}
```
* This bitmap will have a default granularity that matches the cluster size of
its associated drive, if available, clamped to between [4KiB, 64KiB].
The current default for qcow2 is 64KiB.
* To create a new bitmap that tracks changes in 32KiB segments:
```json
{ "execute": "block-dirty-bitmap-add",
"arguments": {
"node": "drive0",
"name": "bitmap0",
"granularity": 32768
}
}
```
### Deletion
* Bitmaps that are frozen cannot be deleted.
* Deleting the bitmap does not impact any other bitmaps attached to the same
node, nor does it affect any backups already created from this node.
* Because bitmaps are only unique to the node to which they are attached,
you must specify the node/drive name here, too.
```json
{ "execute": "block-dirty-bitmap-remove",
"arguments": {
"node": "drive0",
"name": "bitmap0"
}
}
```
### Resetting
* Resetting a bitmap will clear all information it holds.
* An incremental backup created from an empty bitmap will copy no data,
as if nothing has changed.
```json
{ "execute": "block-dirty-bitmap-clear",
"arguments": {
"node": "drive0",
"name": "bitmap0"
}
}
```
## Transactions (Not yet implemented)
* Transactional commands are forthcoming in a future version,
and are not yet available for use. This section serves as
documentation of intent for their design and usage.
### Justification
Bitmaps can be safely modified when the VM is paused or halted by using
the basic QMP commands. For instance, you might perform the following actions:
1. Boot the VM in a paused state.
2. Create a full drive backup of drive0.
3. Create a new bitmap attached to drive0.
4. Resume execution of the VM.
5. Incremental backups are ready to be created.
At this point, the bitmap and drive backup would be correctly in sync,
and incremental backups made from this point forward would be correctly aligned
to the full drive backup.
This is not particularly useful if we decide we want to start incremental
backups after the VM has been running for a while, for which we will need to
perform actions such as the following:
1. Boot the VM and begin execution.
2. Using a single transaction, perform the following operations:
* Create bitmap0.
* Create a full drive backup of drive0.
3. Incremental backups are now ready to be created.
### Supported Bitmap Transactions
* block-dirty-bitmap-add
* block-dirty-bitmap-clear
The usages are identical to their respective QMP commands, but see below
for examples.
### Example: New Incremental Backup
As outlined in the justification, perhaps we want to create a new incremental
backup chain attached to a drive.
```json
{ "execute": "transaction",
"arguments": {
"actions": [
{"type": "block-dirty-bitmap-add",
"data": {"node": "drive0", "name": "bitmap0"} },
{"type": "drive-backup",
"data": {"device": "drive0", "target": "/path/to/full_backup.img",
"sync": "full", "format": "qcow2"} }
]
}
}
```
### Example: New Incremental Backup Anchor Point
Maybe we just want to create a new full backup with an existing bitmap and
want to reset the bitmap to track the new chain.
```json
{ "execute": "transaction",
"arguments": {
"actions": [
{"type": "block-dirty-bitmap-clear",
"data": {"node": "drive0", "name": "bitmap0"} },
{"type": "drive-backup",
"data": {"device": "drive0", "target": "/path/to/new_full_backup.img",
"sync": "full", "format": "qcow2"} }
]
}
}
```
## Incremental Backups
The star of the show.
**Nota Bene!** Only incremental backups of entire drives are supported for now.
So despite the fact that you can attach a bitmap to any arbitrary node, they are
only currently useful when attached to the root node. This is because
drive-backup only supports drives/devices instead of arbitrary nodes.
### Example: First Incremental Backup
1. Create a full backup and sync it to the dirty bitmap, as in the transactional
examples above; or with the VM offline, manually create a full copy and then
create a new bitmap before the VM begins execution.
* Let's assume the full backup is named 'full_backup.img'.
* Let's assume the bitmap you created is 'bitmap0' attached to 'drive0'.
2. Create a destination image for the incremental backup that utilizes the
full backup as a backing image.
* Let's assume it is named 'incremental.0.img'.
```sh
# qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
```
3. Issue the incremental backup command:
```json
{ "execute": "drive-backup",
"arguments": {
"device": "drive0",
"bitmap": "bitmap0",
"target": "incremental.0.img",
"format": "qcow2",
"sync": "dirty-bitmap",
"mode": "existing"
}
}
```
### Example: Second Incremental Backup
1. Create a new destination image for the incremental backup that points to the
previous one, e.g.: 'incremental.1.img'
```sh
# qemu-img create -f qcow2 incremental.1.img -b incremental.0.img -F qcow2
```
2. Issue a new incremental backup command. The only difference here is that we
have changed the target image below.
```json
{ "execute": "drive-backup",
"arguments": {
"device": "drive0",
"bitmap": "bitmap0",
"target": "incremental.1.img",
"format": "qcow2",
"sync": "dirty-bitmap",
"mode": "existing"
}
}
```
## Errors
* In the event of an error that occurs after a backup job is successfully
launched, either by a direct QMP command or a QMP transaction, the user
will receive a BLOCK_JOB_COMPLETE event with a failure message, accompanied
by a BLOCK_JOB_ERROR event.
* In the case of an event being cancelled, the user will receive a
BLOCK_JOB_CANCELLED event instead of a pair of COMPLETE and ERROR events.
* In either case, the incremental backup data contained within the bitmap is
safely rolled back, and the data within the bitmap is not lost. The image
file created for the failed attempt can be safely deleted.
* Once the underlying problem is fixed (e.g. more storage space is freed up),
you can simply retry the incremental backup command with the same bitmap.
### Example
1. Create a target image:
```sh
# qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
```
2. Attempt to create an incremental backup via QMP:
```json
{ "execute": "drive-backup",
"arguments": {
"device": "drive0",
"bitmap": "bitmap0",
"target": "incremental.0.img",
"format": "qcow2",
"sync": "dirty-bitmap",
"mode": "existing"
}
}
```
3. Receive an event notifying us of failure:
```json
{ "timestamp": { "seconds": 1424709442, "microseconds": 844524 },
"data": { "speed": 0, "offset": 0, "len": 67108864,
"error": "No space left on device",
"device": "drive1", "type": "backup" },
"event": "BLOCK_JOB_COMPLETED" }
```
4. Delete the failed incremental, and re-create the image.
```sh
# rm incremental.0.img
# qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
```
5. Retry the command after fixing the underlying problem,
such as freeing up space on the backup volume:
```json
{ "execute": "drive-backup",
"arguments": {
"device": "drive0",
"bitmap": "bitmap0",
"target": "incremental.0.img",
"format": "qcow2",
"sync": "dirty-bitmap",
"mode": "existing"
}
}
```
6. Receive confirmation that the job completed successfully:
```json
{ "timestamp": { "seconds": 1424709668, "microseconds": 526525 },
"data": { "device": "drive1", "type": "backup",
"speed": 0, "len": 67108864, "offset": 67108864},
"event": "BLOCK_JOB_COMPLETED" }
```
<!--
The FreeBSD Documentation License
Redistribution and use in source (Markdown) and 'compiled' forms (SGML, HTML,
PDF, PostScript, RTF and so forth) with or without modification, are permitted
provided that the following conditions are met:
Redistributions of source code (Markdown) must retain the above copyright
notice, this list of conditions and the following disclaimer of this file
unmodified.
Redistributions in compiled form (transformed to other DTDs, converted to PDF,
PostScript, RTF and other formats) must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution.
THIS DOCUMENTATION IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-->

View File

@@ -4,7 +4,9 @@ QEMU memory hotplug
This document explains how to use the memory hotplug feature in QEMU,
which is present since v2.1.0.
Guest support is required for memory hotplug to work.
Please, note that memory hotunplug is not supported yet. This means
that you're able to add memory, but you're not able to remove it.
Also, proper guest support is required for memory hotplug to work.
Basic RAM hotplug
-----------------
@@ -72,22 +74,3 @@ comes from regular RAM, 1GB is a 1GB hugepage page and 256MB is from
-device pc-dimm,id=dimm1,memdev=mem1 \
-object memory-backend-file,id=mem2,size=256M,mem-path=/mnt/hugepages-2MB \
-device pc-dimm,id=dimm2,memdev=mem2
RAM hot-unplug
---------------
In order to be able to hot unplug pc-dimm device, QEMU has to be told the ids
of pc-dimm device and memory backend object. The ids were assigned when you hot
plugged memory.
Two monitor commands are used to hot unplug memory:
- "device_del": deletes a front-end pc-dimm device
- "object_del": deletes a memory backend object
For example, assuming that the pc-dimm device with id "dimm1" exists, and its memory
backend is "mem1", the following commands tries to remove it.
(qemu) device_del dimm1
(qemu) object_del mem1

View File

@@ -1,149 +0,0 @@
Use multiple thread (de)compression in live migration
=====================================================
Copyright (C) 2015 Intel Corporation
Author: Liang Li <liang.z.li@intel.com>
This work is licensed under the terms of the GNU GPLv2 or later. See
the COPYING file in the top-level directory.
Contents:
=========
* Introduction
* When to use
* Performance
* Usage
* TODO
Introduction
============
Instead of sending the guest memory directly, this solution will
compress the RAM page before sending; after receiving, the data will
be decompressed. Using compression in live migration can help
to reduce the data transferred about 60%, this is very useful when the
bandwidth is limited, and the total migration time can also be reduced
about 70% in a typical case. In addition to this, the VM downtime can be
reduced about 50%. The benefit depends on data's compressibility in VM.
The process of compression will consume additional CPU cycles, and the
extra CPU cycles will increase the migration time. On the other hand,
the amount of data transferred will decrease; this factor can reduce
the total migration time. If the process of the compression is quick
enough, then the total migration time can be reduced, and multiple
thread compression can be used to accelerate the compression process.
The decompression speed of Zlib is at least 4 times as quick as
compression, if the source and destination CPU have equal speed,
keeping the compression thread count 4 times the decompression
thread count can avoid resource waste.
Compression level can be used to control the compression speed and the
compression ratio. High compression ratio will take more time, level 0
stands for no compression, level 1 stands for the best compression
speed, and level 9 stands for the best compression ratio. Users can
select a level number between 0 and 9.
When to use the multiple thread compression in live migration
=============================================================
Compression of data will consume extra CPU cycles; so in a system with
high overhead of CPU, avoid using this feature. When the network
bandwidth is very limited and the CPU resource is adequate, use of
multiple thread compression will be very helpful. If both the CPU and
the network bandwidth are adequate, use of multiple thread compression
can still help to reduce the migration time.
Performance
===========
Test environment:
CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
Socket Count: 2
RAM: 128G
NIC: Intel I350 (10/100/1000Mbps)
Host OS: CentOS 7 64-bit
Guest OS: RHEL 6.5 64-bit
Parameter: qemu-system-x86_64 -enable-kvm -smp 4 -m 4096
/share/ia32e_rhel6u5.qcow -monitor stdio
There is no additional application is running on the guest when doing
the test.
Speed limit: 1000Gb/s
---------------------------------------------------------------
| original | compress thread: 8
| way | decompress thread: 2
| | compression level: 1
---------------------------------------------------------------
total time(msec): | 3333 | 1833
---------------------------------------------------------------
downtime(msec): | 100 | 27
---------------------------------------------------------------
transferred ram(kB):| 363536 | 107819
---------------------------------------------------------------
throughput(mbps): | 893.73 | 482.22
---------------------------------------------------------------
total ram(kB): | 4211524 | 4211524
---------------------------------------------------------------
There is an application running on the guest which write random numbers
to RAM block areas periodically.
Speed limit: 1000Gb/s
---------------------------------------------------------------
| original | compress thread: 8
| way | decompress thread: 2
| | compression level: 1
---------------------------------------------------------------
total time(msec): | 37369 | 15989
---------------------------------------------------------------
downtime(msec): | 337 | 173
---------------------------------------------------------------
transferred ram(kB):| 4274143 | 1699824
---------------------------------------------------------------
throughput(mbps): | 936.99 | 870.95
---------------------------------------------------------------
total ram(kB): | 4211524 | 4211524
---------------------------------------------------------------
Usage
=====
1. Verify both the source and destination QEMU are able
to support the multiple thread compression migration:
{qemu} info_migrate_capabilities
{qemu} ... compress: off ...
2. Activate compression on the source:
{qemu} migrate_set_capability compress on
3. Set the compression thread count on source:
{qemu} migrate_set_parameter compress_threads 12
4. Set the compression level on the source:
{qemu} migrate_set_parameter compress_level 1
5. Set the decompression thread count on destination:
{qemu} migrate_set_parameter decompress_threads 3
6. Start outgoing migration:
{qemu} migrate -d tcp:destination.host:4444
{qemu} info migrate
Capabilities: ... compress: on
...
The following are the default settings:
compress: off
compress_threads: 8
decompress_threads: 2
compress_level: 1 (which means best speed)
So, only the first two steps are required to use the multiple
thread compression in migration. You can do more if the default
settings are not appropriate.
TODO
====
Some faster (de)compression method such as LZ4 and Quicklz can help
to reduce the CPU consumption when doing (de)compression. If using
these faster (de)compression method, less (de)compression threads
are needed when doing the migration.

View File

@@ -1,193 +1,61 @@
= How to use the QAPI code generator =
Copyright IBM Corp. 2011
Copyright (C) 2012-2015 Red Hat, Inc.
This work is licensed under the terms of the GNU GPL, version 2 or
later. See the COPYING file in the top-level directory.
== Introduction ==
QAPI is a native C API within QEMU which provides management-level
functionality to internal and external users. For external
users/processes, this interface is made available by a JSON-based wire
format for the QEMU Monitor Protocol (QMP) for controlling qemu, as
well as the QEMU Guest Agent (QGA) for communicating with the guest.
The remainder of this document uses "Client JSON Protocol" when
referring to the wire contents of a QMP or QGA connection.
functionality to internal/external users. For external
users/processes, this interface is made available by a JSON-based
QEMU Monitor protocol that is provided by the QMP server.
To map Client JSON Protocol interfaces to the native C QAPI
implementations, a JSON-based schema is used to define types and
function signatures, and a set of scripts is used to generate types,
signatures, and marshaling/dispatch code. This document will describe
how the schemas, scripts, and resulting code are used.
To map QMP-defined interfaces to the native C QAPI implementations,
a JSON-based schema is used to define types and function
signatures, and a set of scripts is used to generate types/signatures,
and marshaling/dispatch code. The QEMU Guest Agent also uses these
scripts, paired with a separate schema, to generate
marshaling/dispatch code for the guest agent server running in the
guest.
This document will describe how the schemas, scripts, and resulting
code are used.
== QMP/Guest agent schema ==
A QAPI schema file is designed to be loosely based on JSON
(http://www.ietf.org/rfc/rfc7159.txt) with changes for quoting style
and the use of comments; a QAPI schema file is then parsed by a python
code generation program. A valid QAPI schema consists of a series of
top-level expressions, with no commas between them. Where
dictionaries (JSON objects) are used, they are parsed as python
OrderedDicts so that ordering is preserved (for predictable layout of
generated C structs and parameter lists). Ordering doesn't matter
between top-level expressions or the keys within an expression, but
does matter within dictionary values for 'data' and 'returns' members
of a single expression. QAPI schema input is written using 'single
quotes' instead of JSON's "double quotes" (in contrast, Client JSON
Protocol uses no comments, and while input accepts 'single quotes' as
an extension, output is strict JSON using only "double quotes"). As
in JSON, trailing commas are not permitted in arrays or dictionaries.
Input must be ASCII (although QMP supports full Unicode strings, the
QAPI parser does not). At present, there is no place where a QAPI
schema requires the use of JSON numbers or null.
This file defines the types, commands, and events used by QMP. It should
fully describe the interface used by QMP.
Comments are allowed; anything between an unquoted # and the following
newline is ignored. Although there is not yet a documentation
generator, a form of stylized comments has developed for consistently
documenting details about an expression and when it was added to the
schema. The documentation is delimited between two lines of ##, then
the first line names the expression, an optional overview is provided,
then individual documentation about each member of 'data' is provided,
and finally, a 'Since: x.y.z' tag lists the release that introduced
the expression. Optional fields are tagged with the phrase
'#optional', often with their default value; and extensions added
after the expression was first released are also given a '(since
x.y.z)' comment. For example:
This file is designed to be loosely based on JSON although it's technically
executable Python. While dictionaries are used, they are parsed as
OrderedDicts so that ordering is preserved.
##
# @BlockStats:
#
# Statistics of a virtual block device or a block backing device.
#
# @device: #optional If the stats are for a virtual block device, the name
# corresponding to the virtual block device.
#
# @stats: A @BlockDeviceStats for the device.
#
# @parent: #optional This describes the file block device if it has one.
#
# @backing: #optional This describes the backing block device if it has one.
# (Since 2.0)
#
# Since: 0.14.0
##
{ 'struct': 'BlockStats',
'data': {'*device': 'str', 'stats': 'BlockDeviceStats',
'*parent': 'BlockStats',
'*backing': 'BlockStats'} }
There are two basic syntaxes used, type definitions and command definitions.
The schema sets up a series of types, as well as commands and events
that will use those types. Forward references are allowed: the parser
scans in two passes, where the first pass learns all type names, and
the second validates the schema and generates the code. This allows
the definition of complex structs that can have mutually recursive
types, and allows for indefinite nesting of Client JSON Protocol that
satisfies the schema. A type name should not be defined more than
once. It is permissible for the schema to contain additional types
not used by any commands or events in the Client JSON Protocol, for
the side effect of generated C code used internally.
The first syntax defines a type and is represented by a dictionary. There are
three kinds of user-defined types that are supported: complex types,
enumeration types and union types.
There are seven top-level expressions recognized by the parser:
'include', 'command', 'struct', 'enum', 'union', 'alternate', and
'event'. There are several groups of types: simple types (a number of
built-in types, such as 'int' and 'str'; as well as enumerations),
complex types (structs and two flavors of unions), and alternate types
(a choice between other types). The 'command' and 'event' expressions
can refer to existing types by name, or list an anonymous type as a
dictionary. Listing a type name inside an array refers to a
single-dimension array of that type; multi-dimension arrays are not
directly supported (although an array of a complex struct that
contains an array member is possible).
Types, commands, and events share a common namespace. Therefore,
generally speaking, type definitions should always use CamelCase for
user-defined type names, while built-in types are lowercase. Type
definitions should not end in 'Kind', as this namespace is used for
creating implicit C enums for visiting union types. Command names,
and field names within a type, should be all lower case with words
separated by a hyphen. However, some existing older commands and
complex types use underscore; when extending such expressions,
consistency is preferred over blindly avoiding underscore. Event
names should be ALL_CAPS with words separated by underscore. The
special string '**' appears for some commands that manually perform
their own type checking rather than relying on the type-safe code
produced by the qapi code generators.
Any name (command, event, type, field, or enum value) beginning with
"x-" is marked experimental, and may be withdrawn or changed
incompatibly in a future release. Downstream vendors may add
extensions; such extensions should begin with a prefix matching
"__RFQDN_" (for the reverse-fully-qualified-domain-name of the
vendor), even if the rest of the name uses dash (example:
__com.redhat_drive-mirror). Other than downstream extensions (with
leading underscore and the use of dots), all names should begin with a
letter, and contain only ASCII letters, digits, dash, and underscore.
It is okay to reuse names that match C keywords; the generator will
rename a field named "default" in the QAPI to "q_default" in the
generated C code.
In the rest of this document, usage lines are given for each
expression type, with literal strings written in lower case and
placeholders written in capitals. If a literal string includes a
prefix of '*', that key/value pair can be omitted from the expression.
For example, a usage statement that includes '*base':STRUCT-NAME
means that an expression has an optional key 'base', which if present
must have a value that forms a struct name.
=== Built-in Types ===
The following types are built-in to the parser:
'str' - arbitrary UTF-8 string
'int' - 64-bit signed integer (although the C code may place further
restrictions on acceptable range)
'number' - floating point number
'bool' - JSON value of true or false
'int8', 'int16', 'int32', 'int64' - like 'int', but enforce maximum
bit size
'uint8', 'uint16', 'uint32', 'uint64' - unsigned counterparts
'size' - like 'uint64', but allows scaled suffix from command line
visitor
Generally speaking, types definitions should always use CamelCase for the type
names. Command names should be all lower case with words separated by a hyphen.
=== Includes ===
Usage: { 'include': STRING }
The QAPI schema definitions can be modularized using the 'include' directive:
{ 'include': 'path/to/file.json' }
{ 'include': 'path/to/file.json'}
The directive is evaluated recursively, and include paths are relative to the
file using the directive. Multiple includes of the same file are
safe. No other keys should appear in the expression, and the include
value should be a string.
As a matter of style, it is a good idea to have all files be
self-contained, but at the moment, nothing prevents an included file
from making a forward reference to a type that is only introduced by
an outer file. The parser may be made stricter in the future to
prevent incomplete include files.
file using the directive. Multiple includes of the same file are safe.
=== Struct types ===
=== Complex types ===
Usage: { 'struct': STRING, 'data': DICT, '*base': STRUCT-NAME }
A complex type is a dictionary containing a single key whose value is a
dictionary. This corresponds to a struct in C or an Object in JSON. An
example of a complex type is:
A struct is a dictionary containing a single 'data' key whose
value is a dictionary. This corresponds to a struct in C or an Object
in JSON. Each value of the 'data' dictionary must be the name of a
type, or a one-element array containing a type name. An example of a
struct is:
{ 'struct': 'MyType',
{ 'type': 'MyType',
'data': { 'member1': 'str', 'member2': 'int', '*member3': 'str' } }
The use of '*' as a prefix to the name means the member is optional in
the corresponding JSON protocol usage.
The use of '*' as a prefix to the name means the member is optional.
The default initialization value of an optional argument should not be changed
between versions of QEMU unless the new default maintains backward
@@ -216,13 +84,13 @@ A structure that is used in both input and output of various commands
must consider the backwards compatibility constraints of both directions
of use.
A struct definition can specify another struct as its base.
A complex type definition can specify another complex type as its base.
In this case, the fields of the base type are included as top-level fields
of the new struct's dictionary in the Client JSON Protocol wire
format. An example definition is:
of the new complex type's dictionary in the QMP wire format. An example
definition is:
{ 'struct': 'BlockdevOptionsGenericFormat', 'data': { 'file': 'str' } }
{ 'struct': 'BlockdevOptionsGenericCOWFormat',
{ 'type': 'BlockdevOptionsGenericFormat', 'data': { 'file': 'str' } }
{ 'type': 'BlockdevOptionsGenericCOWFormat',
'base': 'BlockdevOptionsGenericFormat',
'data': { '*backing': 'str' } }
@@ -232,158 +100,97 @@ both fields like this:
{ "file": "/some/place/my-image",
"backing": "/some/place/my-backing-file" }
=== Enumeration types ===
Usage: { 'enum': STRING, 'data': ARRAY-OF-STRING }
An enumeration type is a dictionary containing a single 'data' key
whose value is a list of strings. An example enumeration is:
An enumeration type is a dictionary containing a single key whose value is a
list of strings. An example enumeration is:
{ 'enum': 'MyEnum', 'data': [ 'value1', 'value2', 'value3' ] }
Nothing prevents an empty enumeration, although it is probably not
useful. The list of strings should be lower case; if an enum name
represents multiple words, use '-' between words. The string 'max' is
not allowed as an enum value, and values should not be repeated.
The enumeration values are passed as strings over the Client JSON
Protocol, but are encoded as C enum integral values in generated code.
While the C code starts numbering at 0, it is better to use explicit
comparisons to enum values than implicit comparisons to 0; the C code
will also include a generated enum member ending in _MAX for tracking
the size of the enum, useful when using common functions for
converting between strings and enum values. Since the wire format
always passes by name, it is acceptable to reorder or add new
enumeration members in any location without breaking clients of Client
JSON Protocol; however, removing enum values would break
compatibility. For any struct that has a field that will only contain
a finite set of string values, using an enum type for that field is
better than open-coding the field to be type 'str'.
=== Union types ===
Usage: { 'union': STRING, 'data': DICT }
or: { 'union': STRING, 'data': DICT, 'base': STRUCT-NAME,
'discriminator': ENUM-MEMBER-OF-BASE }
Union types are used to let the user choose between several different data
types. A union type is defined using a dictionary as explained in the
following paragraphs.
Union types are used to let the user choose between several different
variants for an object. There are two flavors: simple (no
discriminator or base), flat (both discriminator and base). A union
type is defined using a data dictionary as explained in the following
paragraphs.
A simple union type defines a mapping from automatic discriminator
values to data types like in this example:
A simple union type defines a mapping from discriminator values to data types
like in this example:
{ 'struct': 'FileOptions', 'data': { 'filename': 'str' } }
{ 'struct': 'Qcow2Options',
{ 'type': 'FileOptions', 'data': { 'filename': 'str' } }
{ 'type': 'Qcow2Options',
'data': { 'backing-file': 'str', 'lazy-refcounts': 'bool' } }
{ 'union': 'BlockdevOptions',
'data': { 'file': 'FileOptions',
'qcow2': 'Qcow2Options' } }
In the Client JSON Protocol, a simple union is represented by a
dictionary that contains the 'type' field as a discriminator, and a
'data' field that is of the specified data type corresponding to the
discriminator value, as in these examples:
In the QMP wire format, a simple union is represented by a dictionary that
contains the 'type' field as a discriminator, and a 'data' field that is of the
specified data type corresponding to the discriminator value:
{ "type": "file", "data" : { "filename": "/some/place/my-image" } }
{ "type": "qcow2", "data" : { "backing-file": "/some/place/my-image",
"lazy-refcounts": true } }
The generated C code uses a struct containing a union. Additionally,
an implicit C enum 'NameKind' is created, corresponding to the union
'Name', for accessing the various branches of the union. No branch of
the union can be named 'max', as this would collide with the implicit
enum. The value for each branch can be of any type.
A union definition can specify a complex type as its base. In this case, the
fields of the complex type are included as top-level fields of the union
dictionary in the QMP wire format. An example definition is:
{ 'type': 'BlockdevCommonOptions', 'data': { 'readonly': 'bool' } }
{ 'union': 'BlockdevOptions',
'base': 'BlockdevCommonOptions',
'data': { 'raw': 'RawOptions',
'qcow2': 'Qcow2Options' } }
And it looks like this on the wire:
{ "type": "qcow2",
"readonly": false,
"data" : { "backing-file": "/some/place/my-image",
"lazy-refcounts": true } }
A flat union definition specifies a struct as its base, and
avoids nesting on the wire. All branches of the union must be
complex types, and the top-level fields of the union dictionary on
the wire will be combination of fields from both the base type and the
appropriate branch type (when merging two dictionaries, there must be
no keys in common). The 'discriminator' field must be the name of an
enum-typed member of the base struct.
The following example enhances the above simple union example by
adding a common field 'readonly', renaming the discriminator to
something more applicable, and reducing the number of {} required on
the wire:
Flat union types avoid the nesting on the wire. They are used whenever a
specific field of the base type is declared as the discriminator ('type' is
then no longer generated). The discriminator must be of enumeration type.
The above example can then be modified as follows:
{ 'enum': 'BlockdevDriver', 'data': [ 'raw', 'qcow2' ] }
{ 'struct': 'BlockdevCommonOptions',
{ 'type': 'BlockdevCommonOptions',
'data': { 'driver': 'BlockdevDriver', 'readonly': 'bool' } }
{ 'union': 'BlockdevOptions',
'base': 'BlockdevCommonOptions',
'discriminator': 'driver',
'data': { 'file': 'FileOptions',
'data': { 'raw': 'RawOptions',
'qcow2': 'Qcow2Options' } }
Resulting in these JSON objects:
Resulting in this JSON object:
{ "driver": "file", "readonly": true,
"filename": "/some/place/my-image" }
{ "driver": "qcow2", "readonly": false,
"backing-file": "/some/place/my-image", "lazy-refcounts": true }
Notice that in a flat union, the discriminator name is controlled by
the user, but because it must map to a base member with enum type, the
code generator can ensure that branches exist for all values of the
enum (although the order of the keys need not match the declaration of
the enum). In the resulting generated C data types, a flat union is
represented as a struct with the base member fields included directly,
and then a union of structures for each branch of the struct.
A simple union can always be re-written as a flat union where the base
class has a single member named 'type', and where each branch of the
union has a struct with a single member named 'data'. That is,
{ 'union': 'Simple', 'data': { 'one': 'str', 'two': 'int' } }
is identical on the wire to:
{ 'enum': 'Enum', 'data': ['one', 'two'] }
{ 'struct': 'Base', 'data': { 'type': 'Enum' } }
{ 'struct': 'Branch1', 'data': { 'data': 'str' } }
{ 'struct': 'Branch2', 'data': { 'data': 'int' } }
{ 'union': 'Flat': 'base': 'Base', 'discriminator': 'type',
'data': { 'one': 'Branch1', 'two': 'Branch2' } }
{ "driver": "qcow2",
"readonly": false,
"backing-file": "/some/place/my-image",
"lazy-refcounts": true }
=== Alternate types ===
A special type of unions are anonymous unions. They don't form a dictionary in
the wire format but allow the direct use of different types in their place. As
they aren't structured, they don't have any explicit discriminator but use
the (QObject) data type of their value as an implicit discriminator. This means
that they are restricted to using only one discriminator value per QObject
type. For example, you cannot have two different complex types in an anonymous
union, or two different integer types.
Usage: { 'alternate': STRING, 'data': DICT }
Anonymous unions are declared using an empty dictionary as their discriminator.
The discriminator values never appear on the wire, they are only used in the
generated C code. Anonymous unions cannot have a base type.
An alternate type is one that allows a choice between two or more JSON
data types (string, integer, number, or object, but currently not
array) on the wire. The definition is similar to a simple union type,
where each branch of the union names a QAPI type. For example:
{ 'alternate': 'BlockRef',
{ 'union': 'BlockRef',
'discriminator': {},
'data': { 'definition': 'BlockdevOptions',
'reference': 'str' } }
Just like for a simple union, an implicit C enum 'NameKind' is created
to enumerate the branches for the alternate 'Name'.
Unlike a union, the discriminator string is never passed on the wire
for the Client JSON Protocol. Instead, the value's JSON type serves
as an implicit discriminator, which in turn means that an alternate
can only express a choice between types represented differently in
JSON. If a branch is typed as the 'bool' built-in, the alternate
accepts true and false; if it is typed as any of the various numeric
built-ins, it accepts a JSON number; if it is typed as a 'str'
built-in or named enum type, it accepts a JSON string; and if it is
typed as a complex type (struct or union), it accepts a JSON object.
Two different complex types, for instance, aren't permitted, because
both are represented as a JSON object.
The example alternate declaration above allows using both of the
following example objects:
This example allows using both of the following example objects:
{ "file": "my_existing_block_device_id" }
{ "file": { "driver": "file",
@@ -393,95 +200,23 @@ following example objects:
=== Commands ===
Usage: { 'command': STRING, '*data': COMPLEX-TYPE-NAME-OR-DICT,
'*returns': TYPE-NAME-OR-DICT,
'*gen': false, '*success-response': false }
Commands are defined by using a list containing three members. The first
member is the command name, the second member is a dictionary containing
arguments, and the third member is the return type.
Commands are defined by using a dictionary containing several members,
where three members are most common. The 'command' member is a
mandatory string, and determines the "execute" value passed in a
Client JSON Protocol command exchange.
The 'data' argument maps to the "arguments" dictionary passed in as
part of a Client JSON Protocol command. The 'data' member is optional
and defaults to {} (an empty dictionary). If present, it must be the
string name of a complex type, a one-element array containing the name
of a complex type, or a dictionary that declares an anonymous type
with the same semantics as a 'struct' expression, with one exception
noted below when 'gen' is used.
The 'returns' member describes what will appear in the "return" field
of a Client JSON Protocol reply on successful completion of a command.
The member is optional from the command declaration; if absent, the
"return" field will be an empty dictionary. If 'returns' is present,
it must be the string name of a complex or built-in type, a
one-element array containing the name of a complex or built-in type,
or a dictionary that declares an anonymous type with the same
semantics as a 'struct' expression, with one exception noted below
when 'gen' is used. Although it is permitted to have the 'returns'
member name a built-in type or an array of built-in types, any command
that does this cannot be extended to return additional information in
the future; thus, new commands should strongly consider returning a
dictionary-based type or an array of dictionaries, even if the
dictionary only contains one field at the present.
All commands in Client JSON Protocol use a dictionary to report
failure, with no way to specify that in QAPI. Where the error return
is different than the usual GenericError class in order to help the
client react differently to certain error conditions, it is worth
documenting this in the comments before the command declaration.
Some example commands:
{ 'command': 'my-first-command',
'data': { 'arg1': 'str', '*arg2': 'str' } }
{ 'struct': 'MyType', 'data': { '*value': 'str' } }
{ 'command': 'my-second-command',
'returns': [ 'MyType' ] }
which would validate this Client JSON Protocol transaction:
=> { "execute": "my-first-command",
"arguments": { "arg1": "hello" } }
<= { "return": { } }
=> { "execute": "my-second-command" }
<= { "return": [ { "value": "one" }, { } ] }
In rare cases, QAPI cannot express a type-safe representation of a
corresponding Client JSON Protocol command. In these cases, if the
command expression includes the key 'gen' with boolean value false,
then the 'data' or 'returns' member that intends to bypass generated
type-safety and do its own manual validation should use an inline
dictionary definition, with a value of '**' rather than a valid type
name for the keys that the generated code will not validate. Please
try to avoid adding new commands that rely on this, and instead use
type-safe unions. For an example of bypass usage:
{ 'command': 'netdev_add',
'data': {'type': 'str', 'id': 'str', '*props': '**'},
'gen': false }
Normally, the QAPI schema is used to describe synchronous exchanges,
where a response is expected. But in some cases, the action of a
command is expected to change state in a way that a successful
response is not possible (although the command will still return a
normal dictionary error on failure). When a successful reply is not
possible, the command expression should include the optional key
'success-response' with boolean value false. So far, only QGA makes
use of this field.
An example command is:
{ 'command': 'my-command',
'data': { 'arg1': 'str', '*arg2': 'str' },
'returns': 'str' }
=== Events ===
Usage: { 'event': STRING, '*data': COMPLEX-TYPE-NAME-OR-DICT }
Events are defined with the keyword 'event'. It is not allowed to
name an event 'MAX', since the generator also produces a C enumeration
of all event names with a generated _MAX value at the end. When
'data' is also specified, additional info will be included in the
event, with similar semantics to a 'struct' expression. Finally there
will be C API generated in qapi-event.h; when called by QEMU code, a
message with timestamp will be emitted on the wire.
Events are defined with the keyword 'event'. When 'data' is also specified,
additional info will be included in the event. Finally there will be C API
generated in qapi-event.h; when called by QEMU code, a message with timestamp
will be emitted on the wire. If timestamp is -1, it means failure to retrieve
host time.
An example event is:
@@ -499,9 +234,9 @@ Resulting in this JSON object:
Schemas are fed into 3 scripts to generate all the code/files that, paired
with the core QAPI libraries, comprise everything required to take JSON
commands read in by a Client JSON Protocol server, unmarshal the arguments into
commands read in by a QMP/guest agent server, unmarshal the arguments into
the underlying C types, call into the corresponding C function, and map the
response back to a Client JSON Protocol response to be returned to the user.
response back to a QMP/guest agent response to be returned to the user.
As an example, we'll use the following schema, which describes a single
complex user-defined type (which will produce a C struct, along with a list
@@ -510,7 +245,7 @@ case we want to accept/return a list of this type with a command), and a
command which takes that type as a parameter and returns the same type:
$ cat example-schema.json
{ 'struct': 'UserDefOne',
{ 'type': 'UserDefOne',
'data': { 'integer': 'int', 'string': 'str' } }
{ 'command': 'my-command',
@@ -536,7 +271,7 @@ created code.
Example:
$ python scripts/qapi-types.py --output-dir="qapi-generated" \
--prefix="example-" example-schema.json
--prefix="example-" --input-file=example-schema.json
$ cat qapi-generated/example-qapi-types.c
[Uninteresting stuff omitted...]
@@ -576,7 +311,7 @@ Example:
#ifndef EXAMPLE_QAPI_TYPES_H
#define EXAMPLE_QAPI_TYPES_H
[Built-in types omitted...]
[Builtin types omitted...]
typedef struct UserDefOne UserDefOne;
@@ -589,7 +324,7 @@ Example:
struct UserDefOneList *next;
} UserDefOneList;
[Functions on built-in types omitted...]
[Functions on builtin types omitted...]
struct UserDefOne
{
@@ -623,7 +358,7 @@ $(prefix)qapi-visit.h: declarations for previously mentioned visitor
Example:
$ python scripts/qapi-visit.py --output-dir="qapi-generated"
--prefix="example-" example-schema.json
--prefix="example-" --input-file=example-schema.json
$ cat qapi-generated/example-qapi-visit.c
[Uninteresting stuff omitted...]
@@ -681,14 +416,14 @@ Example:
error_propagate(errp, err);
}
$ python scripts/qapi-commands.py --output-dir="qapi-generated" \
--prefix="example-" example-schema.json
--prefix="example-" --input-file=example-schema.json
$ cat qapi-generated/example-qapi-visit.h
[Uninteresting stuff omitted...]
#ifndef EXAMPLE_QAPI_VISIT_H
#define EXAMPLE_QAPI_VISIT_H
[Visitors for built-in types omitted...]
[Visitors for builtin types omitted...]
void visit_type_UserDefOne(Visitor *m, UserDefOne **obj, const char *name, Error **errp);
void visit_type_UserDefOneList(Visitor *m, UserDefOneList **obj, const char *name, Error **errp);
@@ -715,7 +450,7 @@ $(prefix)qmp-commands.h: Function prototypes for the QMP commands
Example:
$ python scripts/qapi-commands.py --output-dir="qapi-generated"
--prefix="example-" example-schema.json
--prefix="example-" --input-file=example-schema.json
$ cat qapi-generated/example-qmp-marshal.c
[Uninteresting stuff omitted...]
@@ -806,7 +541,7 @@ $(prefix)qapi-event.c - Implementation of functions to send an event
Example:
$ python scripts/qapi-event.py --output-dir="qapi-generated"
--prefix="example-" example-schema.json
--prefix="example-" --input-file=example-schema.json
$ cat qapi-generated/example-qapi-event.c
[Uninteresting stuff omitted...]

View File

@@ -31,27 +31,21 @@ Example:
BLOCK_IMAGE_CORRUPTED
---------------------
Emitted when a disk image is being marked corrupt. The image can be
identified by its device or node name. The 'device' field is always
present for compatibility reasons, but it can be empty ("") if the
image does not have a device name associated.
Emitted when a disk image is being marked corrupt.
Data:
- "device": Device name (json-string)
- "node-name": Node name (json-string, optional)
- "msg": Informative message (e.g., reason for the corruption)
(json-string)
- "offset": If the corruption resulted from an image access, this
is the host's access offset into the image
(json-int, optional)
- "size": If the corruption resulted from an image access, this
is the access size (json-int, optional)
- "device": Device name (json-string)
- "msg": Informative message (e.g., reason for the corruption) (json-string)
- "offset": If the corruption resulted from an image access, this is the access
offset into the image (json-int)
- "size": If the corruption resulted from an image access, this is the access
size (json-int)
Example:
{ "event": "BLOCK_IMAGE_CORRUPTED",
"data": { "device": "ide0-hd0", "node-name": "node0",
"data": { "device": "ide0-hd0",
"msg": "Prevented active L1 table overwrite", "offset": 196608,
"size": 65536 },
"timestamp": { "seconds": 1378126126, "microseconds": 966463 } }
@@ -232,23 +226,6 @@ Example:
{ "event": "GUEST_PANICKED",
"data": { "action": "pause" } }
MEM_UNPLUG_ERROR
--------------------
Emitted when memory hot unplug error occurs.
Data:
- "device": device name (json-string)
- "msg": Informative message (e.g., reason for the error) (json-string)
Example:
{ "event": "MEM_UNPLUG_ERROR"
"data": { "device": "dimm1",
"msg": "acpi: device unplug for unsupported device"
},
"timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
NIC_RX_FILTER_CHANGED
---------------------

View File

@@ -1,21 +1,10 @@
QEMU Machine Protocol Specification
0. About This Document
======================
Copyright (C) 2009-2015 Red Hat, Inc.
This work is licensed under the terms of the GNU GPL, version 2 or
later. See the COPYING file in the top-level directory.
1. Introduction
===============
This document specifies the QEMU Machine Protocol (QMP), a JSON-based
protocol which is available for applications to operate QEMU at the
machine-level. It is also in use by the QEMU Guest Agent (QGA), which
is available for host applications to interact with the guest
operating system.
This document specifies the QEMU Machine Protocol (QMP), a JSON-based protocol
which is available for applications to operate QEMU at the machine-level.
2. Protocol Specification
=========================
@@ -29,27 +18,14 @@ following format:
json-DATA-STRUCTURE-NAME
Where DATA-STRUCTURE-NAME is any valid JSON data structure, as defined
by the JSON standard:
Where DATA-STRUCTURE-NAME is any valid JSON data structure, as defined by
the JSON standard:
http://www.ietf.org/rfc/rfc7159.txt
http://www.ietf.org/rfc/rfc4627.txt
The protocol is always encoded in UTF-8 except for synchronization
bytes (documented below); although thanks to json-string escape
sequences, the server will reply using only the strict ASCII subset.
For convenience, json-object members mentioned in this document will
be in a certain order. However, in real protocol usage they can be in
ANY order, thus no particular order should be assumed. On the other
hand, use of json-array elements presumes that preserving order is
important unless specifically documented otherwise. Repeating a key
within a json-object gives unpredictable results.
Also for convenience, the server will accept an extension of
'single-quoted' strings in place of the usual "double-quoted"
json-string, and both input forms of strings understand an additional
escape sequence of "\'" for a single quote. The server will only use
double quoting on output.
For convenience, json-object members and json-array elements mentioned in
this document will be in a certain order. However, in real protocol usage
they can be in ANY order, thus no particular order should be assumed.
2.1 General Definitions
-----------------------
@@ -76,16 +52,7 @@ The greeting message format is:
- The "version" member contains the Server's version information (the format
is the same of the query-version command)
- The "capabilities" member specify the availability of features beyond the
baseline specification; the order of elements in this array has no
particular significance, so a client must search the entire array
when looking for a particular capability
2.2.1 Capabilities
------------------
As of the date this document was last revised, no server or client
capability strings have been defined.
baseline specification
2.3 Issuing Commands
--------------------
@@ -98,14 +65,10 @@ The format for command execution is:
- The "execute" member identifies the command to be executed by the Server
- The "arguments" member is used to pass any arguments required for the
execution of the command, it is optional when no arguments are
required. Each command documents what contents will be considered
valid when handling the json-argument
execution of the command, it is optional when no arguments are required
- The "id" member is a transaction identification associated with the
command execution, it is optional and will be part of the response if
provided. The "id" member can be any json-value, although most
clients merely use a json-number incremented for each successive
command
provided
2.4 Commands Responses
----------------------
@@ -118,15 +81,13 @@ of a command execution: success or error.
The format of a success response is:
{ "return": json-value, "id": json-value }
{ "return": json-object, "id": json-value }
Where,
- The "return" member contains the data returned by the command, which
is defined on a per-command basis (usually a json-object or
json-array of json-objects, but sometimes a json-number, json-string,
or json-array of json-strings); it is an empty json-object if the
command does not return data
- The "return" member contains the command returned data, which is defined
in a per-command basis or an empty json-object if the command does not
return data
- The "id" member contains the transaction identification associated
with the command execution if issued by the Client
@@ -153,8 +114,7 @@ if provided by the client.
-----------------------
As a result of state changes, the Server may send messages unilaterally
to the Client at any time, when not in the middle of any other
response. They are called "asynchronous events".
to the Client at any time. They are called "asynchronous events".
The format of asynchronous events is:
@@ -166,27 +126,13 @@ The format of asynchronous events is:
- The "event" member contains the event's name
- The "data" member contains event specific data, which is defined in a
per-event basis, it is optional
- The "timestamp" member contains the exact time of when the event
occurred in the Server. It is a fixed json-object with time in
seconds and microseconds relative to the Unix Epoch (1 Jan 1970); if
there is a failure to retrieve host time, both members of the
timestamp will be set to -1.
- The "timestamp" member contains the exact time of when the event occurred
in the Server. It is a fixed json-object with time in seconds and
microseconds
For a listing of supported asynchronous events, please, refer to the
qmp-events.txt file.
2.5 QGA Synchronization
-----------------------
When using QGA, an additional synchronization feature is built into
the protocol. If the Client sends a raw 0xFF sentinel byte (not valid
JSON), then the Server will reset its state and discard all pending
data prior to the sentinel. Conversely, if the Client makes use of
the 'guest-sync-delimited' command, the Server will send a raw 0xFF
sentinel byte prior to its response, to aid the Client in discarding
any data prior to the sentinel.
3. QMP Examples
===============
@@ -199,37 +145,32 @@ This section provides some examples of real QMP usage, in all of them
S: { "QMP": { "version": { "qemu": { "micro": 50, "minor": 6, "major": 1 },
"package": ""}, "capabilities": []}}
3.2 Client QMP negotiation
--------------------------
C: { "execute": "qmp_capabilities" }
S: { "return": {}}
3.3 Simple 'stop' execution
3.2 Simple 'stop' execution
---------------------------
C: { "execute": "stop" }
S: { "return": {} }
3.4 KVM information
3.3 KVM information
-------------------
C: { "execute": "query-kvm", "id": "example" }
S: { "return": { "enabled": true, "present": true }, "id": "example"}
3.5 Parsing error
3.4 Parsing error
------------------
C: { "execute": }
S: { "error": { "class": "GenericError", "desc": "Invalid JSON syntax" } }
3.6 Powerdown event
3.5 Powerdown event
-------------------
S: { "timestamp": { "seconds": 1258551470, "microseconds": 802384 },
"event": "POWERDOWN" }
4. Capabilities Negotiation
===========================
----------------------------
When a Client successfully establishes a connection, the Server is in
Capabilities Negotiation mode.
@@ -248,7 +189,7 @@ effect, all commands (except qmp_capabilities) are allowed and asynchronous
messages are delivered.
5 Compatibility Considerations
==============================
------------------------------
All protocol changes or new features which modify the protocol format in an
incompatible way are disabled by default and will be advertised by the
@@ -272,16 +213,12 @@ However, Clients must not assume any particular:
- Amount of errors generated by a command, that is, new errors can be added
to any existing command in newer versions of the Server
Any command or field name beginning with "x-" is deemed experimental,
and may be withdrawn or changed in an incompatible manner in a future
release.
Of course, the Server does guarantee to send valid JSON. But apart from
this, a Client should be "conservative in what they send, and liberal in
what they accept".
6. Downstream extension of QMP
==============================
------------------------------
We recommend that downstream consumers of QEMU do *not* modify QMP.
Management tools should be able to support both upstream and downstream

View File

@@ -2,7 +2,7 @@ QEMU<->ACPI BIOS memory hotplug interface
--------------------------------------
ACPI BIOS GPE.3 handler is dedicated for notifying OS about memory hot-add
and hot-remove events.
events.
Memory hot-plug interface (IO port 0xa00-0xa17, 1-4 byte access):
---------------------------------------------------------------
@@ -19,9 +19,7 @@ Memory hot-plug interface (IO port 0xa00-0xa17, 1-4 byte access):
1: Device insert event, used to distinguish device for which
no device check event to OSPM was issued.
It's valid only when bit 1 is set.
2: Device remove event, used to distinguish device for which
no device eject request to OSPM was issued.
3-7: reserved and should be ignored by OSPM
2-7: reserved and should be ignored by OSPM
[0x15-0x17] reserved
write access:
@@ -33,62 +31,14 @@ Memory hot-plug interface (IO port 0xa00-0xa17, 1-4 byte access):
[0xc-0x13] reserved, writes into it are ignored
[0x14] Memory device control fields
bits:
0: reserved, OSPM must clear it before writing to register.
Due to BUG in versions prior 2.4 that field isn't cleared
when other fields are written. Keep it reserved and don't
try to reuse it.
0: reserved, OSPM must clear it before writing to register
1: if set to 1 clears device insert event, set by OSPM
after it has emitted device check event for the
selected memory device
2: if set to 1 clears device remove event, set by OSPM
after it has emitted device eject request for the
selected memory device
3: if set to 1 initiates device eject, set by OSPM when it
triggers memory device removal and calls _EJ0 method
4-7: reserved, OSPM must clear them before writing to register
2-7: reserved, OSPM must clear them before writing to register
Selecting memory device slot beyond present range has no effect on platform:
- write accesses to memory hot-plug registers not documented above are
ignored
- read accesses to memory hot-plug registers not documented above return
all bits set to 1.
Memory hot remove process diagram:
----------------------------------
+-------------++-----------------------++------------------+
|1.QEMU||2.QEMU||3.QEMU|
|device_del+---->+deviceunplugrequest+----->+SendSCItoguest,|
|||cb||returncontrolto|
+-------------++-----------------------+|management|
+------------------+
+---------------------------------------------------------------------+
+---------------------++-------------------------+
|OSPM:|removeevent|OSPM:|
|sendEjectRequest,||Scanmemorydevices|
|clearremoveevent+<-------------+foreventflags|
||||
+---------------------++-------------------------+
|
|
+---------v--------++-----------------------+
|GuestOS:|success|OSPM:|
|processEjection+----------->+Execute_EJ0method, |
|request ||setejectbitinflags|
+------------------++-----------------------+
|failure|
vv
+------------------------++-----------------------+
|OSPM:||QEMU:|
|setOSTevent&status||calldeviceunplugcb|
|fields|||
+------------------------++-----------------------+
||
vv
+------------------++-------------------+
|QEMU:||QEMU:|
|SendOSTQMPevent||Senddevicedeleted|
|||QMPevent|
+------------------+||
+-------------------+

View File

@@ -45,7 +45,6 @@ PCI devices (other than virtio):
1b36:0003 PCI Dual-port 16550A adapter (docs/specs/pci-serial.txt)
1b36:0004 PCI Quad-port 16550A adapter (docs/specs/pci-serial.txt)
1b36:0005 PCI test device (docs/specs/pci-testdev.txt)
1b36:0006 PCI Rocker Ethernet switch device
1b36:0007 PCI SD Card Host Controller Interface (SDHCI)
All these devices are documented in docs/specs.

File diff suppressed because it is too large Load Diff

View File

@@ -127,11 +127,6 @@ in the ancillary data:
If Master is unable to send the full message or receives a wrong reply it will
close the connection. An optional reconnection mechanism can be implemented.
Multi queue support
-------------------
The protocol supports multiple queues by setting all index fields in the sent
messages to a properly calculated value.
Message types
-------------

546
exec.c

File diff suppressed because it is too large Load Diff

View File

@@ -9,6 +9,13 @@
* the COPYING file in the top-level directory.
*/
/* work around a broken sys/capability.h */
#if defined(__i386__)
typedef unsigned long long __u64;
#endif
#if defined(__powerpc64__)
#include <asm/types.h>
#endif
#include <sys/resource.h>
#include <getopt.h>
#include <syslog.h>

View File

@@ -1,59 +0,0 @@
<?xml version="1.0"?>
<!-- Copyright (C) 2010-2014 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved. -->
<!DOCTYPE feature SYSTEM "gdb-target.dtd">
<feature name="org.gnu.gdb.s390.vx">
<vector id="v4f" type="ieee_single" count="4"/>
<vector id="v2d" type="ieee_double" count="2"/>
<vector id="v16i8" type="int8" count="16"/>
<vector id="v8i16" type="int16" count="8"/>
<vector id="v4i32" type="int32" count="4"/>
<vector id="v2i64" type="int64" count="2"/>
<union id="vec128">
<field name="v4_float" type="v4f"/>
<field name="v2_double" type="v2d"/>
<field name="v16_int8" type="v16i8"/>
<field name="v8_int16" type="v8i16"/>
<field name="v4_int32" type="v4i32"/>
<field name="v2_int64" type="v2i64"/>
<field name="uint128" type="uint128"/>
</union>
<reg name="v0l" bitsize="64" type="uint64"/>
<reg name="v1l" bitsize="64" type="uint64"/>
<reg name="v2l" bitsize="64" type="uint64"/>
<reg name="v3l" bitsize="64" type="uint64"/>
<reg name="v4l" bitsize="64" type="uint64"/>
<reg name="v5l" bitsize="64" type="uint64"/>
<reg name="v6l" bitsize="64" type="uint64"/>
<reg name="v7l" bitsize="64" type="uint64"/>
<reg name="v8l" bitsize="64" type="uint64"/>
<reg name="v9l" bitsize="64" type="uint64"/>
<reg name="v10l" bitsize="64" type="uint64"/>
<reg name="v11l" bitsize="64" type="uint64"/>
<reg name="v12l" bitsize="64" type="uint64"/>
<reg name="v13l" bitsize="64" type="uint64"/>
<reg name="v14l" bitsize="64" type="uint64"/>
<reg name="v15l" bitsize="64" type="uint64"/>
<reg name="v16" bitsize="128" type="vec128"/>
<reg name="v17" bitsize="128" type="vec128"/>
<reg name="v18" bitsize="128" type="vec128"/>
<reg name="v19" bitsize="128" type="vec128"/>
<reg name="v20" bitsize="128" type="vec128"/>
<reg name="v21" bitsize="128" type="vec128"/>
<reg name="v22" bitsize="128" type="vec128"/>
<reg name="v23" bitsize="128" type="vec128"/>
<reg name="v24" bitsize="128" type="vec128"/>
<reg name="v25" bitsize="128" type="vec128"/>
<reg name="v26" bitsize="128" type="vec128"/>
<reg name="v27" bitsize="128" type="vec128"/>
<reg name="v28" bitsize="128" type="vec128"/>
<reg name="v29" bitsize="128" type="vec128"/>
<reg name="v30" bitsize="128" type="vec128"/>
<reg name="v31" bitsize="128" type="vec128"/>
</feature>

View File

@@ -41,12 +41,6 @@
#include "qemu/sockets.h"
#include "sysemu/kvm.h"
#ifdef CONFIG_USER_ONLY
#define GDB_ATTACHED "0"
#else
#define GDB_ATTACHED "1"
#endif
static inline int target_memory_rw_debug(CPUState *cpu, target_ulong addr,
uint8_t *buf, int len, bool is_write)
{
@@ -775,14 +769,6 @@ static CPUState *find_cpu(uint32_t thread_id)
return NULL;
}
static int is_query_packet(const char *p, const char *query, char separator)
{
unsigned int query_len = strlen(query);
return strncmp(p, query, query_len) == 0 &&
(p[query_len] == '\0' || p[query_len] == separator);
}
static int gdb_handle_packet(GDBState *s, const char *line_buf)
{
CPUState *cpu;
@@ -888,9 +874,11 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf)
goto unknown_command;
}
case 'k':
#ifdef CONFIG_USER_ONLY
/* Kill the target */
fprintf(stderr, "\nQEMU: Terminated via GDBstub\n");
exit(0);
#endif
case 'D':
/* Detach packet */
gdb_breakpoint_remove_all();
@@ -1074,7 +1062,7 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf)
SSTEP_NOTIMER);
put_packet(s, buf);
break;
} else if (is_query_packet(p, "qemu.sstep", '=')) {
} else if (strncmp(p,"qemu.sstep",10) == 0) {
/* Display or change the sstep_flags */
p += 10;
if (*p != '=') {
@@ -1119,7 +1107,7 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf)
break;
}
#ifdef CONFIG_USER_ONLY
else if (strcmp(p, "Offsets") == 0) {
else if (strncmp(p, "Offsets", 7) == 0) {
TaskState *ts = s->c_cpu->opaque;
snprintf(buf, sizeof(buf),
@@ -1147,7 +1135,7 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf)
break;
}
#endif /* !CONFIG_USER_ONLY */
if (is_query_packet(p, "Supported", ':')) {
if (strncmp(p, "Supported", 9) == 0) {
snprintf(buf, sizeof(buf), "PacketSize=%x", MAX_PACKET_LENGTH);
cc = CPU_GET_CLASS(first_cpu);
if (cc->gdb_core_xml_file != NULL) {
@@ -1199,10 +1187,6 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf)
put_packet_binary(s, buf, len + 1);
break;
}
if (is_query_packet(p, "Attached", ':')) {
put_packet(s, GDB_ATTACHED);
break;
}
/* Unrecognised 'q' command. */
goto unknown_command;

View File

@@ -990,21 +990,6 @@ STEXI
@item migrate_set_capability @var{capability} @var{state}
@findex migrate_set_capability
Enable/Disable the usage of a capability @var{capability} for migration.
ETEXI
{
.name = "migrate_set_parameter",
.args_type = "parameter:s,value:i",
.params = "parameter value",
.help = "Set the parameter for migration",
.mhandler.cmd = hmp_migrate_set_parameter,
.command_completion = migrate_set_parameter_completion,
},
STEXI
@item migrate_set_parameter @var{parameter} @var{value}
@findex migrate_set_parameter
Set the parameter @var{parameter} for migration.
ETEXI
{
@@ -1013,7 +998,8 @@ ETEXI
.params = "protocol hostname port tls-port cert-subject",
.help = "send migration info to spice/vnc client",
.user_print = monitor_user_noop,
.mhandler.cmd_new = client_migrate_info,
.mhandler.cmd_async = client_migrate_info,
.flags = MONITOR_CMD_ASYNC,
},
STEXI
@@ -1776,8 +1762,6 @@ show user network stack connection states
show migration status
@item info migrate_capabilities
show current migration capabilities
@item info migrate_parameters
show current migration parameters
@item info migrate_cache_size
show current migration XBZRLE cache size
@item info balloon

101
hmp.c
View File

@@ -60,7 +60,7 @@ void hmp_info_version(Monitor *mon, const QDict *qdict)
info = qmp_query_version(NULL);
monitor_printf(mon, "%" PRId64 ".%" PRId64 ".%" PRId64 "%s\n",
info->qemu->major, info->qemu->minor, info->qemu->micro,
info->qemu.major, info->qemu.minor, info->qemu.micro,
info->package);
qapi_free_VersionInfo(info);
@@ -252,29 +252,6 @@ void hmp_info_migrate_capabilities(Monitor *mon, const QDict *qdict)
qapi_free_MigrationCapabilityStatusList(caps);
}
void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
{
MigrationParameters *params;
params = qmp_query_migrate_parameters(NULL);
if (params) {
monitor_printf(mon, "parameters:");
monitor_printf(mon, " %s: %" PRId64,
MigrationParameter_lookup[MIGRATION_PARAMETER_COMPRESS_LEVEL],
params->compress_level);
monitor_printf(mon, " %s: %" PRId64,
MigrationParameter_lookup[MIGRATION_PARAMETER_COMPRESS_THREADS],
params->compress_threads);
monitor_printf(mon, " %s: %" PRId64,
MigrationParameter_lookup[MIGRATION_PARAMETER_DECOMPRESS_THREADS],
params->decompress_threads);
monitor_printf(mon, "\n");
}
qapi_free_MigrationParameters(params);
}
void hmp_info_migrate_cache_size(Monitor *mon, const QDict *qdict)
{
monitor_printf(mon, "xbzrel cache size: %" PRId64 " kbytes\n",
@@ -414,7 +391,8 @@ static void print_block_info(Monitor *mon, BlockInfo *info,
inserted->iops_size);
}
if (verbose) {
/* TODO: inserted->image should never be null */
if (verbose && inserted->image) {
monitor_printf(mon, "\nImages:\n");
image_info = inserted->image;
while (1) {
@@ -671,14 +649,14 @@ static void hmp_info_pci_device(Monitor *mon, const PciDeviceInfo *dev)
dev->slot, dev->function);
monitor_printf(mon, " ");
if (dev->class_info->has_desc) {
monitor_printf(mon, "%s", dev->class_info->desc);
if (dev->class_info.has_desc) {
monitor_printf(mon, "%s", dev->class_info.desc);
} else {
monitor_printf(mon, "Class %04" PRId64, dev->class_info->q_class);
monitor_printf(mon, "Class %04" PRId64, dev->class_info.q_class);
}
monitor_printf(mon, ": PCI device %04" PRIx64 ":%04" PRIx64 "\n",
dev->id->vendor, dev->id->device);
dev->id.vendor, dev->id.device);
if (dev->has_irq) {
monitor_printf(mon, " IRQ %" PRId64 ".\n", dev->irq);
@@ -686,25 +664,25 @@ static void hmp_info_pci_device(Monitor *mon, const PciDeviceInfo *dev)
if (dev->has_pci_bridge) {
monitor_printf(mon, " BUS %" PRId64 ".\n",
dev->pci_bridge->bus->number);
dev->pci_bridge->bus.number);
monitor_printf(mon, " secondary bus %" PRId64 ".\n",
dev->pci_bridge->bus->secondary);
dev->pci_bridge->bus.secondary);
monitor_printf(mon, " subordinate bus %" PRId64 ".\n",
dev->pci_bridge->bus->subordinate);
dev->pci_bridge->bus.subordinate);
monitor_printf(mon, " IO range [0x%04"PRIx64", 0x%04"PRIx64"]\n",
dev->pci_bridge->bus->io_range->base,
dev->pci_bridge->bus->io_range->limit);
dev->pci_bridge->bus.io_range->base,
dev->pci_bridge->bus.io_range->limit);
monitor_printf(mon,
" memory range [0x%08"PRIx64", 0x%08"PRIx64"]\n",
dev->pci_bridge->bus->memory_range->base,
dev->pci_bridge->bus->memory_range->limit);
dev->pci_bridge->bus.memory_range->base,
dev->pci_bridge->bus.memory_range->limit);
monitor_printf(mon, " prefetchable memory range "
"[0x%08"PRIx64", 0x%08"PRIx64"]\n",
dev->pci_bridge->bus->prefetchable_range->base,
dev->pci_bridge->bus->prefetchable_range->limit);
dev->pci_bridge->bus.prefetchable_range->base,
dev->pci_bridge->bus.prefetchable_range->limit);
}
for (region = dev->regions; region; region = region->next) {
@@ -1056,7 +1034,7 @@ void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
false, NULL, false, NULL,
full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
true, mode, false, 0, false, 0, false, 0,
false, 0, false, 0, &err);
false, 0, false, 0, false, true, &err);
hmp_handle_error(mon, &err);
}
@@ -1084,8 +1062,7 @@ void hmp_drive_backup(Monitor *mon, const QDict *qdict)
qmp_drive_backup(device, filename, !!format, format,
full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
true, mode, false, 0, false, NULL,
false, 0, false, 0, &err);
true, mode, false, 0, false, 0, false, 0, &err);
hmp_handle_error(mon, &err);
}
@@ -1208,48 +1185,6 @@ void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict)
}
}
void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
{
const char *param = qdict_get_str(qdict, "parameter");
int value = qdict_get_int(qdict, "value");
Error *err = NULL;
bool has_compress_level = false;
bool has_compress_threads = false;
bool has_decompress_threads = false;
int i;
for (i = 0; i < MIGRATION_PARAMETER_MAX; i++) {
if (strcmp(param, MigrationParameter_lookup[i]) == 0) {
switch (i) {
case MIGRATION_PARAMETER_COMPRESS_LEVEL:
has_compress_level = true;
break;
case MIGRATION_PARAMETER_COMPRESS_THREADS:
has_compress_threads = true;
break;
case MIGRATION_PARAMETER_DECOMPRESS_THREADS:
has_decompress_threads = true;
break;
}
qmp_migrate_set_parameters(has_compress_level, value,
has_compress_threads, value,
has_decompress_threads, value,
&err);
break;
}
}
if (i == MIGRATION_PARAMETER_MAX) {
error_set(&err, QERR_INVALID_PARAMETER, param);
}
if (err) {
monitor_printf(mon, "migrate_set_parameter: %s\n",
error_get_pretty(err));
error_free(err);
}
}
void hmp_set_password(Monitor *mon, const QDict *qdict)
{
const char *protocol = qdict_get_str(qdict, "protocol");

5
hmp.h
View File

@@ -28,7 +28,6 @@ void hmp_info_chardev(Monitor *mon, const QDict *qdict);
void hmp_info_mice(Monitor *mon, const QDict *qdict);
void hmp_info_migrate(Monitor *mon, const QDict *qdict);
void hmp_info_migrate_capabilities(Monitor *mon, const QDict *qdict);
void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict);
void hmp_info_migrate_cache_size(Monitor *mon, const QDict *qdict);
void hmp_info_cpus(Monitor *mon, const QDict *qdict);
void hmp_info_block(Monitor *mon, const QDict *qdict);
@@ -65,7 +64,6 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict);
void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
void hmp_set_password(Monitor *mon, const QDict *qdict);
void hmp_expire_password(Monitor *mon, const QDict *qdict);
@@ -111,12 +109,11 @@ void set_link_completion(ReadLineState *rs, int nb_args, const char *str);
void netdev_add_completion(ReadLineState *rs, int nb_args, const char *str);
void netdev_del_completion(ReadLineState *rs, int nb_args, const char *str);
void ringbuf_write_completion(ReadLineState *rs, int nb_args, const char *str);
void ringbuf_read_completion(ReadLineState *rs, int nb_args, const char *str);
void watchdog_action_completion(ReadLineState *rs, int nb_args,
const char *str);
void migrate_set_capability_completion(ReadLineState *rs, int nb_args,
const char *str);
void migrate_set_parameter_completion(ReadLineState *rs, int nb_args,
const char *str);
void host_net_add_completion(ReadLineState *rs, int nb_args, const char *str);
void host_net_remove_completion(ReadLineState *rs, int nb_args,
const char *str);

View File

@@ -21,7 +21,7 @@
#include "virtio-9p-coth.h"
#include "hw/virtio/virtio-access.h"
static uint64_t virtio_9p_get_features(VirtIODevice *vdev, uint64_t features)
static uint32_t virtio_9p_get_features(VirtIODevice *vdev, uint32_t features)
{
virtio_add_feature(&features, VIRTIO_9P_MOUNT_TAG);
return features;

View File

@@ -1,6 +1,5 @@
common-obj-$(CONFIG_ACPI_X86) += core.o piix4.o ich9.o pcihp.o
common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
common-obj-$(CONFIG_ACPI) += core.o piix4.o ich9.o pcihp.o cpu_hotplug.o
common-obj-$(CONFIG_ACPI) += memory_hotplug.o
common-obj-$(CONFIG_ACPI) += acpi_interface.o
common-obj-$(CONFIG_ACPI) += bios-linker-loader.o
common-obj-$(CONFIG_ACPI) += aml-build.o

View File

@@ -19,7 +19,6 @@
* with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include <glib/gprintf.h>
#include <stdio.h>
#include <stdarg.h>
#include <assert.h>
@@ -27,8 +26,6 @@
#include <string.h>
#include "hw/acpi/aml-build.h"
#include "qemu/bswap.h"
#include "qemu/bitops.h"
#include "hw/acpi/bios-linker-loader.h"
static GArray *build_alloc_array(void)
{
@@ -60,6 +57,7 @@ static void build_append_array(GArray *array, GArray *val)
static void
build_append_nameseg(GArray *array, const char *seg)
{
/* It would be nicer to use g_string_vprintf but it's only there in 2.22 */
int len;
len = strlen(seg);
@@ -73,12 +71,22 @@ build_append_nameseg(GArray *array, const char *seg)
static void GCC_FMT_ATTR(2, 0)
build_append_namestringv(GArray *array, const char *format, va_list ap)
{
/* It would be nicer to use g_string_vprintf but it's only there in 2.22 */
char *s;
int len;
va_list va_len;
char **segs;
char **segs_iter;
int seg_count = 0;
s = g_strdup_vprintf(format, ap);
va_copy(va_len, ap);
len = vsnprintf(NULL, 0, format, va_len);
va_end(va_len);
len += 1;
s = g_new(typeof(*s), len);
len = vsnprintf(s, len, format, ap);
segs = g_strsplit(s, ".", 0);
g_free(s);
@@ -446,73 +454,6 @@ Aml *aml_and(Aml *arg1, Aml *arg2)
return var;
}
/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefOr */
Aml *aml_or(Aml *arg1, Aml *arg2)
{
Aml *var = aml_opcode(0x7D /* OrOp */);
aml_append(var, arg1);
aml_append(var, arg2);
build_append_byte(var->buf, 0x00 /* NullNameOp */);
return var;
}
/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefShiftLeft */
Aml *aml_shiftleft(Aml *arg1, Aml *count)
{
Aml *var = aml_opcode(0x79 /* ShiftLeftOp */);
aml_append(var, arg1);
aml_append(var, count);
build_append_byte(var->buf, 0x00); /* NullNameOp */
return var;
}
/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefShiftRight */
Aml *aml_shiftright(Aml *arg1, Aml *count)
{
Aml *var = aml_opcode(0x7A /* ShiftRightOp */);
aml_append(var, arg1);
aml_append(var, count);
build_append_byte(var->buf, 0x00); /* NullNameOp */
return var;
}
/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefLLess */
Aml *aml_lless(Aml *arg1, Aml *arg2)
{
Aml *var = aml_opcode(0x95 /* LLessOp */);
aml_append(var, arg1);
aml_append(var, arg2);
return var;
}
/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefAdd */
Aml *aml_add(Aml *arg1, Aml *arg2)
{
Aml *var = aml_opcode(0x72 /* AddOp */);
aml_append(var, arg1);
aml_append(var, arg2);
build_append_byte(var->buf, 0x00 /* NullNameOp */);
return var;
}
/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefIncrement */
Aml *aml_increment(Aml *arg)
{
Aml *var = aml_opcode(0x75 /* IncrementOp */);
aml_append(var, arg);
return var;
}
/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefIndex */
Aml *aml_index(Aml *arg1, Aml *idx)
{
Aml *var = aml_opcode(0x88 /* IndexOp */);
aml_append(var, arg1);
aml_append(var, idx);
build_append_byte(var->buf, 0x00 /* NullNameOp */);
return var;
}
/* ACPI 1.0b: 16.2.5.3 Type 1 Opcodes Encoding: DefNotify */
Aml *aml_notify(Aml *arg1, Aml *arg2)
{
@@ -564,60 +505,6 @@ Aml *aml_call4(const char *method, Aml *arg1, Aml *arg2, Aml *arg3, Aml *arg4)
return var;
}
/*
* ACPI 1.0b: 6.4.3.4 32-Bit Fixed Location Memory Range Descriptor
* (Type 1, Large Item Name 0x6)
*/
Aml *aml_memory32_fixed(uint32_t addr, uint32_t size,
AmlReadAndWrite read_and_write)
{
Aml *var = aml_alloc();
build_append_byte(var->buf, 0x86); /* Memory32Fixed Resource Descriptor */
build_append_byte(var->buf, 9); /* Length, bits[7:0] value = 9 */
build_append_byte(var->buf, 0); /* Length, bits[15:8] value = 0 */
build_append_byte(var->buf, read_and_write); /* Write status, 1 rw 0 ro */
/* Range base address */
build_append_byte(var->buf, extract32(addr, 0, 8)); /* bits[7:0] */
build_append_byte(var->buf, extract32(addr, 8, 8)); /* bits[15:8] */
build_append_byte(var->buf, extract32(addr, 16, 8)); /* bits[23:16] */
build_append_byte(var->buf, extract32(addr, 24, 8)); /* bits[31:24] */
/* Range length */
build_append_byte(var->buf, extract32(size, 0, 8)); /* bits[7:0] */
build_append_byte(var->buf, extract32(size, 8, 8)); /* bits[15:8] */
build_append_byte(var->buf, extract32(size, 16, 8)); /* bits[23:16] */
build_append_byte(var->buf, extract32(size, 24, 8)); /* bits[31:24] */
return var;
}
/*
* ACPI 5.0: 6.4.3.6 Extended Interrupt Descriptor
* Type 1, Large Item Name 0x9
*/
Aml *aml_interrupt(AmlConsumerAndProducer con_and_pro,
AmlLevelAndEdge level_and_edge,
AmlActiveHighAndLow high_and_low, AmlShared shared,
uint32_t irq)
{
Aml *var = aml_alloc();
uint8_t irq_flags = con_and_pro | (level_and_edge << 1)
| (high_and_low << 2) | (shared << 3);
build_append_byte(var->buf, 0x89); /* Extended irq descriptor */
build_append_byte(var->buf, 6); /* Length, bits[7:0] minimum value = 6 */
build_append_byte(var->buf, 0); /* Length, bits[15:8] minimum value = 0 */
build_append_byte(var->buf, irq_flags); /* Interrupt Vector Information. */
build_append_byte(var->buf, 0x01); /* Interrupt table length = 1 */
/* Interrupt Number */
build_append_byte(var->buf, extract32(irq, 0, 8)); /* bits[7:0] */
build_append_byte(var->buf, extract32(irq, 8, 8)); /* bits[15:8] */
build_append_byte(var->buf, extract32(irq, 16, 8)); /* bits[23:16] */
build_append_byte(var->buf, extract32(irq, 24, 8)); /* bits[31:24] */
return var;
}
/* ACPI 1.0b: 6.4.2.5 I/O Port Descriptor */
Aml *aml_io(AmlIODecode dec, uint16_t min_base, uint16_t max_base,
uint8_t aln, uint8_t len)
@@ -655,14 +542,6 @@ Aml *aml_irq_no_flags(uint8_t irq)
return var;
}
/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefLNot */
Aml *aml_lnot(Aml *arg)
{
Aml *var = aml_opcode(0x92 /* LNotOp */);
aml_append(var, arg);
return var;
}
/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefLEqual */
Aml *aml_equal(Aml *arg1, Aml *arg2)
{
@@ -680,13 +559,6 @@ Aml *aml_if(Aml *predicate)
return var;
}
/* ACPI 1.0b: 16.2.5.3 Type 1 Opcodes Encoding: DefElse */
Aml *aml_else(void)
{
Aml *var = aml_bundle(0xA1 /* ElseOp */, AML_PACKAGE);
return var;
}
/* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefMethod */
Aml *aml_method(const char *name, int arg_count)
{
@@ -715,22 +587,10 @@ Aml *aml_resource_template(void)
return var;
}
/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefBuffer
* Pass byte_list as NULL to request uninitialized buffer to reserve space.
*/
Aml *aml_buffer(int buffer_size, uint8_t *byte_list)
/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefBuffer */
Aml *aml_buffer(void)
{
int i;
Aml *var = aml_bundle(0x11 /* BufferOp */, AML_BUFFER);
for (i = 0; i < buffer_size; i++) {
if (byte_list == NULL) {
build_append_byte(var->buf, 0x0);
} else {
build_append_byte(var->buf, byte_list[i]);
}
}
return var;
}
@@ -776,40 +636,34 @@ Aml *aml_reserved_field(unsigned length)
}
/* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefField */
Aml *aml_field(const char *name, AmlAccessType type, AmlUpdateRule rule)
Aml *aml_field(const char *name, AmlFieldFlags flags)
{
Aml *var = aml_bundle(0x81 /* FieldOp */, AML_EXT_PACKAGE);
uint8_t flags = rule << 5 | type;
build_append_namestring(var->buf, "%s", name);
build_append_byte(var->buf, flags);
return var;
}
/* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefCreateDWordField */
Aml *aml_create_dword_field(Aml *srcbuf, Aml *index, const char *name)
{
Aml *var = aml_alloc();
build_append_byte(var->buf, 0x8A); /* CreateDWordFieldOp */
aml_append(var, srcbuf);
aml_append(var, index);
build_append_namestring(var->buf, "%s", name);
return var;
}
/* ACPI 1.0b: 16.2.3 Data Objects Encoding: String */
Aml *aml_string(const char *name_format, ...)
{
Aml *var = aml_opcode(0x0D /* StringPrefix */);
va_list ap;
va_list ap, va_len;
char *s;
int len;
va_start(ap, name_format);
len = g_vasprintf(&s, name_format, ap);
va_copy(va_len, ap);
len = vsnprintf(NULL, 0, name_format, va_len);
va_end(va_len);
len += 1;
s = g_new0(typeof(*s), len);
len = vsnprintf(s, len, name_format, ap);
va_end(ap);
g_array_append_vals(var->buf, s, len + 1);
g_array_append_vals(var->buf, s, len);
build_append_byte(var->buf, 0x0); /* NullChar */
g_free(s);
return var;
@@ -977,7 +831,7 @@ Aml *aml_word_bus_number(AmlMinFixed min_fixed, AmlMaxFixed max_fixed,
uint16_t addr_trans, uint16_t len)
{
return aml_word_as_desc(AML_BUS_NUMBER_RANGE, min_fixed, max_fixed, dec,
return aml_word_as_desc(aml_bus_number_range, min_fixed, max_fixed, dec,
addr_gran, addr_min, addr_max, addr_trans, len, 0);
}
@@ -994,25 +848,7 @@ Aml *aml_word_io(AmlMinFixed min_fixed, AmlMaxFixed max_fixed,
uint16_t len)
{
return aml_word_as_desc(AML_IO_RANGE, min_fixed, max_fixed, dec,
addr_gran, addr_min, addr_max, addr_trans, len,
isa_ranges);
}
/*
* ACPI 1.0b: 6.4.3.5.4 ASL Macros for DWORD Address Descriptor
*
* More verbose description at:
* ACPI 5.0: 19.5.33 DWordIO (DWord IO Resource Descriptor Macro)
*/
Aml *aml_dword_io(AmlMinFixed min_fixed, AmlMaxFixed max_fixed,
AmlDecode dec, AmlISARanges isa_ranges,
uint32_t addr_gran, uint32_t addr_min,
uint32_t addr_max, uint32_t addr_trans,
uint32_t len)
{
return aml_dword_as_desc(AML_IO_RANGE, min_fixed, max_fixed, dec,
return aml_word_as_desc(aml_io_range, min_fixed, max_fixed, dec,
addr_gran, addr_min, addr_max, addr_trans, len,
isa_ranges);
}
@@ -1024,7 +860,7 @@ Aml *aml_dword_io(AmlMinFixed min_fixed, AmlMaxFixed max_fixed,
* ACPI 5.0: 19.5.34 DWordMemory (DWord Memory Resource Descriptor Macro)
*/
Aml *aml_dword_memory(AmlDecode dec, AmlMinFixed min_fixed,
AmlMaxFixed max_fixed, AmlCacheable cacheable,
AmlMaxFixed max_fixed, AmlCacheble cacheable,
AmlReadAndWrite read_and_write,
uint32_t addr_gran, uint32_t addr_min,
uint32_t addr_max, uint32_t addr_trans,
@@ -1032,7 +868,7 @@ Aml *aml_dword_memory(AmlDecode dec, AmlMinFixed min_fixed,
{
uint8_t flags = read_and_write | (cacheable << 1);
return aml_dword_as_desc(AML_MEMORY_RANGE, min_fixed, max_fixed,
return aml_dword_as_desc(aml_memory_range, min_fixed, max_fixed,
dec, addr_gran, addr_min, addr_max,
addr_trans, len, flags);
}
@@ -1044,7 +880,7 @@ Aml *aml_dword_memory(AmlDecode dec, AmlMinFixed min_fixed,
* ACPI 5.0: 19.5.102 QWordMemory (QWord Memory Resource Descriptor Macro)
*/
Aml *aml_qword_memory(AmlDecode dec, AmlMinFixed min_fixed,
AmlMaxFixed max_fixed, AmlCacheable cacheable,
AmlMaxFixed max_fixed, AmlCacheble cacheable,
AmlReadAndWrite read_and_write,
uint64_t addr_gran, uint64_t addr_min,
uint64_t addr_max, uint64_t addr_trans,
@@ -1052,158 +888,7 @@ Aml *aml_qword_memory(AmlDecode dec, AmlMinFixed min_fixed,
{
uint8_t flags = read_and_write | (cacheable << 1);
return aml_qword_as_desc(AML_MEMORY_RANGE, min_fixed, max_fixed,
return aml_qword_as_desc(aml_memory_range, min_fixed, max_fixed,
dec, addr_gran, addr_min, addr_max,
addr_trans, len, flags);
}
static uint8_t Hex2Byte(const char *src)
{
int hi, lo;
hi = Hex2Digit(src[0]);
assert(hi >= 0);
assert(hi <= 15);
lo = Hex2Digit(src[1]);
assert(lo >= 0);
assert(lo <= 15);
return (hi << 4) | lo;
}
/*
* ACPI 3.0: 17.5.124 ToUUID (Convert String to UUID Macro)
* e.g. UUID: aabbccdd-eeff-gghh-iijj-kkllmmnnoopp
* call aml_touuid("aabbccdd-eeff-gghh-iijj-kkllmmnnoopp");
*/
Aml *aml_touuid(const char *uuid)
{
Aml *var = aml_bundle(0x11 /* BufferOp */, AML_BUFFER);
assert(strlen(uuid) == 36);
assert(uuid[8] == '-');
assert(uuid[13] == '-');
assert(uuid[18] == '-');
assert(uuid[23] == '-');
build_append_byte(var->buf, Hex2Byte(uuid + 6)); /* dd - at offset 00 */
build_append_byte(var->buf, Hex2Byte(uuid + 4)); /* cc - at offset 01 */
build_append_byte(var->buf, Hex2Byte(uuid + 2)); /* bb - at offset 02 */
build_append_byte(var->buf, Hex2Byte(uuid + 0)); /* aa - at offset 03 */
build_append_byte(var->buf, Hex2Byte(uuid + 11)); /* ff - at offset 04 */
build_append_byte(var->buf, Hex2Byte(uuid + 9)); /* ee - at offset 05 */
build_append_byte(var->buf, Hex2Byte(uuid + 16)); /* hh - at offset 06 */
build_append_byte(var->buf, Hex2Byte(uuid + 14)); /* gg - at offset 07 */
build_append_byte(var->buf, Hex2Byte(uuid + 19)); /* ii - at offset 08 */
build_append_byte(var->buf, Hex2Byte(uuid + 21)); /* jj - at offset 09 */
build_append_byte(var->buf, Hex2Byte(uuid + 24)); /* kk - at offset 10 */
build_append_byte(var->buf, Hex2Byte(uuid + 26)); /* ll - at offset 11 */
build_append_byte(var->buf, Hex2Byte(uuid + 28)); /* mm - at offset 12 */
build_append_byte(var->buf, Hex2Byte(uuid + 30)); /* nn - at offset 13 */
build_append_byte(var->buf, Hex2Byte(uuid + 32)); /* oo - at offset 14 */
build_append_byte(var->buf, Hex2Byte(uuid + 34)); /* pp - at offset 15 */
return var;
}
/*
* ACPI 2.0b: 16.2.3.6.4.3 Unicode Macro (Convert Ascii String To Unicode)
*/
Aml *aml_unicode(const char *str)
{
int i = 0;
Aml *var = aml_bundle(0x11 /* BufferOp */, AML_BUFFER);
do {
build_append_byte(var->buf, str[i]);
build_append_byte(var->buf, 0);
i++;
} while (i <= strlen(str));
return var;
}
void
build_header(GArray *linker, GArray *table_data,
AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
{
memcpy(&h->signature, sig, 4);
h->length = cpu_to_le32(len);
h->revision = rev;
memcpy(h->oem_id, ACPI_BUILD_APPNAME6, 6);
memcpy(h->oem_table_id, ACPI_BUILD_APPNAME4, 4);
memcpy(h->oem_table_id + 4, sig, 4);
h->oem_revision = cpu_to_le32(1);
memcpy(h->asl_compiler_id, ACPI_BUILD_APPNAME4, 4);
h->asl_compiler_revision = cpu_to_le32(1);
h->checksum = 0;
/* Checksum to be filled in by Guest linker */
bios_linker_loader_add_checksum(linker, ACPI_BUILD_TABLE_FILE,
table_data->data, h, len, &h->checksum);
}
void *acpi_data_push(GArray *table_data, unsigned size)
{
unsigned off = table_data->len;
g_array_set_size(table_data, off + size);
return table_data->data + off;
}
unsigned acpi_data_len(GArray *table)
{
#if GLIB_CHECK_VERSION(2, 22, 0)
assert(g_array_get_element_size(table) == 1);
#endif
return table->len;
}
void acpi_add_table(GArray *table_offsets, GArray *table_data)
{
uint32_t offset = cpu_to_le32(table_data->len);
g_array_append_val(table_offsets, offset);
}
void acpi_build_tables_init(AcpiBuildTables *tables)
{
tables->rsdp = g_array_new(false, true /* clear */, 1);
tables->table_data = g_array_new(false, true /* clear */, 1);
tables->tcpalog = g_array_new(false, true /* clear */, 1);
tables->linker = bios_linker_loader_init();
}
void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
{
void *linker_data = bios_linker_loader_cleanup(tables->linker);
g_free(linker_data);
g_array_free(tables->rsdp, true);
g_array_free(tables->table_data, true);
g_array_free(tables->tcpalog, mfre);
}
/* Build rsdt table */
void
build_rsdt(GArray *table_data, GArray *linker, GArray *table_offsets)
{
AcpiRsdtDescriptorRev1 *rsdt;
size_t rsdt_len;
int i;
const int table_data_len = (sizeof(uint32_t) * table_offsets->len);
rsdt_len = sizeof(*rsdt) + table_data_len;
rsdt = acpi_data_push(table_data, rsdt_len);
memcpy(rsdt->table_offset_entry, table_offsets->data, table_data_len);
for (i = 0; i < table_offsets->len; ++i) {
/* rsdt->table_offset_entry to be filled by Guest linker */
bios_linker_loader_add_pointer(linker,
ACPI_BUILD_TABLE_FILE,
ACPI_BUILD_TABLE_FILE,
table_data, &rsdt->table_offset_entry[i],
sizeof(uint32_t));
}
build_header(linker, table_data,
(void *)rsdt, "RSDT", rsdt_len, 1);
}

View File

@@ -400,26 +400,15 @@ void ich9_pm_device_plug_cb(ICH9LPCPMRegs *pm, DeviceState *dev, Error **errp)
void ich9_pm_device_unplug_request_cb(ICH9LPCPMRegs *pm, DeviceState *dev,
Error **errp)
{
if (pm->acpi_memory_hotplug.is_enabled &&
object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
acpi_memory_unplug_request_cb(&pm->acpi_regs, pm->irq,
&pm->acpi_memory_hotplug, dev, errp);
} else {
error_setg(errp, "acpi: device unplug request for not supported device"
" type: %s", object_get_typename(OBJECT(dev)));
}
error_setg(errp, "acpi: device unplug request for not supported device"
" type: %s", object_get_typename(OBJECT(dev)));
}
void ich9_pm_device_unplug_cb(ICH9LPCPMRegs *pm, DeviceState *dev,
Error **errp)
{
if (pm->acpi_memory_hotplug.is_enabled &&
object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
acpi_memory_unplug_cb(&pm->acpi_memory_hotplug, dev, errp);
} else {
error_setg(errp, "acpi: device unplug for not supported device"
" type: %s", object_get_typename(OBJECT(dev)));
}
error_setg(errp, "acpi: device unplug for not supported device"
" type: %s", object_get_typename(OBJECT(dev)));
}
void ich9_pm_ospm_status(AcpiDeviceIf *adev, ACPIOSTInfoList ***list)

View File

@@ -2,7 +2,6 @@
#include "hw/acpi/pc-hotplug.h"
#include "hw/mem/pc-dimm.h"
#include "hw/boards.h"
#include "hw/qdev-core.h"
#include "trace.h"
#include "qapi-event.h"
@@ -76,7 +75,6 @@ static uint64_t acpi_memory_hotplug_read(void *opaque, hwaddr addr,
case 0x14: /* pack and return is_* fields */
val |= mdev->is_enabled ? 1 : 0;
val |= mdev->is_inserting ? 2 : 0;
val |= mdev->is_removing ? 4 : 0;
trace_mhp_acpi_read_flags(mem_st->selector, val);
break;
default:
@@ -92,9 +90,6 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr addr, uint64_t data,
MemHotplugState *mem_st = opaque;
MemStatus *mdev;
ACPIOSTInfo *info;
DeviceState *dev = NULL;
HotplugHandler *hotplug_ctrl = NULL;
Error *local_err = NULL;
if (!mem_st->dev_count) {
return;
@@ -132,36 +127,13 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr addr, uint64_t data,
qapi_event_send_acpi_device_ost(info, &error_abort);
qapi_free_ACPIOSTInfo(info);
break;
case 0x14: /* set is_* fields */
case 0x14:
mdev = &mem_st->devs[mem_st->selector];
if (data & 2) { /* clear insert event */
mdev->is_inserting = false;
trace_mhp_acpi_clear_insert_evt(mem_st->selector);
} else if (data & 4) {
mdev->is_removing = false;
trace_mhp_acpi_clear_remove_evt(mem_st->selector);
} else if (data & 8) {
if (!mdev->is_enabled) {
trace_mhp_acpi_ejecting_invalid_slot(mem_st->selector);
break;
}
dev = DEVICE(mdev->dimm);
hotplug_ctrl = qdev_get_hotplug_handler(dev);
/* call pc-dimm unplug cb */
hotplug_handler_unplug(hotplug_ctrl, dev, &local_err);
if (local_err) {
trace_mhp_acpi_pc_dimm_delete_failed(mem_st->selector);
qapi_event_send_mem_unplug_error(dev->id,
error_get_pretty(local_err),
&error_abort);
break;
}
trace_mhp_acpi_pc_dimm_deleted(mem_st->selector);
}
break;
default:
break;
}
}
@@ -191,51 +163,29 @@ void acpi_memory_hotplug_init(MemoryRegion *as, Object *owner,
memory_region_add_subregion(as, ACPI_MEMORY_HOTPLUG_BASE, &state->io);
}
/**
* acpi_memory_slot_status:
* @mem_st: memory hotplug state
* @dev: device
* @errp: set in case of an error
*
* Obtain a single memory slot status.
*
* This function will be called by memory unplug request cb and unplug cb.
*/
static MemStatus *
acpi_memory_slot_status(MemHotplugState *mem_st,
DeviceState *dev, Error **errp)
void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st,
DeviceState *dev, Error **errp)
{
MemStatus *mdev;
Error *local_err = NULL;
int slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP,
&local_err);
if (local_err) {
error_propagate(errp, local_err);
return NULL;
return;
}
if (slot >= mem_st->dev_count) {
char *dev_path = object_get_canonical_path(OBJECT(dev));
error_setg(errp, "acpi_memory_slot_status: "
error_setg(errp, "acpi_memory_plug_cb: "
"device [%s] returned invalid memory slot[%d]",
dev_path, slot);
g_free(dev_path);
return NULL;
}
return &mem_st->devs[slot];
}
void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st,
DeviceState *dev, Error **errp)
{
MemStatus *mdev;
mdev = acpi_memory_slot_status(mem_st, dev, errp);
if (!mdev) {
return;
}
mdev = &mem_st->devs[slot];
mdev->dimm = dev;
mdev->is_enabled = true;
mdev->is_inserting = true;
@@ -246,38 +196,6 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st,
return;
}
void acpi_memory_unplug_request_cb(ACPIREGS *ar, qemu_irq irq,
MemHotplugState *mem_st,
DeviceState *dev, Error **errp)
{
MemStatus *mdev;
mdev = acpi_memory_slot_status(mem_st, dev, errp);
if (!mdev) {
return;
}
mdev->is_removing = true;
/* Do ACPI magic */
ar->gpe.sts[0] |= ACPI_MEMORY_HOTPLUG_STATUS;
acpi_update_sci(ar, irq);
}
void acpi_memory_unplug_cb(MemHotplugState *mem_st,
DeviceState *dev, Error **errp)
{
MemStatus *mdev;
mdev = acpi_memory_slot_status(mem_st, dev, errp);
if (!mdev) {
return;
}
mdev->is_enabled = false;
mdev->dimm = NULL;
}
static const VMStateDescription vmstate_memhp_sts = {
.name = "memory hotplug device state",
.version_id = 1,

View File

@@ -31,6 +31,7 @@
#include "hw/pci/pci.h"
#include "hw/acpi/acpi.h"
#include "sysemu/sysemu.h"
#include "qemu/range.h"
#include "exec/ioport.h"
#include "exec/address-spaces.h"
#include "hw/pci/pci_bus.h"
@@ -119,7 +120,7 @@ static bool acpi_pcihp_pc_no_hotplug(AcpiPciHpState *s, PCIDevice *dev)
static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slots)
{
BusChild *kid, *next;
int slot = ctz32(slots);
int slot = ffs(slots) - 1;
PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
if (!bus) {

View File

@@ -275,7 +275,7 @@ static const VMStateDescription vmstate_memhp_state = {
static const VMStateDescription vmstate_acpi = {
.name = "piix4_pm",
.version_id = 3,
.minimum_version_id = 3,
.minimum_version_id = 2, /* qemu-kvm */
.minimum_version_id_old = 1,
.load_state_old = acpi_load_old,
.post_load = vmstate_acpi_post_load,
@@ -361,11 +361,7 @@ static void piix4_device_unplug_request_cb(HotplugHandler *hotplug_dev,
{
PIIX4PMState *s = PIIX4_PM(hotplug_dev);
if (s->acpi_memory_hotplug.is_enabled &&
object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
acpi_memory_unplug_request_cb(&s->ar, s->irq, &s->acpi_memory_hotplug,
dev, errp);
} else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
acpi_pcihp_device_unplug_cb(&s->ar, s->irq, &s->acpi_pci_hotplug, dev,
errp);
} else {
@@ -377,15 +373,8 @@ static void piix4_device_unplug_request_cb(HotplugHandler *hotplug_dev,
static void piix4_device_unplug_cb(HotplugHandler *hotplug_dev,
DeviceState *dev, Error **errp)
{
PIIX4PMState *s = PIIX4_PM(hotplug_dev);
if (s->acpi_memory_hotplug.is_enabled &&
object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
acpi_memory_unplug_cb(&s->acpi_memory_hotplug, dev, errp);
} else {
error_setg(errp, "acpi: device unplug for not supported device"
" type: %s", object_get_typename(OBJECT(dev)));
}
error_setg(errp, "acpi: device unplug for not supported device"
" type: %s", object_get_typename(OBJECT(dev)));
}
static void piix4_update_bus_hotplug(PCIBus *pci_bus, void *opaque)

View File

@@ -157,12 +157,9 @@ static void clipper_init(MachineState *machine)
load_image_targphys(initrd_filename, initrd_base,
ram_size - initrd_base);
address_space_stq(&address_space_memory, param_offset + 0x100,
initrd_base + 0xfffffc0000000000ULL,
MEMTXATTRS_UNSPECIFIED,
NULL);
address_space_stq(&address_space_memory, param_offset + 0x108,
initrd_size, MEMTXATTRS_UNSPECIFIED, NULL);
stq_phys(&address_space_memory,
param_offset + 0x100, initrd_base + 0xfffffc0000000000ULL);
stq_phys(&address_space_memory, param_offset + 0x108, initrd_size);
}
}
}

View File

@@ -613,8 +613,7 @@ static bool make_iommu_tlbe(hwaddr taddr, hwaddr mask, IOMMUTLBEntry *ret)
translation, given the address of the PTE. */
static bool pte_translate(hwaddr pte_addr, IOMMUTLBEntry *ret)
{
uint64_t pte = address_space_ldq(&address_space_memory, pte_addr,
MEMTXATTRS_UNSPECIFIED, NULL);
uint64_t pte = ldq_phys(&address_space_memory, pte_addr);
/* Check valid bit. */
if ((pte & 1) == 0) {

View File

@@ -3,7 +3,6 @@ obj-$(CONFIG_DIGIC) += digic_boards.o
obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o
obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o
obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o
obj-$(CONFIG_ACPI) += virt-acpi-build.o
obj-y += netduino2.o
obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o
@@ -11,4 +10,3 @@ obj-$(CONFIG_DIGIC) += digic.o
obj-y += omap1.o omap2.o strongarm.o
obj-$(CONFIG_ALLWINNER_A10) += allwinner-a10.o cubieboard.o
obj-$(CONFIG_STM32F205_SOC) += stm32f205_soc.o
obj-$(CONFIG_XLNX_ZYNQMP) += xlnx-zynqmp.o xlnx-ep108.o

View File

@@ -170,8 +170,7 @@ static void default_reset_secondary(ARMCPU *cpu,
{
CPUARMState *env = &cpu->env;
address_space_stl_notdirty(&address_space_memory, info->smp_bootreg_addr,
0, MEMTXATTRS_UNSPECIFIED, NULL);
stl_phys_notdirty(&address_space_memory, info->smp_bootreg_addr, 0);
env->regs[15] = info->smp_loader_start;
}
@@ -181,8 +180,7 @@ static inline bool have_dtb(const struct arm_boot_info *info)
}
#define WRITE_WORD(p, value) do { \
address_space_stl_notdirty(&address_space_memory, p, value, \
MEMTXATTRS_UNSPECIFIED, NULL); \
stl_phys_notdirty(&address_space_memory, p, value); \
p += 4; \
} while (0)

View File

@@ -69,17 +69,11 @@ static void hb_reset_secondary(ARMCPU *cpu, const struct arm_boot_info *info)
switch (info->nb_cpus) {
case 4:
address_space_stl_notdirty(&address_space_memory,
SMP_BOOT_REG + 0x30, 0,
MEMTXATTRS_UNSPECIFIED, NULL);
stl_phys_notdirty(&address_space_memory, SMP_BOOT_REG + 0x30, 0);
case 3:
address_space_stl_notdirty(&address_space_memory,
SMP_BOOT_REG + 0x20, 0,
MEMTXATTRS_UNSPECIFIED, NULL);
stl_phys_notdirty(&address_space_memory, SMP_BOOT_REG + 0x20, 0);
case 2:
address_space_stl_notdirty(&address_space_memory,
SMP_BOOT_REG + 0x10, 0,
MEMTXATTRS_UNSPECIFIED, NULL);
stl_phys_notdirty(&address_space_memory, SMP_BOOT_REG + 0x10, 0);
env->regs[15] = SMP_BOOT_ADDR;
break;
default:
@@ -217,7 +211,6 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
qemu_irq pic[128];
int n;
qemu_irq cpu_irq[4];
qemu_irq cpu_fiq[4];
MemoryRegion *sysram;
MemoryRegion *dram;
MemoryRegion *sysmem;
@@ -270,7 +263,6 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
exit(1);
}
cpu_irq[n] = qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ);
cpu_fiq[n] = qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_FIQ);
}
sysmem = get_system_memory();
@@ -315,7 +307,6 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
sysbus_mmio_map(busdev, 0, MPCORE_PERIPHBASE);
for (n = 0; n < smp_cpus; n++) {
sysbus_connect_irq(busdev, n, cpu_irq[n]);
sysbus_connect_irq(busdev, n + smp_cpus, cpu_fiq[n]);
}
for (n = 0; n < 128; n++) {

View File

@@ -579,10 +579,7 @@ static uint32_t mipid_txrx(void *opaque, uint32_t cmd, int len)
case 0x26: /* GAMSET */
if (!s->pm) {
s->gamma = ctz32(s->param[0] & 0xf);
if (s->gamma == 32) {
s->gamma = -1; /* XXX: should this be 0? */
}
s->gamma = ffs(s->param[0] & 0xf) - 1;
} else if (s->pm < 0) {
s->pm = 1;
}

View File

@@ -2004,7 +2004,8 @@ static void omap_mpuio_write(void *opaque, hwaddr addr,
case 0x04: /* OUTPUT_REG */
diff = (s->outputs ^ value) & ~s->dir;
s->outputs = value;
while ((ln = ctz32(diff)) != 32) {
while ((ln = ffs(diff))) {
ln --;
if (s->handler[ln])
qemu_set_irq(s->handler[ln], (value >> ln) & 1);
diff &= ~(1 << ln);
@@ -2016,7 +2017,8 @@ static void omap_mpuio_write(void *opaque, hwaddr addr,
s->dir = value;
value = s->outputs & ~s->dir;
while ((ln = ctz32(diff)) != 32) {
while ((ln = ffs(diff))) {
ln --;
if (s->handler[ln])
qemu_set_irq(s->handler[ln], (value >> ln) & 1);
diff &= ~(1 << ln);

View File

@@ -274,7 +274,7 @@ static void pxa2xx_pwrmode_write(CPUARMState *env, const ARMCPRegInfo *ri,
s->cpu->env.uncached_cpsr = ARM_CPU_MODE_SVC;
s->cpu->env.daif = PSTATE_A | PSTATE_F | PSTATE_I;
s->cpu->env.cp15.sctlr_ns = 0;
s->cpu->env.cp15.cpacr_el1 = 0;
s->cpu->env.cp15.c1_coproc = 0;
s->cpu->env.cp15.ttbr0_el[1] = 0;
s->cpu->env.cp15.dacr_ns = 0;
s->pm_regs[PSSR >> 2] |= 0x8; /* Set STS */

View File

@@ -137,7 +137,7 @@ static void pxa2xx_gpio_handler_update(PXA2xxGPIOInfo *s) {
level = s->olevel[i] & s->dir[i];
for (diff = s->prev_level[i] ^ level; diff; diff ^= 1 << bit) {
bit = ctz32(diff);
bit = ffs(diff) - 1;
line = bit + 32 * i;
qemu_set_irq(s->handler[line], (level >> bit) & 1);
}

View File

@@ -528,7 +528,7 @@ static void strongarm_gpio_handler_update(StrongARMGPIOInfo *s)
level = s->olevel & s->dir;
for (diff = s->prev_level ^ level; diff; diff ^= 1 << bit) {
bit = ctz32(diff);
bit = ffs(diff) - 1;
qemu_set_irq(s->handler[bit], (level >> bit) & 1);
}
@@ -745,7 +745,7 @@ static void strongarm_ppc_handler_update(StrongARMPPCInfo *s)
level = s->olevel & s->dir;
for (diff = s->prev_level ^ level; diff; diff ^= 1 << bit) {
bit = ctz32(diff);
bit = ffs(diff) - 1;
qemu_set_irq(s->handler[bit], (level >> bit) & 1);
}

View File

@@ -253,8 +253,6 @@ static void init_cpus(const char *cpu_model, const char *privdev,
DeviceState *cpudev = DEVICE(qemu_get_cpu(n));
sysbus_connect_irq(busdev, n, qdev_get_gpio_in(cpudev, ARM_CPU_IRQ));
sysbus_connect_irq(busdev, n + smp_cpus,
qdev_get_gpio_in(cpudev, ARM_CPU_FIQ));
}
}

View File

@@ -1,644 +0,0 @@
/* Support for generating ACPI tables and passing them to Guests
*
* ARM virt ACPI generation
*
* Copyright (C) 2008-2010 Kevin O'Connor <kevin@koconnor.net>
* Copyright (C) 2006 Fabrice Bellard
* Copyright (C) 2013 Red Hat Inc
*
* Author: Michael S. Tsirkin <mst@redhat.com>
*
* Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
*
* Author: Shannon Zhao <zhaoshenglong@huawei.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
* You should have received a copy of the GNU General Public License along
* with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "qemu-common.h"
#include "hw/arm/virt-acpi-build.h"
#include "qemu/bitmap.h"
#include "trace.h"
#include "qom/cpu.h"
#include "target-arm/cpu.h"
#include "hw/acpi/acpi-defs.h"
#include "hw/acpi/acpi.h"
#include "hw/nvram/fw_cfg.h"
#include "hw/acpi/bios-linker-loader.h"
#include "hw/loader.h"
#include "hw/hw.h"
#include "hw/acpi/aml-build.h"
#include "hw/pci/pcie_host.h"
#include "hw/pci/pci.h"
#define ARM_SPI_BASE 32
typedef struct VirtAcpiCpuInfo {
DECLARE_BITMAP(found_cpus, VIRT_ACPI_CPU_ID_LIMIT);
} VirtAcpiCpuInfo;
static void virt_acpi_get_cpu_info(VirtAcpiCpuInfo *cpuinfo)
{
CPUState *cpu;
memset(cpuinfo->found_cpus, 0, sizeof cpuinfo->found_cpus);
CPU_FOREACH(cpu) {
set_bit(cpu->cpu_index, cpuinfo->found_cpus);
}
}
static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus)
{
uint16_t i;
for (i = 0; i < smp_cpus; i++) {
Aml *dev = aml_device("C%03x", i);
aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
aml_append(dev, aml_name_decl("_UID", aml_int(i)));
aml_append(scope, dev);
}
}
static void acpi_dsdt_add_uart(Aml *scope, const MemMapEntry *uart_memmap,
int uart_irq)
{
Aml *dev = aml_device("COM0");
aml_append(dev, aml_name_decl("_HID", aml_string("ARMH0011")));
aml_append(dev, aml_name_decl("_UID", aml_int(0)));
Aml *crs = aml_resource_template();
aml_append(crs, aml_memory32_fixed(uart_memmap->base,
uart_memmap->size, AML_READ_WRITE));
aml_append(crs,
aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
AML_EXCLUSIVE, uart_irq));
aml_append(dev, aml_name_decl("_CRS", crs));
aml_append(scope, dev);
}
static void acpi_dsdt_add_rtc(Aml *scope, const MemMapEntry *rtc_memmap,
int rtc_irq)
{
Aml *dev = aml_device("RTC0");
aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0013")));
aml_append(dev, aml_name_decl("_UID", aml_int(0)));
Aml *crs = aml_resource_template();
aml_append(crs, aml_memory32_fixed(rtc_memmap->base,
rtc_memmap->size, AML_READ_WRITE));
aml_append(crs,
aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
AML_EXCLUSIVE, rtc_irq));
aml_append(dev, aml_name_decl("_CRS", crs));
aml_append(scope, dev);
}
static void acpi_dsdt_add_flash(Aml *scope, const MemMapEntry *flash_memmap)
{
Aml *dev, *crs;
hwaddr base = flash_memmap->base;
hwaddr size = flash_memmap->size;
dev = aml_device("FLS0");
aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0015")));
aml_append(dev, aml_name_decl("_UID", aml_int(0)));
crs = aml_resource_template();
aml_append(crs, aml_memory32_fixed(base, size, AML_READ_WRITE));
aml_append(dev, aml_name_decl("_CRS", crs));
aml_append(scope, dev);
dev = aml_device("FLS1");
aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0015")));
aml_append(dev, aml_name_decl("_UID", aml_int(1)));
crs = aml_resource_template();
aml_append(crs, aml_memory32_fixed(base + size, size, AML_READ_WRITE));
aml_append(dev, aml_name_decl("_CRS", crs));
aml_append(scope, dev);
}
static void acpi_dsdt_add_virtio(Aml *scope,
const MemMapEntry *virtio_mmio_memmap,
int mmio_irq, int num)
{
hwaddr base = virtio_mmio_memmap->base;
hwaddr size = virtio_mmio_memmap->size;
int irq = mmio_irq;
int i;
for (i = 0; i < num; i++) {
Aml *dev = aml_device("VR%02u", i);
aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0005")));
aml_append(dev, aml_name_decl("_UID", aml_int(i)));
Aml *crs = aml_resource_template();
aml_append(crs, aml_memory32_fixed(base, size, AML_READ_WRITE));
aml_append(crs,
aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
AML_EXCLUSIVE, irq + i));
aml_append(dev, aml_name_decl("_CRS", crs));
aml_append(scope, dev);
base += size;
}
}
static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, int irq)
{
Aml *method, *crs, *ifctx, *UUID, *ifctx1, *elsectx, *buf;
int i, bus_no;
hwaddr base_mmio = memmap[VIRT_PCIE_MMIO].base;
hwaddr size_mmio = memmap[VIRT_PCIE_MMIO].size;
hwaddr base_pio = memmap[VIRT_PCIE_PIO].base;
hwaddr size_pio = memmap[VIRT_PCIE_PIO].size;
hwaddr base_ecam = memmap[VIRT_PCIE_ECAM].base;
hwaddr size_ecam = memmap[VIRT_PCIE_ECAM].size;
int nr_pcie_buses = size_ecam / PCIE_MMCFG_SIZE_MIN;
Aml *dev = aml_device("%s", "PCI0");
aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A08")));
aml_append(dev, aml_name_decl("_CID", aml_string("PNP0A03")));
aml_append(dev, aml_name_decl("_SEG", aml_int(0)));
aml_append(dev, aml_name_decl("_BBN", aml_int(0)));
aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
aml_append(dev, aml_name_decl("_UID", aml_string("PCI0")));
aml_append(dev, aml_name_decl("_STR", aml_unicode("PCIe 0 Device")));
/* Declare the PCI Routing Table. */
Aml *rt_pkg = aml_package(nr_pcie_buses * PCI_NUM_PINS);
for (bus_no = 0; bus_no < nr_pcie_buses; bus_no++) {
for (i = 0; i < PCI_NUM_PINS; i++) {
int gsi = (i + bus_no) % PCI_NUM_PINS;
Aml *pkg = aml_package(4);
aml_append(pkg, aml_int((bus_no << 16) | 0xFFFF));
aml_append(pkg, aml_int(i));
aml_append(pkg, aml_name("GSI%d", gsi));
aml_append(pkg, aml_int(0));
aml_append(rt_pkg, pkg);
}
}
aml_append(dev, aml_name_decl("_PRT", rt_pkg));
/* Create GSI link device */
for (i = 0; i < PCI_NUM_PINS; i++) {
Aml *dev_gsi = aml_device("GSI%d", i);
aml_append(dev_gsi, aml_name_decl("_HID", aml_string("PNP0C0F")));
aml_append(dev_gsi, aml_name_decl("_UID", aml_int(0)));
crs = aml_resource_template();
aml_append(crs,
aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
AML_EXCLUSIVE, irq + i));
aml_append(dev_gsi, aml_name_decl("_PRS", crs));
crs = aml_resource_template();
aml_append(crs,
aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
AML_EXCLUSIVE, irq + i));
aml_append(dev_gsi, aml_name_decl("_CRS", crs));
method = aml_method("_SRS", 1);
aml_append(dev_gsi, method);
aml_append(dev, dev_gsi);
}
method = aml_method("_CBA", 0);
aml_append(method, aml_return(aml_int(base_ecam)));
aml_append(dev, method);
method = aml_method("_CRS", 0);
Aml *rbuf = aml_resource_template();
aml_append(rbuf,
aml_word_bus_number(AML_MIN_FIXED, AML_MAX_FIXED, AML_POS_DECODE,
0x0000, 0x0000, nr_pcie_buses - 1, 0x0000,
nr_pcie_buses));
aml_append(rbuf,
aml_dword_memory(AML_POS_DECODE, AML_MIN_FIXED, AML_MAX_FIXED,
AML_NON_CACHEABLE, AML_READ_WRITE, 0x0000, base_mmio,
base_mmio + size_mmio - 1, 0x0000, size_mmio));
aml_append(rbuf,
aml_dword_io(AML_MIN_FIXED, AML_MAX_FIXED, AML_POS_DECODE,
AML_ENTIRE_RANGE, 0x0000, 0x0000, size_pio - 1, base_pio,
size_pio));
aml_append(method, aml_name_decl("RBUF", rbuf));
aml_append(method, aml_return(rbuf));
aml_append(dev, method);
/* Declare an _OSC (OS Control Handoff) method */
aml_append(dev, aml_name_decl("SUPP", aml_int(0)));
aml_append(dev, aml_name_decl("CTRL", aml_int(0)));
method = aml_method("_OSC", 4);
aml_append(method,
aml_create_dword_field(aml_arg(3), aml_int(0), "CDW1"));
/* PCI Firmware Specification 3.0
* 4.5.1. _OSC Interface for PCI Host Bridge Devices
* The _OSC interface for a PCI/PCI-X/PCI Express hierarchy is
* identified by the Universal Unique IDentifier (UUID)
* 33DB4D5B-1FF7-401C-9657-7441C03DD766
*/
UUID = aml_touuid("33DB4D5B-1FF7-401C-9657-7441C03DD766");
ifctx = aml_if(aml_equal(aml_arg(0), UUID));
aml_append(ifctx,
aml_create_dword_field(aml_arg(3), aml_int(4), "CDW2"));
aml_append(ifctx,
aml_create_dword_field(aml_arg(3), aml_int(8), "CDW3"));
aml_append(ifctx, aml_store(aml_name("CDW2"), aml_name("SUPP")));
aml_append(ifctx, aml_store(aml_name("CDW3"), aml_name("CTRL")));
aml_append(ifctx, aml_store(aml_and(aml_name("CTRL"), aml_int(0x1D)),
aml_name("CTRL")));
ifctx1 = aml_if(aml_lnot(aml_equal(aml_arg(1), aml_int(0x1))));
aml_append(ifctx1, aml_store(aml_or(aml_name("CDW1"), aml_int(0x08)),
aml_name("CDW1")));
aml_append(ifctx, ifctx1);
ifctx1 = aml_if(aml_lnot(aml_equal(aml_name("CDW3"), aml_name("CTRL"))));
aml_append(ifctx1, aml_store(aml_or(aml_name("CDW1"), aml_int(0x10)),
aml_name("CDW1")));
aml_append(ifctx, ifctx1);
aml_append(ifctx, aml_store(aml_name("CTRL"), aml_name("CDW3")));
aml_append(ifctx, aml_return(aml_arg(3)));
aml_append(method, ifctx);
elsectx = aml_else();
aml_append(elsectx, aml_store(aml_or(aml_name("CDW1"), aml_int(4)),
aml_name("CDW1")));
aml_append(elsectx, aml_return(aml_arg(3)));
aml_append(method, elsectx);
aml_append(dev, method);
method = aml_method("_DSM", 4);
/* PCI Firmware Specification 3.0
* 4.6.1. _DSM for PCI Express Slot Information
* The UUID in _DSM in this context is
* {E5C937D0-3553-4D7A-9117-EA4D19C3434D}
*/
UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
ifctx = aml_if(aml_equal(aml_arg(0), UUID));
ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
uint8_t byte_list[1] = {1};
buf = aml_buffer(1, byte_list);
aml_append(ifctx1, aml_return(buf));
aml_append(ifctx, ifctx1);
aml_append(method, ifctx);
byte_list[0] = 0;
buf = aml_buffer(1, byte_list);
aml_append(method, aml_return(buf));
aml_append(dev, method);
Aml *dev_rp0 = aml_device("%s", "RP0");
aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0)));
aml_append(dev, dev_rp0);
aml_append(scope, dev);
}
/* RSDP */
static GArray *
build_rsdp(GArray *rsdp_table, GArray *linker, unsigned rsdt)
{
AcpiRsdpDescriptor *rsdp = acpi_data_push(rsdp_table, sizeof *rsdp);
bios_linker_loader_alloc(linker, ACPI_BUILD_RSDP_FILE, 16,
true /* fseg memory */);
memcpy(&rsdp->signature, "RSD PTR ", sizeof(rsdp->signature));
memcpy(rsdp->oem_id, ACPI_BUILD_APPNAME6, sizeof(rsdp->oem_id));
rsdp->length = cpu_to_le32(sizeof(*rsdp));
rsdp->revision = 0x02;
/* Point to RSDT */
rsdp->rsdt_physical_address = cpu_to_le32(rsdt);
/* Address to be filled by Guest linker */
bios_linker_loader_add_pointer(linker, ACPI_BUILD_RSDP_FILE,
ACPI_BUILD_TABLE_FILE,
rsdp_table, &rsdp->rsdt_physical_address,
sizeof rsdp->rsdt_physical_address);
rsdp->checksum = 0;
/* Checksum to be filled by Guest linker */
bios_linker_loader_add_checksum(linker, ACPI_BUILD_RSDP_FILE,
rsdp, rsdp, sizeof *rsdp, &rsdp->checksum);
return rsdp_table;
}
static void
build_mcfg(GArray *table_data, GArray *linker, VirtGuestInfo *guest_info)
{
AcpiTableMcfg *mcfg;
const MemMapEntry *memmap = guest_info->memmap;
int len = sizeof(*mcfg) + sizeof(mcfg->allocation[0]);
mcfg = acpi_data_push(table_data, len);
mcfg->allocation[0].address = cpu_to_le64(memmap[VIRT_PCIE_ECAM].base);
/* Only a single allocation so no need to play with segments */
mcfg->allocation[0].pci_segment = cpu_to_le16(0);
mcfg->allocation[0].start_bus_number = 0;
mcfg->allocation[0].end_bus_number = (memmap[VIRT_PCIE_ECAM].size
/ PCIE_MMCFG_SIZE_MIN) - 1;
build_header(linker, table_data, (void *)mcfg, "MCFG", len, 5);
}
/* GTDT */
static void
build_gtdt(GArray *table_data, GArray *linker)
{
int gtdt_start = table_data->len;
AcpiGenericTimerTable *gtdt;
gtdt = acpi_data_push(table_data, sizeof *gtdt);
/* The interrupt values are the same with the device tree when adding 16 */
gtdt->secure_el1_interrupt = ARCH_TIMER_S_EL1_IRQ + 16;
gtdt->secure_el1_flags = ACPI_EDGE_SENSITIVE;
gtdt->non_secure_el1_interrupt = ARCH_TIMER_NS_EL1_IRQ + 16;
gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE;
gtdt->virtual_timer_interrupt = ARCH_TIMER_VIRT_IRQ + 16;
gtdt->virtual_timer_flags = ACPI_EDGE_SENSITIVE;
gtdt->non_secure_el2_interrupt = ARCH_TIMER_NS_EL2_IRQ + 16;
gtdt->non_secure_el2_flags = ACPI_EDGE_SENSITIVE;
build_header(linker, table_data,
(void *)(table_data->data + gtdt_start), "GTDT",
table_data->len - gtdt_start, 5);
}
/* MADT */
static void
build_madt(GArray *table_data, GArray *linker, VirtGuestInfo *guest_info,
VirtAcpiCpuInfo *cpuinfo)
{
int madt_start = table_data->len;
const MemMapEntry *memmap = guest_info->memmap;
AcpiMultipleApicTable *madt;
AcpiMadtGenericDistributor *gicd;
int i;
madt = acpi_data_push(table_data, sizeof *madt);
for (i = 0; i < guest_info->smp_cpus; i++) {
AcpiMadtGenericInterrupt *gicc = acpi_data_push(table_data,
sizeof *gicc);
gicc->type = ACPI_APIC_GENERIC_INTERRUPT;
gicc->length = sizeof(*gicc);
gicc->base_address = memmap[VIRT_GIC_CPU].base;
gicc->cpu_interface_number = i;
gicc->arm_mpidr = i;
gicc->uid = i;
if (test_bit(i, cpuinfo->found_cpus)) {
gicc->flags = cpu_to_le32(ACPI_GICC_ENABLED);
}
}
gicd = acpi_data_push(table_data, sizeof *gicd);
gicd->type = ACPI_APIC_GENERIC_DISTRIBUTOR;
gicd->length = sizeof(*gicd);
gicd->base_address = memmap[VIRT_GIC_DIST].base;
build_header(linker, table_data,
(void *)(table_data->data + madt_start), "APIC",
table_data->len - madt_start, 5);
}
/* FADT */
static void
build_fadt(GArray *table_data, GArray *linker, unsigned dsdt)
{
AcpiFadtDescriptorRev5_1 *fadt = acpi_data_push(table_data, sizeof(*fadt));
/* Hardware Reduced = 1 and use PSCI 0.2+ and with HVC */
fadt->flags = cpu_to_le32(1 << ACPI_FADT_F_HW_REDUCED_ACPI);
fadt->arm_boot_flags = cpu_to_le16((1 << ACPI_FADT_ARM_USE_PSCI_G_0_2) |
(1 << ACPI_FADT_ARM_PSCI_USE_HVC));
/* ACPI v5.1 (fadt->revision.fadt->minor_revision) */
fadt->minor_revision = 0x1;
fadt->dsdt = cpu_to_le32(dsdt);
/* DSDT address to be filled by Guest linker */
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
ACPI_BUILD_TABLE_FILE,
table_data, &fadt->dsdt,
sizeof fadt->dsdt);
build_header(linker, table_data,
(void *)fadt, "FACP", sizeof(*fadt), 5);
}
/* DSDT */
static void
build_dsdt(GArray *table_data, GArray *linker, VirtGuestInfo *guest_info)
{
Aml *scope, *dsdt;
const MemMapEntry *memmap = guest_info->memmap;
const int *irqmap = guest_info->irqmap;
dsdt = init_aml_allocator();
/* Reserve space for header */
acpi_data_push(dsdt->buf, sizeof(AcpiTableHeader));
scope = aml_scope("\\_SB");
acpi_dsdt_add_cpus(scope, guest_info->smp_cpus);
acpi_dsdt_add_uart(scope, &memmap[VIRT_UART],
(irqmap[VIRT_UART] + ARM_SPI_BASE));
acpi_dsdt_add_rtc(scope, &memmap[VIRT_RTC],
(irqmap[VIRT_RTC] + ARM_SPI_BASE));
acpi_dsdt_add_flash(scope, &memmap[VIRT_FLASH]);
acpi_dsdt_add_virtio(scope, &memmap[VIRT_MMIO],
(irqmap[VIRT_MMIO] + ARM_SPI_BASE), NUM_VIRTIO_TRANSPORTS);
acpi_dsdt_add_pci(scope, memmap, (irqmap[VIRT_PCIE] + ARM_SPI_BASE));
aml_append(dsdt, scope);
/* copy AML table into ACPI tables blob and patch header there */
g_array_append_vals(table_data, dsdt->buf->data, dsdt->buf->len);
build_header(linker, table_data,
(void *)(table_data->data + table_data->len - dsdt->buf->len),
"DSDT", dsdt->buf->len, 5);
free_aml_allocator();
}
typedef
struct AcpiBuildState {
/* Copy of table in RAM (for patching). */
MemoryRegion *table_mr;
MemoryRegion *rsdp_mr;
MemoryRegion *linker_mr;
/* Is table patched? */
bool patched;
VirtGuestInfo *guest_info;
} AcpiBuildState;
static
void virt_acpi_build(VirtGuestInfo *guest_info, AcpiBuildTables *tables)
{
GArray *table_offsets;
unsigned dsdt, rsdt;
VirtAcpiCpuInfo cpuinfo;
GArray *tables_blob = tables->table_data;
virt_acpi_get_cpu_info(&cpuinfo);
table_offsets = g_array_new(false, true /* clear */,
sizeof(uint32_t));
bios_linker_loader_alloc(tables->linker, ACPI_BUILD_TABLE_FILE,
64, false /* high memory */);
/*
* The ACPI v5.1 tables for Hardware-reduced ACPI platform are:
* RSDP
* RSDT
* FADT
* GTDT
* MADT
* DSDT
*/
/* DSDT is pointed to by FADT */
dsdt = tables_blob->len;
build_dsdt(tables_blob, tables->linker, guest_info);
/* FADT MADT GTDT pointed to by RSDT */
acpi_add_table(table_offsets, tables_blob);
build_fadt(tables_blob, tables->linker, dsdt);
acpi_add_table(table_offsets, tables_blob);
build_madt(tables_blob, tables->linker, guest_info, &cpuinfo);
acpi_add_table(table_offsets, tables_blob);
build_gtdt(tables_blob, tables->linker);
acpi_add_table(table_offsets, tables_blob);
build_mcfg(tables_blob, tables->linker, guest_info);
/* RSDT is pointed to by RSDP */
rsdt = tables_blob->len;
build_rsdt(tables_blob, tables->linker, table_offsets);
/* RSDP is in FSEG memory, so allocate it separately */
build_rsdp(tables->rsdp, tables->linker, rsdt);
/* Cleanup memory that's no longer used. */
g_array_free(table_offsets, true);
}
static void acpi_ram_update(MemoryRegion *mr, GArray *data)
{
uint32_t size = acpi_data_len(data);
/* Make sure RAM size is correct - in case it got changed
* e.g. by migration */
memory_region_ram_resize(mr, size, &error_abort);
memcpy(memory_region_get_ram_ptr(mr), data->data, size);
memory_region_set_dirty(mr, 0, size);
}
static void virt_acpi_build_update(void *build_opaque, uint32_t offset)
{
AcpiBuildState *build_state = build_opaque;
AcpiBuildTables tables;
/* No state to update or already patched? Nothing to do. */
if (!build_state || build_state->patched) {
return;
}
build_state->patched = true;
acpi_build_tables_init(&tables);
virt_acpi_build(build_state->guest_info, &tables);
acpi_ram_update(build_state->table_mr, tables.table_data);
acpi_ram_update(build_state->rsdp_mr, tables.rsdp);
acpi_ram_update(build_state->linker_mr, tables.linker);
acpi_build_tables_cleanup(&tables, true);
}
static void virt_acpi_build_reset(void *build_opaque)
{
AcpiBuildState *build_state = build_opaque;
build_state->patched = false;
}
static MemoryRegion *acpi_add_rom_blob(AcpiBuildState *build_state,
GArray *blob, const char *name,
uint64_t max_size)
{
return rom_add_blob(name, blob->data, acpi_data_len(blob), max_size, -1,
name, virt_acpi_build_update, build_state);
}
static const VMStateDescription vmstate_virt_acpi_build = {
.name = "virt_acpi_build",
.version_id = 1,
.minimum_version_id = 1,
.fields = (VMStateField[]) {
VMSTATE_BOOL(patched, AcpiBuildState),
VMSTATE_END_OF_LIST()
},
};
void virt_acpi_setup(VirtGuestInfo *guest_info)
{
AcpiBuildTables tables;
AcpiBuildState *build_state;
if (!guest_info->fw_cfg) {
trace_virt_acpi_setup();
return;
}
if (!acpi_enabled) {
trace_virt_acpi_setup();
return;
}
build_state = g_malloc0(sizeof *build_state);
build_state->guest_info = guest_info;
acpi_build_tables_init(&tables);
virt_acpi_build(build_state->guest_info, &tables);
/* Now expose it all to Guest */
build_state->table_mr = acpi_add_rom_blob(build_state, tables.table_data,
ACPI_BUILD_TABLE_FILE,
ACPI_BUILD_TABLE_MAX_SIZE);
assert(build_state->table_mr != NULL);
build_state->linker_mr =
acpi_add_rom_blob(build_state, tables.linker, "etc/table-loader", 0);
fw_cfg_add_file(guest_info->fw_cfg, ACPI_BUILD_TPMLOG_FILE,
tables.tcpalog->data, acpi_data_len(tables.tcpalog));
build_state->rsdp_mr = acpi_add_rom_blob(build_state, tables.rsdp,
ACPI_BUILD_RSDP_FILE, 0);
qemu_register_reset(virt_acpi_build_reset, build_state);
virt_acpi_build_reset(build_state);
vmstate_register(NULL, 0, &vmstate_virt_acpi_build, build_state);
/* Cleanup tables but don't free the memory: we track it
* in build_state.
*/
acpi_build_tables_cleanup(&tables, false);
}

View File

@@ -31,7 +31,6 @@
#include "hw/sysbus.h"
#include "hw/arm/arm.h"
#include "hw/arm/primecell.h"
#include "hw/arm/virt.h"
#include "hw/devices.h"
#include "net/net.h"
#include "sysemu/block-backend.h"
@@ -44,7 +43,8 @@
#include "qemu/bitops.h"
#include "qemu/error-report.h"
#include "hw/pci-host/gpex.h"
#include "hw/arm/virt-acpi-build.h"
#define NUM_VIRTIO_TRANSPORTS 32
/* Number of external interrupt lines to configure the GIC with */
#define NUM_IRQS 128
@@ -60,6 +60,24 @@
#define GIC_FDT_IRQ_PPI_CPU_START 8
#define GIC_FDT_IRQ_PPI_CPU_WIDTH 8
enum {
VIRT_FLASH,
VIRT_MEM,
VIRT_CPUPERIPHS,
VIRT_GIC_DIST,
VIRT_GIC_CPU,
VIRT_UART,
VIRT_MMIO,
VIRT_RTC,
VIRT_FW_CFG,
VIRT_PCIE,
};
typedef struct MemMapEntry {
hwaddr base;
hwaddr size;
} MemMapEntry;
typedef struct VirtBoardInfo {
struct arm_boot_info bootinfo;
const char *cpu_model;
@@ -113,9 +131,14 @@ static const MemMapEntry a15memmap[] = {
[VIRT_FW_CFG] = { 0x09020000, 0x0000000a },
[VIRT_MMIO] = { 0x0a000000, 0x00000200 },
/* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
[VIRT_PCIE_MMIO] = { 0x10000000, 0x2eff0000 },
[VIRT_PCIE_PIO] = { 0x3eff0000, 0x00010000 },
[VIRT_PCIE_ECAM] = { 0x3f000000, 0x01000000 },
/*
* PCIE verbose map:
*
* MMIO window { 0x10000000, 0x2eff0000 },
* PIO window { 0x3eff0000, 0x00010000 },
* ECAM { 0x3f000000, 0x01000000 },
*/
[VIRT_PCIE] = { 0x10000000, 0x30000000 },
[VIRT_MEM] = { 0x40000000, 30ULL * 1024 * 1024 * 1024 },
};
@@ -266,10 +289,10 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi)
"arm,armv7-timer");
}
qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts",
GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags,
GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags,
GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_VIRT_IRQ, irqflags,
GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL2_IRQ, irqflags);
GIC_FDT_IRQ_TYPE_PPI, 13, irqflags,
GIC_FDT_IRQ_TYPE_PPI, 14, irqflags,
GIC_FDT_IRQ_TYPE_PPI, 11, irqflags,
GIC_FDT_IRQ_TYPE_PPI, 10, irqflags);
}
static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
@@ -363,8 +386,6 @@ static uint32_t create_gic(const VirtBoardInfo *vbi, qemu_irq *pic)
qdev_get_gpio_in(gicdev, ppibase + 27));
sysbus_connect_irq(gicbusdev, i, qdev_get_gpio_in(cpudev, ARM_CPU_IRQ));
sysbus_connect_irq(gicbusdev, i + smp_cpus,
qdev_get_gpio_in(cpudev, ARM_CPU_FIQ));
}
for (i = 0; i < NUM_IRQS; i++) {
@@ -621,14 +642,16 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, uint32_t gic_phandle,
static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
uint32_t gic_phandle)
{
hwaddr base_mmio = vbi->memmap[VIRT_PCIE_MMIO].base;
hwaddr size_mmio = vbi->memmap[VIRT_PCIE_MMIO].size;
hwaddr base_pio = vbi->memmap[VIRT_PCIE_PIO].base;
hwaddr size_pio = vbi->memmap[VIRT_PCIE_PIO].size;
hwaddr base_ecam = vbi->memmap[VIRT_PCIE_ECAM].base;
hwaddr size_ecam = vbi->memmap[VIRT_PCIE_ECAM].size;
hwaddr base = base_mmio;
int nr_pcie_buses = size_ecam / PCIE_MMCFG_SIZE_MIN;
hwaddr base = vbi->memmap[VIRT_PCIE].base;
hwaddr size = vbi->memmap[VIRT_PCIE].size;
hwaddr end = base + size;
hwaddr size_mmio;
hwaddr size_ioport = 64 * 1024;
int nr_pcie_buses = 16;
hwaddr size_ecam = PCIE_MMCFG_SIZE_MIN * nr_pcie_buses;
hwaddr base_mmio = base;
hwaddr base_ioport;
hwaddr base_ecam;
int irq = vbi->irqmap[VIRT_PCIE];
MemoryRegion *mmio_alias;
MemoryRegion *mmio_reg;
@@ -638,6 +661,10 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
char *nodename;
int i;
base_ecam = QEMU_ALIGN_DOWN(end - size_ecam, size_ecam);
base_ioport = QEMU_ALIGN_DOWN(base_ecam - size_ioport, size_ioport);
size_mmio = base_ioport - base;
dev = qdev_create(NULL, TYPE_GPEX_HOST);
qdev_init_nofail(dev);
@@ -660,7 +687,7 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
memory_region_add_subregion(get_system_memory(), base_mmio, mmio_alias);
/* Map IO port space */
sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio);
sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_ioport);
for (i = 0; i < GPEX_NUM_IRQS; i++) {
sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, pic[irq + i]);
@@ -680,7 +707,7 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
2, base_ecam, 2, size_ecam);
qemu_fdt_setprop_sized_cells(vbi->fdt, nodename, "ranges",
1, FDT_PCI_RANGE_IOPORT, 2, 0,
2, base_pio, 2, size_pio,
2, base_ioport, 2, size_ioport,
1, FDT_PCI_RANGE_MMIO, 2, base_mmio,
2, base_mmio, 2, size_mmio);
@@ -698,14 +725,6 @@ static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
return board->fdt;
}
static
void virt_guest_info_machine_done(Notifier *notifier, void *data)
{
VirtGuestInfoState *guest_info_state = container_of(notifier,
VirtGuestInfoState, machine_done);
virt_acpi_setup(&guest_info_state->info);
}
static void machvirt_init(MachineState *machine)
{
VirtMachineState *vms = VIRT_MACHINE(machine);
@@ -715,8 +734,6 @@ static void machvirt_init(MachineState *machine)
MemoryRegion *ram = g_new(MemoryRegion, 1);
const char *cpu_model = machine->cpu_model;
VirtBoardInfo *vbi;
VirtGuestInfoState *guest_info_state = g_malloc0(sizeof *guest_info_state);
VirtGuestInfo *guest_info = &guest_info_state->info;
uint32_t gic_phandle;
char **cpustr;
@@ -809,14 +826,6 @@ static void machvirt_init(MachineState *machine)
create_virtio_devices(vbi, pic);
create_fw_cfg(vbi);
rom_set_fw(fw_cfg_find());
guest_info->smp_cpus = smp_cpus;
guest_info->fw_cfg = fw_cfg_find();
guest_info->memmap = vbi->memmap;
guest_info->irqmap = vbi->irqmap;
guest_info_state->machine_done.notify = virt_guest_info_machine_done;
qemu_add_machine_init_done_notifier(&guest_info_state->machine_done);
vbi->bootinfo.ram_size = machine->ram_size;
vbi->bootinfo.kernel_filename = machine->kernel_filename;

View File

@@ -1,82 +0,0 @@
/*
* Xilinx ZynqMP EP108 board
*
* Copyright (C) 2015 Xilinx Inc
* Written by Peter Crosthwaite <peter.crosthwaite@xilinx.com>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the
* Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* for more details.
*/
#include "hw/arm/xlnx-zynqmp.h"
#include "hw/boards.h"
#include "qemu/error-report.h"
#include "exec/address-spaces.h"
typedef struct XlnxEP108 {
XlnxZynqMPState soc;
MemoryRegion ddr_ram;
} XlnxEP108;
/* Max 2GB RAM */
#define EP108_MAX_RAM_SIZE 0x80000000ull
static struct arm_boot_info xlnx_ep108_binfo;
static void xlnx_ep108_init(MachineState *machine)
{
XlnxEP108 *s = g_new0(XlnxEP108, 1);
Error *err = NULL;
object_initialize(&s->soc, sizeof(s->soc), TYPE_XLNX_ZYNQMP);
object_property_add_child(OBJECT(machine), "soc", OBJECT(&s->soc),
&error_abort);
object_property_set_bool(OBJECT(&s->soc), true, "realized", &err);
if (err) {
error_report("%s", error_get_pretty(err));
exit(1);
}
if (machine->ram_size > EP108_MAX_RAM_SIZE) {
error_report("WARNING: RAM size " RAM_ADDR_FMT " above max supported, "
"reduced to %llx", machine->ram_size, EP108_MAX_RAM_SIZE);
machine->ram_size = EP108_MAX_RAM_SIZE;
}
if (machine->ram_size <= 0x08000000) {
qemu_log("WARNING: RAM size " RAM_ADDR_FMT " is small for EP108",
machine->ram_size);
}
memory_region_allocate_system_memory(&s->ddr_ram, NULL, "ddr-ram",
machine->ram_size);
memory_region_add_subregion(get_system_memory(), 0, &s->ddr_ram);
xlnx_ep108_binfo.ram_size = machine->ram_size;
xlnx_ep108_binfo.kernel_filename = machine->kernel_filename;
xlnx_ep108_binfo.kernel_cmdline = machine->kernel_cmdline;
xlnx_ep108_binfo.initrd_filename = machine->initrd_filename;
xlnx_ep108_binfo.loader_start = 0;
arm_load_kernel(&s->soc.cpu[0], &xlnx_ep108_binfo);
}
static QEMUMachine xlnx_ep108_machine = {
.name = "xlnx-ep108",
.desc = "Xilinx ZynqMP EP108 board",
.init = xlnx_ep108_init,
};
static void xlnx_ep108_machine_init(void)
{
qemu_register_machine(&xlnx_ep108_machine);
}
machine_init(xlnx_ep108_machine_init);

View File

@@ -1,211 +0,0 @@
/*
* Xilinx Zynq MPSoC emulation
*
* Copyright (C) 2015 Xilinx Inc
* Written by Peter Crosthwaite <peter.crosthwaite@xilinx.com>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the
* Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* for more details.
*/
#include "hw/arm/xlnx-zynqmp.h"
#include "hw/intc/arm_gic_common.h"
#include "exec/address-spaces.h"
#define GIC_NUM_SPI_INTR 160
#define ARM_PHYS_TIMER_PPI 30
#define ARM_VIRT_TIMER_PPI 27
#define GIC_BASE_ADDR 0xf9000000
#define GIC_DIST_ADDR 0xf9010000
#define GIC_CPU_ADDR 0xf9020000
static const uint64_t gem_addr[XLNX_ZYNQMP_NUM_GEMS] = {
0xFF0B0000, 0xFF0C0000, 0xFF0D0000, 0xFF0E0000,
};
static const int gem_intr[XLNX_ZYNQMP_NUM_GEMS] = {
57, 59, 61, 63,
};
static const uint64_t uart_addr[XLNX_ZYNQMP_NUM_UARTS] = {
0xFF000000, 0xFF010000,
};
static const int uart_intr[XLNX_ZYNQMP_NUM_UARTS] = {
21, 22,
};
typedef struct XlnxZynqMPGICRegion {
int region_index;
uint32_t address;
} XlnxZynqMPGICRegion;
static const XlnxZynqMPGICRegion xlnx_zynqmp_gic_regions[] = {
{ .region_index = 0, .address = GIC_DIST_ADDR, },
{ .region_index = 1, .address = GIC_CPU_ADDR, },
};
static inline int arm_gic_ppi_index(int cpu_nr, int ppi_index)
{
return GIC_NUM_SPI_INTR + cpu_nr * GIC_INTERNAL + ppi_index;
}
static void xlnx_zynqmp_init(Object *obj)
{
XlnxZynqMPState *s = XLNX_ZYNQMP(obj);
int i;
for (i = 0; i < XLNX_ZYNQMP_NUM_CPUS; i++) {
object_initialize(&s->cpu[i], sizeof(s->cpu[i]),
"cortex-a53-" TYPE_ARM_CPU);
object_property_add_child(obj, "cpu[*]", OBJECT(&s->cpu[i]),
&error_abort);
}
object_initialize(&s->gic, sizeof(s->gic), TYPE_ARM_GIC);
qdev_set_parent_bus(DEVICE(&s->gic), sysbus_get_default());
for (i = 0; i < XLNX_ZYNQMP_NUM_GEMS; i++) {
object_initialize(&s->gem[i], sizeof(s->gem[i]), TYPE_CADENCE_GEM);
qdev_set_parent_bus(DEVICE(&s->gem[i]), sysbus_get_default());
}
for (i = 0; i < XLNX_ZYNQMP_NUM_UARTS; i++) {
object_initialize(&s->uart[i], sizeof(s->uart[i]), TYPE_CADENCE_UART);
qdev_set_parent_bus(DEVICE(&s->uart[i]), sysbus_get_default());
}
}
static void xlnx_zynqmp_realize(DeviceState *dev, Error **errp)
{
XlnxZynqMPState *s = XLNX_ZYNQMP(dev);
MemoryRegion *system_memory = get_system_memory();
uint8_t i;
qemu_irq gic_spi[GIC_NUM_SPI_INTR];
Error *err = NULL;
qdev_prop_set_uint32(DEVICE(&s->gic), "num-irq", GIC_NUM_SPI_INTR + 32);
qdev_prop_set_uint32(DEVICE(&s->gic), "revision", 2);
qdev_prop_set_uint32(DEVICE(&s->gic), "num-cpu", XLNX_ZYNQMP_NUM_CPUS);
object_property_set_bool(OBJECT(&s->gic), true, "realized", &err);
if (err) {
error_propagate((errp), (err));
return;
}
assert(ARRAY_SIZE(xlnx_zynqmp_gic_regions) == XLNX_ZYNQMP_GIC_REGIONS);
for (i = 0; i < XLNX_ZYNQMP_GIC_REGIONS; i++) {
SysBusDevice *gic = SYS_BUS_DEVICE(&s->gic);
const XlnxZynqMPGICRegion *r = &xlnx_zynqmp_gic_regions[i];
MemoryRegion *mr = sysbus_mmio_get_region(gic, r->region_index);
uint32_t addr = r->address;
int j;
sysbus_mmio_map(gic, r->region_index, addr);
for (j = 0; j < XLNX_ZYNQMP_GIC_ALIASES; j++) {
MemoryRegion *alias = &s->gic_mr[i][j];
addr += XLNX_ZYNQMP_GIC_REGION_SIZE;
memory_region_init_alias(alias, OBJECT(s), "zynqmp-gic-alias", mr,
0, XLNX_ZYNQMP_GIC_REGION_SIZE);
memory_region_add_subregion(system_memory, addr, alias);
}
}
for (i = 0; i < XLNX_ZYNQMP_NUM_CPUS; i++) {
qemu_irq irq;
object_property_set_int(OBJECT(&s->cpu[i]), QEMU_PSCI_CONDUIT_SMC,
"psci-conduit", &error_abort);
if (i > 0) {
/* Secondary CPUs start in PSCI powered-down state */
object_property_set_bool(OBJECT(&s->cpu[i]), true,
"start-powered-off", &error_abort);
}
object_property_set_int(OBJECT(&s->cpu[i]), GIC_BASE_ADDR,
"reset-cbar", &err);
if (err) {
error_propagate((errp), (err));
return;
}
object_property_set_bool(OBJECT(&s->cpu[i]), true, "realized", &err);
if (err) {
error_propagate((errp), (err));
return;
}
sysbus_connect_irq(SYS_BUS_DEVICE(&s->gic), i,
qdev_get_gpio_in(DEVICE(&s->cpu[i]), ARM_CPU_IRQ));
irq = qdev_get_gpio_in(DEVICE(&s->gic),
arm_gic_ppi_index(i, ARM_PHYS_TIMER_PPI));
qdev_connect_gpio_out(DEVICE(&s->cpu[i]), 0, irq);
irq = qdev_get_gpio_in(DEVICE(&s->gic),
arm_gic_ppi_index(i, ARM_VIRT_TIMER_PPI));
qdev_connect_gpio_out(DEVICE(&s->cpu[i]), 1, irq);
}
for (i = 0; i < GIC_NUM_SPI_INTR; i++) {
gic_spi[i] = qdev_get_gpio_in(DEVICE(&s->gic), i);
}
for (i = 0; i < XLNX_ZYNQMP_NUM_GEMS; i++) {
NICInfo *nd = &nd_table[i];
if (nd->used) {
qemu_check_nic_model(nd, TYPE_CADENCE_GEM);
qdev_set_nic_properties(DEVICE(&s->gem[i]), nd);
}
object_property_set_bool(OBJECT(&s->gem[i]), true, "realized", &err);
if (err) {
error_propagate((errp), (err));
return;
}
sysbus_mmio_map(SYS_BUS_DEVICE(&s->gem[i]), 0, gem_addr[i]);
sysbus_connect_irq(SYS_BUS_DEVICE(&s->gem[i]), 0,
gic_spi[gem_intr[i]]);
}
for (i = 0; i < XLNX_ZYNQMP_NUM_UARTS; i++) {
object_property_set_bool(OBJECT(&s->uart[i]), true, "realized", &err);
if (err) {
error_propagate((errp), (err));
return;
}
sysbus_mmio_map(SYS_BUS_DEVICE(&s->uart[i]), 0, uart_addr[i]);
sysbus_connect_irq(SYS_BUS_DEVICE(&s->uart[i]), 0,
gic_spi[uart_intr[i]]);
}
}
static void xlnx_zynqmp_class_init(ObjectClass *oc, void *data)
{
DeviceClass *dc = DEVICE_CLASS(oc);
dc->realize = xlnx_zynqmp_realize;
}
static const TypeInfo xlnx_zynqmp_type_info = {
.name = TYPE_XLNX_ZYNQMP,
.parent = TYPE_DEVICE,
.instance_size = sizeof(XlnxZynqMPState),
.instance_init = xlnx_zynqmp_init,
.class_init = xlnx_zynqmp_class_init,
};
static void xlnx_zynqmp_register_types(void)
{
type_register_static(&xlnx_zynqmp_type_info);
}
type_init(xlnx_zynqmp_register_types)

Some files were not shown because too many files have changed in this diff Show More