sdk/emulator/qemu.git
8 years agoblock: add support for encryption secrets in block I/O tests
Daniel P. Berrange [Tue, 10 May 2016 16:11:28 +0000 (17:11 +0100)]
block: add support for encryption secrets in block I/O tests

The LUKS block driver tests will require the ability to specify
encryption secrets with block devices. This requires using the
--object argument to qemu-img/qemu-io to create a 'secret'
object.

When the IMGKEYSECRET env variable is set, it provides the
password to be associated with a secret called 'keysec0'

The _qemu_img_wrapper function isn't modified as that needs
to cope with differing syntax for subcommands, so can't be
made to use the image opts syntax unconditionally.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1462896689-18450-3-git-send-email-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
8 years agoblock: add support for --image-opts in block I/O tests
Daniel P. Berrange [Tue, 10 May 2016 16:11:27 +0000 (17:11 +0100)]
block: add support for --image-opts in block I/O tests

Currently all block tests use the traditional syntax for images
just specifying a filename. To support the LUKS driver without
resorting to JSON, the tests need to be able to use the new
--image-opts argument to qemu-img and qemu-io.

This introduces a new env variable IMGOPTSSYNTAX. If this is
set to 'true', then qemu-img/qemu-io should use --image-opts.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1462896689-18450-2-git-send-email-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
8 years agoqemu-io: Add 'write -z -u' to test MAY_UNMAP flag
Eric Blake [Sun, 8 May 2016 03:16:45 +0000 (21:16 -0600)]
qemu-io: Add 'write -z -u' to test MAY_UNMAP flag

Make it easier to control whether the BDRV_REQ_MAY_UNMAP flag
can be passed through a write_zeroes command, by adding the '-u'
flag to qemu-io 'write -z' and 'aio_write -z'.  To be useful,
the device has to be opened with BDRV_O_UNMAP (done by default
in qemu-io, but can be made explicit with '-d unmap').

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1462677405-4752-7-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
8 years agoqemu-io: Add 'write -f' to test FUA flag
Eric Blake [Sun, 8 May 2016 03:16:44 +0000 (21:16 -0600)]
qemu-io: Add 'write -f' to test FUA flag

Make it easier to test block drivers with BDRV_REQ_FUA in
.supported_write_flags, by adding the '-f' flag to qemu-io to
conditionally pass the flag through to specific writes ('write',
'write -z', 'writev', 'aio_write', 'aio_write -z'). You'll want
to use 'qemu-io -t none' to actually make -f useful (as
otherwise, the default writethrough mode automatically sets the
FUA bit on every write).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1462677405-4752-6-git-send-email-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
8 years agoqemu-io: Allow unaligned access by default
Eric Blake [Sun, 8 May 2016 03:16:43 +0000 (21:16 -0600)]
qemu-io: Allow unaligned access by default

There's no reason to require the user to specify a flag just so
they can pass in unaligned numbers.  Keep 'read -p' and 'write -p'
as no-ops so that I don't have to hunt down and update all users
of qemu-io, but otherwise make their behavior default as 'read' and
'write'.  Also fix 'write -z', 'readv', 'writev', 'writev',
'aio_read', 'aio_write', and 'aio_write -z'.  For now, 'read -b',
'write -b', and 'write -c' still require alignment (and 'multiwrite',
but that's slated to die soon).

qemu-iotest 23 is updated to match, as the only test that was
previously explicitly expecting an error on an unaligned request.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1462677405-4752-5-git-send-email-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
8 years agoqemu-io: Use bool for command line flags
Eric Blake [Sun, 8 May 2016 03:16:42 +0000 (21:16 -0600)]
qemu-io: Use bool for command line flags

We require a C99 compiler; let's use it to express what we
really mean.

(Yes, we now have an instance of 'if (bool + bool + bool > 1)',
which, although semantically valid C, looks ugly; it gets
cleaned up later.)

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1462677405-4752-4-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
8 years agoqemu-io: Make 'open' subcommand more like command line
Eric Blake [Sun, 8 May 2016 03:16:41 +0000 (21:16 -0600)]
qemu-io: Make 'open' subcommand more like command line

The command line defaults to BDRV_O_UNMAP, but can use
-d to reset it.  Meanwhile, the 'open' subcommand was
defaulting to no discards, with no way to set it.

The command line has both -n and -tMODE to set a variety
of cache modes, but the 'open' subcommand had only -n.

The 'open' subcommand had no way to set BDRV_O_NATIVE_AIO.

Note that the 'reopen' subcommand uses '-c' where the
command line and 'open' use -t.  Making that consistent
would be a separate patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1462677405-4752-3-git-send-email-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
8 years agoqemu-io: Add missing option documentation
Eric Blake [Sun, 8 May 2016 03:16:40 +0000 (21:16 -0600)]
qemu-io: Add missing option documentation

The Usage: summary is missing several options, but rather than
having to maintain it, it's simpler to just state [OPTIONS],
since the options are spelled out below.

Commit 499afa2 added --image-opts, but forgot to document it in
--help.  Likewise for commit 9e8f183 and -d/--discard.

Commit e3aff4f6 put "-o/--offset" in the long opts, but it has
never been honored.

Add a note that '-n' is short for '-t none'.

Commit 9a2d77ad killed the -C option, but forgot to undocument
it for the 'open' subcommand.

Finally, commit 10d9d75 removed -g/--growable, but forgot to
cull it from the valid short options.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1462677405-4752-2-git-send-email-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
8 years agoqmp: add monitor command to add/remove a child
Wen Congyang [Tue, 10 May 2016 07:36:39 +0000 (15:36 +0800)]
qmp: add monitor command to add/remove a child

The new QMP command name is x-blockdev-change. It's just for adding/removing
quorum's child now, and doesn't support all kinds of children, all kinds of
operations, nor all block drivers. So it is experimental now.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1462865799-19402-4-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
8 years agoquorum: implement bdrv_add_child() and bdrv_del_child()
Wen Congyang [Tue, 10 May 2016 07:36:38 +0000 (15:36 +0800)]
quorum: implement bdrv_add_child() and bdrv_del_child()

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Message-id: 1462865799-19402-3-git-send-email-xiecl.fnst@cn.fujitsu.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
8 years agoAdd new block driver interface to add/delete a BDS's child
Wen Congyang [Tue, 10 May 2016 07:36:37 +0000 (15:36 +0800)]
Add new block driver interface to add/delete a BDS's child

In some cases, we want to take a quorum child offline, and take
another child online.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1462865799-19402-2-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
8 years agoqemu-img: check block status of backing file when converting.
Ren Kimura [Wed, 27 Apr 2016 16:04:58 +0000 (01:04 +0900)]
qemu-img: check block status of backing file when converting.

When converting images, check the block status of its backing file chain
to avoid needlessly reading zeros.

Signed-off-by: Ren Kimura <rkx1209dev@gmail.com>
Message-id: 1461773098-20356-1-git-send-email-rkx1209dev@gmail.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
8 years agoiotests: fix the redirection order in 083
Wei Jiangang [Tue, 26 Apr 2016 10:13:21 +0000 (18:13 +0800)]
iotests: fix the redirection order in 083

It should redirect stdout to /dev/null first,
then redirect stderr to whatever stdout currently points at.

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Message-id: 1461665601-14908-1-git-send-email-weijg.fnst@cn.fujitsu.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
8 years agoblock: Inactivate all children
Fam Zheng [Wed, 11 May 2016 02:45:35 +0000 (10:45 +0800)]
block: Inactivate all children

Currently we only inactivate the top BDS. Actually bdrv_inactivate
should be the opposite of bdrv_invalidate_cache.

Recurse into the whole subtree instead.

Because a node may have multiple parents, and because once
BDRV_O_INACTIVE is set for a node, further writes are not allowed, we
cannot interleave flag settings and .bdrv_inactivate calls (that may
submit write to other nodes in a graph) within a single pass. Therefore
two passes are used here.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: Drop superfluous invalidating bs->file from drivers
Fam Zheng [Wed, 11 May 2016 02:45:34 +0000 (10:45 +0800)]
block: Drop superfluous invalidating bs->file from drivers

Now they are invalidated by the block layer, so it's not necessary to
do this in block drivers' implementations of .bdrv_invalidate_cache.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: Invalidate all children
Fam Zheng [Wed, 11 May 2016 02:45:33 +0000 (10:45 +0800)]
block: Invalidate all children

Currently we only recurse to bs->file, which will miss the children in quorum
and VMDK.

Recurse into the whole subtree to avoid that.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agonbd: Simplify client FUA handling
Eric Blake [Tue, 3 May 2016 22:39:08 +0000 (16:39 -0600)]
nbd: Simplify client FUA handling

Now that the block layer honors per-bds FUA support, we don't
have to duplicate the fallback flush at the NBD layer.  The
static function nbd_co_writev_flags() is no longer needed, and
the driver can just directly use nbd_client_co_writev().

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: Honor BDRV_REQ_FUA during write_zeroes
Eric Blake [Tue, 3 May 2016 22:39:07 +0000 (16:39 -0600)]
block: Honor BDRV_REQ_FUA during write_zeroes

The block layer has a couple of cases where it can lose
Force Unit Access semantics when writing a large block of
zeroes, such that the request returns before the zeroes
have been guaranteed to land on underlying media.

SCSI does not support FUA during WRITESAME(10/16); FUA is only
supported if it falls back to WRITE(10/16).  But where the
underlying device is new enough to not need a fallback, it
means that any upper layer request with FUA semantics was
silently ignoring BDRV_REQ_FUA.

Conversely, NBD has situations where it can support FUA but not
ZERO_WRITE; when that happens, the generic block layer fallback
to bdrv_driver_pwritev() (or the older bdrv_co_writev() in qemu
2.6) was losing the FUA flag.

The problem of losing flags unrelated to ZERO_WRITE has been
latent in bdrv_co_do_write_zeroes() since commit aa7bfbff, but
back then, it did not matter because there was no FUA flag.  It
became observable when commit 93f5e6d8 paved the way for flags
that can impact correctness, when we should have been using
bdrv_co_writev_flags() with modified flags.  Compare to commit
9eeb6dd, which got flag manipulation right in
bdrv_co_do_zero_pwritev().

Symptoms: I tested with qemu-io with default writethrough cache
(which is supposed to use FUA semantics on every write), and
targetted an NBD client connected to a server that intentionally
did not advertise NBD_FLAG_SEND_FUA.  When doing 'write 0 512',
the NBD client sent two operations (NBD_CMD_WRITE then
NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing
'write -z 0 512', the NBD client sent only NBD_CMD_WRITE.

The fix is do to a cleanup bdrv_co_flush() at the end of the
operation if any step in the middle relied on a BDS that does
not natively support FUA for that step (note that we don't
need to flush after every operation, if the operation is broken
into chunks based on bounce-buffer sizing).  Each BDS gains a
new flag .supported_zero_flags, which parallels the use of
.supported_write_flags but only when accessing a zero write
operation (the flags MUST be different, because of SCSI having
different semantics based on WRITE vs. WRITESAME; and also
because BDRV_REQ_MAY_UNMAP only makes sense on zero writes).

Also fix some documentation to describe -ENOTSUP semantics,
particularly since iscsi depends on those semantics.

Down the road, we may want to add a driver where its
.bdrv_co_pwritev() honors all three of BDRV_REQ_FUA,
BDRV_REQ_ZERO_WRITE, and BDRV_REQ_MAY_UNMAP, and advertise
this via bs->supported_write_flags for blocks opened by that
driver; such a driver should NOT supply .bdrv_co_write_zeroes
nor .supported_zero_flags.  But none of the drivers touched
in this patch want to do that (the act of writing zeroes is
different enough from normal writes to deserve a second
callback).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: Make supported_write_flags a per-bds property
Eric Blake [Tue, 3 May 2016 22:39:06 +0000 (16:39 -0600)]
block: Make supported_write_flags a per-bds property

Pre-patch, .supported_write_flags lives at the driver level, which
means we are blindly declaring that all block devices using a
given driver will either equally support FUA, or that we need a
fallback at the block layer.  But there are drivers where FUA
support is a per-block decision: the NBD block driver is dependent
on the remote server advertising NBD_FLAG_SEND_FUA (and has
fallback code to duplicate the flush that the block layer would do
if NBD had not set .supported_write_flags); and the iscsi block
driver is dependent on the mode sense bits advertised by the
underlying device (and is currently silently ignoring FUA requests
if the underlying device does not support FUA).

The fix is to make supported flags as a per-BDS option, set during
.bdrv_open().  This patch moves the variable and fixes NBD and iscsi
to set it only conditionally; later patches will then further
simplify the NBD driver to quit duplicating work done at the block
layer, as well as tackle the fact that SCSI does not support FUA
semantics on WRITESAME(10/16) but only on WRITE(10/16).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoqcow2: improve qcow2_co_write_zeroes()
Denis V. Lunev [Wed, 11 May 2016 07:00:14 +0000 (10:00 +0300)]
qcow2: improve qcow2_co_write_zeroes()

There is a possibility that qcow2_co_write_zeroes() will be called
with the partial block. This could be synthetically triggered with
    qemu-io -c "write -z 32k 4k"
and can happen in the real life in qemu-nbd. The latter happens under
the following conditions:
    (1) qemu-nbd is started with --detect-zeroes=on and is connected to the
        kernel NBD client
    (2) third party program opens kernel NBD device with O_DIRECT
    (3) third party program performs write operation with memory buffer
        not aligned to the page
In this case qcow2_co_write_zeroes() is unable to perform the operation
and mark entire cluster as zeroed and returns ENOTSUP. Thus the caller
switches to non-optimized version and writes real zeroes to the disk.

The patch creates a shortcut. If the block is read as zeroes, f.e. if
it is unallocated, the request is extended to cover full block.
User-visible situation with this block is not changed. Before the patch
the block is filled in the image with real zeroes. After that patch the
block is marked as zeroed in metadata. Thus any subsequent changes in
backing store chain are not affected.

Kevin, thank you for a cool suggestion.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: Kill unused sector-based blk_* functions
Eric Blake [Fri, 6 May 2016 16:26:45 +0000 (10:26 -0600)]
block: Kill unused sector-based blk_* functions

Now that there are no remaining clients, we can drop the
sector-based blk_read(), blk_write(), blk_aio_readv(), and
blk_aio_writev().  Sadly, there are still remaining
sector-based interfaces, such as blk_*discard(), or
blk_write_compressed(); those will have to wait for another
day.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoqemu-io: Switch to byte-based block access
Eric Blake [Fri, 6 May 2016 16:26:44 +0000 (10:26 -0600)]
qemu-io: Switch to byte-based block access

qemu-io is the last user of several sector-based interfaces.
This patch upgrades to the new interfaces under the hood,
then deletes the resulting dead code.  Note that for maximum
back-compat, while the -p option is no longer required to get
blk_pread(), it is still needed to allow for unaligned access;
this is because qemu-iotest 23 relies on qemu-io rejecting
unaligned accesses without -p.  A later patch may clean up the
interface to be more user-friendly, but it's better to separate
what's done under the hood from what the user sees.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoqemu-img: Switch to byte-based block access
Eric Blake [Fri, 6 May 2016 16:26:43 +0000 (10:26 -0600)]
qemu-img: Switch to byte-based block access

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agonbd: Switch to byte-based block access
Eric Blake [Fri, 6 May 2016 16:26:42 +0000 (10:26 -0600)]
nbd: Switch to byte-based block access

Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Add a constant for our magic number 512, to make it obvious
that this size will NOT change even if BDRV_SECTOR_SIZE does,
even though the two happen to be the same for now.  Split
assignments from conditionals to keep checkpatch.pl happy.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoatapi: Switch to byte-based block access
Eric Blake [Fri, 6 May 2016 16:26:41 +0000 (10:26 -0600)]
atapi: Switch to byte-based block access

Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Add new defines ATAPI_SECTOR_BITS and ATAPI_SECTOR_SIZE to
use anywhere we were previously scaling BDRV_SECTOR_* by 4,
for better legibility.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agom25p80: Switch to byte-based block access
Eric Blake [Fri, 6 May 2016 16:26:40 +0000 (10:26 -0600)]
m25p80: Switch to byte-based block access

Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Likewise for blk_aio_readv() and blk_aio_writev().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agosd: Switch to byte-based block access
Eric Blake [Fri, 6 May 2016 16:26:39 +0000 (10:26 -0600)]
sd: Switch to byte-based block access

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Greatly simplifies the code, now that we let the block layer
take care of alignment and read-modify-write on our behalf :)
In fact, we no longer need to include 'buf' in the migration
stream (although we do have to ensure that the stream remains
compatible).

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agopflash: Switch to byte-based block access
Eric Blake [Fri, 6 May 2016 16:26:38 +0000 (10:26 -0600)]
pflash: Switch to byte-based block access

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoonenand: Switch to byte-based block access
Eric Blake [Fri, 6 May 2016 16:26:37 +0000 (10:26 -0600)]
onenand: Switch to byte-based block access

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

This particular device picks its size during onenand_initfn(),
and can be at most 0x80000000 bytes; therefore, shifting an
'int sec' request to get back to a byte offset should never
overflow 32 bits.  But adding assertions to document that point
should not hurt.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agonand: Switch to byte-based block access
Eric Blake [Fri, 6 May 2016 16:26:36 +0000 (10:26 -0600)]
nand: Switch to byte-based block access

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

This file is doing some complex computations to map various
flash page sizes (256, 512, and 2048) atop generic uses of
512-byte sector operations.  Perhaps someone will want to tidy
up the file for fewer gymnastics in managing addresses and
offsets, and less wasteful visits of 256-byte pages, but it
was out of scope for this series, where I just went with the
mechanical conversion.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agofdc: Switch to byte-based block access
Eric Blake [Fri, 6 May 2016 16:26:35 +0000 (10:26 -0600)]
fdc: Switch to byte-based block access

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoxen_disk: Switch to byte-based aio block access
Eric Blake [Fri, 6 May 2016 16:26:34 +0000 (10:26 -0600)]
xen_disk: Switch to byte-based aio block access

Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agovirtio: Switch to byte-based aio block access
Eric Blake [Fri, 6 May 2016 16:26:33 +0000 (10:26 -0600)]
virtio: Switch to byte-based aio block access

Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.

The trace is modified at the same time, and nb_sectors is now
unused.  Fix a comment typo while in the vicinity.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoscsi-disk: Switch to byte-based aio block access
Eric Blake [Fri, 6 May 2016 16:26:32 +0000 (10:26 -0600)]
scsi-disk: Switch to byte-based aio block access

Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.

As part of the cleanup, scsi_init_iovec() no longer needs to return
a value, and reword a comment.

[ kwolf: Fix read accounting change ]

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoide: Switch to byte-based aio block access
Eric Blake [Fri, 6 May 2016 16:26:31 +0000 (10:26 -0600)]
ide: Switch to byte-based aio block access

Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.

The patch had to touch multiple files at once, because dma_blk_io()
takes pointers to the functions, and ide_issue_trim() piggybacks on
the same interface (while ignoring offset under the hood).

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: Introduce byte-based aio read/write
Eric Blake [Fri, 6 May 2016 16:26:30 +0000 (10:26 -0600)]
block: Introduce byte-based aio read/write

blk_aio_readv() and blk_aio_writev() are annoying in that they
can't access sub-sector granularity, and cannot pass flags.
Also, they require the caller to pass redundant information
about the size of the I/O (qiov->size in bytes must match
nb_sectors in sectors).

Add new blk_aio_preadv() and blk_aio_pwritev() functions to fix
the flaws. The next few patches will upgrade callers, then
finally delete the old interfaces.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: Switch blk_*write_zeroes() to byte interface
Eric Blake [Fri, 6 May 2016 16:26:29 +0000 (10:26 -0600)]
block: Switch blk_*write_zeroes() to byte interface

Sector-based blk_write() should die; convert the one-off
variant blk_write_zeroes() to use an offset/count interface
instead.  Likewise for blk_co_write_zeroes() and
blk_aio_write_zeroes().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: Switch blk_read_unthrottled() to byte interface
Eric Blake [Fri, 6 May 2016 16:26:28 +0000 (10:26 -0600)]
block: Switch blk_read_unthrottled() to byte interface

Sector-based blk_read() should die; convert the one-off
variant blk_read_unthrottled().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: Allow BDRV_REQ_FUA through blk_pwrite()
Eric Blake [Fri, 6 May 2016 16:26:27 +0000 (10:26 -0600)]
block: Allow BDRV_REQ_FUA through blk_pwrite()

We have several block drivers that understand BDRV_REQ_FUA,
and emulate it in the block layer for the rest by a full flush.
But without a way to actually request BDRV_REQ_FUA during a
pass-through blk_pwrite(), FUA-aware block drivers like NBD are
forced to repeat the emulation logic of a full flush regardless
of whether the backend they are writing to could do it more
efficiently.

This patch just wires up a flags argument; followup patches
will actually make use of it in the NBD driver and in qemu-io.

Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoqemu-io: Fix memory leak in 'aio_write -z'
Kevin Wolf [Mon, 9 May 2016 10:03:04 +0000 (12:03 +0200)]
qemu-io: Fix memory leak in 'aio_write -z'

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
8 years agoAllow users to specify the vmdk virtual hardware version.
Janne Karhunen [Tue, 3 May 2016 09:43:30 +0000 (02:43 -0700)]
Allow users to specify the vmdk virtual hardware version.

Vmdk images have metadata to indicate the vmware virtual
hardware version image was created/tested to run with.
Allow users to specify that version via new 'hwversion'
option.

[ kwolf: Adjust qemu-iotests common.filter ]

Signed-off-by: Janne Karhunen <Janne.Karhunen@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: always compile-check debug prints
Zhou Jie [Fri, 29 Apr 2016 05:12:01 +0000 (13:12 +0800)]
block: always compile-check debug prints

Files with conditional debug statements should ensure that the printf is
always compiled. This prevents bitrot of the format string of the debug
statement. And switch debug output to stderr.

Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: Fix typo in comment
Wei Jiangang [Tue, 26 Apr 2016 10:12:43 +0000 (18:12 +0800)]
block: Fix typo in comment

s/imlement/implement/

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: Remove BlockDriver.bdrv_read/write
Kevin Wolf [Tue, 26 Apr 2016 15:28:25 +0000 (17:28 +0200)]
block: Remove BlockDriver.bdrv_read/write

There are no block drivers left that implement the old .bdrv_read/write
interface, so it can be removed now. This gets us rid of the
corresponding emulation functions, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agovvfat: Implement .bdrv_co_preadv/pwritev interfaces
Kevin Wolf [Tue, 26 Apr 2016 15:14:08 +0000 (17:14 +0200)]
vvfat: Implement .bdrv_co_preadv/pwritev interfaces

This doesn't really convert any of the actual vvfat logic to use
vectored I/O (and it's doubtful whether that would make sense), but
instead just adapts the wrappers to the modern interface.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agovpc: Implement .bdrv_co_pwritev() interface
Kevin Wolf [Tue, 26 Apr 2016 14:24:46 +0000 (16:24 +0200)]
vpc: Implement .bdrv_co_pwritev() interface

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agovpc: Implement .bdrv_co_preadv() interface
Kevin Wolf [Tue, 26 Apr 2016 14:24:46 +0000 (16:24 +0200)]
vpc: Implement .bdrv_co_preadv() interface

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agovmdk: Implement .bdrv_co_pwritev() interface
Kevin Wolf [Tue, 26 Apr 2016 11:39:11 +0000 (13:39 +0200)]
vmdk: Implement .bdrv_co_pwritev() interface

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agovmdk: Implement .bdrv_co_preadv() interface
Kevin Wolf [Mon, 25 Apr 2016 15:34:41 +0000 (17:34 +0200)]
vmdk: Implement .bdrv_co_preadv() interface

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agovmdk: Add vmdk_find_offset_in_cluster()
Kevin Wolf [Mon, 25 Apr 2016 15:14:48 +0000 (17:14 +0200)]
vmdk: Add vmdk_find_offset_in_cluster()

This is a byte granularity version of vmdk_find_index_in_cluster().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agovdi: Implement .bdrv_co_pwritev() interface
Kevin Wolf [Mon, 25 Apr 2016 14:22:39 +0000 (16:22 +0200)]
vdi: Implement .bdrv_co_pwritev() interface

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agovdi: Implement .bdrv_co_preadv() interface
Kevin Wolf [Mon, 25 Apr 2016 14:22:39 +0000 (16:22 +0200)]
vdi: Implement .bdrv_co_preadv() interface

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agodmg: Implement .bdrv_co_preadv() interface
Kevin Wolf [Mon, 25 Apr 2016 13:43:09 +0000 (15:43 +0200)]
dmg: Implement .bdrv_co_preadv() interface

This implements .bdrv_co_preadv() for the cloop block driver. While
updating the error paths, change -1 to a valid -errno code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agocloop: Implement .bdrv_co_preadv() interface
Kevin Wolf [Mon, 25 Apr 2016 13:21:46 +0000 (15:21 +0200)]
cloop: Implement .bdrv_co_preadv() interface

This implements .bdrv_co_preadv() for the cloop block driver. While
updating the error paths, change -1 to a valid -errno code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agobochs: Implement .bdrv_co_preadv() interface
Kevin Wolf [Mon, 25 Apr 2016 12:53:22 +0000 (14:53 +0200)]
bochs: Implement .bdrv_co_preadv() interface

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agoblock: Introduce .bdrv_co_preadv/pwritev BlockDriver function
Kevin Wolf [Mon, 25 Apr 2016 09:25:18 +0000 (11:25 +0200)]
block: Introduce .bdrv_co_preadv/pwritev BlockDriver function

Many parts of the block layer are already byte granularity. The block
driver interface, however, was still missing an interface that allows
making use of this. This patch introduces a new BlockDriver interface,
which is based on coroutines, vectored, has flags and uses a byte
granularity. This is now the preferred interface for new drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agoblock: Rename bdrv_co_do_preadv/writev to bdrv_co_preadv/writev
Kevin Wolf [Mon, 25 Apr 2016 12:57:23 +0000 (14:57 +0200)]
block: Rename bdrv_co_do_preadv/writev to bdrv_co_preadv/writev

It used to be an internal helper function just for implementing
bdrv_co_do_readv/writev(), but now that it's a public interface, it
deserves a name without "do" in it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agoblock: Support AIO drivers in bdrv_driver_preadv/pwritev()
Kevin Wolf [Mon, 25 Apr 2016 12:13:12 +0000 (14:13 +0200)]
block: Support AIO drivers in bdrv_driver_preadv/pwritev()

Instead of registering emulation functions as .bdrv_co_writev, just
directly check whether the function is there or not, and use the AIO
interface if it isn't. This makes the read/write functions more
consistent with how things are done in other places (flush, discard,
etc.)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agoblock: Introduce bdrv_driver_pwritev()
Kevin Wolf [Mon, 25 Apr 2016 10:13:39 +0000 (12:13 +0200)]
block: Introduce bdrv_driver_pwritev()

This is a function that simply calls into the block driver for doing a
write, providing the byte granularity interface we want to eventually
have everywhere, and using whatever interface that driver supports.

This one is a bit more interesting than the version for reads: It adds
support for .bdrv_co_writev_flags() everywhere, so that drivers
implementing this function can drop .bdrv_co_writev() now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agoblock: Introduce bdrv_driver_preadv()
Kevin Wolf [Mon, 25 Apr 2016 09:46:41 +0000 (11:46 +0200)]
block: Introduce bdrv_driver_preadv()

This is a function that simply calls into the block driver for doing a
read, providing the byte granularity interface we want to eventually
have everywhere, and using whatever interface that driver supports.

For now, this is just a wrapper for calling bs->drv->bdrv_co_readv().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
8 years agolinux-aio: make it more type safe
Paolo Bonzini [Thu, 7 Apr 2016 16:33:35 +0000 (18:33 +0200)]
linux-aio: make it more type safe

Replace void* with an opaque LinuxAioState type.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
Paolo Bonzini [Thu, 7 Apr 2016 16:33:34 +0000 (18:33 +0200)]
block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end

Extract the handling of io_plug "depth" from linux-aio.c and let the
main bdrv_drain loop do nothing but wait on I/O.

Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug
now operate on all children.  The visit order is now symmetrical between
plug and unplug, making it possible for formats to implement plug/unplug.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: introduce bdrv_no_throttling_begin/end
Paolo Bonzini [Thu, 7 Apr 2016 16:33:33 +0000 (18:33 +0200)]
block: introduce bdrv_no_throttling_begin/end

Extract the handling of throttling from bdrv_flush_io_queue.  These
new functions will soon become BdrvChildRole callbacks, as they can
be generalized to "beginning of drain" and "end of drain".

Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: extract bdrv_drain_poll/bdrv_co_yield_to_drain from bdrv_drain/bdrv_co_drain
Paolo Bonzini [Thu, 7 Apr 2016 16:33:32 +0000 (18:33 +0200)]
block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from bdrv_drain/bdrv_co_drain

Do not call bdrv_drain_recurse twice in bdrv_co_drain.  A small
tweak to the logic in Fam's patch, which is harmless since no
one implements bdrv_drain anyway.  But better get it right.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: move restarting of throttled reqs to block/throttle-groups.c
Paolo Bonzini [Thu, 7 Apr 2016 16:33:31 +0000 (18:33 +0200)]
block: move restarting of throttled reqs to block/throttle-groups.c

We want to remove throttled_reqs from block/io.c.  This is the easy
part---hide the handling of throttled_reqs during disable/enable of
throttling within throttle-groups.c.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: make bdrv_start_throttled_reqs return void
Paolo Bonzini [Thu, 7 Apr 2016 16:33:30 +0000 (18:33 +0200)]
block: make bdrv_start_throttled_reqs return void

The return value is unused and I am not sure why it would be useful.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoblock: Don't disable I/O throttling on sync requests
Kevin Wolf [Thu, 7 Apr 2016 16:33:29 +0000 (18:33 +0200)]
block: Don't disable I/O throttling on sync requests

We had to disable I/O throttling with synchronous requests because we
didn't use to run timers in nested event loops when the code was
introduced. This isn't true any more, and throttling works just fine
even when using the synchronous API.

The removed code is in fact dead code since commit a8823a3b ('block: Use
blk_co_pwritev() for blk_write()') because I/O throttling can only be
set on the top layer, but BlockBackend always uses the coroutine
interface now instead of using the sync API emulation in block.c.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1458660792-3035-2-git-send-email-kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
8 years agoOpen 2.7 development tree
Peter Maydell [Thu, 12 May 2016 11:35:25 +0000 (12:35 +0100)]
Open 2.7 development tree

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoUpdate version for v2.6.0 release
Peter Maydell [Wed, 11 May 2016 15:44:26 +0000 (16:44 +0100)]
Update version for v2.6.0 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoUpdate version for v2.6.0-rc5 release
Peter Maydell [Mon, 9 May 2016 13:08:12 +0000 (14:08 +0100)]
Update version for v2.6.0-rc5 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoMerge remote-tracking branch 'remotes/kraxel/tags/pull-vga-20160509-1' into staging
Peter Maydell [Mon, 9 May 2016 12:42:25 +0000 (13:42 +0100)]
Merge remote-tracking branch 'remotes/kraxel/tags/pull-vga-20160509-1' into staging

vga security fixes (CVE-2016-3710, CVE-2016-3712)

# gpg: Signature made Mon 09 May 2016 13:39:30 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-vga-20160509-1:
  vga: make sure vga register setup for vbe stays intact (CVE-2016-3712).
  vga: update vga register setup on vbe changes
  vga: factor out vga register setup
  vga: add vbe_enabled() helper
  vga: fix banked access bounds checking (CVE-2016-3710)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoUpdate version for v2.6.0-rc4 release
Peter Maydell [Mon, 2 May 2016 16:27:01 +0000 (17:27 +0100)]
Update version for v2.6.0-rc4 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoRevert "acpi: mark PMTIMER as unlocked"
Gerd Hoffmann [Fri, 15 Apr 2016 06:43:29 +0000 (08:43 +0200)]
Revert "acpi: mark PMTIMER as unlocked"

This reverts commit 7070e085d490c396f9237c8f10bf8b6e69cd0066.

Commit message claims locking is not needed, but that appears
to not be true, seabios ehci driver runs into timekeeping problems
with this, see
https://bugzilla.redhat.com/show_bug.cgi?id=1322713

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1460702609-25971-1-git-send-email-kraxel@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agovga: make sure vga register setup for vbe stays intact (CVE-2016-3712).
Gerd Hoffmann [Tue, 26 Apr 2016 12:48:06 +0000 (14:48 +0200)]
vga: make sure vga register setup for vbe stays intact (CVE-2016-3712).

Call vbe_update_vgaregs() when the guest touches GFX, SEQ or CRT
registers, to make sure the vga registers will always have the
values needed by vbe mode.  This makes sure the sanity checks
applied by vbe_fixup_regs() are effective.

Without this guests can muck with shift_control, can turn on planar
vga modes or text mode emulation while VBE is active, making qemu
take code paths meant for CGA compatibility, but with the very
large display widths and heigts settable using VBE registers.

Which is good for one or another buffer overflow.  Not that
critical as they typically read overflows happening somewhere
in the display code.  So guests can DoS by crashing qemu with a
segfault, but it is probably not possible to break out of the VM.

Fixes: CVE-2016-3712
Reported-by: Zuozhi Fzz <zuozhi.fzz@alibaba-inc.com>
Reported-by: P J P <ppandit@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
8 years agovga: update vga register setup on vbe changes
Gerd Hoffmann [Tue, 26 Apr 2016 13:39:22 +0000 (15:39 +0200)]
vga: update vga register setup on vbe changes

Call the new vbe_update_vgaregs() function on vbe configuration
changes, to make sure vga registers are up-to-date.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
8 years agovga: factor out vga register setup
Gerd Hoffmann [Tue, 26 Apr 2016 13:24:18 +0000 (15:24 +0200)]
vga: factor out vga register setup

When enabling vbe mode qemu will setup a bunch of vga registers to make
sure the vga emulation operates in correct mode for a linear
framebuffer.  Move that code to a separate function so we can call it
from other places too.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
8 years agovga: add vbe_enabled() helper
Gerd Hoffmann [Tue, 26 Apr 2016 12:11:34 +0000 (14:11 +0200)]
vga: add vbe_enabled() helper

Makes code a bit easier to read.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
8 years agovga: fix banked access bounds checking (CVE-2016-3710)
Gerd Hoffmann [Tue, 26 Apr 2016 06:49:10 +0000 (08:49 +0200)]
vga: fix banked access bounds checking (CVE-2016-3710)

vga allows banked access to video memory using the window at 0xa00000
and it supports a different access modes with different address
calculations.

The VBE bochs extentions support banked access too, using the
VBE_DISPI_INDEX_BANK register.  The code tries to take the different
address calculations into account and applies different limits to
VBE_DISPI_INDEX_BANK depending on the current access mode.

Which is probably effective in stopping misprogramming by accident.
But from a security point of view completely useless as an attacker
can easily change access modes after setting the bank register.

Drop the bogus check, add range checks to vga_mem_{readb,writeb}
instead.

Fixes: CVE-2016-3710
Reported-by: Qinghao Tang <luodalongde@gmail.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
8 years agoconfigure: Check if struct fsxattr is available from linux header
Jan Vesely [Fri, 29 Apr 2016 17:15:23 +0000 (13:15 -0400)]
configure: Check if struct fsxattr is available from linux header

Fixes build failure with --enable-xfsctl and
new linux headers (>=4.5) and older xfsprogs(<4.5):
In file included from /usr/include/xfs/xfs.h:38:0,
                 from /var/tmp/portage/app-emulation/qemu-2.5.0-r1/work/qemu-2.5.0/block/raw-posix.c:97:
/usr/include/xfs/xfs_fs.h:42:8: error: redefinition of â€˜struct fsxattr’
 struct fsxattr {
        ^
In file included from /var/tmp/portage/app-emulation/qemu-2.5.0-r1/work/qemu-2.5.0/block/raw-posix.c:60:0:
/usr/include/linux/fs.h:155:8: note: originally defined here
 struct fsxattr {

This is really a bug in the system headers, but we can work around it
by defining HAVE_FSXATTR in the QEMU headers if linux/fs.h provides
the struct, so that xfs_fs.h doesn't try to define it as well.

CC: qemu-trivial@nongnu.org
CC: Markus Armbruster <armbru@redhat.com>
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Stefan Weil <sw@weilnetz.de>
Tested-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
[PMM: adjusted commit message, comments]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoMerge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
Peter Maydell [Sun, 1 May 2016 21:52:47 +0000 (22:52 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

acpi: last minute fix for 2.6

Minor, obvious fix only affecting BE hosts.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Sun 01 May 2016 13:43:28 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  acpi: fix bios linker loadder COMMAND_ALLOCATE on bigendian host

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoacpi: fix bios linker loadder COMMAND_ALLOCATE on bigendian host
Igor Mammedov [Fri, 29 Apr 2016 12:44:40 +0000 (14:44 +0200)]
acpi: fix bios linker loadder COMMAND_ALLOCATE on bigendian host

'make check' fails with:

ERROR:tests/bios-tables-test.c:493:load_expected_aml:
   assertion failed: (g_file_test(aml_file, G_FILE_TEST_EXISTS))

since commit:
caf50c7166a6ed96c462ab5db4b495e1234e4cc6
tests: pc: acpi: drop not needed 'expected SSDT' blobs

Assert happens because qemu-system-x86_64 generates
SSDT table and test looks for a corresponding expected
table to compare with.

However there is no expected SSDT blob anymore, since
QEMU souldn't generate one. As it happens BIOS is not
able to read ACPI tables from QEMU and fallbacks to
embeded legacy ACPI codepath, which generates SSDT.
That happens due to wrongly sized endiannes conversion
which makes
 uint8_t BiosLinkerLoaderEntry.alloc.zone
end up with 0 due to truncation of 32 bit integer
which on host is 1 or 2.

Fix it by dropping invalid cpu_to_le32() as uint8_t
doesn't require any conversion.

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1330174

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
8 years agoMerge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Peter Maydell [Fri, 29 Apr 2016 11:12:33 +0000 (12:12 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

vvfat fixes for 2.6.0-rc4

# gpg: Signature made Fri 29 Apr 2016 10:52:13 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  vvfat: Fix default volume label
  vvfat: Fix volume name assertion

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoMerge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-04-29' into staging
Peter Maydell [Fri, 29 Apr 2016 10:26:10 +0000 (11:26 +0100)]
Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-04-29' into staging

QAPI patches for 2016-04-29

# gpg: Signature made Fri 29 Apr 2016 10:13:08 BST using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-qapi-2016-04-29:
  qapi: Don't pass NULL to printf in string input visitor

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agovvfat: Fix default volume label
Kevin Wolf [Wed, 27 Apr 2016 12:18:16 +0000 (14:18 +0200)]
vvfat: Fix default volume label

Commit d5941dd documented that it leaves the default volume name as it
was ("QEMU VVFAT"), but it doesn't actually implement this. You get an
empty name (eleven space characters) instead.

This fixes the implementation to apply the advertised default.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8 years agovvfat: Fix volume name assertion
Kevin Wolf [Wed, 27 Apr 2016 12:11:38 +0000 (14:11 +0200)]
vvfat: Fix volume name assertion

Commit d5941dd made the volume name configurable, but it didn't consider
that the rw code compares the volume name string to assert that the
first directory entry is the volume name. This made vvfat crash in rw
mode.

This fixes the assertion to compare with the configured volume name
instead of a literal string.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8 years agoqapi: Don't pass NULL to printf in string input visitor
Eric Blake [Thu, 28 Apr 2016 21:45:28 +0000 (15:45 -0600)]
qapi: Don't pass NULL to printf in string input visitor

Make sure the error message for visit_type_uint64() gracefully
handles a NULL 'name' when called from the top level or a list
context, as not all the world behaves like glibc in allowing
NULL through a printf-family %s.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-21-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
8 years agoslirp: fix guest network access with darwin host
Samuel Thibault [Thu, 28 Apr 2016 16:53:08 +0000 (18:53 +0200)]
slirp: fix guest network access with darwin host

On Darwin, connect, sendto and friends want the exact size of the sockaddr,
not more (and in particular, not sizeof(struct sockaddr_storaget))

This commit adds the sockaddr_size helper to be used when passing a sockaddr
size to such function, and makes use of it int sendto and connect calls.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoMerge remote-tracking branch 'remotes/lalrae/tags/mips-20160428' into staging
Peter Maydell [Thu, 28 Apr 2016 10:48:12 +0000 (11:48 +0100)]
Merge remote-tracking branch 'remotes/lalrae/tags/mips-20160428' into staging

MIPS patches 2016-04-28

Changes:
* fixed RDHWR exception host PC

# gpg: Signature made Thu 28 Apr 2016 10:11:18 BST using RSA key ID 0B29DA6B
# gpg: Good signature from "Leon Alrae <leon.alrae@imgtec.com>"

* remotes/lalrae/tags/mips-20160428:
  target-mips: Fix RDHWR exception host PC

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoMerge remote-tracking branch 'remotes/armbru/tags/pull-error-2016-04-28' into staging
Peter Maydell [Thu, 28 Apr 2016 10:05:37 +0000 (11:05 +0100)]
Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2016-04-28' into staging

Fix dangling pointers and error message regressions

# gpg: Signature made Thu 28 Apr 2016 07:25:51 BST using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-error-2016-04-28:
  qom: -object error messages lost location, restore it
  replay: Fix dangling location bug in replay_configure()
  QemuOpts: Fix qemu_opts_foreach() dangling location regression

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoMerge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160426' into staging
Peter Maydell [Thu, 28 Apr 2016 09:25:26 +0000 (10:25 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160426' into staging

ppc patch queue for 2016-04-26 (last minute qemu-2.6 fix)

This just has one, last-minute, fix for a serious regression of memory
hotplug.

Patch author's comment:
    Really sorry for the way last-minute fix, but without this memory
    hotplug is totally broken :( Hoping to get this in for Wednesday's
    RC4, which I think will be the final before release.

# gpg: Signature made Tue 26 Apr 2016 03:52:20 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.6-20160426:
  spapr_drc: fix aborts during DRC-count based hotplug

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agotarget-mips: Fix RDHWR exception host PC
James Hogan [Wed, 27 Apr 2016 22:21:06 +0000 (23:21 +0100)]
target-mips: Fix RDHWR exception host PC

Commit b00c72180c36 ("target-mips: add PC, XNP reg numbers to RDHWR")
changed the rdhwr helpers to use check_hwrena() to check the register
being accessed is enabled in CP0_HWREna when used from user mode. If
that check fails an EXCP_RI exception is raised at the host PC
calculated with GETPC().

However check_hwrena() may not be fully inlined as the
do_raise_exception() part of it is common regardless of the arguments.
This causes GETPC() to calculate the address in the call in the helper
instead of the generated code calling the helper. No TB will be found
and the EPC reported with the resulting guest RI exception points to the
beginning of the TB instead of the RDHWR instruction.

We can't reliably force check_hwrena() to be inlined, and converting it
to a macro would be ugly, so instead pass the host PC in as an argument,
with each rdhwr helper passing GETPC(). This should avoid any dependence
on compiler behaviour, and in practice seems to ensure the full inlining
of check_hwrena() on x86_64.

This issue causes failures when running a MIPS KVM (trap & emulate)
guest in a MIPS QEMU TCG guest, as the inner guest kernel will do a
RDHWR of counter, which is disabled in the outer guest's CP0_HWREna by
KVM so it can emulate the inner guest's counter. The emulation fails and
the RI exception is passed to the inner guest.

Fixes: b00c72180c36 ("target-mips: add PC, XNP reg numbers to RDHWR")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Yongbok Kim <yongbok.kim@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
8 years agoqom: -object error messages lost location, restore it
Markus Armbruster [Wed, 27 Apr 2016 14:29:09 +0000 (16:29 +0200)]
qom: -object error messages lost location, restore it

qemu_opts_foreach() runs its callback with the error location set to
the option's location.  Any errors the callback reports use the
option's location automatically.

Commit 90998d5 moved the actual error reporting from "inside"
qemu_opts_foreach() to after it.  Here's a typical hunk:

 if (qemu_opts_foreach(qemu_find_opts("object"),
    -                          object_create,
    -                          object_create_initial, NULL)) {
    +                          user_creatable_add_opts_foreach,
    +                          object_create_initial, &err)) {
    +        error_report_err(err);
     exit(1);
 }

Before, object_create() reports from within qemu_opts_foreach(), using
the option's location.  Afterwards, we do it after
qemu_opts_foreach(), using whatever location happens to be current
there.  Commonly a "none" location.

This is because Error objects don't have location information.
Problematic.

Reproducer:

    $ qemu-system-x86_64 -nodefaults -display none -object secret,id=foo,foo=bar
    qemu-system-x86_64: Property '.foo' not found

Note no location.  This commit restores it:

    qemu-system-x86_64: -object secret,id=foo,foo=bar: Property '.foo' not found

Note that the qemu_opts_foreach() bug just fixed could mask the bug
here: if the location it leaves dangling hasn't been clobbered, yet,
it's the correct one.

Reported-by: Eric Blake <eblake@redhat.com>
Cc: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1461767349-15329-4-git-send-email-armbru@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[Paragraph on Error added to commit message]

8 years agoreplay: Fix dangling location bug in replay_configure()
Markus Armbruster [Wed, 27 Apr 2016 14:29:08 +0000 (16:29 +0200)]
replay: Fix dangling location bug in replay_configure()

replay_configure() pushes and pops a Location with automatic storage
duration.  Except it fails to pop when -icount parameter "rr" isn't
given.  cur_loc then points to unused stack space, and will most
likely get clobbered in short order.

Clobbered cur_loc can make loc_pop() and error_print_loc() crash or
report bogus locations.

Broken in commit 890ad55.

I didn't take the time to find a reproducer.

Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1461767349-15329-3-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
8 years agoQemuOpts: Fix qemu_opts_foreach() dangling location regression
Markus Armbruster [Wed, 27 Apr 2016 14:29:07 +0000 (16:29 +0200)]
QemuOpts: Fix qemu_opts_foreach() dangling location regression

qemu_opts_foreach() pushes and pops a Location with automatic storage
duration.  Except it fails to pop when @func() returns non-zero.
cur_loc then points to unused stack space, and will most likely get
clobbered in short order.

Clobbered cur_loc can make loc_pop() and error_print_loc() crash or
report bogus locations.

Affects several qemu command line options as well as qemu-img,
qemu-io, qemu-nbd -object, and blkdebug's configuration file.

Broken in commit a4c7367, v2.4.0.

Reproducer:
    $ qemu-system-x86_64 -nodefaults -display none -object secret,id=foo,foo=bar

main() reports "Property '.foo' not found" like this:

    if (qemu_opts_foreach(qemu_find_opts("object"),
                          user_creatable_add_opts_foreach,
                          object_create_delayed, &err)) {
        error_report_err(err);
        exit(1);
    }

cur_loc then points to where qemu_opts_foreach()'s Location used to
be, i.e. unused stack space.  With optimization, this Location doesn't
get clobbered for me, and also happens to be the correct location.
Without optimization, it does get clobbered in a way that makes
error_report_err() report no location.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1461767349-15329-2-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
8 years agospapr_drc: fix aborts during DRC-count based hotplug
Michael Roth [Mon, 25 Apr 2016 22:24:25 +0000 (17:24 -0500)]
spapr_drc: fix aborts during DRC-count based hotplug

CPU/memory resources can be signalled en-masse via
spapr_hotplug_req_add_by_count(), and when doing so, actually change
the meaning of the 'drc' parameter passed to
spapr_hotplug_req_event() to be a count rather than an index.

f40eb92 added a hook in spapr_hotplug_req_event() to record when a
device had been 'signalled' to the guest, but that code assumes that
drc is always an index. In cases where it's a count, such as memory
hotplug, the DRC lookup will fail, leading to an assert.

Fix this by only explicitly setting the signalled state for cases where
we are doing PCI hotplug.

For other resources types, since we cannot selectively track whether a
resource has been signalled in cases where we signal attach as a count,
set the 'signalled' state to true immediately upon making the
resource available via drck->attach().

Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: david@gibson.dropbear.id.au
Cc: qemu-ppc@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
8 years agousb/uhci: move pid check
Gerd Hoffmann [Fri, 22 Apr 2016 10:44:53 +0000 (12:44 +0200)]
usb/uhci: move pid check

commit "5f77e06 usb: add pid check at the first of uhci_handle_td()"
moved the pid verification to the start of the uhci_handle_td function,
to simplify the error handling (we don't have to free stuff which we
didn't allocate in the first place ...).

Problem is now the check fires too often, it raises error IRQs even for
TDs which we are not going to process because they are not set active.

So, lets move down the check a bit, so it is done only for active TDs,
but still before we are going to allocate stuff to process the requested
transfer.

Reported-by: Joe Clifford <joe@thunderbug.co.uk>
Tested-by: Joe Clifford <joe@thunderbug.co.uk>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1461321893-15811-1-git-send-email-kraxel@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoMerge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160423' into staging
Peter Maydell [Mon, 25 Apr 2016 10:15:53 +0000 (11:15 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160423' into staging

ppc patch queue for 2016-03-23

A single fix for a bug in parameter handling for the spapr PCI host
bridge.

# gpg: Signature made Sat 23 Apr 2016 07:55:29 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.6-20160423:
  hw/ppc/spapr: Fix crash when specifying bad parameters to spapr-pci-host-bridge

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agohw/ppc/spapr: Fix crash when specifying bad parameters to spapr-pci-host-bridge
Thomas Huth [Thu, 21 Apr 2016 10:08:58 +0000 (12:08 +0200)]
hw/ppc/spapr: Fix crash when specifying bad parameters to spapr-pci-host-bridge

QEMU currently crashes when using bad parameters for the
spapr-pci-host-bridge device:

$ qemu-system-ppc64 -device spapr-pci-host-bridge,buid=0x123,liobn=0x321,mem_win_addr=0x1,io_win_addr=0x10
Segmentation fault

The problem is that spapr_tce_find_by_liobn() might return NULL, but
the code in spapr_populate_pci_dt() does not check for this condition
and then tries to dereference this NULL pointer.
Apart from that, the return value of spapr_populate_pci_dt() also
has to be checked for all PCI buses, not only for the last one, to
make sure we catch all errors.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
8 years agoMerge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Peter Maydell [Fri, 22 Apr 2016 15:17:12 +0000 (16:17 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Mirror block job fixes for 2.6.0-rc4

# gpg: Signature made Fri 22 Apr 2016 15:46:41 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  mirror: Workaround for unexpected iohandler events during completion
  aio-posix: Skip external nodes in aio_dispatch
  virtio: Mark host notifiers as external
  event-notifier: Add "is_external" parameter
  iohandler: Introduce iohandler_get_aio_context

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agomirror: Workaround for unexpected iohandler events during completion
Fam Zheng [Fri, 22 Apr 2016 13:53:56 +0000 (21:53 +0800)]
mirror: Workaround for unexpected iohandler events during completion

Commit 5a7e7a0ba moved mirror_exit to a BH handler but didn't add any
protection against new requests that could sneak in just before the
BH is dispatched. For example (assuming a code base at that commit):

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch
                aio_dispatch
                  ...
                    mirror_run
                      bdrv_drain
    (a)               block_job_defer_to_main_loop
          qemu_iohandler_poll
            virtio_queue_host_notifier_read
              ...
                virtio_submit_multiwrite
    (b)           blk_aio_multiwrite

        main_loop_wait # 2
          <snip>
                aio_dispatch
                  aio_bh_poll
    (c)             mirror_exit

At (a) we know the BDS has no pending request. However, the same
main_loop_wait call is going to dispatch iohandlers (EventNotifier
events), which may lead to a new I/O from guest. So the invariant is
already broken at (c). Data loss.

Commit f3926945c8 made iohandler to use aio API.  The order of
virtio_queue_host_notifier_read and block_job_defer_to_main_loop within
a main_loop_wait becomes unpredictable, and even worse, if the host
notifier event arrives at the next main_loop_wait call, the
unpredictable order between mirror_exit and
virtio_queue_host_notifier_read is also a trouble. As shown below, this
commit made the bug easier to trigger:

    - Bug case 1:

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch (qemu_aio_context)
                ...
                  mirror_run
                    bdrv_drain
    (a)             block_job_defer_to_main_loop
              aio_ctx_dispatch (iohandler_ctx)
                virtio_queue_host_notifier_read
                  ...
                    virtio_submit_multiwrite
    (b)               blk_aio_multiwrite

        main_loop_wait # 2
          ...
                aio_dispatch
                  aio_bh_poll
    (c)             mirror_exit

    - Bug case 2:

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch (qemu_aio_context)
                ...
                  mirror_run
                    bdrv_drain
    (a)             block_job_defer_to_main_loop

        main_loop_wait # 2
          ...
            aio_ctx_dispatch (iohandler_ctx)
              virtio_queue_host_notifier_read
                ...
                  virtio_submit_multiwrite
    (b)             blk_aio_multiwrite
              aio_dispatch
                aio_bh_poll
    (c)           mirror_exit

In both cases, (b) breaks the invariant wanted by (a) and (c).

Until then, the request loss has been silent. Later, 3f09bfbc7be added
asserts at (c) to check the invariant (in
bdrv_replace_in_backing_chain), and Max reported an assertion failure
first visible there, by doing active committing while the guest is
running bonnie++.

2.5 added bdrv_drained_begin at (a) to protect the dataplane case from
similar problems, but we never realize the main loop bug until now.

As a bandage, this patch disables iohandler's external events
temporarily together with bs->ctx.

Launchpad Bug: 1570134

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>