Paolo Bonzini [Wed, 9 Jul 2014 09:53:05 +0000 (11:53 +0200)]
AioContext: export and use aio_dispatch
So far, aio_poll's scheme was dispatch/poll/dispatch, where
the first dispatch phase was used only in the GSource case in
order to avoid a blocking poll. Earlier patches changed it to
dispatch/prepare/poll/dispatch, where prepare is aio_compute_timeout.
By making aio_dispatch public, we can remove the first dispatch
phase altogether, so that both aio_poll and the GSource use the same
prepare/poll/dispatch scheme.
This patch breaks the invariant that aio_poll(..., true) will not block
the first time it returns false. This used to be fundamental for
qemu_aio_flush's implementation as "while (qemu_aio_wait()) {}" but
no code in QEMU relies on this invariant anymore. The return value
of aio_poll() is now comparable with that of g_main_context_iteration.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini [Wed, 9 Jul 2014 09:53:04 +0000 (11:53 +0200)]
AioContext: run bottom halves after polling
Make the dispatching phase the same before blocking and afterwards.
The next patch will make aio_dispatch public and use it directly
for the GSource case, instead of aio_poll. aio_poll can then be
simplified heavily.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini [Wed, 9 Jul 2014 09:53:03 +0000 (11:53 +0200)]
aio-win32: Factor out duplicate code into aio_dispatch_handlers
Later, the call to aio_dispatch will move int the GSource wrapper, while the
standalone case will still be call the component functions aio_bh_poll,
aio_dispatch_handlers and timerlistgroup_run_timers.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini [Wed, 9 Jul 2014 09:53:02 +0000 (11:53 +0200)]
aio-win32: Evaluate timers after handles
This is similar to what aio_poll does in the stand-alone case.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini [Wed, 9 Jul 2014 09:53:01 +0000 (11:53 +0200)]
AioContext: take bottom halves into account when computing aio_poll timeout
Right now, QEMU invokes aio_bh_poll before the "poll" phase
of aio_poll. It is simpler to do it afterwards and skip the
"poll" phase altogether when the OS-dependent parts of AioContext
are invoked from GSource. This way, AioContext behaves more
similarly when used as a GSource vs. when used as stand-alone.
As a start, take bottom halves into account when computing the
poll timeout. If a bottom half is ready, do a non-blocking
poll. As a side effect, this makes idle bottom halves work
with aio_poll; an improvement, but not really an important
one since they are deprecated.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Wed, 27 Aug 2014 13:29:59 +0000 (14:29 +0100)]
blockdev: fix drive-mirror 'granularity' error message
Name the 'granularity' parameter and give its expected value range.
Previously the device name was mistakenly reported as the parameter
name.
Note that the error class is unchanged from ERROR_CLASS_GENERIC_ERROR.
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Fam Zheng [Tue, 26 Aug 2014 07:15:43 +0000 (15:15 +0800)]
coroutine: Drop co_sleep_ns
block_job_sleep_ns is the only user. Since we are moving towards
AioContext aware code, it's better to use the explicit version and drop
the old one.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Liu Yuan [Mon, 18 Aug 2014 09:41:05 +0000 (17:41 +0800)]
block/quorum: add simple read pattern support
This patch adds single read pattern to quorum driver and quorum vote is default
pattern.
For now we do a quorum vote on all the reads, it is designed for unreliable
underlying storage such as non-redundant NFS to make sure data integrity at the
cost of the read performance.
For some use cases as following:
VM
--------------
| |
v v
A B
Both A and B has hardware raid storage to justify the data integrity on its own.
So it would help performance if we do a single read instead of on all the nodes.
Further, if we run VM on either of the storage node, we can make a local read
request for better performance.
This patch generalize the above 2 nodes case in the N nodes. That is,
vm -> write to all the N nodes, read just one of them. If single read fails, we
try to read next node in FIFO order specified by the startup command.
The 2 nodes case is very similar to DRBD[1] though lack of auto-sync
functionality in the single device/node failure for now. But compared with DRBD
we still have some advantages over it:
- Suppose we have 20 VMs running on one(assume A) of 2 nodes' DRBD backed
storage. And if A crashes, we need to restart all the VMs on node B. But for
practice case, we can't because B might not have enough resources to setup 20 VMs
at once. So if we run our 20 VMs with quorum driver, and scatter the replicated
images over the data center, we can very likely restart 20 VMs without any
resource problem.
After all, I think we can build a more powerful replicated image functionality
on quorum and block jobs(block mirror) to meet various High Availibility needs.
E.g, Enable single read pattern on 2 children,
-drive driver=quorum,children.0.file.filename=0.qcow2,\
children.1.file.filename=1.qcow2,read-pattern=fifo,vote-threshold=1
[1] http://en.wikipedia.org/wiki/Distributed_Replicated_Block_Device
[Dropped \n from an error_setg() error message
--Stefan]
Cc: Benoit Canet <benoit@irqsave.net>
Cc: Eric Blake <eblake@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Liu Yuan [Mon, 18 Aug 2014 09:41:04 +0000 (17:41 +0800)]
qapi: add read-pattern enum for quorum
Cc: Eric Blake <eblake@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Hitoshi Mitake [Mon, 11 Aug 2014 05:43:46 +0000 (14:43 +0900)]
sheepdog: improve error handling for a case of failed lock
Recently, sheepdog revived its VDI locking functionality. This patch
updates sheepdog driver of QEMU for this feature. It changes an error
code for a case of failed locking. -EBUSY is a suitable one.
Reported-by: Valerio Pachera <sirio81@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Liu Yuan <namei.unix@gmail.com>
Cc: MORITA Kazutaka <morita.kazutaka@gmail.com>
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Hitoshi Mitake [Mon, 11 Aug 2014 05:43:45 +0000 (14:43 +0900)]
sheepdog: adopting protocol update for VDI locking
The update is required for supporting iSCSI multipath. It doesn't
affect behavior of QEMU driver but adding a new field to vdi request
struct is required.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Liu Yuan <namei.unix@gmail.com>
Cc: MORITA Kazutaka <morita.kazutaka@gmail.com>
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Tue, 26 Aug 2014 18:17:56 +0000 (19:17 +0100)]
qemu-img: always goto out in img_snapshot() error paths
The out label has the qemu_progress_end() and other cleanup calls.
Always goto out in error paths so the cleanup happens. These error
paths now return 1 instead of -1.
Note that bdrv_unref(NULL) is safe. We just need to initialize bs to
NULL at the top of the function.
We can now remove the obsolete bs_old_backing = NULL and bs_new_backing
= NULL for safe mode. Originally it was necessary in commit
3e85c6fd
("qemu-img rebase") but became useless in commit c2abcce ("qemu-img:
avoid calling exit(1) to release resources properly") because the
variables are already initialized during declaration.
Reported-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Stefan Hajnoczi [Tue, 26 Aug 2014 18:17:55 +0000 (19:17 +0100)]
qemu-img: fix img_compare() flags error path
If img_compare() fails to parse the cache flags the goto out3 code path
will call qemu_progress_end(). Make sure we actually call
qemu_progress_init() first.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Stefan Hajnoczi [Tue, 26 Aug 2014 18:17:54 +0000 (19:17 +0100)]
qemu-img: fix img_commit() error return value
The img_commit() return value is a process exit code. Use 1 for failure
instead of -1. The other failure paths in this function already use 1.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Daniel Henrique Barboza [Wed, 13 Aug 2014 15:44:27 +0000 (12:44 -0300)]
block.curl: adding 'timeout' option
The curl hardcoded timeout (5 seconds) sometimes is not long
enough depending on the remote server configuration and network
traffic. The user should be able to set how much long he is
willing to wait for the connection.
Adding a new option to set this timeout gives the user this
flexibility. The previous default timeout of 5 seconds will be
used if this option is not present.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Benoit Canet <benoit.canet@nodalink.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Markus Armbruster [Fri, 15 Aug 2014 11:32:37 +0000 (13:32 +0200)]
ide: Fix bootindex for bus_id > 9
We identify devices by their Open Firmware device paths. The encoding
of bus numbers is incorrect: idebus_get_fw_dev_path() formats them in
decimal, while SeaBIOS uses hexadecimal. With bus number > 9, SeaBIOS
will miss the bootindex (lucky case), or apply it to another device
(unlucky case).
Bug can't bite right now: ich9-ahci has six ports, and the sysbus-ahci
created by Calxeda Highbank has just one.
Fix it anyway, by changing %d to %x.
I couldn't find an Open Firmware spec covering this. For what it's
worth, OVMF agrees with SeaBIOS.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Peter Maydell [Thu, 28 Aug 2014 16:08:13 +0000 (17:08 +0100)]
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
SCSI patches include bug fixes from Fam and Peter, improved error
reporting from Fam and a fix for DPRINTF bitrot. Memory patches try
again to initialize name from the QOM name.
# gpg: Signature made Thu 28 Aug 2014 15:10:31 BST using RSA key ID
9B4D86F2
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: aka "Paolo Bonzini <bonzini@gnu.org>"
* remotes/bonzini/tags/for-upstream:
memory: Lazy init name from QOM name as needed
xen: hvm: Abstract away memory region name ref
xen-hvm: Constify string
virtio-scsi: Report error if num_queues is 0 or too large
scsi-generic: remove superfluous DPRINTF avoid to break compiling
block/iscsi: fix memory corruption on iscsi resize
scsi-bus: Convert DeviceClass init to realize
block: Pass errp in blkconf_geometry
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Thu, 28 Aug 2014 15:07:23 +0000 (16:07 +0100)]
Merge remote-tracking branch 'remotes/kvm/tags/for-upstream' into staging
Mostly bugfixes + Alexey's interface-based implementation
of the NMI monitor command.
# gpg: Signature made Thu 28 Aug 2014 15:07:22 BST using RSA key ID
9B4D86F2
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: aka "Paolo Bonzini <bonzini@gnu.org>"
* remotes/kvm/tags/for-upstream:
mc146818rtc: reinitialize irq_reinject_on_ack_count on reset
target-i386: Add "tsc_adjust" CPU feature name
target-i386: Add "mpx" CPU feature name
vl: process -object after other backend options
checkpatch.pl: adjust typedef definition to QEMU coding style
x86: Clear MTRRs on vCPU reset
x86: kvm: Add MTRR support for kvm_get|put_msrs()
x86: Use common variable range MTRR counts
target-i386: Don't forbid NX bit on PAE PDEs and PTEs
spapr: Add support for new NMI interface
s390x: Migrate to new NMI interface
s390x: Convert QEMUMachine to MachineClass
cpus: Define callback for QEMU "nmi" command
kvm: run cpu state synchronization on target vcpu thread
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Crosthwaite [Tue, 26 Aug 2014 03:10:24 +0000 (20:10 -0700)]
memory: Lazy init name from QOM name as needed
To support name retrieval of MemoryRegions that were created
dynamically (that is, not via memory_region_init and friends). We
cache the name in MemoryRegion's state as
object_get_canonical_path_component mallocs the returned value
so it's not suitable for direct return to callers. Memory already
frees the name field, so this will be garbage collected along with
the MR object.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Crosthwaite [Tue, 26 Aug 2014 03:09:48 +0000 (20:09 -0700)]
xen: hvm: Abstract away memory region name ref
The mr->name field is removed. This slipped through compile testing.
Fix.
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Crosthwaite [Tue, 26 Aug 2014 03:09:13 +0000 (20:09 -0700)]
xen-hvm: Constify string
It's constant, and sourced from existing const strings. Avoid dodgy
casts by converting to const.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Maydell [Thu, 28 Aug 2014 13:51:12 +0000 (14:51 +0100)]
Merge remote-tracking branch 'remotes/stefanha/tags/fix-buildbot-
12082014-pull-request' into staging
Pull request
# gpg: Signature made Thu 28 Aug 2014 13:43:00 BST using RSA key ID
81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>"
* remotes/stefanha/tags/fix-buildbot-
12082014-pull-request:
Revert "qemu-img: sort block formats in help message"
block: sort formats alphabetically in bdrv_iterate_format()
mirror: fix uninitialized variable delay_ns warnings
trace: avoid Python 2.5 all() in tracetool
libqtest: launch QEMU with QEMU_AUDIO_DRV=none
qapi.py: avoid Python 2.5+ any() function
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Stefan Hajnoczi [Wed, 27 Aug 2014 11:08:56 +0000 (12:08 +0100)]
Revert "qemu-img: sort block formats in help message"
This reverts commit
1a443c1b8b4314d365e82bddeb1de5b4b1c15fb3 and the
later commit
395071a76328189f50c778f4dee6dabb90503dd9.
GSequence was introduced in glib 2.14. RHEL 5 fails to compile since it
uses glib 2.12.3.
Now that bdrv_iterate_format() invokes the iteration callback in sorted
order these commits are unnecessary.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Stefan Hajnoczi [Wed, 27 Aug 2014 11:08:55 +0000 (12:08 +0100)]
block: sort formats alphabetically in bdrv_iterate_format()
Format names are best consumed in alphabetical order. This makes
human-readable output easy to produce.
bdrv_iterate_format() already has an array of format strings. Sort them
before invoking the iteration callback.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Stefan Hajnoczi [Wed, 27 Aug 2014 11:08:54 +0000 (12:08 +0100)]
mirror: fix uninitialized variable delay_ns warnings
The gcc 4.1.2 compiler warns that delay_ns may be uninitialized in
mirror_iteration().
There are two break statements in the do ... while loop that skip over
the delay_ns assignment. These are probably the cause of the warning.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Stefan Hajnoczi [Wed, 27 Aug 2014 11:08:53 +0000 (12:08 +0100)]
trace: avoid Python 2.5 all() in tracetool
Red Hat Enterprise Linux 5 ships Python 2.4.3. The all() function was
added in Python 2.5 so we cannot use it.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Stefan Hajnoczi [Wed, 27 Aug 2014 11:08:52 +0000 (12:08 +0100)]
libqtest: launch QEMU with QEMU_AUDIO_DRV=none
No test case actually uses the audio backend. Disable audio to prevent
warnings on hosts with no sound hardware present:
GTESTER check-qtest-aarch64
sdl: SDL_OpenAudio failed
sdl: Reason: No available audio device
sdl: SDL_OpenAudio failed
sdl: Reason: No available audio device
audio: Failed to create voice `lm4549.out'
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Stefan Hajnoczi [Wed, 27 Aug 2014 11:08:51 +0000 (12:08 +0100)]
qapi.py: avoid Python 2.5+ any() function
There is one instance of any() in qapi.py that breaks builds on older
distros that ship Python 2.4 (like RHEL5):
GEN qmp-commands.h
Traceback (most recent call last):
File "build/scripts/qapi-commands.py", line 445, in ?
exprs = parse_schema(input_file)
File "build/scripts/qapi.py", line 329, in parse_schema
schema = QAPISchema(open(input_file, "r"))
File "build/scripts/qapi.py", line 110, in __init__
if any(include_path == elem[1]
NameError: global name 'any' is not defined
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Paolo Bonzini [Wed, 27 Aug 2014 15:54:52 +0000 (17:54 +0200)]
mc146818rtc: reinitialize irq_reinject_on_ack_count on reset
This field was forgotten, and it makes the state after reset
non-deterministic.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Maydell [Tue, 26 Aug 2014 13:18:40 +0000 (14:18 +0100)]
Merge remote-tracking branch 'remotes/mcayland/qemu-openbios' into staging
* remotes/mcayland/qemu-openbios:
Update OpenBIOS images
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Eduardo Habkost [Mon, 25 Aug 2014 20:02:13 +0000 (17:02 -0300)]
target-i386: Add "tsc_adjust" CPU feature name
tsc_adjust migration support is already implemented (commit
f28558d3d37ad3bc4e35e8ac93f7bf81a0d5622c), so we can add it to the list
of known feature names.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Eduardo Habkost [Mon, 25 Aug 2014 20:02:12 +0000 (17:02 -0300)]
target-i386: Add "mpx" CPU feature name
Migration support for MPX is already implemented (commit
79e9ebebbf2a00c46fcedb6dc7dd5e12bbd30216), so we can add it to the list
of known feature names.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Mark Cave-Ayland [Tue, 26 Aug 2014 12:52:15 +0000 (13:52 +0100)]
Update OpenBIOS images
Update OpenBIOS images to SVN r1316 built from submodule.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Paolo Bonzini [Mon, 25 Aug 2014 11:47:00 +0000 (13:47 +0200)]
vl: process -object after other backend options
QOM backends can refer to chardevs, but not vice versa. So
process -chardev and -fsdev options before -object
This fixes the rng-egd backend to virtio-rng.
Reported-by: Amos Kong <akong@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 10 Jun 2014 08:52:02 +0000 (10:52 +0200)]
checkpatch.pl: adjust typedef definition to QEMU coding style
Most QEMU typedefs are camelcase, starting with one uppercase letter
and containing at least one lowercase letter. There are a few
all-uppercase types, add the most common too.
This fixes recognition of types in lines such as
static __attribute__((unused)) inline void tcg_out8(TCGContext *s, uint8_t v)
(Example provided by Peter Maydell).
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fam Zheng [Tue, 26 Aug 2014 06:30:30 +0000 (14:30 +0800)]
virtio-scsi: Report error if num_queues is 0 or too large
No cmd vq surprises guest (Linux panics in virtscsi_probe), too many
queues abort qemu (in the following virtio_add_queue).
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Gonglei [Fri, 22 Aug 2014 02:01:50 +0000 (10:01 +0800)]
scsi-generic: remove superfluous DPRINTF avoid to break compiling
variables lun and tag had been eliminated, break compiling
when enable debug switch. Meanwhile traces provide the same
information with this DPRINTF, so remove it.
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Lieven [Fri, 22 Aug 2014 08:08:49 +0000 (10:08 +0200)]
block/iscsi: fix memory corruption on iscsi resize
bs->total_sectors is not yet updated at this point. resulting
in memory corruption if the volume has grown and data is written
to the newly availble areas.
CC: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fam Zheng [Tue, 12 Aug 2014 02:12:55 +0000 (10:12 +0800)]
scsi-bus: Convert DeviceClass init to realize
Replace "init/destroy" with "realize/unrealize" in SCSIDeviceClass,
which has errp as a parameter. So all the implementations now use
error_setg instead of error_report for reporting error.
Also in scsi_bus_legacy_handle_cmdline, report the error when
initializing the if=scsi devices, before returning it, because in the
callee, error_report is changed to error_setg. And the callers don't
have the right locations (e.g. "-drive if=scsi").
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fam Zheng [Tue, 12 Aug 2014 02:12:54 +0000 (10:12 +0800)]
block: Pass errp in blkconf_geometry
This allows us to pass error information to caller.
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Maydell [Tue, 26 Aug 2014 09:42:06 +0000 (10:42 +0100)]
Merge remote-tracking branch 'remotes/awilliam/tags/vfio-pci-for-qemu-
20140825.0' into staging
VFIO: Enable primary NVIDIA quirk regardless of VGA support
# gpg: Signature made Mon 25 Aug 2014 20:29:37 BST using RSA key ID
3BB08B22
# gpg: Can't check signature: public key not found
* remotes/awilliam/tags/vfio-pci-for-qemu-
20140825.0:
vfio: Enable NVIDIA 88000 region quirk regardless of VGA
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Alex Williamson [Mon, 25 Aug 2014 18:10:15 +0000 (12:10 -0600)]
vfio: Enable NVIDIA 88000 region quirk regardless of VGA
If we make use of OVMF for the BIOS then we can use GPUs without VGA
space access, but we still need this quirk. Disassociate it from the
x-vga option and enable it on all NVIDIA VGA display class devices.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Peter Maydell [Mon, 25 Aug 2014 17:49:25 +0000 (18:49 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pci, pc fixes, features
A bunch of bugfixes - these will make sense for 2.1.1
ACPI support for TPM and partial ARI support for PCIE.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Sun 24 Aug 2014 23:16:35 BST using RSA key ID
D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
* remotes/mst/tags/for_upstream:
pcie: fix trailing whitespace
ioh3420: Enable ARI forwarding
ioh3420: Remove obsoleted, unused ioh3420_init function
pcie: Rename the pcie_cap_ari_* functions to pcie_cap_arifwd_*
pcie: Fix incorrect write to the ari capability next function field
ssdt-tpm: add generated hex file to git
Add ACPI tables for TPM
pc: reserve more memory for ACPI for new machine types
pcihp: fix possible array out of bounds
pci_bridge: manually destroy memory regions within PCIBridgeWindows
hostmem: set MPOL_MF_MOVE
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Alex Williamson [Thu, 14 Aug 2014 21:39:39 +0000 (15:39 -0600)]
x86: Clear MTRRs on vCPU reset
The SDM specifies (June 2014 Vol3 11.11.5):
On a hardware reset, the P6 and more recent processors clear the
valid flags in variable-range MTRRs and clear the E flag in the
IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the
MTRRs are undefined.
We currently do none of that, so whatever MTRR settings you had prior
to reset is what you have after reset. Usually this doesn't matter
because KVM often ignores the guest mappings and uses write-back
anyway. However, if you have an assigned device and an IOMMU that
allows NoSnoop for that device, KVM defers to the guest memory
mappings which are now stale after reset. The result is that OVMF
rebooting on such a configuration takes a full minute to LZMA
decompress the firmware volume, a process that is nearly instant on
the initial boot.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Alex Williamson [Thu, 14 Aug 2014 21:39:33 +0000 (15:39 -0600)]
x86: kvm: Add MTRR support for kvm_get|put_msrs()
The MTRR state in KVM currently runs completely independent of the
QEMU state in CPUX86State.mtrr_*. This means that on migration, the
target loses MTRR state from the source. Generally that's ok though
because KVM ignores it and maps everything as write-back anyway. The
exception to this rule is when we have an assigned device and an IOMMU
that doesn't promote NoSnoop transactions from that device to be cache
coherent. In that case KVM trusts the guest mapping of memory as
configured in the MTRR.
This patch updates kvm_get|put_msrs() so that we retrieve the actual
vCPU MTRR settings and therefore keep CPUX86State synchronized for
migration. kvm_put_msrs() is also used on vCPU reset and therefore
allows future modificaitons of MTRR state at reset to be realized.
Note that the entries array used by both functions was already
slightly undersized for holding every possible MSR, so this patch
increases it beyond the 28 new entries necessary for MTRR state.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Alex Williamson [Thu, 14 Aug 2014 21:39:27 +0000 (15:39 -0600)]
x86: Use common variable range MTRR counts
We currently define the number of variable range MTRR registers as 8
in the CPUX86State structure and vmstate, but use MSR_MTRRcap_VCNT
(also 8) to report to guests the number available. Change this to
use MSR_MTRRcap_VCNT consistently.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
William Grant [Sun, 24 Aug 2014 05:13:48 +0000 (15:13 +1000)]
target-i386: Don't forbid NX bit on PAE PDEs and PTEs
Commit
e8f6d00c30ed88910d0d985f4b2bf41654172ceb ("target-i386: raise
page fault for reserved physical address bits") added a check that the
NX bit is not set on PAE PDPEs, but it also added it to rsvd_mask for
the rest of the function. This caused any PDEs or PTEs with NX set to be
erroneously rejected, making PAE guests with NX support unusable.
Signed-off-by: William Grant <wgrant@ubuntu.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Maydell [Mon, 25 Aug 2014 16:34:30 +0000 (17:34 +0100)]
Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-2014-08-24' into staging
trivial patches for 2014-08-24
# gpg: Signature made Sun 24 Aug 2014 14:28:49 BST using RSA key ID
A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg: aka "Michael Tokarev <mjt@corpit.ru>"
# gpg: aka "Michael Tokarev <mjt@debian.org>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5
# Subkey fingerprint: 6F67 E18E 7C91 C5B1 5514 66A7 BEE5 9D74 A4C3 D7DB
* remotes/mjt/tags/trivial-patches-2014-08-24:
vmxnet3: Pad short frames to minimum size (60 bytes)
libdecnumber: Fix warnings from smatch (missing static, boolean operations)
linux-user: fix file descriptor leaks
po: Fix Makefile rules for in-tree builds without configuration
slirp/misc: Use the GLib memory allocation APIs
configure: no need to mkdir QMP
dma: axidma: Variablise repeated s->streams[i] sub-expr
microblaze: ml605: Get rid of ddr_base variable
tests/bios-tables-test: check the value returned by fopen()
tcg: dump op count into qemu log
util/path: Use the GLib memory allocation routines
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Alexey Kardashevskiy [Wed, 20 Aug 2014 12:16:36 +0000 (22:16 +1000)]
spapr: Add support for new NMI interface
This implements an NMI interface POWERPC SPAPR machine.
This enables an "nmi" HMP/QMP command supported on SPAPR.
This calls POWERPC_EXCP_RESET (vector 0x100) in the guest to deliver NMI
to every CPU. The expected result is XMON (in-kernel debugger) invocation.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Alexey Kardashevskiy [Wed, 20 Aug 2014 12:16:35 +0000 (22:16 +1000)]
s390x: Migrate to new NMI interface
This implements an NMI interface for s390 and s390-ccw machines.
This removes #ifdef s390 branch in qmp_inject_nmi so new s390's
nmi_monitor_handler() callback is going to be used for NMI.
Since nmi_monitor_handler()-calling code is platform independent,
CPUState::cpu_index is used instead of S390CPU::env.cpu_num.
There should not be any change in behaviour as both @cpu_index and
@cpu_num are global CPU numbers.
Note that s390_cpu_restart() already takes care of the specified cpu,
so we don't need to schedule via async_run_on_cpu().
Since the only error s390_cpu_restart() can return is ENOSYS, convert
it to QERR_UNSUPPORTED.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Alexey Kardashevskiy [Wed, 20 Aug 2014 12:16:34 +0000 (22:16 +1000)]
s390x: Convert QEMUMachine to MachineClass
This converts s390-virtio and s390-ccw-virtio machines to QOM MachineClass.
This brings ability to add interfaces to the machine classes. The first
interface for addition will be NMI.
The patch is mechanical so no change in behavior is expected.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Alexey Kardashevskiy [Wed, 20 Aug 2014 12:16:33 +0000 (22:16 +1000)]
cpus: Define callback for QEMU "nmi" command
This introduces an NMI (Non Maskable Interrupt) interface with
a single nmi_monitor_handler() method. A machine or a device can
implement it. This searches for an QOM object with this interface
and if it is implemented, calls it. The callback implements an action
required to cause debug crash dump on in-kernel debugger invocation.
The callback returns Error**.
This adds a nmi_monitor_handle() helper which walks through
all objects to find the interface. The interface method is called
for all found instances.
This adds support for it in qmp_inject_nmi(). Since no architecture
supports it at the moment, there is no change in behaviour.
This changes inject-nmi command description for HMP and QMP.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Michael S. Tsirkin [Sun, 24 Aug 2014 20:45:29 +0000 (22:45 +0200)]
pcie: fix trailing whitespace
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Knut Omang [Sun, 24 Aug 2014 13:32:20 +0000 (15:32 +0200)]
ioh3420: Enable ARI forwarding
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Knut Omang [Sun, 24 Aug 2014 13:32:19 +0000 (15:32 +0200)]
ioh3420: Remove obsoleted, unused ioh3420_init function
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Knut Omang [Sun, 24 Aug 2014 13:32:18 +0000 (15:32 +0200)]
pcie: Rename the pcie_cap_ari_* functions to pcie_cap_arifwd_*
Rename helper functions to make a clearer distinction between
the PCIe capability/control register feature ARI forwarding and a
device that supports the ARI feature via an ARI extended PCIe capability.
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Knut Omang [Sun, 24 Aug 2014 13:32:17 +0000 (15:32 +0200)]
pcie: Fix incorrect write to the ari capability next function field
PCI_ARI_CAP_NFN, a macro for reading next function was used instead of
the intended write.
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Michael S. Tsirkin [Wed, 20 Aug 2014 21:20:13 +0000 (23:20 +0200)]
ssdt-tpm: add generated hex file to git
Needed for systems without IASL.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefan Berger [Mon, 11 Aug 2014 20:33:36 +0000 (16:33 -0400)]
Add ACPI tables for TPM
Add an SSDT ACPI table for the TPM device.
Add a TCPA table for BIOS logging area when a TPM is being used.
The latter follows this spec here:
http://www.trustedcomputinggroup.org/files/static_page_files/
DCD4188E-1A4B-B294-
D050A155FB6F7385/TCG_ACPIGeneralSpecification_PublicReview.pdf
This patch has Michael Tsirkin's patches folded in.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Michael S. Tsirkin [Wed, 20 Aug 2014 19:58:12 +0000 (21:58 +0200)]
pc: reserve more memory for ACPI for new machine types
commit
868270f23d8db2cce83e4f082fe75e8625a5fbf9
acpi-build: tweak acpi migration limits
broke kernel loading with -kernel/-initrd: it doubled
the size of ACPI tables but did not reserve
enough memory.
As a result, issues on boot and halt are observed.
Fix this up by doubling reserved memory for new machine types.
Cc: qemu-stable@nongnu.org
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Gonglei [Wed, 20 Aug 2014 05:52:30 +0000 (13:52 +0800)]
pcihp: fix possible array out of bounds
Prevent out-of-bounds array access on
acpi_pcihp_pci_status.
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Paolo Bonzini [Wed, 20 Aug 2014 15:50:05 +0000 (17:50 +0200)]
pci_bridge: manually destroy memory regions within PCIBridgeWindows
The regions are destroyed and recreated on configuration space accesses.
We need to destroy them before the containing PCIBridgeWindows object
is freed.
Reported-by: Gonglei <arei.gonglei@huawei.com>
Reported-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Ben Draper [Wed, 20 Aug 2014 12:27:14 +0000 (13:27 +0100)]
vmxnet3: Pad short frames to minimum size (60 bytes)
When running VMware ESXi under qemu-kvm the guest discards frames
that are too short. Short ARP Requests will be dropped, this prevents
guests on the same bridge as VMware ESXi from communicating. This patch
simply adds the padding on the network device itself.
Signed-off-by: Ben Draper <ben@xrsa.net>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Stefan Weil [Wed, 20 Aug 2014 09:02:10 +0000 (11:02 +0200)]
libdecnumber: Fix warnings from smatch (missing static, boolean operations)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
zhanghailiang [Fri, 22 Aug 2014 08:23:51 +0000 (16:23 +0800)]
linux-user: fix file descriptor leaks
Handle variable "fd_orig" going out of scope leaks the handle.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Stefan Weil [Fri, 18 Jul 2014 14:52:29 +0000 (16:52 +0200)]
po: Fix Makefile rules for in-tree builds without configuration
Adding 'update' to the phony targets fixes this error:
$ LANG=C make -C po update
make: Entering directory `/qemu/po'
LINK update
/qemu/po/de_DE.po: file not recognized: File format not recognized
collect2: error: ld returned 1 exit status
make: *** [update] Error 1
make: Leaving directory `/qemu/po'
Some other phony targets (build, install) were also added, and the
existing .PHONY statement was moved to a more prominent position at
the beginning of the Makefile.
The patch also fixes a 2nd bug. The default target should be 'all',
but instead 'modules' (from rules.mak) was the default. Fix this by
adding 'all' as a target before any include statement.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
zhanghailiang [Tue, 19 Aug 2014 08:30:17 +0000 (16:30 +0800)]
slirp/misc: Use the GLib memory allocation APIs
Here we don't check the return value of malloc() which may fail.
Use the g_new() instead, which will abort the program when
there is not enough memory.
Also, use g_strdup instead of strdup and remove the unnecessary
strdup function.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Liming Wang [Tue, 19 Aug 2014 02:24:05 +0000 (10:24 +0800)]
configure: no need to mkdir QMP
commit
7537fe04 QMP: QMP/ -> docs/qmp/
Above commit has moved last QMP files to docs/qmp and it's not necessary
to create QMP directory. So remove it from configure.
Signed-off-by: Liming Wang <liming.wang@canonical.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Peter Crosthwaite [Mon, 18 Aug 2014 00:53:12 +0000 (17:53 -0700)]
dma: axidma: Variablise repeated s->streams[i] sub-expr
This have 6 inline usages. Make it a bit more readable by using a local
variable.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Peter Crosthwaite [Mon, 18 Aug 2014 00:52:38 +0000 (17:52 -0700)]
microblaze: ml605: Get rid of ddr_base variable
It's a constant based on a macro. Just use the macro in place.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
zhanghailiang [Mon, 18 Aug 2014 07:54:33 +0000 (15:54 +0800)]
tests/bios-tables-test: check the value returned by fopen()
The function fopen() may fail, so check its return value.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Liu <john.liuli@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
zhanghailiang [Mon, 18 Aug 2014 07:58:08 +0000 (15:58 +0800)]
tcg: dump op count into qemu log
fopen() may fail and it does not check its return vaule here,
it is better to dump op count to the normal log file.
Signed-off-by: Li Liu <john.liuli@huawei.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
zhanghailiang [Mon, 18 Aug 2014 07:49:22 +0000 (15:49 +0800)]
util/path: Use the GLib memory allocation routines
In this file, we don't check the return value of malloc/strdup/realloc which may fail.
Instead of using these routines, we use the GLib memory APIs g_malloc/g_strdup/g_realloc.
They will exit on allocation failure, so there is no need to test for failure,
which would be fine for setup.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Peter Maydell [Fri, 22 Aug 2014 15:12:51 +0000 (16:12 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block patches
# gpg: Signature made Fri 22 Aug 2014 14:47:53 BST using RSA key ID
C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
* remotes/kevin/tags/for-upstream: (29 commits)
qemu-img: Allow cache mode specification for amend
qemu-img: Allow source cache mode specification
vmdk: Use bdrv_nb_sectors() where sectors, not bytes are wanted
blkdebug: Delete BH in bdrv_aio_cancel
qemu-iotests: add test case 101 for short file I/O
raw-posix: fix O_DIRECT short reads
block/iscsi: fix memory corruption on iscsi resize
block/vvfat.c: remove debugging code to reinit stderr if NULL
iotests: Add test for image filename construction
quorum: Implement bdrv_refresh_filename()
nbd: Implement bdrv_refresh_filename()
blkverify: Implement bdrv_refresh_filename()
blkdebug: Implement bdrv_refresh_filename()
block: Add bdrv_refresh_filename()
virtio-blk: fix reference a pointer which might be freed
virtio-blk: allow block_resize with dataplane
block: acquire AioContext in qmp_block_resize()
qemu-iotests: Fix 028 reference output for qed
test-coroutine: test cost introduced by coroutine
iotests: Add test for qcow2's cache options
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Fri, 22 Aug 2014 13:39:53 +0000 (14:39 +0100)]
Merge remote-tracking branch 'remotes/riku/linux-user-for-upstream' into staging
* remotes/riku/linux-user-for-upstream: (22 commits)
linux-user: check return value of malloc()
linux-user: writev Partial Writes
linux-user: Support target-to-host translation of mlockall argument
linux-user: clock_nanosleep errno Handling on PPC
linux-user: Minimum Sig Handler Stack Size for PPC64 ELF V2
linux-user: Move get_ppc64_abi
linux-user: Detect fault in sched_rr_get_interval
linux-user: Handle NULL sched_param argument to sched_*
linux-user: Detect Negative Message Sizes in msgsnd System Call
linux-user: Conditionally Pass Attribute Pointer to mq_open()
linux-user: Make ipc syscall's third argument an abi_long
linux-user: Properly Handle semun Structure In Cross-Endian Situations
linux-user: Dereference Pointer Argument to ipc/semctl Sys Call
linux-user: PPC64 semid_ds Doesnt Include _unused1 and _unused2
linux-user: add setns and unshare
linux-user: support ioprio_{get, set} syscalls
linux-user: support timerfd_{create, gettime, settime} syscalls
linux-user: fix readlink handling with magic exe symlink
linux-user: Fix conversion of sigevent argument to timer_create
linux-user: Fix syscall instruction usermode emulation on X86_64
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Max Reitz [Tue, 22 Jul 2014 20:58:43 +0000 (22:58 +0200)]
qemu-img: Allow cache mode specification for amend
qemu-img amend may extensively modify the target image, depending on the
options to be amended (e.g. conversion to qcow2 compat level 0.10 from
1.1 for an image with many unallocated zero clusters). Therefore it
makes sense to allow the user to specify the cache mode to be used.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max Reitz [Tue, 22 Jul 2014 20:58:42 +0000 (22:58 +0200)]
qemu-img: Allow source cache mode specification
Many qemu-img subcommands only read the source file(s) once. For these
use cases, a full write-back cache is unnecessary and mainly clutters
host cache memory. Though this is generally no concern as cache memory
is freely available and can be scaled by the host OS, it may become a
concern with thin provisioning.
For these cases, it makes sense to allow users to freely specify the
source cache mode (e.g. use no cache at all).
This commit adds a new switch (-T) for the qemu-img subcommands check,
compare, convert and rebase to specify the cache to be used for source
images (the backing file in case of rebase).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
zhanghailiang [Thu, 14 Aug 2014 07:29:18 +0000 (15:29 +0800)]
linux-user: check return value of malloc()
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Acked-by: Riku Voipio <riku.voipio@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:43 +0000 (13:53 -0500)]
linux-user: writev Partial Writes
Although not technically not required by POSIX, the writev system call will
typically write out its buffers individually. That is, if the first buffer
is written successfully, but the second buffer pointer is invalid, then
the first chuck will be written and its size is returned.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:42 +0000 (13:53 -0500)]
linux-user: Support target-to-host translation of mlockall argument
The argument to the mlockall system call is not necessarily the same on
all platforms and thus may require translation prior to passing to the
host.
For example, PowerPC 64 bit platforms define values for MCL_CURRENT
(0x2000) and MCL_FUTURE (0x4000) which are different from Intel platforms
(0x1 and 0x2, respectively)
Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:41 +0000 (13:53 -0500)]
linux-user: clock_nanosleep errno Handling on PPC
The clock_nanosleep syscall is unusual in that it returns positive
numbers in error handling situations, versus returning -1 and setting
errno, or returning a negative errno value. On POWER, the kernel will
set the SO bit of CR0 to indicate failure in a syscall. QEMU has
generic handling to do this for syscalls with standard return values.
Add special case code for clock_nanosleep to handle CR0 properly.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:40 +0000 (13:53 -0500)]
linux-user: Minimum Sig Handler Stack Size for PPC64 ELF V2
The ELF V2 ABI for PPC64 defines MINSIGSTKSZ as 4096 bytes whereas it was
2048 previously.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Wed, 13 Aug 2014 19:04:44 +0000 (14:04 -0500)]
linux-user: Move get_ppc64_abi
The get_ppc64_abi is used to determine the ELF ABI (i.e. V1 or V2). This
routine is currently implemented in the linux-user/elfload.c file but
is useful in other scenarios. Move the routine to a more generally
available location (linux-user/ppc/target_cpu.h).
Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:39 +0000 (13:53 -0500)]
linux-user: Detect fault in sched_rr_get_interval
Properly detect a fault when attempting to store into an invalid
struct timespec pointer.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:38 +0000 (13:53 -0500)]
linux-user: Handle NULL sched_param argument to sched_*
The sched_getparam, sched_setparam and sched_setscheduler system
calls take a pointer argument to a sched_param structure. When
this pointer is null, errno should be set to EINVAL.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:37 +0000 (13:53 -0500)]
linux-user: Detect Negative Message Sizes in msgsnd System Call
The msgsnd system call takes an argument that describes the message
size (msgsz) and is of type size_t. The system call should set
errno to EINVAL in the event that a negative message size is passed.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:36 +0000 (13:53 -0500)]
linux-user: Conditionally Pass Attribute Pointer to mq_open()
The mq_open system call takes an optional struct mq_attr pointer
argument in the fourth position. This pointer is used when O_CREAT
is specified in the flags (second) argument. It may be NULL, in
which case the queue is created with implementation defined attributes.
Change the code to properly handle the case when NULL is passed in the
arg4 position.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:35 +0000 (13:53 -0500)]
linux-user: Make ipc syscall's third argument an abi_long
For those target ABIs that use the ipc system call (e.g. POWER),
the third argument is used in the shmat path as a pointer. It
therefore must be declared as an abi_long (versus int) so that
the address bits are not lost in truncation. In fact, all arguments
to do_ipc should be declared as abit_long.
In fact, it makes more sense for all of the arguments to be declaried
as abi_long (except call).
Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:34 +0000 (13:53 -0500)]
linux-user: Properly Handle semun Structure In Cross-Endian Situations
The semun union used in the semctl system call contains both an int (val) and
pointers. In cross-endian situations on 64 bit targets, the value passed to
semctl is an 8 byte (abi_long) value and thus does not have the 4-byte val
field in the correct location. In order to rectify this, the other half
of the union must be accessed. This is achieved in code by performing
a byte swap on the entire 8 byte union, followed by a 4-byte swap of the
first half.
Also, eliminate an extraneous (dead) line of code that sets target_su.val in
the IPC_SET/IPC_GET case.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:33 +0000 (13:53 -0500)]
linux-user: Dereference Pointer Argument to ipc/semctl Sys Call
When the ipc system call is used to wrap a semctl system call,
the ptr argument to ipc needs to be dereferenced prior to passing
it to the semctl handler. This is because the fourth argument to
semctl is a union and not a pointer to a union.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:32 +0000 (13:53 -0500)]
linux-user: PPC64 semid_ds Doesnt Include _unused1 and _unused2
The 64 bit PowerPC platforms eliminate the _unused1 and _unused2
elements of the semid_ds structure from <sys/sem.h>. So eliminate
these from the target_semid_ds structure.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Riku Voipio [Tue, 12 Aug 2014 12:58:57 +0000 (15:58 +0300)]
linux-user: add setns and unshare
Add support for the setns and unshare syscalls, trivially passed through to
the host. Based on patches by Paul Burton, added configure check.
Signed-off-by: Paul Burton <paul@archlinuxmips.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Paul Burton [Sun, 22 Jun 2014 10:25:43 +0000 (11:25 +0100)]
linux-user: support ioprio_{get, set} syscalls
Add support for the ioprio_get & ioprio_set syscalls, allowing their
use by target programs.
Signed-off-by: Paul Burton <paul@archlinuxmips.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Riku Voipio [Sun, 22 Jun 2014 10:25:42 +0000 (11:25 +0100)]
linux-user: support timerfd_{create, gettime, settime} syscalls
Adds support for the timerfd_create, timerfd_gettime & timerfd_settime
syscalls, allowing use of timerfds by target programs.
v2: By Riku - added configure check for timerfd and ifdefs
for benefit of old distributions like RHEL5.
Signed-off-by: Paul Burton <paul@archlinuxmips.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Mike Frysinger [Fri, 8 Aug 2014 00:40:25 +0000 (09:40 +0900)]
linux-user: fix readlink handling with magic exe symlink
The current code always returns the length of the path when it should
be returning the number of bytes it wrote to the output string.
Further, readlink is not supposed to append a NUL byte, but the current
snprintf logic will always do just that.
Even further, if you pass in a length of 0, you're suppoesd to get back
an error (EINVAL), but the current logic just returns 0.
Further still, if there was an error reading the symlink, we should not
go ahead and try to read the target buffer as it is garbage.
Simple test for the first two issues:
$ cat test.c
int main() {
char buf[50];
size_t len;
for (len = 0; len < 10; ++len) {
memset(buf, '!', sizeof(buf));
ssize_t ret = readlink("/proc/self/exe", buf, len);
buf[20] = '\0';
printf("readlink(/proc/self/exe, {%s}, %zu) = %zi\n", buf, len, ret);
}
return 0;
}
Now compare the output of the native:
$ gcc test.c -o /tmp/x
$ /tmp/x
$ strace /tmp/x
With what qemu does:
$ armv7a-cros-linux-gnueabi-gcc test.c -o /tmp/x -static
$ qemu-arm /tmp/x
$ qemu-arm -strace /tmp/x
Signed-off-by: Mike Frysinger <vapier@chromium.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Peter Maydell [Sat, 9 Aug 2014 14:42:32 +0000 (15:42 +0100)]
linux-user: Fix conversion of sigevent argument to timer_create
There were a number of bugs in the conversion of the sigevent
argument to timer_create from target to host format:
* signal number not converted from target to host
* thread ID not copied across
* sigev_value not copied across
* we never unlocked the struct when we were done
Between them, these problems meant that SIGEV_THREAD_ID
timers (and the glibc-implemented SIGEV_THREAD timers which
depend on them) didn't work.
Fix these problems and clean up the code a little by pulling
the struct conversion out into its own function, in line with
how we convert various other structs. This allows the test
program in bug LP:1042388 to run.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Jincheng Miao [Fri, 8 Aug 2014 03:56:54 +0000 (11:56 +0800)]
linux-user: Fix syscall instruction usermode emulation on X86_64
Currently syscall instruction is buggy on user mode X86_64,
the EIP is updated after do_syscall(), that is too late for
clone(). Because clone() will create a thread at the env->EIP
(the address of syscall insn), and then child thread enters
do_syscall() again, that is not expected. Sometimes it is tragic.
User mode syscall insn emulation is not used MSR, so the
action should be same to INT 0x80. INT 0x80 will update EIP in
do_interrupt(), ditto for syscall() for consistency.
Signed-off-by: Jincheng Miao <jmiao@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Riku Voipio [Wed, 6 Aug 2014 07:36:37 +0000 (10:36 +0300)]
linux-user: redirect openat calls
While Mikhail fixed /proc/self/maps, it was noticed openat calls are
not redirected currently. Some archs don't have open at all, so
openat needs to be redirected.
Fix this by consolidating open/openat code to do_openat - open
is implemented using openat(AT_FDCWD, ... ), which according
to open(2) man page is identical.
Since all targets now have openat, remove the ifdef around sys_openat
and openat: case in do_syscall.
Cc: Mikhail Ilin <m.ilin@samsung.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Mikhail Ilyin [Tue, 5 Aug 2014 13:33:51 +0000 (17:33 +0400)]
linux-user: /proc/self/maps content
Build /proc/self/maps doing a match against guest memory translation table.
Output only that map records which are valid for guest memory layout.
Signed-off-by: Mikhail Ilyin <m.ilin@samsung.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Markus Armbruster [Thu, 21 Aug 2014 12:36:19 +0000 (14:36 +0200)]
vmdk: Use bdrv_nb_sectors() where sectors, not bytes are wanted
Instead of bdrv_getlength().
Commit 57322b7 did this all over block, but one more bdrv_getlength()
has crept in since.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>