sdk/emulator/qemu.git
8 years agoscsi-disk: introduce dma_readv and dma_writev
Paolo Bonzini [Tue, 10 May 2016 08:13:00 +0000 (10:13 +0200)]
scsi-disk: introduce dma_readv and dma_writev

These are replacements for blk_aio_readv and blk_aio_writev that allow
customization of the data path.  They reuse the DMA helpers' DMAIOFunc
callback type, so that the same function can be used in either the
QEMUSGList or the bounce-buffered case.

This customization will be needed in the next patch to do zero-copy
SG_IO on scsi-block.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agoscsi-disk: introduce a common base class
Paolo Bonzini [Tue, 10 May 2016 08:10:49 +0000 (10:10 +0200)]
scsi-disk: introduce a common base class

This will be the place to add DMAIOFuncs in the next patch.  There
are also a couple DeviceClass members that can be moved to the
abstract class's initialization function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agoxen-hvm: ignore background I/O sections
Paul Durrant [Mon, 9 May 2016 16:31:20 +0000 (17:31 +0100)]
xen-hvm: ignore background I/O sections

Since Xen will correctly handle accesses to unimplemented I/O ports (by
returning all 1's for reads and ignoring writes) there is no need for
QEMU to register backgroud I/O sections.

This patch therefore adds checks to xen_io_add/del so that sections with
memory-region ops pointing at 'unassigned_io_ops' are ignored.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1462811480-16295-1-git-send-email-paul.durrant@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agodocs/atomics: update comparison with Linux
Paolo Bonzini [Wed, 25 May 2016 12:23:27 +0000 (14:23 +0200)]
docs/atomics: update comparison with Linux

Over time, some differences between QEMU and Linux atomics are getting
smoothed.  In particular, Linux grew atomic_fetch_or (and in general
the differences regarding RMW operations were not described accurately)
and smp_load_acquire/smp_store_release.  Also, set_mb was renamed to
smp_store_mb().  Include these changes in the documentation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agoatomics: do not emit consume barrier for atomic_rcu_read
Emilio G. Cota [Tue, 24 May 2016 20:06:14 +0000 (16:06 -0400)]
atomics: do not emit consume barrier for atomic_rcu_read

Currently we emit a consume-load in atomic_rcu_read.  Because of
limitations in current compilers, this is overkill for non-Alpha hosts
and it is only useful to make Thread Sanitizer work.

This patch leaves the consume-load in atomic_rcu_read when
compiling with Thread Sanitizer enabled, and resorts to a
relaxed load + smp_read_barrier_depends otherwise.

On an RMO host architecture, such as aarch64, the performance
improvement of this change is easily measurable. For instance,
qht-bench performs an atomic_rcu_read on every lookup. Performance
before and after applying this patch:

$ tests/qht-bench -d 5 -n 1
Before: 9.78 MT/s
After:  10.96 MT/s

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1464120374-8950-4-git-send-email-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agoatomics: emit an smp_read_barrier_depends() barrier only for Alpha and Thread Sanitizer
Emilio G. Cota [Tue, 24 May 2016 20:06:13 +0000 (16:06 -0400)]
atomics: emit an smp_read_barrier_depends() barrier only for Alpha and Thread Sanitizer

For correctness, smp_read_barrier_depends() is only required to
emit a barrier on Alpha hosts. However, we are currently emitting
a consume fence unconditionally, and most compilers currently treat
consume and acquire fences as equivalent.

Fix it by keeping the consume fence if we're compiling with Thread
Sanitizer, since this might help prevent false warnings. Otherwise,
only emit the barrier for Alpha hosts. Note that we still guarantee
that smp_read_barrier_depends() is a compiler barrier.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1464120374-8950-3-git-send-email-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agodocs/atomics: update atomic_read/set comparison with Linux
Emilio G. Cota [Tue, 24 May 2016 20:06:12 +0000 (16:06 -0400)]
docs/atomics: update atomic_read/set comparison with Linux

Recently Linux did a mass conversion of its atomic_read/set calls
so that they at least are READ/WRITE_ONCE. See Linux's commit
62e8a325 ("atomic, arch: Audit atomic_{read,set}()"). It seems though
that their documentation hasn't been updated to reflect this.

The appended updates our documentation to reflect the change, which
means there is effectively no difference between our atomic_read/set
and the current Linux implementation.

While at it, fix the statement that a barrier is implied by
atomic_read/set, which is incorrect. Volatile/atomic semantics prevent
transformations pertaining the variable they apply to; this, however,
has no effect on surrounding statements like barriers do. For more
details on this, see:
  https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1464120374-8950-2-git-send-email-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agobt: rewrite csrhci_write to avoid out-of-bounds writes
Paolo Bonzini [Fri, 20 May 2016 08:35:15 +0000 (10:35 +0200)]
bt: rewrite csrhci_write to avoid out-of-bounds writes

The usage of INT_MAX in this function confuses Coverity.  I think
the defect is bogus, however there is no protection against
getting more than sizeof(s->inpkt) bytes from the character device
backend.

Rewrite the function to only fill in as much data as needed from
buf into s->inpkt.  The plen variable is replaced by a simple
state machine and there is no need anymore to shift contents to
the beginning of s->inpkt.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agoblock/iscsi: avoid potential overflow of acb->task->cdb
Peter Lieven [Tue, 24 May 2016 08:59:28 +0000 (10:59 +0200)]
block/iscsi: avoid potential overflow of acb->task->cdb

at least in the path via virtio-blk the maximum size is not
restricted.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1464080368-29584-1-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agoscsi: megasas: check 'read_queue_head' index value
Prasad J Pandit [Wed, 25 May 2016 12:25:10 +0000 (17:55 +0530)]
scsi: megasas: check 'read_queue_head' index value

While doing MegaRAID SAS controller command frame lookup, routine
'megasas_lookup_frame' uses 'read_queue_head' value as an index
into 'frames[MEGASAS_MAX_FRAMES=2048]' array. Limit its value
within array bounds to avoid any OOB access.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464179110-18593-1-git-send-email-ppandit@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agoscsi: megasas: initialise local configuration data buffer
Prasad J Pandit [Wed, 25 May 2016 12:11:44 +0000 (17:41 +0530)]
scsi: megasas: initialise local configuration data buffer

When reading MegaRAID SAS controller configuration via MegaRAID
Firmware Interface(MFI) commands, routine megasas_dcmd_cfg_read
uses an uninitialised local data buffer. Initialise this buffer
to avoid stack information leakage.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464178304-12831-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agoscsi: megasas: use appropriate property buffer size
Prasad J Pandit [Wed, 25 May 2016 10:31:29 +0000 (16:01 +0530)]
scsi: megasas: use appropriate property buffer size

When setting MegaRAID SAS controller properties via MegaRAID
Firmware Interface(MFI) commands, a user supplied size parameter
is used to set property value. Use appropriate size value to avoid
OOB access issues.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464172291-2856-2-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agoscsi: mptsas: infinite loop while fetching requests
Prasad J Pandit [Tue, 24 May 2016 08:07:44 +0000 (13:37 +0530)]
scsi: mptsas: infinite loop while fetching requests

The LSI SAS1068 Host Bus Adapter emulator in Qemu, periodically
looks for requests and fetches them. A loop doing that in
mptsas_fetch_requests() could run infinitely if 's->state' was
not operational. Move check to avoid such a loop.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-stable@nongnu.org
Message-Id: <1464077264-25473-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agoscsi: pvscsi: check command descriptor ring buffer size (CVE-2016-4952)
Prasad J Pandit [Mon, 23 May 2016 10:48:05 +0000 (16:18 +0530)]
scsi: pvscsi: check command descriptor ring buffer size (CVE-2016-4952)

Vmware Paravirtual SCSI emulation uses command descriptors to
process SCSI commands. These descriptors come with their ring
buffers. A guest could set the ring buffer size to an arbitrary
value leading to OOB access issue. Add check to avoid it.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-stable@nongnu.org
Message-Id: <1464000485-27041-1-git-send-email-ppandit@redhat.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agokvm_stat: Remove
Paolo Bonzini [Tue, 24 May 2016 08:54:42 +0000 (10:54 +0200)]
kvm_stat: Remove

The source has moved to the Linux kernel tree.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agonbd: Don't trim unrequested bytes
Eric Blake [Wed, 25 May 2016 10:59:25 +0000 (04:59 -0600)]
nbd: Don't trim unrequested bytes

Similar to commit df7b97ff, we are mishandling clients that
give an unaligned NBD_CMD_TRIM request, and potentially
trimming bytes that occur before their request; which in turn
can cause potential unintended data loss (unlikely in
practice, since most clients are sane and issue aligned trim
requests).  However, while we fixed read and write by switching
to the byte interfaces of blk_, we don't yet have a byte
interface for discard.  On the other hand, trim is advisory, so
rounding the user's request to simply ignore the first and last
unaligned sectors (or the entire request, if it is sub-sector
in length) is just fine.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1464173965-9694-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agohw/char: QOM'ify milkymist-uart.c
xiaoqiang zhao [Wed, 25 May 2016 06:39:04 +0000 (14:39 +0800)]
hw/char: QOM'ify milkymist-uart.c

drop the qemu_char_get_next_serial and use chardev prop instead

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-6-git-send-email-zxq_yx_007@163.com>
Tested-by: Michael Walle <michael@walle.cc>
Acked-by: Michael Walle <michael@walle.cc>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agohw/char: QOM'ify lm32_uart.c
xiaoqiang zhao [Wed, 25 May 2016 06:39:03 +0000 (14:39 +0800)]
hw/char: QOM'ify lm32_uart.c

* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial
* Add lm32_uart_create function to create lm32 uart device

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-5-git-send-email-zxq_yx_007@163.com>
Tested-by: Michael Walle <michael@walle.cc>
Acked-by: Michael Walle <michael@walle.cc>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agohw/char: QOM'ify lm32_juart.c
xiaoqiang zhao [Wed, 25 May 2016 06:39:02 +0000 (14:39 +0800)]
hw/char: QOM'ify lm32_juart.c

* Drop the old SysBus init function
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-4-git-send-email-zxq_yx_007@163.com>
Tested-by: Michael Walle <michael@walle.cc>
Acked-by: Michael Walle <michael@walle.cc>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agohw/char: QOM'ify etraxfs_ser.c
xiaoqiang zhao [Wed, 25 May 2016 06:39:01 +0000 (14:39 +0800)]
hw/char: QOM'ify etraxfs_ser.c

* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial
* Add etraxfs_ser_create function to create etraxfs serial device

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-3-git-send-email-zxq_yx_007@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agohw/char: QOM'ify escc.c
xiaoqiang zhao [Wed, 25 May 2016 06:39:00 +0000 (14:39 +0800)]
hw/char: QOM'ify escc.c

* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-2-git-send-email-zxq_yx_007@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agoRevert "memory: Drop FlatRange.romd_mode"
Paolo Bonzini [Tue, 24 May 2016 19:26:28 +0000 (21:26 +0200)]
Revert "memory: Drop FlatRange.romd_mode"

This reverts commit 5b5660adf1fdb61db14ec681b10463b8cba633f1,
as it breaks the UEFI guest firmware (known as ArmVirtPkg or AAVMF)
running in the "virt" machine type of "qemu-system-aarch64":

Contrary to the commit message, (a->mr == b->mr) does *not* imply
that (a->romd_mode == b->romd_mode): the pflash device model calls
memory_region_rom_device_set_romd() -- for switching between the above
modes --, and that function changes mr->romd_mode but the current
AddressSpaceDispatch's FlatRange keeps the old value.  Therefore
region_del/region_add are not called on the KVM MemoryListener.

Reported-by: Drew Jones <drjones@redhat.com>
Tested-by: Drew Jones <drjones@redhat.com>
Analyzed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8 years agoMerge remote-tracking branch 'remotes/riku/tags/pull-linux-user-20160527' into staging
Peter Maydell [Fri, 27 May 2016 13:05:48 +0000 (14:05 +0100)]
Merge remote-tracking branch 'remotes/riku/tags/pull-linux-user-20160527' into staging

linux-user pull request v2 for may 2016

# gpg: Signature made Fri 27 May 2016 12:51:10 BST using RSA key ID DE3C9BC0
# gpg: Good signature from "Riku Voipio <riku.voipio@iki.fi>"
# gpg:                 aka "Riku Voipio <riku.voipio@linaro.org>"

* remotes/riku/tags/pull-linux-user-20160527: (38 commits)
  linux-user,target-ppc: fix use of MSR_LE
  linux-user/signal.c: Use s390 target space address instead of host space
  linux-user/signal.c: Use target address instead of host address for microblaze restorer
  linux-user/signal.c: Generate opcode data for restorer in setup_rt_frame
  linux-user: arm: Remove ARM_cpsr and similar #defines
  linux-user: Use direct syscalls for setuid(), etc
  linux-user: x86_64: Don't use 16-bit UIDs
  linux-user: Use g_try_malloc() in do_msgrcv()
  linux-user: Handle msgrcv error case correctly
  linux-user: Handle negative values in timespec conversion
  linux-user: Use safe_syscall for futex syscall
  linux-user: Use safe_syscall for pselect, select syscalls
  linux-user: Use safe_syscall for execve syscall
  linux-user: Use safe_syscall for wait system calls
  linux-user: Use safe_syscall for open and openat system calls
  linux-user: Use safe_syscall for read and write system calls
  linux-user: Provide safe_syscall for fixing races between signals and syscalls
  linux-user: Add debug code to exercise restarting system calls
  linux-user: Support for restarting system calls for Microblaze targets
  linux-user: Set r14 on exit from microblaze syscall
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agolinux-user,target-ppc: fix use of MSR_LE
Laurent Vivier [Wed, 30 Mar 2016 16:36:51 +0000 (18:36 +0200)]
linux-user,target-ppc: fix use of MSR_LE

setup_frame()/setup_rt_frame()/restore_user_regs() are using
MSR_LE as the similar kernel functions do: as a bitmask.

But in QEMU, MSR_LE is a bit position, so change this
accordingly.

The previous code was doing nothing as MSR_LE is 0,
and "env->msr &= ~MSR_LE" doesn't change the value of msr.

And yes, a user process can change its endianness,
see linux kernel commit:

    fab5db9 [PATCH] powerpc: Implement support for setting little-endian mode via prctl

and prctl(2): PR_SET_ENDIAN, PR_GET_ENDIAN

Reviewed-by: Thomas Huth <huth@tuxfamily.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user/signal.c: Use s390 target space address instead of host space
Chen Gang [Tue, 24 May 2016 11:54:32 +0000 (14:54 +0300)]
linux-user/signal.c: Use s390 target space address instead of host space

The return address is in target space, so the restorer address needs to
be target space, too.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
8 years agolinux-user/signal.c: Use target address instead of host address for microblaze restorer
Chen Gang [Tue, 29 Mar 2016 14:13:45 +0000 (22:13 +0800)]
linux-user/signal.c: Use target address instead of host address for microblaze restorer

The return address is in target space, so the restorer address needs to
be target space, too.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user/signal.c: Generate opcode data for restorer in setup_rt_frame
Chen Gang [Tue, 29 Mar 2016 13:53:49 +0000 (21:53 +0800)]
linux-user/signal.c: Generate opcode data for restorer in setup_rt_frame

Original implementation uses do_rt_sigreturn directly in host space,
when a guest program is in unwind procedure in guest space, it will get
an incorrect restore address, then causes unwind failure.

Also cleanup the original incorrect indentation.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: arm: Remove ARM_cpsr and similar #defines
Peter Maydell [Thu, 3 Mar 2016 12:11:18 +0000 (12:11 +0000)]
linux-user: arm: Remove ARM_cpsr and similar #defines

The #defines of ARM_cpsr and friends in linux-user/arm/target-syscall.h
can clash with versions in the system headers if building on an
ARM or AArch64 build (though this seems to be dependent on the version
of the system headers). The QEMU defines are not very useful (it's
not clear that they're intended for use with the target_pt_regs struct
rather than (say) the CPUARMState structure) and we only use them in one
function in elfload.c anyway. So just remove the #defines and directly
access regs->uregs[].

Reported-by: Christopher Covington <cov@codeaurora.org>
Tested-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Use direct syscalls for setuid(), etc
Peter Maydell [Tue, 1 Mar 2016 16:33:02 +0000 (16:33 +0000)]
linux-user: Use direct syscalls for setuid(), etc

On Linux the setuid(), setgid(), etc system calls have different semantics
from the libc functions. The libc functions follow POSIX and update the
credentials for all threads in the process; the system calls update only
the thread which makes the call. (This impedance mismatch is worked around
in libc by signalling all threads to tell them to do a syscall, in a
byzantine and fragile way; see http://ewontfix.com/17/.)

Since in linux-user we are trying to emulate the system call semantics,
we must implement all these syscalls to directly call the underlying
host syscall, rather than calling the host libc function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: x86_64: Don't use 16-bit UIDs
Peter Maydell [Tue, 1 Mar 2016 16:25:17 +0000 (16:25 +0000)]
linux-user: x86_64: Don't use 16-bit UIDs

The 64-bit x86 syscall ABI uses 32-bit UIDs; only define
USE_UID16 for 32-bit x86.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Use g_try_malloc() in do_msgrcv()
Peter Maydell [Fri, 20 May 2016 18:00:57 +0000 (19:00 +0100)]
linux-user: Use g_try_malloc() in do_msgrcv()

In do_msgrcv() we want to allocate a message buffer, whose size
is passed to us by the guest. That means we could legitimately
fail, so use g_try_malloc() and handle the error case, in the same
way that do_msgsnd() does.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Handle msgrcv error case correctly
Peter Maydell [Fri, 20 May 2016 18:00:56 +0000 (19:00 +0100)]
linux-user: Handle msgrcv error case correctly

The msgrcv ABI is a bit odd -- the msgsz argument is a size_t, which is
unsigned, but it must fail EINVAL if the value is negative when cast
to a long. We were incorrectly passing the value through an
"unsigned int", which meant that if the guest was 32-bit longs and
the host was 64-bit longs an input of 0xffffffff (which should trigger
EINVAL) would simply be passed to the host msgrcv() as 0xffffffff,
where it does not cause the host kernel to reject it.
Follow the same approach as do_msgsnd() in using a ssize_t and
doing the check for negative values by hand, so we correctly fail
in this corner case.

This fixes the msgrcv03 Linux Test Project test case, which otherwise
hangs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Handle negative values in timespec conversion
Peter Maydell [Thu, 19 May 2016 11:01:40 +0000 (12:01 +0100)]
linux-user: Handle negative values in timespec conversion

In a struct timespec, both fields are signed longs. Converting
them from guest to host with code like
    host_ts->tv_sec = tswapal(target_ts->tv_sec);
mishandles negative values if the guest has 32-bit longs and
the host has 64-bit longs because tswapal()'s return type is
abi_ulong: the assignment will zero-extend into the host long
type rather than sign-extending it.

Make the conversion routines use __get_user() and __set_user()
instead: this automatically picks up the signedness of the
field type and does the correct kind of sign or zero extension.
It also handles the possibility that the target struct is not
sufficiently aligned for the host's requirements.

In particular, this fixes a hang when running the Linux Test Project
mq_timedsend01 and mq_timedreceive01 tests: one of the test cases
sets the timeout to -1 and expects an EINVAL failure, but we were
setting a very long timeout instead.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Use safe_syscall for futex syscall
Peter Maydell [Thu, 12 May 2016 17:47:52 +0000 (18:47 +0100)]
linux-user: Use safe_syscall for futex syscall

Use the safe_syscall wrapper for the futex syscall.

In particular, this fixes hangs when using programs that link
against the Boehm garbage collector, including the Mono runtime.

(We don't change the sys_futex() call in the implementation of
the exit syscall, because as the FIXME comment there notes
that should be handled by disabling signals, since we can't
easily back out if the futex were to return ERESTARTSYS.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Use safe_syscall for pselect, select syscalls
Peter Maydell [Thu, 12 May 2016 17:47:51 +0000 (18:47 +0100)]
linux-user: Use safe_syscall for pselect, select syscalls

Use the safe_syscall wrapper for the pselect and select syscalls.
Since not every architecture has the select syscall, we now
have to implement select in terms of pselect, which means doing
timeval<->timespec conversion.

(Five years on from the initial patch that added pselect support
to QEMU and a decade after pselect6 went into the kernel, it seems
safe to not try to support hosts with header files which don't
define __NR_pselect6.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Use safe_syscall for execve syscall
Timothy E Baldwin [Thu, 12 May 2016 17:47:50 +0000 (18:47 +0100)]
linux-user: Use safe_syscall for execve syscall

Wrap execve() in the safe-syscall handling. Although execve() is not
an interruptible syscall, it is a special case: if we allow a signal
to happen before we make the host$ syscall then we will 'lose' it,
because at the point of execve the process leaves QEMU's control.  So
we use the safe syscall wrapper to ensure that we either take the
signal as a guest signal, or else it does not happen before the
execve completes and makes it the other program's problem.

The practical upshot is that without this SIGTERM could fail to
terminate the process.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-25-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: expanded commit message to explain in more detail why this is
 needed, and add comment about it too]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Use safe_syscall for wait system calls
Timothy E Baldwin [Thu, 12 May 2016 17:47:49 +0000 (18:47 +0100)]
linux-user: Use safe_syscall for wait system calls

Use safe_syscall for waitpid, waitid and wait4 syscalls. Note that this
change allows us to implement support for waitid's fifth (rusage) argument
in future; for the moment we ignore it as we have done up til now.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-18-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: Adjust to new safe_syscall convention. Add fifth waitid syscall argument
 (which isn't present in the libc interface but is in the syscall ABI)]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Use safe_syscall for open and openat system calls
Timothy E Baldwin [Thu, 12 May 2016 17:47:48 +0000 (18:47 +0100)]
linux-user: Use safe_syscall for open and openat system calls

Restart open() and openat() if signals occur before,
or during with SA_RESTART.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-17-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: Adjusted to follow new -1-and-set-errno safe_syscall convention]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Use safe_syscall for read and write system calls
Timothy E Baldwin [Thu, 12 May 2016 17:47:47 +0000 (18:47 +0100)]
linux-user: Use safe_syscall for read and write system calls

Restart read() and write() if signals occur before, or during with SA_RESTART

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-15-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: Update to new safe_syscall() convention of setting errno]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Provide safe_syscall for fixing races between signals and syscalls
Timothy E Baldwin [Thu, 12 May 2016 17:47:46 +0000 (18:47 +0100)]
linux-user: Provide safe_syscall for fixing races between signals and syscalls

If a signal is delivered immediately before a blocking system call the
handler will only be called after the system call returns, which may be a
long time later or never.

This is fixed by using a function (safe_syscall) that checks if a guest
signal is pending prior to making a system call, and if so does not call the
system call and returns -TARGET_ERESTARTSYS. If a signal is received between
the check and the system call host_signal_handler() rewinds execution to
before the check. This rewinding has the effect of closing the race window
so that safe_syscall will reliably either (a) go into the host syscall
with no unprocessed guest signals pending or or (b) return
-TARGET_ERESTARTSYS so that the caller can deal with the signals.
Implementing this requires a per-host-architecture assembly language
fragment.

This will also resolve the mishandling of the SA_RESTART flag where
we would restart a host system call and not call the guest signal handler
until the syscall finally completed -- syscall restarting now always
happens at the guest syscall level so the guest signal handler will run.
(The host syscall will never be restarted because if the host kernel
rewinds the PC to point at the syscall insn for a restart then our
host_signal_handler() will see this and arrange the guest PC rewind.)

This commit contains the infrastructure for implementing safe_syscall
and the assembly language fragment for x86-64, but does not change any
syscalls to use it.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-14-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM:
 * Avoid having an architecture if-ladder in configure by putting
   linux-user/host/$(ARCH) on the include path and including
   safe-syscall.inc.S from it
 * Avoid ifdef ladder in signal.c by creating new hostdep.h to hold
   host-architecture-specific things
 * Added copyright/license header to safe-syscall.inc.S
 * Rewrote commit message
 * Added comments to safe-syscall.inc.S
 * Changed calling convention of safe_syscall() to match syscall()
   (returns -1 and host error in errno on failure)
 * Added a long comment in qemu.h about how to use safe_syscall()
   to implement guest syscalls.
]
RV: squashed Peters "fixup! linux-user: compile on non-x86-64 hosts"
patch
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agolinux-user: Add debug code to exercise restarting system calls
Timothy E Baldwin [Thu, 12 May 2016 17:47:45 +0000 (18:47 +0100)]
linux-user: Add debug code to exercise restarting system calls

If DEBUG_ERESTARTSYS is set restart all system calls once. This
is pure debug code for exercising the syscall restart code paths
in the per-architecture cpu main loops.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-10-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: Add comment and a commented-out #define next to the commented-out
 generic DEBUG #define; remove the check on TARGET_USE_ERESTARTSYS;
 tweak comment message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Support for restarting system calls for Microblaze targets
Timothy E Baldwin [Thu, 12 May 2016 17:47:44 +0000 (18:47 +0100)]
linux-user: Support for restarting system calls for Microblaze targets

Update the Microblaze main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Note that this in passing fixes a bug where we were corrupting
the guest r[3] on sigreturn with the guest's r[10] because
do_sigreturn() was returning env->regs[10] but the register for
syscall return values is env->regs[3].

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-11-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: Commit message tweaks; drop TARGET_USE_ERESTARTSYS define;
 drop whitespace changes]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Set r14 on exit from microblaze syscall
Peter Maydell [Thu, 12 May 2016 17:47:43 +0000 (18:47 +0100)]
linux-user: Set r14 on exit from microblaze syscall

All syscall exits on microblaze result in r14 being equal to the
PC we return to, because the kernel syscall exit instruction "rtbd"
does this. (This is true even for sigreturn(); note that r14 is
not a userspace-usable register as the kernel may clobber it at
any point.)

Emulate the setting of r14 on exit; this isn't really a guest
visible change for valid guest code because r14 isn't reliably
observable anyway. However having the code and the comment helps
to explain why it's ok for the ERESTARTSYS handling not to undo
the changes to r14 that happen on syscall entry.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Support for restarting system calls for tilegx targets
Peter Maydell [Thu, 12 May 2016 17:47:42 +0000 (18:47 +0100)]
linux-user: Support for restarting system calls for tilegx targets

Update the tilegx main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * return -TARGET_QEMU_ESIGRETURN from sigreturn rather than current R_RE
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Note that this fixes a bug where a sigreturn which happened to have
an errno value in TILEGX_R_RE would incorrectly cause TILEGX_R_ERR
to get set.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Support for restarting system calls for CRIS targets
Timothy E Baldwin [Thu, 12 May 2016 17:47:41 +0000 (18:47 +0100)]
linux-user: Support for restarting system calls for CRIS targets

Update the CRIS main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-34-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Support for restarting system calls for S390 targets
Timothy E Baldwin [Thu, 12 May 2016 17:47:40 +0000 (18:47 +0100)]
linux-user: Support for restarting system calls for S390 targets

Update the S390 main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-33-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; remove stray double semicolon; drop
 TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Support for restarting system calls for M68K targets
Timothy E Baldwin [Thu, 12 May 2016 17:47:39 +0000 (18:47 +0100)]
linux-user: Support for restarting system calls for M68K targets

Update the M68K main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-32-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Support for restarting system calls for OpenRISC targets
Timothy E Baldwin [Thu, 12 May 2016 17:47:38 +0000 (18:47 +0100)]
linux-user: Support for restarting system calls for OpenRISC targets

Update the OpenRISC main loop code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

(We don't implement sigreturn on this target so there is no
code there to update.)

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-31-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Support for restarting system calls for UniCore32 targets
Timothy E Baldwin [Thu, 12 May 2016 17:47:37 +0000 (18:47 +0100)]
linux-user: Support for restarting system calls for UniCore32 targets

Update the UniCore32 main loop code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

(We don't support signals on this target so there is no sigreturn code
to update.)

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-30-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Support for restarting system calls for Alpha targets
Timothy E Baldwin [Thu, 12 May 2016 17:47:36 +0000 (18:47 +0100)]
linux-user: Support for restarting system calls for Alpha targets

Update the Alpha main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-13-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define;
 PC is env->pc, not env->ir[IR_PV]]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Support for restarting system calls for SH4 targets
Timothy E Baldwin [Thu, 12 May 2016 17:47:35 +0000 (18:47 +0100)]
linux-user: Support for restarting system calls for SH4 targets

Update the SH4 main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-12-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Support for restarting system calls for SPARC targets
Timothy E Baldwin [Thu, 12 May 2016 17:47:34 +0000 (18:47 +0100)]
linux-user: Support for restarting system calls for SPARC targets

Update the SPARC main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-9-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: Commit message tweaks; drop TARGET_USE_ERESTARTSYS define]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Support for restarting system calls for PPC targets
Timothy E Baldwin [Thu, 12 May 2016 17:47:33 +0000 (18:47 +0100)]
linux-user: Support for restarting system calls for PPC targets

Update the PPC main loop code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn

(We already handle TARGET_QEMU_ESIGRETURN.)

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-8-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Support for restarting system calls for MIPS targets
Timothy E Baldwin [Thu, 12 May 2016 17:47:32 +0000 (18:47 +0100)]
linux-user: Support for restarting system calls for MIPS targets

Update the MIPS main loop code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn

(We already handle TARGET_QEMU_ESIGRETURN.)

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-7-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Support for restarting system calls for ARM targets
Timothy E Baldwin [Thu, 12 May 2016 17:47:31 +0000 (18:47 +0100)]
linux-user: Support for restarting system calls for ARM targets

Update the 32-bit and 64-bit ARM main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-6-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Support for restarting system calls for x86 targets
Timothy E Baldwin [Thu, 12 May 2016 17:47:30 +0000 (18:47 +0100)]
linux-user: Support for restarting system calls for x86 targets

Update the x86 main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code rather than passing it
   back out as the "return code" from do_sigreturn()
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch EAX

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-5-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: Commit message tweaks; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Renumber TARGET_QEMU_ESIGRETURN, make it not arch-specific
Timothy E Baldwin [Thu, 12 May 2016 17:47:29 +0000 (18:47 +0100)]
linux-user: Renumber TARGET_QEMU_ESIGRETURN, make it not arch-specific

Currently we define a QEMU-internal errno TARGET_QEMU_ESIGRETURN
only on the MIPS and PPC targets; move this to errno_defs.h
so it is available for all architectures, and renumber it to 513.
We pick 513 because this is safe from future use as a system call return
value: Linux uses it as ERESTART_NOINTR internally and never allows that
errno to escape to userspace.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-4-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: TARGET_ERESTARTSYS split out into preceding patch, add comment]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Define TARGET_ERESTART* errno values
Timothy E Baldwin [Thu, 12 May 2016 17:47:28 +0000 (18:47 +0100)]
linux-user: Define TARGET_ERESTART* errno values

Define TARGET_ERESTARTSYS; like the kernel, we will use this to
indicate that a guest system call should be restarted. We use
the same value the kernel does for this, 512.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
[PMM: split out from the patch which moves and renumbers
 TARGET_QEMU_ESIGRETURN, add comment on usage]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Reindent signal handling
Timothy E Baldwin [Thu, 12 May 2016 17:47:27 +0000 (18:47 +0100)]
linux-user: Reindent signal handling

Some of the signal handling was a mess with a mixture of tabs and 8 space
indents.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-3-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: just rebased]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
8 years agolinux-user: Consistently return host errnos from do_openat()
Peter Maydell [Thu, 12 May 2016 17:47:26 +0000 (18:47 +0100)]
linux-user: Consistently return host errnos from do_openat()

The function do_openat() is not consistent about whether it is
returning a host errno or a guest errno in case of failure.
Standardise on returning -1 with errno set (ie caller has
to call get_errno()).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reported-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
8 years agolinux-user: Check array bounds in errno conversion
Timothy E Baldwin [Thu, 12 May 2016 17:47:25 +0000 (18:47 +0100)]
linux-user: Check array bounds in errno conversion

Check array bounds in host_to_target_errno() and target_to_host_errno().

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-2-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: Add a lower-bound check, use braces on if(), tweak commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
8 years agoMerge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160527' into staging
Peter Maydell [Fri, 27 May 2016 09:11:11 +0000 (10:11 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160527' into staging

ppc patch queue for 2016-05-27 (first pull for qemu-2.7)

I'm back from holidays now, and have re-collated the ppc patch queue.
This is a first pull request against the qemu-2.7 branch, mostly
consisting of patches which were posted before the 2.6 freeze, but
weren't suitable for late inclusion in the 2.6 branch.

 * Assorted bugfixes and cleanups
 * Some preliminary patches towards dynamic DMA windows and CPU hotplug
 * Significant performance impovement for the spapr-llan device
 * Added myself to MAINTAINERS for ppc (overdue)

# gpg: Signature made Fri 27 May 2016 04:04:15 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.7-20160527:
  MAINTAINERS: Add David Gibson as ppc maintainer
  spapr_iommu: Move table allocation to helpers
  spapr_iommu: Finish renaming vfio_accel to need_vfio
  spapr_pci: Use correct DMA LIOBN when composing the device tree
  spapr: ensure device trees are always associated with DRC
  PPC/KVM: early validation of vcpu id
  Added negative check for get_image_size()
  hw/net/spapr_llan: Provide counter with dropped rx frames to the guest
  hw/net/spapr_llan: Delay flushing of the RX queue while adding new RX buffers
  target-ppc: Cleanups to rldinm, rldnm, rldimi
  target-ppc: Use 32-bit rotate instead of deposit + 64-bit rotate
  target-ppc: Use movcond in isel
  target-ppc: Correct KVM synchronization for ppc_hash64_set_external_hpt()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoMAINTAINERS: Add David Gibson as ppc maintainer
David Gibson [Thu, 26 May 2016 06:14:57 +0000 (16:14 +1000)]
MAINTAINERS: Add David Gibson as ppc maintainer

I've been de facto co-maintainer of all ppc target related code for some
time.  Alex Graf isworking on other things and doesn't have a whole lot of
time for qemu ppc maintainership.  So, update the MAINTAINERS file to
reflect this.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexander Graf <agraf@suse.de>
Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
8 years agospapr_iommu: Move table allocation to helpers
Alexey Kardashevskiy [Wed, 4 May 2016 06:52:19 +0000 (16:52 +1000)]
spapr_iommu: Move table allocation to helpers

At the moment presence of vfio-pci devices on a bus affect the way
the guest view table is allocated. If there is no vfio-pci on a PHB
and the host kernel supports KVM acceleration of H_PUT_TCE, a table
is allocated in KVM. However, if there is vfio-pci and we do yet not
KVM acceleration for these, the table has to be allocated by
the userspace. At the moment the table is allocated once at boot time
but next patches will reallocate it.

This moves kvmppc_create_spapr_tce/g_malloc0 and their counterparts
to helpers.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
8 years agospapr_iommu: Finish renaming vfio_accel to need_vfio
Alexey Kardashevskiy [Wed, 4 May 2016 06:52:21 +0000 (16:52 +1000)]
spapr_iommu: Finish renaming vfio_accel to need_vfio

6a81dd17 "spapr_iommu: Rename vfio_accel parameter" renamed vfio_accel
flag everywhere but one spot was missed.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
8 years agospapr_pci: Use correct DMA LIOBN when composing the device tree
Alexey Kardashevskiy [Wed, 4 May 2016 06:52:18 +0000 (16:52 +1000)]
spapr_pci: Use correct DMA LIOBN when composing the device tree

The user could have picked LIOBN via the CLI but the device tree
rendering code would still use the value derived from the PHB index
(which is the default fallback if LIOBN is not set in the CLI).

This replaces SPAPR_PCI_LIOBN() with the actual DMA LIOBN value.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
8 years agospapr: ensure device trees are always associated with DRC
Jianjun Duan [Tue, 24 May 2016 17:55:04 +0000 (10:55 -0700)]
spapr: ensure device trees are always associated with DRC

There are possible racing situations involving hotplug events and
guest migration. For cases where a hotplug event is migrated, or
the guest is in the process of fetching device tree at the time of
migration, we need to ensure the device tree is created and
associated with the corresponding DRC for devices that were
hotplugged on the source, but 'coldplugged' on the target.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
8 years agoPPC/KVM: early validation of vcpu id
Greg Kurz [Tue, 26 Apr 2016 13:41:04 +0000 (15:41 +0200)]
PPC/KVM: early validation of vcpu id

The KVM API restricts vcpu ids to be < KVM_CAP_MAX_VCPUS. On PowerPC
targets, depending on the number of threads per core in the host and
in the guest, some topologies do generate higher vcpu ids actually.
When this happens, QEMU bails out with the following error:

kvm_init_vcpu failed: Invalid argument

The KVM_CREATE_VCPU ioctl has several EINVAL return paths, so it is
not possible to fully disambiguate.

This patch adds a check in the code that computes vcpu ids, so that
we can detect the error earlier, and print a friendlier message instead
of calling KVM_CREATE_VCPU with an obviously bogus vcpu id.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
8 years agoAdded negative check for get_image_size()
Zhou Jie [Mon, 25 Apr 2016 15:36:06 +0000 (11:36 -0400)]
Added negative check for get_image_size()

This patch adds check for negative return value from get_image_size(),
where it is missing. It avoids unnecessary two function calls.

Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
8 years agohw/net/spapr_llan: Provide counter with dropped rx frames to the guest
Thomas Huth [Mon, 4 Apr 2016 10:13:10 +0000 (12:13 +0200)]
hw/net/spapr_llan: Provide counter with dropped rx frames to the guest

The last 8 bytes of the receive buffer list page (that has been supplied
by the guest with the H_REGISTER_LOGICAL_LAN call) contain a counter
for frames that have been dropped because there was no suitable receive
buffer available. This patch introduces code to use this field to
provide the information about dropped rx packets to the guest.
There it can be queried with "ethtool -S eth0 | grep rx_no_buffer".

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
8 years agohw/net/spapr_llan: Delay flushing of the RX queue while adding new RX buffers
Thomas Huth [Thu, 31 Mar 2016 11:47:05 +0000 (13:47 +0200)]
hw/net/spapr_llan: Delay flushing of the RX queue while adding new RX buffers

Currently, the spapr-vlan device is trying to flush the RX queue
after each RX buffer that has been added by the guest via the
H_ADD_LOGICAL_LAN_BUFFER hypercall. In case the receive buffer pool
was empty before, we only pass single packets to the guest this
way. This can cause very bad performance if a sender is trying
to stream fragmented UDP packets to the guest. For example when
using the UDP_STREAM test from netperf with UDP packets that are
much bigger than the MTU size, almost all UDP packets are dropped
in the guest since the chances are quite high that at least one of
the fragments got lost on the way.

When flushing the receive queue, it's much better if we'd have
a bunch of receive buffers available already, so that fragmented
packets can be passed to the guest in one go. To do this, the
spapr_vlan_receive() function should return 0 instead of -1 if there
are no more receive buffers available, so that receive_disabled = 1
gets temporarily set for the receive queue, and we have to delay
the queue flushing at the end of h_add_logical_lan_buffer() a little
bit by using a timer, so that the guest gets a chance to add multiple
RX buffers before we flush the queue again.

This improves the UDP_STREAM test with the spapr-vlan device a lot:
Running
 netserver -p 44444 -L <guestip> -f -D -4
in the guest, and
 netperf -p 44444 -L <hostip> -H <guestip> -t UDP_STREAM -l 60 -- -m 16384
in the host, I get the following values _without_ this patch:

Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

229376   16384   60.00     1738970      0    3798.83
229376           60.00          23              0.05

That "0.05" means that almost all UDP packets got lost/discarded
at the receiving side.
With this patch applied, the value look much better:

Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

229376   16384   60.00     1789104      0    3908.35
229376           60.00       22818             49.85

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
8 years agotarget-ppc: Cleanups to rldinm, rldnm, rldimi
Richard Henderson [Wed, 24 Feb 2016 01:18:36 +0000 (17:18 -0800)]
target-ppc: Cleanups to rldinm, rldnm, rldimi

Mirror the cleanups just done to rlwinm, rlwnm and rlwimi.
This adds use of deposit to rldimi.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
8 years agotarget-ppc: Use 32-bit rotate instead of deposit + 64-bit rotate
Richard Henderson [Wed, 24 Feb 2016 01:18:35 +0000 (17:18 -0800)]
target-ppc: Use 32-bit rotate instead of deposit + 64-bit rotate

A 32-bit rotate insn is more common on hosts than a deposit insn,
and if the host has neither the result is truely horrific.

At the same time, tidy up the temporaries within these functions,
drop the over-use of "likely", drop some checks for identity that
will also be checked by tcg-op.c functions, and special case mask
without rotate within rlwinm.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
8 years agotarget-ppc: Use movcond in isel
Richard Henderson [Wed, 24 Feb 2016 01:18:34 +0000 (17:18 -0800)]
target-ppc: Use movcond in isel

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
8 years agotarget-ppc: Correct KVM synchronization for ppc_hash64_set_external_hpt()
David Gibson [Tue, 29 Mar 2016 06:27:10 +0000 (17:27 +1100)]
target-ppc: Correct KVM synchronization for ppc_hash64_set_external_hpt()

ppc_hash64_set_external_hpt() was added in e5c0d3c "target-ppc: Add helpers
for updating a CPU's SDR1 and external HPT".  This helper contains a
cpu_synchronize_state() since it may need to push state back to KVM
afterwards.

This turns out to break things when it is used in the reset path, which is
the only current user.  It appears that kvm_vcpu_dirty is not being set
early in the reset path, so the cpu_synchronize_state() is clobbering state
set up by the early part of the cpu reset path with stale state from KVM.

This may require some changes to the generic cpu reset path to fix
properly, but as a short term fix we can just remove the
cpu_synchronize_state() from ppc_hash64_set_external_hpt(), and require any
non-reset path callers to do that manually.

Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
8 years agoMerge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20160526.1' into...
Peter Maydell [Thu, 26 May 2016 18:18:08 +0000 (19:18 +0100)]
Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20160526.1' into staging

VFIO updates 2016-05-26

 - Infrastructure and quirks to support IGD assignment (Alex Williamson)
 - Fixes to 128bit handling, IOMMU replay, IOMMU translation sanity
   checking (Alexey Kardashevskiy)

# gpg: Signature made Thu 26 May 2016 18:50:29 BST using RSA key ID 3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
# gpg:                 aka "Alex Williamson <alex@shazbot.org>"
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"

* remotes/awilliam/tags/vfio-update-20160526.1:
  vfio: Check that IOMMU MR translates to system address space
  memory: Fix IOMMU replay base address
  vfio: Fix 128 bit handling when deleting region
  vfio/pci: Add IGD documentation
  vfio/pci: Add a separate option for IGD OpRegion support
  vfio/pci: Intel graphics legacy mode assignment
  vfio/pci: Setup BAR quirks after capabilities probing
  vfio/pci: Consolidate VGA setup
  vfio/pci: Fix return of vfio_populate_vga()
  vfio: Create device specific region info helper
  vfio: Enable sparse mmap capability

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agovfio: Check that IOMMU MR translates to system address space
Alexey Kardashevskiy [Thu, 26 May 2016 15:43:23 +0000 (09:43 -0600)]
vfio: Check that IOMMU MR translates to system address space

At the moment IOMMU MR only translate to the system memory.
However if some new code changes this, we will need clear indication why
it is not working so here is the check.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
8 years agomemory: Fix IOMMU replay base address
Alexey Kardashevskiy [Thu, 26 May 2016 15:43:23 +0000 (09:43 -0600)]
memory: Fix IOMMU replay base address

Since a788f227 "memory: Allow replay of IOMMU mapping notifications"
when new VFIO listener is added, all existing IOMMU mappings are
replayed. However there is a problem that the base address of
an IOMMU memory region (IOMMU MR) is ignored which is not a problem
for the existing user (which is pseries) with its default 32bit DMA
window starting at 0 but it is if there is another DMA window.

This stores the IOMMU's offset_within_address_space and adjusts
the IOVA before calling vfio_dma_map/vfio_dma_unmap.

As the IOMMU notifier expects IOVA offset rather than the absolute
address, this also adjusts IOVA in sPAPR H_PUT_TCE handler before
calling notifier(s).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
8 years agovfio: Fix 128 bit handling when deleting region
Alexey Kardashevskiy [Thu, 26 May 2016 15:43:22 +0000 (09:43 -0600)]
vfio: Fix 128 bit handling when deleting region

7532d3cbf "vfio: Fix 128 bit handling" added support for 64bit IOMMU
memory regions when those are added to VFIO address space; however
removing code cannot cope with these as int128_get64() will fail on
1<<64.

This copies 128bit handling from region_add() to region_del().

Since the only machine type which is actually going to use 64bit IOMMU
is pseries and it never really removes them (instead it will dynamically
add/remove subregions), this should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
8 years agovfio/pci: Add IGD documentation
Alex Williamson [Thu, 26 May 2016 15:43:22 +0000 (09:43 -0600)]
vfio/pci: Add IGD documentation

Document the usage modes, host primary graphics considerations, usage,
and fw_cfg ABI required for IGD assignment with vfio.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
8 years agovfio/pci: Add a separate option for IGD OpRegion support
Alex Williamson [Thu, 26 May 2016 15:43:22 +0000 (09:43 -0600)]
vfio/pci: Add a separate option for IGD OpRegion support

The IGD OpRegion is enabled automatically when running in legacy mode,
but it can sometimes be useful in universal passthrough mode as well.
Without an OpRegion, output spigots don't work, and even though Intel
doesn't officially support physical outputs in UPT mode, it's a
useful feature.  Note that if an OpRegion is enabled but a monitor is
not connected, some graphics features will be disabled in the guest
versus a headless system without an OpRegion, where they would work.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
8 years agovfio/pci: Intel graphics legacy mode assignment
Alex Williamson [Thu, 26 May 2016 15:43:21 +0000 (09:43 -0600)]
vfio/pci: Intel graphics legacy mode assignment

Enable quirks to support SandyBridge and newer IGD devices as primary
VM graphics.  This requires new vfio-pci device specific regions added
in kernel v4.6 to expose the IGD OpRegion, the shadow ROM, and config
space access to the PCI host bridge and LPC/ISA bridge.  VM firmware
support, SeaBIOS only so far, is also required for reserving memory
regions for IGD specific use.  In order to enable this mode, IGD must
be assigned to the VM at PCI bus address 00:02.0, it must have a ROM,
it must be able to enable VGA, it must have or be able to create on
its own an LPC/ISA bridge of the proper type at PCI bus address
00:1f.0 (sorry, not compatible with Q35 yet), and it must have the
above noted vfio-pci kernel features and BIOS.  The intention is that
to enable this mode, a user simply needs to assign 00:02.0 from the
host to 00:02.0 in the VM:

  -device vfio-pci,host=0000:00:02.0,bus=pci.0,addr=02.0

and everything either happens automatically or it doesn't.  In the
case that it doesn't, we leave error reports, but assume the device
will operate in universal passthrough mode (UPT), which doesn't
require any of this, but has a much more narrow window of supported
devices, supported use cases, and supported guest drivers.

When using IGD in this mode, the VM firmware is required to reserve
some VM RAM for the OpRegion (on the order or several 4k pages) and
stolen memory for the GTT (up to 8MB for the latest GPUs).  An
additional option, x-igd-gms allows the user to specify some amount
of additional memory (value is number of 32MB chunks up to 512MB) that
is pre-allocated for graphics use.  TBH, I don't know of anything that
requires this or makes use of this memory, which is why we don't
allocate any by default, but the specification suggests this is not
actually a valid combination, so the option exists as a workaround.
Please report if it's actually necessary in some environment.

See code comments for further discussion about the actual operation
of the quirks necessary to assign these devices.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
8 years agovfio/pci: Setup BAR quirks after capabilities probing
Alex Williamson [Thu, 26 May 2016 15:43:21 +0000 (09:43 -0600)]
vfio/pci: Setup BAR quirks after capabilities probing

Capability probing modifies wmask, which quirks may be interested in
changing themselves.  Apply our BAR quirks after the capability scan
to make this possible.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
8 years agovfio/pci: Consolidate VGA setup
Alex Williamson [Thu, 26 May 2016 15:43:21 +0000 (09:43 -0600)]
vfio/pci: Consolidate VGA setup

Combine VGA discovery and registration.  Quirks can have dependencies
on BARs, so the quirks push out until after we've scanned the BARs.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
8 years agovfio/pci: Fix return of vfio_populate_vga()
Alex Williamson [Thu, 26 May 2016 15:43:20 +0000 (09:43 -0600)]
vfio/pci: Fix return of vfio_populate_vga()

This function returns success if either we setup the VGA region or
the host vfio doesn't return enough regions to support the VGA index.
This latter case doesn't make any sense.  If we're asked to populate
VGA, fail if it doesn't exist and let the caller decide if that's
important.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
8 years agovfio: Create device specific region info helper
Alex Williamson [Thu, 26 May 2016 15:43:20 +0000 (09:43 -0600)]
vfio: Create device specific region info helper

Given a device specific region type and sub-type, find it.  Also
cleanup return point on error in vfio_get_region_info() so that we
always return 0 with a valid pointer or -errno and NULL.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
8 years agovfio: Enable sparse mmap capability
Alex Williamson [Thu, 26 May 2016 15:43:20 +0000 (09:43 -0600)]
vfio: Enable sparse mmap capability

The sparse mmap capability in a vfio region info allows vfio to tell
us which sub-areas of a region may be mmap'd.  Thus rather than
assuming a single mmap covers the entire region and later frobbing it
ourselves for things like the PCI MSI-X vector table, we can read that
directly from vfio.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
8 years agoMerge remote-tracking branch 'remotes/amit-migration/tags/migration-2.7-2' into staging
Peter Maydell [Thu, 26 May 2016 15:09:26 +0000 (16:09 +0100)]
Merge remote-tracking branch 'remotes/amit-migration/tags/migration-2.7-2' into staging

migration: add TLS support to the migration data channel

This is a big refactoring of the migration backend code - moving away from
QEMUFile to the new QIOChannel framework introduced here.  This brings a
good level of abstraction and reduction of many lines of code.

This series also adds the ability for many backends (all except RDMA) to
use TLS for encrypting the migration data between the endpoints.

# gpg: Signature made Thu 26 May 2016 07:07:08 BST using RSA key ID 657EF670
# gpg: Good signature from "Amit Shah <amit@amitshah.net>"
# gpg:                 aka "Amit Shah <amit@kernel.org>"
# gpg:                 aka "Amit Shah <amitshah@gmx.net>"

* remotes/amit-migration/tags/migration-2.7-2: (28 commits)
  migration: remove qemu_get_fd method from QEMUFile
  migration: remove support for non-iovec based write handlers
  migration: add support for encrypting data with TLS
  migration: define 'tls-creds' and 'tls-hostname' migration parameters
  migration: don't use an array for storing migrate parameters
  migration: move definition of struct QEMUFile back into qemu-file.c
  migration: delete QEMUFile stdio implementation
  migration: delete QEMUFile sockets implementation
  migration: delete QEMUSizedBuffer struct
  migration: delete QEMUFile buffer implementation
  migration: convert savevm to use QIOChannel for writing to files
  migration: convert RDMA to use QIOChannel interface
  migration: convert exec socket protocol to use QIOChannel
  migration: convert fd socket protocol to use QIOChannel
  migration: convert tcp socket protocol to use QIOChannel
  migration: rename unix.c to socket.c
  migration: convert unix socket protocol to use QIOChannel
  migration: convert post-copy to use QIOChannelBuffer
  migration: add reporting of errors for outgoing migration
  migration: add helpers for creating QEMUFile from a QIOChannel
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoMerge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Peter Maydell [Thu, 26 May 2016 13:29:29 +0000 (14:29 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Wed 25 May 2016 18:32:40 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (31 commits)
  blockjob: Remove BlockJob.bs
  commit: Use BlockBackend for I/O
  backup: Use BlockBackend for I/O
  backup: Remove bs parameter from backup_do_cow()
  backup: Pack Notifier within BackupBlockJob
  backup: Don't leak BackupBlockJob in error path
  mirror: Use BlockBackend for I/O
  mirror: Allow target that already has a BlockBackend
  stream: Use BlockBackend for I/O
  block: Make blk_co_preadv/pwritev() public
  block: Convert block job core to BlockBackend
  block: Default to enabled write cache in blk_new()
  block: Cancel jobs first in bdrv_close_all()
  block: keep a list of block jobs
  block: Rename blk_write_zeroes()
  dma-helpers: change BlockBackend to opaque value in DMAIOFunc
  dma-helpers: change interface to byte-based
  block: Propagate .drained_begin/end callbacks
  block: Fix reconfiguring graph with drained nodes
  block: Make bdrv_drain() use bdrv_drained_begin/end()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoqdev: Start disentangling bus from device
Andreas Färber [Mon, 13 Jul 2015 17:35:41 +0000 (19:35 +0200)]
qdev: Start disentangling bus from device

Move bus type and related APIs to a separate file bus.c.
This is a first step in breaking up qdev.c into more manageable chunks.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[AF: Rebased onto osdep.h]
Signed-off-by: Andreas Färber <afaerber@suse.de>
[PMM: added bus.o to link line for test-qdev-global-props]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agocpu-exec: Fix direct jump to TB spanning page
Sergey Fedorov [Mon, 16 May 2016 13:13:00 +0000 (16:13 +0300)]
cpu-exec: Fix direct jump to TB spanning page

It is not safe to make a direct jump to a TB spanning two pages in
system emulation because the mapping for the second page can get changed
but we don't take care of direct jumps in this case.

However in user mode emulation, this is not the case because there's
only static address translation and TBs are always invalidated properly.

Fixes: 5b053a4a2827 ("tcg: Clean up direct block chaining safety checks")
Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Message-id: 1463404380-29302-1-git-send-email-sergey.fedorov@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agoMerge remote-tracking branch 'remotes/afaerber/tags/maintainers-for-peter' into staging
Peter Maydell [Thu, 26 May 2016 11:41:12 +0000 (12:41 +0100)]
Merge remote-tracking branch 'remotes/afaerber/tags/maintainers-for-peter' into staging

Andreas stepping down from most maintainer positions

# gpg: Signature made Wed 25 May 2016 16:53:45 BST using RSA key ID 3E7E013F
# gpg: Good signature from "Andreas Färber <afaerber@suse.de>"
# gpg:                 aka "Andreas Färber <afaerber@suse.com>"

* remotes/afaerber/tags/maintainers-for-peter:
  MAINTAINERS: Drop Andreas as CPU maintainer
  MAINTAINERS: Drop Andreas as 0.15 maintainer
  MAINTAINERS: Drop Andreas as PReP maintainer
  MAINTAINERS: Drop Andreas as Cocoa maintainer

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8 years agomigration: remove qemu_get_fd method from QEMUFile
Daniel P. Berrange [Wed, 27 Apr 2016 10:05:18 +0000 (11:05 +0100)]
migration: remove qemu_get_fd method from QEMUFile

Now that there is a set_blocking callback in QEMUFileOps,
and all users needing non-blocking support have been
converted to QIOChannel, there is no longer any codepath
requiring the qemu_get_fd() method for QEMUFile. Remove it
to avoid further code being introduced with an expectation
of direct file handle access.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-29-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
8 years agomigration: remove support for non-iovec based write handlers
Daniel P. Berrange [Wed, 27 Apr 2016 10:05:17 +0000 (11:05 +0100)]
migration: remove support for non-iovec based write handlers

All the remaining QEMUFile implementations provide an iovec
based write handler, so the put_buffer callback can be removed
to simplify the code.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-28-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
8 years agomigration: add support for encrypting data with TLS
Daniel P. Berrange [Wed, 27 Apr 2016 10:05:16 +0000 (11:05 +0100)]
migration: add support for encrypting data with TLS

This extends the migration_set_incoming_channel and
migration_set_outgoing_channel methods so that they
will automatically wrap the QIOChannel in a
QIOChannelTLS instance if TLS credentials are configured
in the migration parameters.

This allows TLS to work for tcp, unix, fd and exec
migration protocols. It does not (currently) work for
RDMA since it does not use these APIs, but it is
unlikely that TLS would be desired with RDMA anyway
since it would degrade the performance to that seen
with TCP defeating the purpose of using RDMA.

On the target host, QEMU would be launched with a set
of TLS credentials for a server endpoint

 $ qemu-system-x86_64 -monitor stdio -incoming defer \
    -object tls-creds-x509,dir=/home/berrange/security/qemutls,endpoint=server,id=tls0 \
    ...other args...

To enable incoming TLS migration 2 monitor commands are
then used

  (qemu) migrate_set_str_parameter tls-creds tls0
  (qemu) migrate_incoming tcp:myhostname:9000

On the source host, QEMU is launched in a similar
manner but using client endpoint credentials

 $ qemu-system-x86_64 -monitor stdio \
    -object tls-creds-x509,dir=/home/berrange/security/qemutls,endpoint=client,id=tls0 \
    ...other args...

To enable outgoing TLS migration 2 monitor commands are
then used

  (qemu) migrate_set_str_parameter tls-creds tls0
  (qemu) migrate tcp:otherhostname:9000

Thanks to earlier improvements to error reporting,
TLS errors can be seen 'info migrate' when doing a
detached migration. For example:

  (qemu) info migrate
  capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off
  Migration status: failed
  total time: 0 milliseconds
  error description: TLS handshake failed: The TLS connection was non-properly terminated.

Or

  (qemu) info migrate
  capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off
  Migration status: failed
  total time: 0 milliseconds
  error description: Certificate does not match the hostname localhost

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-27-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
8 years agomigration: define 'tls-creds' and 'tls-hostname' migration parameters
Daniel P. Berrange [Wed, 27 Apr 2016 10:05:15 +0000 (11:05 +0100)]
migration: define 'tls-creds' and 'tls-hostname' migration parameters

Define two new migration parameters to be used with TLS encryption.
The 'tls-creds' parameter provides the ID of an instance of the
'tls-creds' object type, or rather a subclass such as 'tls-creds-x509'.
Providing these credentials will enable use of TLS on the migration
data stream.

If using x509 certificates, together with a migration URI that does
not include a hostname, the 'tls-hostname' parameter provides the
hostname to use when verifying the server's x509 certificate. This
allows TLS to be used in combination with fd: and exec: protocols
where a TCP connection is established by a 3rd party outside of
QEMU.

NB, this requires changing the migrate_set_parameter method in the
HMP to accept a 's' (string) value instead of 'i' (integer). This
is backwards compatible, because the parsing of strings allows the
quotes to be optional, thus any integer is also a valid string.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-26-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
8 years agomigration: don't use an array for storing migrate parameters
Daniel P. Berrange [Wed, 27 Apr 2016 10:05:14 +0000 (11:05 +0100)]
migration: don't use an array for storing migrate parameters

The MigrateState struct uses an array for storing migration
parameters. This presumes that all future parameters will
be integers too, which is not going to be the case. There
is no functional reason why an array is used, if anything
it makes the code less clear. The QAPI schema already
defines a struct - MigrationParameters - capable of storing
all the individual parameters, so just use that instead of
an array.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-25-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
8 years agomigration: move definition of struct QEMUFile back into qemu-file.c
Daniel P. Berrange [Wed, 27 Apr 2016 10:05:13 +0000 (11:05 +0100)]
migration: move definition of struct QEMUFile back into qemu-file.c

Now that the memory buffer based QEMUFile impl is gone, there
is no need for any backend to be accessing internals of the
QEMUFile struct, so it can be moved back into qemu-file.c

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-24-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
8 years agomigration: delete QEMUFile stdio implementation
Daniel P. Berrange [Wed, 27 Apr 2016 10:05:12 +0000 (11:05 +0100)]
migration: delete QEMUFile stdio implementation

Now that the exec migration backend and savevm have converted
to use the QIOChannel based QEMUFile, there is no user remaining
for the stdio based QEMUFile impl and it can be deleted.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-23-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
8 years agomigration: delete QEMUFile sockets implementation
Daniel P. Berrange [Wed, 27 Apr 2016 10:05:11 +0000 (11:05 +0100)]
migration: delete QEMUFile sockets implementation

Now that the tcp, unix and fd migration backends have converted
to use the QIOChannel based QEMUFile, there is no user remaining
for the sockets based QEMUFile impl and it can be deleted.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-22-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>