sdk/emulator/qemu.git
7 years agoMerge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
Peter Maydell [Tue, 19 Jul 2016 12:00:35 +0000 (13:00 +0100)]
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging

# gpg: Signature made Tue 19 Jul 2016 03:33:40 BST
# gpg:                using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  e1000e: fix building without CONFIG_VMXNET3_PCI
  MAINTAINERS: release Scott from being a rocker maintainer
  tap: fix memory leak on failure to create a multiqueue tap device
  net: fix incorrect argument to iov_to_buf
  net: fix incorrect access to pointer
  e1000e: fix incorrect access to pointer

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoMerge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging
Peter Maydell [Tue, 19 Jul 2016 10:47:06 +0000 (11:47 +0100)]
Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging

# gpg: Signature made Mon 18 Jul 2016 23:53:15 BST
# gpg:                using RSA key 0x7DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/ide-pull-request:
  block: ignore flush requests when storage is clean
  tests: in IDE and AHCI tests perform DMA write before flushing
  ide: set retry_unit for PIO and FLUSH requests
  ide: refactor retry_unit set and clear into separate function

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoMerge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging
Peter Maydell [Tue, 19 Jul 2016 09:54:49 +0000 (10:54 +0100)]
Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging

# gpg: Signature made Mon 18 Jul 2016 22:59:55 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/tracing-pull-request:
  trace: Add QAPI/QMP interfaces to query and control per-vCPU tracing state
  trace: Allow event name pattern in "info trace-events"
  trace: Conditionally trace events based on their per-vCPU state
  trace: Add per-vCPU tracing states for events with the 'vcpu' property
  trace: Cosmetic changes on fast-path tracing
  disas: Remove unused macro '_'
  trace: Identify events with the 'vcpu' property
  trace: [bsd-user] Commandline arguments to control tracing
  trace: [linux-user] Commandline arguments to control tracing

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoMerge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20160718.0' into...
Peter Maydell [Tue, 19 Jul 2016 08:02:05 +0000 (09:02 +0100)]
Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20160718.0' into staging

VFIO update 2016-07-18

One fix for 2.7-rc0 which hides the ARI extended capability, fixing
multifunction support in PCIe configurations where the assigned device
function topology does not match the host (Alex Williamson)

# gpg: Signature made Mon 18 Jul 2016 18:02:27 BST
# gpg:                using RSA key 0x239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
# gpg:                 aka "Alex Williamson <alex@shazbot.org>"
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-update-20160718.0:
  vfio/pci: Hide ARI capability

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoblock: ignore flush requests when storage is clean
Evgeny Yakovlev [Mon, 18 Jul 2016 19:39:52 +0000 (22:39 +0300)]
block: ignore flush requests when storage is clean

Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.

This change introduces a write generation scheme in BlockDriverState.
Current write generation is checked against last flushed generation to
avoid unnessesary flushes.

The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.

This affected some blkdebug testcases that were expecting error logs from
failure-injected flushes which are now skipped entirely
(tests 026 071 089).

This also affects the performance of block jobs and thus BLOCK_JOB_READY
events for driver-mirror and active block-commit commands now arrives
faster, before QMP send successfully returns to caller (tests 141 144).

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-5-git-send-email-den@openvz.org
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
7 years agotests: in IDE and AHCI tests perform DMA write before flushing
Evgeny Yakovlev [Mon, 18 Jul 2016 19:39:51 +0000 (22:39 +0300)]
tests: in IDE and AHCI tests perform DMA write before flushing

Due to changes in flush behaviour clean disks stopped generating
flush_to_disk events and IDE and AHCI tests that test flush commands
started to fail.

This change adds additional DMA writes to affected tests before sending
flush commands so that bdrv_flush actually generates flush_to_disk event.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-4-git-send-email-den@openvz.org
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
7 years agoide: set retry_unit for PIO and FLUSH requests
Evgeny Yakovlev [Mon, 18 Jul 2016 19:39:50 +0000 (22:39 +0300)]
ide: set retry_unit for PIO and FLUSH requests

The following sequence of tests discovered a problem in IDE emulation:
1. Send DMA write to IDE device 0
2. Send CMD_FLUSH_CACHE to same IDE device which will be failed by block
layer using blkdebug script in tests/ide-test:test_retry_flush

When doing DMA request ide/core.c will set s->retry_unit to s->unit in
ide_start_dma. When dma completes ide_set_inactive sets retry_unit to -1.
After that ide_flush_cache runs and fails thanks to blkdebug.
ide_flush_cb calls ide_handle_rw_error which asserts that s->retry_unit
== s->unit. But s->retry_unit is still -1 after previous DMA completion
and flush does not use anything related to retry.

This patch restricts retry unit assertion only to ops that actually use
retry logic.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-3-git-send-email-den@openvz.org
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
7 years agoide: refactor retry_unit set and clear into separate function
Evgeny Yakovlev [Mon, 18 Jul 2016 19:39:49 +0000 (22:39 +0300)]
ide: refactor retry_unit set and clear into separate function

Code to set and clear state associated with retry in moved into
ide_set_retry and ide_clear_retry to make adding retry setups easier.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-2-git-send-email-den@openvz.org
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
7 years agotrace: Add QAPI/QMP interfaces to query and control per-vCPU tracing state
Lluís Vilanova [Mon, 11 Jul 2016 10:53:57 +0000 (12:53 +0200)]
trace: Add QAPI/QMP interfaces to query and control per-vCPU tracing state

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agotrace: Allow event name pattern in "info trace-events"
Lluís Vilanova [Mon, 11 Jul 2016 10:53:51 +0000 (12:53 +0200)]
trace: Allow event name pattern in "info trace-events"

Homogenizes the command capabilities with QMP.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agotrace: Conditionally trace events based on their per-vCPU state
Lluís Vilanova [Mon, 11 Jul 2016 10:53:46 +0000 (12:53 +0200)]
trace: Conditionally trace events based on their per-vCPU state

Events with the 'vcpu' property are conditionally emitted according to
their per-vCPU state. Other events are emitted normally based on their
global tracing state.

Note that the per-vCPU condition check applies to all tracing backends.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agotrace: Add per-vCPU tracing states for events with the 'vcpu' property
Lluís Vilanova [Mon, 11 Jul 2016 10:53:41 +0000 (12:53 +0200)]
trace: Add per-vCPU tracing states for events with the 'vcpu' property

Each vCPU gets a 'trace_dstate' bitmap to control the per-vCPU dynamic
tracing state of events with the 'vcpu' property.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agotrace: Cosmetic changes on fast-path tracing
Lluís Vilanova [Mon, 11 Jul 2016 10:53:35 +0000 (12:53 +0200)]
trace: Cosmetic changes on fast-path tracing

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agodisas: Remove unused macro '_'
Lluís Vilanova [Mon, 11 Jul 2016 10:53:30 +0000 (12:53 +0200)]
disas: Remove unused macro '_'

Eliminates a future compilation error when UI code includes the tracing
headers (indirectly pulling "disas/bfd.h" through "qom/cpu.h") and
GLib's i18n '_' macro.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agotrace: Identify events with the 'vcpu' property
Lluís Vilanova [Mon, 11 Jul 2016 10:53:24 +0000 (12:53 +0200)]
trace: Identify events with the 'vcpu' property

A new event attribute 'cpu_id' is added to have a separate ID
space ('TRACE_VCPU_*') for all events with the 'vcpu' property.

These are later used to identify which events are enabled on each vCPU.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agotrace: [bsd-user] Commandline arguments to control tracing
Lluís Vilanova [Fri, 15 Jul 2016 17:08:43 +0000 (19:08 +0200)]
trace: [bsd-user] Commandline arguments to control tracing

[Changed const char *trace_file to char *trace_file since it's a
heap-allocated string that needs to be freed.  This type is also
returned by trace_opt_parse() and used in vl.c.

Also fixed coding style on for(;;) and else statement as suggested by
Eric Blake <eblake@redhat.com> since the patch modifies these lines or
close enough.
--Stefan]

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 146860252322.30668.18276041739086338328.stgit@fimbulvetr.bsc.es
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agotrace: [linux-user] Commandline arguments to control tracing
Lluís Vilanova [Fri, 15 Jul 2016 17:08:38 +0000 (19:08 +0200)]
trace: [linux-user] Commandline arguments to control tracing

[Changed const char *trace_file to char *trace_file since it's a
heap-allocated string that needs to be freed.  This type is also
returned by trace_opt_parse() and used in vl.c.
--Stefan]

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 146860251784.30668.17339867835129075077.stgit@fimbulvetr.bsc.es
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoMerge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Peter Maydell [Mon, 18 Jul 2016 17:13:01 +0000 (18:13 +0100)]
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Mon 18 Jul 2016 17:58:27 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  MAINTAINERS: Add include/block/aio.h to block I/O path section
  virtio-blk: dataplane cleanup
  checkpatch: consider git extended headers valid patches
  aio-posix: remove useless parameter
  linux-aio: prevent submitting more than MAX_EVENTS
  aio_ctx_check: follow CODING_STYLE
  linux-aio: share one LinuxAioState within an AioContext
  spec/parallels: fix a mistake

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agovfio/pci: Hide ARI capability
Alex Williamson [Mon, 18 Jul 2016 16:55:17 +0000 (10:55 -0600)]
vfio/pci: Hide ARI capability

QEMU supports ARI on downstream ports and assigned devices may support
ARI in their extended capabilities.  The endpoint ARI capability
specifies the next function, such that the OS doesn't need to walk
each possible function, however this next function is relative to the
host, not the guest.  This leads to device discovery issues when we
combine separate functions into virtual multi-function packages in a
guest.  For example, SR-IOV VFs are not enumerated by simply probing
the function address space, therefore the ARI next-function field is
zero.  When we combine multiple VFs together as a multi-function
device in the guest, the guest OS identifies ARI is enabled, relies on
this next-function field, and stops looking for additional function
after the first is found.

Long term we should expose the ARI capability to the guest to enable
configurations with more than 8 functions per slot, but this requires
additional QEMU PCI infrastructure to manage the next-function field
for multiple, otherwise independent devices.  In the short term,
hiding this capability allows equivalent functionality to what we
currently have on non-express chipsets.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
7 years agoMAINTAINERS: Add include/block/aio.h to block I/O path section
Fam Zheng [Mon, 18 Jul 2016 07:19:47 +0000 (15:19 +0800)]
MAINTAINERS: Add include/block/aio.h to block I/O path section

This file is actually the header for async.c and aio-*.c., so add it to
the same section.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1468826387-10473-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agovirtio-blk: dataplane cleanup
Cao jin [Mon, 18 Jul 2016 04:05:49 +0000 (12:05 +0800)]
virtio-blk: dataplane cleanup

No need duplicate the judgment, there is one in function entry.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468814749-14510-1-git-send-email-caoj.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agocheckpatch: consider git extended headers valid patches
Stefan Hajnoczi [Fri, 15 Jul 2016 09:46:54 +0000 (10:46 +0100)]
checkpatch: consider git extended headers valid patches

Renames look like this with git-diff(1) when diff.renames = true is set:

  diff --git a/a b/b
  similarity index 100%
  rename from a
  rename to b

This raises the "Does not appear to be a unified-diff format patch"
error because checkpatch.pl only considers a diff valid if it contains
at least one "@@" hunk.

This patch accepts renames and copies too so that checkpatch.pl exits
successfully when a diff only renames/copies files.  The git diff
extended header format is described on the git-diff(1) man page.

Reported-by: Colin Lord <clord@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1468576014-28788-1-git-send-email-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoaio-posix: remove useless parameter
Cao jin [Fri, 15 Jul 2016 10:28:44 +0000 (18:28 +0800)]
aio-posix: remove useless parameter

Parameter **errp of aio_context_setup() is useless, remove it
and clean up the related code.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Fam Zheng <famz@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1468578524-23433-1-git-send-email-caoj.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agolinux-aio: prevent submitting more than MAX_EVENTS
Roman Pen [Wed, 13 Jul 2016 13:03:24 +0000 (15:03 +0200)]
linux-aio: prevent submitting more than MAX_EVENTS

Invoking io_setup(MAX_EVENTS) we ask kernel to create ring buffer for us
with specified number of events.  But kernel ring buffer allocation logic
is a bit tricky (ring buffer is page size aligned + some percpu allocation
are required) so eventually more than requested events number is allocated.

From a userspace side we have to follow the convention and should not try
to io_submit() more or logic, which consumes completed events, should be
changed accordingly.  The pitfall is in the following sequence:

    MAX_EVENTS = 128
    io_setup(MAX_EVENTS)

    io_submit(MAX_EVENTS)
    io_submit(MAX_EVENTS)

    /* now 256 events are in-flight */

    io_getevents(MAX_EVENTS) = 128

    /* we can handle only 128 events at once, to be sure
     * that nothing is pended the io_getevents(MAX_EVENTS)
     * call must be invoked once more or hang will happen. */

To prevent the hang or reiteration of io_getevents() call this patch
restricts the number of in-flights, which is now limited to MAX_EVENTS.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468415004-31755-1-git-send-email-roman.penyaev@profitbricks.com
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoaio_ctx_check: follow CODING_STYLE
Cao jin [Thu, 14 Jul 2016 13:10:43 +0000 (21:10 +0800)]
aio_ctx_check: follow CODING_STYLE

replace tab with spaces

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Message-id: 1468501843-14927-1-git-send-email-caoj.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agolinux-aio: share one LinuxAioState within an AioContext
Paolo Bonzini [Mon, 4 Jul 2016 16:33:20 +0000 (18:33 +0200)]
linux-aio: share one LinuxAioState within an AioContext

This has better performance because it executes fewer system calls
and does not use a bottom half per disk.

Originally proposed by Ming Lei.

[Changed #include "raw-aio.h" to "block/raw-aio.h" in win32-aio.c to fix
build error as reported by Peter Maydell <peter.maydell@linaro.org>.
--Stefan]

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1467650000-51385-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
squash! linux-aio: share one LinuxAioState within an AioContext

7 years agospec/parallels: fix a mistake
Vladimir Sementsov-Ogievskiy [Thu, 30 Jun 2016 08:19:30 +0000 (11:19 +0300)]
spec/parallels: fix a mistake

We have only one flag for now - Empty Image flag. The patch fixes unused
bits specification and marks bit 1 as usused.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoMerge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160718' into staging
Peter Maydell [Mon, 18 Jul 2016 10:24:15 +0000 (11:24 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160718' into staging

ppc patch queue 2016-07-18

Here's what ought to be the final ppc pull request before the 2.7 hard
freeze.  This set contains a rework of the DBDMA device for Mac
platforms, and some assorted cleanups and bugfixes.

# gpg: Signature made Mon 18 Jul 2016 05:35:27 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.7-20160718:
  ppc: Yet another fix for the huge page support detection mechanism
  target-ppc: fix left shift overflow in hpte_page_shift
  ppc/mmu-hash64: Remove duplicated #include statement
  ppc: abort if compat property contains an unknown value
  spapr: Ensure CPU cores are added contiguously and removed in LIFO order
  vfio/spapr: Remove stale ioctl() call
  ppc: Fix support for odd MSR combinations
  dbdma: reset io->processing flag for unassigned DBDMA channel rw accesses
  dbdma: set FLUSH bit upon reception of flush command for unassigned DBDMA channels
  dbdma: fix load_word/store_word value endianness
  dbdma: fix endian of DBDMA_CMDPTR_LO during branch
  dbdma: add per-channel debugging enabled via DEBUG_DBDMA_CHANMASK
  dbdma: always define DBDMA_DPRINTF and enable debug with DEBUG_DBDMA
  spapr: fix core unplug crash

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoe1000e: fix building without CONFIG_VMXNET3_PCI
Jason Wang [Tue, 12 Jul 2016 08:28:23 +0000 (16:28 +0800)]
e1000e: fix building without CONFIG_VMXNET3_PCI

e1000e needs net_tx_pkt.o and net_rx_pkt.o too.

Cc: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Cc: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
7 years agoMAINTAINERS: release Scott from being a rocker maintainer
Jiri Pirko [Mon, 11 Jul 2016 07:49:42 +0000 (09:49 +0200)]
MAINTAINERS: release Scott from being a rocker maintainer

As requested by Scott, removing him.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
7 years agotap: fix memory leak on failure to create a multiqueue tap device
Paolo Bonzini [Fri, 15 Jul 2016 08:56:07 +0000 (10:56 +0200)]
tap: fix memory leak on failure to create a multiqueue tap device

Reported by Coverity.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
7 years agonet: fix incorrect argument to iov_to_buf
Paolo Bonzini [Fri, 15 Jul 2016 08:41:47 +0000 (10:41 +0200)]
net: fix incorrect argument to iov_to_buf

Coverity reports a "suspicious sizeof" which is indeed wrong.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
7 years agonet: fix incorrect access to pointer
Paolo Bonzini [Fri, 15 Jul 2016 08:43:32 +0000 (10:43 +0200)]
net: fix incorrect access to pointer

This is not dereferencing the pointer, and instead checking only
the value of the pointer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
7 years agoe1000e: fix incorrect access to pointer
Paolo Bonzini [Fri, 15 Jul 2016 08:44:38 +0000 (10:44 +0200)]
e1000e: fix incorrect access to pointer

This is not dereferencing the pointer, and instead checking only
the value of the pointer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
7 years agoppc: Yet another fix for the huge page support detection mechanism
Thomas Huth [Fri, 15 Jul 2016 08:10:25 +0000 (10:10 +0200)]
ppc: Yet another fix for the huge page support detection mechanism

Commit 86b50f2e1bef ("Disable huge page support if it is not available
for main RAM") already made sure that huge page support is not announced
to the guest if the normal RAM of non-NUMA configurations is not backed
by a huge page filesystem. However, there is one more case that can go
wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
with huge page support (and only the memory of a DIMM is configured with
it). When QEMU is started with the following command line for example,
the Linux guest currently crashes because it is trying to use huge pages
on a memory region that does not support huge pages:

 qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
   memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
   -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
   -numa node,nodeid=0 -numa node,nodeid=1

To fix this issue, we've got to make sure to disable huge page support,
too, when there is a NUMA node that is not using a memory backend with
huge page support.

Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agotarget-ppc: fix left shift overflow in hpte_page_shift
Paolo Bonzini [Fri, 15 Jul 2016 15:22:10 +0000 (17:22 +0200)]
target-ppc: fix left shift overflow in hpte_page_shift

ps->pte_enc is a 32-bit value, which is shifted left and then compared
to a 64-bit value.  It needs a cast before the shift.

Reported by Coverity.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agoppc/mmu-hash64: Remove duplicated #include statement
Thomas Huth [Thu, 14 Jul 2016 08:14:18 +0000 (10:14 +0200)]
ppc/mmu-hash64: Remove duplicated #include statement

No need to include error-report.h twice here.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agoppc: abort if compat property contains an unknown value
Greg Kurz [Wed, 13 Jul 2016 10:00:17 +0000 (12:00 +0200)]
ppc: abort if compat property contains an unknown value

It is not possible to set the compat property to an unknown value with
powerpc_set_compat(). Something must have gone terribly wrong in QEMU,
if we detect an "Internal error" in powerpc_get_compat(). Let's abort then.

This patch also drops the "max_compat ? *max_compat : -1" construct. It is
useless since max_compat is dereferenced a few lines above.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agospapr: Ensure CPU cores are added contiguously and removed in LIFO order
Bharata B Rao [Wed, 13 Jul 2016 06:50:20 +0000 (12:20 +0530)]
spapr: Ensure CPU cores are added contiguously and removed in LIFO order

If CPU core addition or removal is allowed in random order leading to
holes in the core id range (and hence in the cpu_index range), migration
can fail as migration with holes in cpu_index range isn't yet handled
correctly.

Prevent this situation by enforcing the addition in contiguous order
and removal in LIFO order so that we never end up with holes in
cpu_index range.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agovfio/spapr: Remove stale ioctl() call
David Gibson [Tue, 12 Jul 2016 06:54:03 +0000 (16:54 +1000)]
vfio/spapr: Remove stale ioctl() call

This ioctl() call to VFIO_IOMMU_SPAPR_TCE_REMOVE was left over from an
earlier version of the code and has since been folded into
vfio_spapr_remove_window().

It wasn't caught because although the argument structure has been removed,
the libc function remove() means this didn't trigger a compile failure.
The ioctl() was also almost certain to fail silently and harmlessly with
the bogus argument, so this wasn't caught in testing.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
7 years agoppc: Fix support for odd MSR combinations
Benjamin Herrenschmidt [Sat, 9 Jul 2016 03:41:31 +0000 (13:41 +1000)]
ppc: Fix support for odd MSR combinations

MacOS uses an architecturally illegal MSR combination that
seems nonetheless supported by 32-bit processors, which is
to have MSR[PR]=1 and one or more of MSR[DR/IR/EE]=0.

This adds support for it. To work properly we need to also
properly include support for PR=1,{I,D}R=0 to the MMU index
used by the qemu TLB.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agodbdma: reset io->processing flag for unassigned DBDMA channel rw accesses
Mark Cave-Ayland [Sun, 10 Jul 2016 18:08:58 +0000 (19:08 +0100)]
dbdma: reset io->processing flag for unassigned DBDMA channel rw accesses

Otherwise MacOS 9 hangs upon shutdown.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agodbdma: set FLUSH bit upon reception of flush command for unassigned DBDMA channels
Mark Cave-Ayland [Sun, 10 Jul 2016 18:08:57 +0000 (19:08 +0100)]
dbdma: set FLUSH bit upon reception of flush command for unassigned DBDMA channels

This fixes MacOS 9 whereby it continually flushes and polls the status bits
until they are set to indicate a successful flush.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agodbdma: fix load_word/store_word value endianness
Mark Cave-Ayland [Sun, 10 Jul 2016 18:08:56 +0000 (19:08 +0100)]
dbdma: fix load_word/store_word value endianness

The values to read/write to/from physical memory are copied directly to the
physical address with no endian swapping required.

Also add some extra information to debugging output while we are here.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agodbdma: fix endian of DBDMA_CMDPTR_LO during branch
Mark Cave-Ayland [Sun, 10 Jul 2016 18:08:55 +0000 (19:08 +0100)]
dbdma: fix endian of DBDMA_CMDPTR_LO during branch

The current DBDMA command is stored in little-endian format, so make sure
we convert it to match our CPU when updating the DBDMA_CMDPTR_LO register.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agodbdma: add per-channel debugging enabled via DEBUG_DBDMA_CHANMASK
Mark Cave-Ayland [Sun, 10 Jul 2016 18:08:54 +0000 (19:08 +0100)]
dbdma: add per-channel debugging enabled via DEBUG_DBDMA_CHANMASK

By default large amounts of DBDMA debugging are produced when often it is just
1 or 2 channels that are of interest. Introduce DEBUG_DBDMA_CHANMASK to allow
the developer to select the channels of interest at compile time, and then
further add the extra channel information to each debug statement where
possible.

Also clearly mark the start/end of DBDMA_run_bh to allow tracking the bottom
half execution.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agodbdma: always define DBDMA_DPRINTF and enable debug with DEBUG_DBDMA
Mark Cave-Ayland [Sun, 10 Jul 2016 18:08:53 +0000 (19:08 +0100)]
dbdma: always define DBDMA_DPRINTF and enable debug with DEBUG_DBDMA

Enabling DBDMA_DPRINTF unconditionally ensures that any errors in debug
statements are picked up immediately.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agospapr: fix core unplug crash
Greg Kurz [Fri, 8 Jul 2016 13:12:07 +0000 (15:12 +0200)]
spapr: fix core unplug crash

If the host has 8 threads/core and the guest is started with:

-smp cores=1,threads=4,maxcpus=12

It is possible to crash QEMU by doing:

(qemu) device_add host-spapr-cpu-core,core-id=16,id=foo
(qemu) device_del foo
Segmentation fault

This happens because spapr_core_unplug() assumes cpu_dt_id == core_id.
As long as cpu_dt_id is derived from the non-table cpu_index, this is
only true when you plug cores with contiguous ids.

It is safer to be consistent: the DR connector was created with an
index that is immediately written to cc->core_id, and spapr_core_plug()
also relies on cc->core_id.

Let's use it also in spapr_core_unplug().

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agoMerge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-signed' into staging
Peter Maydell [Fri, 15 Jul 2016 15:56:08 +0000 (16:56 +0100)]
Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-signed' into staging

Update OpenBIOS images

# gpg: Signature made Fri 15 Jul 2016 15:22:36 BST
# gpg:                using RSA key 0x5BC2C56FAE0F321F
# gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>"
# Primary key fingerprint: CC62 1AB9 8E82 200D 915C  C9C4 5BC2 C56F AE0F 321F

* remotes/mcayland/tags/qemu-openbios-signed:
  Update OpenBIOS images to b747b6a built from submodule.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoUpdate OpenBIOS images to b747b6a built from submodule.
Mark Cave-Ayland [Fri, 15 Jul 2016 14:14:35 +0000 (15:14 +0100)]
Update OpenBIOS images to b747b6a built from submodule.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
7 years agoMerge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20160714' into...
Peter Maydell [Thu, 14 Jul 2016 16:32:53 +0000 (17:32 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20160714' into staging

target-arm queue:
 * add virtio-mmio transport base address to device path
   (avoid an assertion failure with multiple virtio-scsi-devices)
 * revert hw/ptimer commit 5a50307 which causes regressions on
   SPARC guests
 * use Neon to accelerate zero-page checking on AArch64 hosts
 * set the MPIDR for TCG to match how KVM does it (and fit with
   GICv2/GICv3 restrictions on SGI target lists)
 * add some missing AArch32 TLBI hypervisor TLB operations
 * m25p80: Fix QIOR/DIOR handling for Winbond
 * hw/misc: fix typo in Aspeed SCU hw-strap2 property name
 * ast2400: pretend DMAs are done for U-boot
 * ast2400: some minor code cleanups

# gpg: Signature made Thu 14 Jul 2016 17:21:30 BST
# gpg:                using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20160714:
  ast2400: externalize revision numbers
  ast2400: pretend DMAs are done for U-boot
  ast2400: replace aspeed_smc_is_implemented()
  hw/misc: fix typo in Aspeed SCU hw-strap2 property name
  m25p80: Fix QIOR/DIOR handling for Winbond
  target-arm: Add missed AArch32 TLBI sytem registers
  hw/arm/virt: tcg: adjust MPIDR like KVM
  gic: provide defines for v2/v3 targetlist sizes
  target-arm: Use Neon for zero checking
  Revert "hw/ptimer: Perform counter wrap around if timer already expired"
  virtio-mmio: format transport base address in BusClass.get_dev_path

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoast2400: externalize revision numbers
Cédric Le Goater [Thu, 14 Jul 2016 15:51:39 +0000 (16:51 +0100)]
ast2400: externalize revision numbers

AST2400_A0_SILICON_REV is defined twice. Fix this by including the
definition in the header file as well as the routine to check if a
silicon revision is supported. It will useful to reuse in other
controllers.

Let's add also AST2500_A0_SILICON_REV for future use.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 1467994016-11678-5-git-send-email-clg@kaod.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoast2400: pretend DMAs are done for U-boot
Cédric Le Goater [Thu, 14 Jul 2016 15:51:38 +0000 (16:51 +0100)]
ast2400: pretend DMAs are done for U-boot

U-boot does SPI timing calibration using DMA tranfers. To let the
initialization continue, we fake success by setting the DMA status of
the Interrupt Control Register.

For the moment, DMA support is not required as it is not used in
normal operation.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 1467994016-11678-4-git-send-email-clg@kaod.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoast2400: replace aspeed_smc_is_implemented()
Cédric Le Goater [Thu, 14 Jul 2016 15:51:38 +0000 (16:51 +0100)]
ast2400: replace aspeed_smc_is_implemented()

aspeed_smc_is_implemented() filters invalid registers in a peculiar
way. Let's remove it and open code the if conditions. It serves the
same purpose, the aesthetic is better, and new registers can easily be
added.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 1467994016-11678-3-git-send-email-clg@kaod.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agohw/misc: fix typo in Aspeed SCU hw-strap2 property name
Cédric Le Goater [Thu, 14 Jul 2016 15:51:38 +0000 (16:51 +0100)]
hw/misc: fix typo in Aspeed SCU hw-strap2 property name

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 1467994016-11678-2-git-send-email-clg@kaod.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agom25p80: Fix QIOR/DIOR handling for Winbond
Marcin Krzeminski [Thu, 14 Jul 2016 15:51:38 +0000 (16:51 +0100)]
m25p80: Fix QIOR/DIOR handling for Winbond

Winbond also support continuous read mode, but as an opposite for other
flash type read mode clock cycles are included to dummy cycles number.
This path add proper handling of read mode byte and update needed
dummy cycles. QPI mode and dummy cycles configuration are not supported.

Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-id: 1467809036-6986-1-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agotarget-arm: Add missed AArch32 TLBI sytem registers
Sergey Sorokin [Thu, 14 Jul 2016 15:51:37 +0000 (16:51 +0100)]
target-arm: Add missed AArch32 TLBI sytem registers

Some PL2 related TLBI system registers are missed in AArch32
implementation. The patch fixes it.

Signed-off-by: Sergey Sorokin <afarallax@yandex.ru>
Message-id: 1468328885-3217862-1-git-send-email-afarallax@yandex.ru
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agohw/arm/virt: tcg: adjust MPIDR like KVM
Andrew Jones [Thu, 14 Jul 2016 15:51:37 +0000 (16:51 +0100)]
hw/arm/virt: tcg: adjust MPIDR like KVM

KVM adjusts the MPIDR of guest vcpus based on the architecture of
the host, 32-bit vs. 64-bit, and, for 64-bit, also on the type of
GIC the guest is using. To be consistent and improve SGI efficiency
we make the same adjustments for TCG as 64-bit KVM hosts. We neglect
to add consistency with 32-bit KVM hosts, as that would reduce SGI
efficiency and KVM is expected to change.

As MPIDR is a system register, and thus guest visible, we only make
adjustments for current and later versioned machines.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-id: 1467378129-23302-3-git-send-email-drjones@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agogic: provide defines for v2/v3 targetlist sizes
Andrew Jones [Thu, 14 Jul 2016 15:51:37 +0000 (16:51 +0100)]
gic: provide defines for v2/v3 targetlist sizes

Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-id: 1467378129-23302-2-git-send-email-drjones@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agotarget-arm: Use Neon for zero checking
Vijay [Thu, 14 Jul 2016 15:51:36 +0000 (16:51 +0100)]
target-arm: Use Neon for zero checking

Use Neon instructions to perform zero checking of
buffer. This is helps in reducing total migration time.

Use case: Idle VM live migration with 4 VCPUS and 8GB ram
running CentOS 7.

Without Neon, the Total migration time is 3.5 Sec

Migration status: completed
total time: 3560 milliseconds
downtime: 33 milliseconds
setup: 5 milliseconds
transferred ram: 297907 kbytes
throughput: 685.76 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2062760 pages
skipped: 0 pages
normal: 69808 pages
normal bytes: 279232 kbytes
dirty sync count: 3

With Neon, the total migration time is 2.9 Sec

Migration status: completed
total time: 2960 milliseconds
downtime: 65 milliseconds
setup: 4 milliseconds
transferred ram: 299869 kbytes
throughput: 830.19 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2064313 pages
skipped: 0 pages
normal: 70294 pages
normal bytes: 281176 kbytes
dirty sync count: 3

Signed-off-by: Vijaya Kumar K <vijayak@cavium.com>
Signed-off-by: Suresh <ksuresh@cavium.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1467190029-694-2-git-send-email-vijayak@cavium.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoRevert "hw/ptimer: Perform counter wrap around if timer already expired"
Dmitry Osipenko [Thu, 14 Jul 2016 15:51:36 +0000 (16:51 +0100)]
Revert "hw/ptimer: Perform counter wrap around if timer already expired"

Software should see timer counter wraparound only after IRQ being triggered.
This fixes regression introduced by the commit 5a50307 ("hw/ptimer: Perform
counter wrap around if timer already expired"), resulting in monotonic timer
jumping backwards on SPARC emulated machine running NetBSD guest OS, as
reported by Mark Cave-Ayland.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Message-id: 20160708132206.2080-1-digetx@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agovirtio-mmio: format transport base address in BusClass.get_dev_path
Laszlo Ersek [Thu, 14 Jul 2016 15:51:36 +0000 (16:51 +0100)]
virtio-mmio: format transport base address in BusClass.get_dev_path

At the moment the following QEMU command line triggers an assertion
failure (minimal reproducer by Cole):

  qemu-system-aarch64 \
    -machine virt-2.6,accel=tcg \
    -nodefaults \
    -no-user-config \
    -nographic -monitor stdio \
    -device virtio-scsi-device,id=scsi0 \
    -device virtio-scsi-device,id=scsi1 \
    -drive file=foo.img,format=raw,if=none,id=d0 \
    -device scsi-hd,bus=scsi0.0,drive=d0 \
    -drive file=foo.img,format=raw,if=none,id=d1 \
    -device scsi-hd,bus=scsi1.0,drive=d1

  qemu-system-aarch64: migration/savevm.c:615:
  vmstate_register_with_alias_id:
  Assertion `!se->compat || se->instance_id == 0' failed.

The reason is that the vmstate sections for the two scsi-hd devices are
not uniquely identifiable by name.

The direct parent buses of the scsi-hd devices -- scsi0.0 and scsi1.0 --
support the BusClass.get_dev_path member function. scsibus_get_dev_path()
formats a device path prefix with the help of its topologically parent
bus, and then appends the chan:id:lun triplet to it. For both scsi-hd
devices, this triplet is 0:0:0.

(Here we use "device path" in the QEMU migration sense, for vmstate
section identification, not in the OFW or UEFI device path senses.)

The virtio-scsi HBA is plugged into the virtio-mmio bus (implemented by
the internal VirtIOMMIOProxy device). This bus class
(TYPE_VIRTIO_MMIO_BUS) inherits, as its get_dev_path() member function,
the virtio_bus_get_dev_path() method from its parent class
(TYPE_VIRTIO_BUS).

virtio_bus_get_dev_path() does not format any kind of device address on
its own; "virtio addresses" are transport-specific. Therefore
virtio_bus_get_dev_path() asks the topologically parent bus of the proxy
object (implementing the specific virtio transport) to format the address
of the proxy object.

(For virtio-pci devices (where the proxy is an instance of VirtIOPCIProxy,
plugged into a PCI bus), this ends up in pcibus_get_dev_path().)

However, VirtIOMMIOProxy is usually (in practice: always) plugged into
"main-system-bus", the singleton TYPE_SYSTEM_BUS object. This BusClass
does not support formatting QEMU vmstate device paths at all (as
SysBusDevice objects can have zero or more IO ports and zero or more MMIO
regions). Hence the formatting request delegated from
virtio_bus_get_dev_path() gets answered with NULL.

The end result is that the two scsi-hd devices end up with the same device
path "0:0:0", which triggers the assert.

We can solve this by recognizing that virtio-mmio transports are
distinguished from each other by their base addresses in MMIO address
space. Implement virtio_mmio_bus_get_dev_path() as follows:

(1) The virtio device whose devpath is to be formatted resides on a
    virtio-mmio bus that is implemented by a VirtIOMMIOProxy object. Ask
    the parent bus of VirtIOMMIOProxy to format the device path of
    VirtIOMMIOProxy, as a path prefix. (This is identical to what
    virtio_bus_get_dev_path() does.)

(2) Append the base address of VirtIOMMIOProxy to the device path, such
    as:
    - virtio-mmio@000000000a003e00,
    - virtio-mmio@000000000a003c00.

Given that these device paths are placed in the migration stream, step (2)
above, if done unconditionally, would break migration. So make that step
conditional on a new VirtIOMMIOProxy property, which is enabled for 2.7
machine types and later.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Cole Robinson <crobinso@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Kevin Zhao <kevin.zhao@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Tom Hanson <thomas.hanson@linaro.org>
Reported-by: Kevin Zhao <kevin.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-id: 1467739394-28357-1-git-send-email-lersek@redhat.com
Fixes: https://bugs.launchpad.net/qemu/+bug/1594239
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoMerge remote-tracking branch 'remotes/bonzini/tags/for-upstream-fwcfg' into staging
Peter Maydell [Thu, 14 Jul 2016 15:49:18 +0000 (16:49 +0100)]
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream-fwcfg' into staging

* Updated fw_cfg option ROM to include DMA support

# gpg: Signature made Thu 14 Jul 2016 14:51:06 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream-fwcfg:
  Add optionrom compatible with fw_cfg DMA version

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoMerge remote-tracking branch 'remotes/xtensa/tags/20160714-xtensa' into staging
Peter Maydell [Thu, 14 Jul 2016 14:57:28 +0000 (15:57 +0100)]
Merge remote-tracking branch 'remotes/xtensa/tags/20160714-xtensa' into staging

Xtensa-related fixes:

- fix FLASH interface width for XTFPGA boards.

# gpg: Signature made Thu 14 Jul 2016 12:00:05 BST
# gpg:                using RSA key 0x51F9CC91F83FA044
# gpg: Good signature from "Max Filippov <max.filippov@cogentembedded.com>"
# gpg:                 aka "Max Filippov <jcmvbkbc@gmail.com>"
# Primary key fingerprint: 2B67 854B 98E5 327D CDEB  17D8 51F9 CC91 F83F A044

* remotes/xtensa/tags/20160714-xtensa:
  target-xtensa: xtfpga: fix FLASH interface width

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoAdd optionrom compatible with fw_cfg DMA version
Marc Marí [Mon, 23 May 2016 18:11:33 +0000 (19:11 +0100)]
Add optionrom compatible with fw_cfg DMA version

This optionrom is based on linuxboot.S.

Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <1464027093-24073-2-git-send-email-rjones@redhat.com>
[Add -fno-toplevel-reorder, support clang without -m16. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoMerge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
Peter Maydell [Thu, 14 Jul 2016 12:44:06 +0000 (13:44 +0100)]
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* SCSI scanner support
* fixes to qemu-char and net exit
* FreeBSD fixes
* Other small bugfixes

# gpg: Signature made Wed 13 Jul 2016 12:30:11 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  hostmem: detect host backend memory is being used properly
  hostmem: fix QEMU crash by 'info memdev'
  char: do not use atexit cleanup handler
  net: do not use atexit for cleanup
  slirp: use exit notifier for slirp_smb_cleanup
  tap: use an exit notifier to call down_script
  util: Fix MIN_NON_ZERO
  qemu-sockets: use qapi_free_SocketAddress in cleanup
  disas: avoid including everything in headers compiled from C++
  json-streamer: fix double-free on exiting during a parse
  main-loop: check return value before using pointer
  Use "-s" instead of "--quiet" to resolve non-fatal build error on FreeBSD.
  scsi-bus: Use longer sense buffer with scanners
  scsi-bus: Add SCSI scanner support

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agotarget-xtensa: xtfpga: fix FLASH interface width
Max Filippov [Wed, 6 Jul 2016 06:31:32 +0000 (09:31 +0300)]
target-xtensa: xtfpga: fix FLASH interface width

FLASH chip on XTFPGA boards is connected with 16-bit-wide interface.
Latest U-Boot can see the difference and does not work correctly with
32-bit-wide interface.
Set FLASH chip 'width' property to 2.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
7 years agoMerge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Peter Maydell [Thu, 14 Jul 2016 10:48:46 +0000 (11:48 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Wed 13 Jul 2016 12:46:17 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (34 commits)
  iotests: Make 157 actually format-agnostic
  vvfat: Fix qcow write target driver specification
  hmp: show all of snapshot info on every block dev in output of 'info snapshots'
  hmp: use snapshot name to determine whether a snapshot is 'fully available'
  qemu-iotests: Test naming of throttling groups
  blockdev: Fix regression with the default naming of throttling groups
  vmdk: fix metadata write regression
  Improve block job rate limiting for small bandwidth values
  qcow2: Fix qcow2_get_cluster_offset()
  qemu-io: Use correct range limitations
  qcow2: Avoid making the L1 table too big
  qemu-img: Use strerror() for generic resize error
  block: Remove BB options from blockdev-add
  qemu-iotests: Test setting WCE with qdev
  block/qdev: Allow configuring rerror/werror with qdev properties
  commit: Fix use of error handling policy
  block/qdev: Allow configuring WCE with qdev properties
  block/qdev: Allow node name for drive properties
  coroutine: move entry argument to qemu_coroutine_create
  test-coroutine: prepare for the next patch
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoMerge remote-tracking branch 'remotes/rth/tags/pull-rth-20160712' into staging
Peter Maydell [Thu, 14 Jul 2016 09:36:27 +0000 (10:36 +0100)]
Merge remote-tracking branch 'remotes/rth/tags/pull-rth-20160712' into staging

target-sparc improvements, v4

# gpg: Signature made Tue 12 Jul 2016 19:04:33 BST
# gpg:                using RSA key 0xAD1270CC4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"
# Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B

* remotes/rth/tags/pull-rth-20160712: (24 commits)
  target-sparc: Elide duplicate updates to fprs
  target-sparc: Use cpu_loop_exit_restore from helper_check_ieee_exceptions
  target-sparc: Use cpu_fsr in stfsr
  target-sparc: Use explicit writes to cpu_fsr
  target-sparc: Remove helper_ldf_asi, helper_stf_asi
  target-sparc: Directly implement block and short ldf/stf asis
  target-sparc: Directly implement easy ldf/stf asis
  target-sparc: Pass TCGMemOp constants to helper_ld/st_asi
  target-sparc: Fix obvious error in ASI_M_BFILL
  target-sparc: Directly implement easy ldd/std asis
  target-sparc: Introduce gen_check_align
  target-sparc: Use QT0 to return results from ldda
  target-sparc: Directly implement easy ld/st asis
  target-sparc: Use defines from asi.h
  target-sparc: Add UA2005 defines to asi.h
  target-sparc: Import linux/arch/sparc/include/uapi/asm/asi.h
  target-sparc: Pass TCGMemOp to gen_ld/st_asi
  target-sparc: Introduce get_asi
  target-sparc: Store %asi in TB flags
  target-sparc: Unify asi handling between 32 and 64-bit
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoMerge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-07-13' into queue...
Kevin Wolf [Wed, 13 Jul 2016 11:45:55 +0000 (13:45 +0200)]
Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-07-13' into queue-block

Block patches (v2) for the block queue.

# gpg: Signature made Wed Jul 13 13:41:53 2016 CEST
# gpg:                using RSA key 0x3BB14202E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40
#      Subkey fingerprint: 58B3 81CE 2DC8 9CF9 9730  EE64 3BB1 4202 E838 ACAD

* mreitz/tags/pull-block-for-kevin-2016-07-13:
  iotests: Make 157 actually format-agnostic
  vvfat: Fix qcow write target driver specification
  hmp: show all of snapshot info on every block dev in output of 'info snapshots'
  hmp: use snapshot name to determine whether a snapshot is 'fully available'
  qemu-iotests: Test naming of throttling groups
  blockdev: Fix regression with the default naming of throttling groups
  vmdk: fix metadata write regression
  Improve block job rate limiting for small bandwidth values
  qcow2: Fix qcow2_get_cluster_offset()
  qemu-io: Use correct range limitations
  qcow2: Avoid making the L1 table too big
  qemu-img: Use strerror() for generic resize error

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoiotests: Make 157 actually format-agnostic
Max Reitz [Mon, 11 Jul 2016 13:22:46 +0000 (15:22 +0200)]
iotests: Make 157 actually format-agnostic

iotest 157 pretends not to care about the image format used, but in fact
it does due to the format name not being filtered in its output. This
patch adds filtering and changes the reference output accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160711132246.3152-1-mreitz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agovvfat: Fix qcow write target driver specification
Max Reitz [Mon, 11 Jul 2016 13:54:52 +0000 (15:54 +0200)]
vvfat: Fix qcow write target driver specification

First, bdrv_open_child() expects all options for the child to be
prefixed by the child's name (and a separating dot). Second,
bdrv_open_child() does not take ownership of the QDict passed to it but
only extracts all options for the child, so if a QDict is created for
the sole purpose of passing it to bdrv_open_child(), it needs to be
freed afterwards.

This patch makes vvfat adhere to both of these rules.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160711135452.11304-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agohmp: show all of snapshot info on every block dev in output of 'info snapshots'
Lin Ma [Thu, 7 Jul 2016 05:26:04 +0000 (13:26 +0800)]
hmp: show all of snapshot info on every block dev in output of 'info snapshots'

Currently, the output of 'info snapshots' shows fully available snapshots.
It's opaque, hides some snapshot information to users. It's not convenient
if users want to know more about all of snapshot information on every block
device via monitor.

Follow Kevin's and Max's proposals, The patch makes the output more detailed:
(qemu) info snapshots
List of snapshots present on all disks:
 ID        TAG                 VM SIZE                DATE       VM CLOCK
 --        checkpoint-1           165M 2016-05-22 16:58:07   00:02:06.813

List of partial (non-loadable) snapshots on 'drive_image1':
 ID        TAG                 VM SIZE                DATE       VM CLOCK
 1         snap1                     0 2016-05-22 16:57:31   00:01:30.567

Signed-off-by: Lin Ma <lma@suse.com>
Message-id: 1467869164-26688-3-git-send-email-lma@suse.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agohmp: use snapshot name to determine whether a snapshot is 'fully available'
Lin Ma [Thu, 7 Jul 2016 05:26:03 +0000 (13:26 +0800)]
hmp: use snapshot name to determine whether a snapshot is 'fully available'

Currently qemu uses snapshot id to determine whether a snapshot is fully
available, It causes incorrect output in some scenario.

For instance:
(qemu) info block
drive_image1 (#block113): /opt/vms/SLES12-SP1-JeOS-x86_64-GM/disk0.qcow2
(qcow2)
    Cache mode:       writeback

drive_image2 (#block349): /opt/vms/SLES12-SP1-JeOS-x86_64-GM/disk1.qcow2
(qcow2)
    Cache mode:       writeback
(qemu)
(qemu) info snapshots
There is no snapshot available.
(qemu)
(qemu) snapshot_blkdev_internal drive_image1 snap1
(qemu)
(qemu) info snapshots
There is no suitable snapshot available
(qemu)
(qemu) savevm checkpoint-1
(qemu)
(qemu) info snapshots
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         snap1                     0 2016-05-22 16:57:31   00:01:30.567
(qemu)

$ qemu-img snapshot -l disk0.qcow2
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         snap1                     0 2016-05-22 16:57:31   00:01:30.567
2         checkpoint-1           165M 2016-05-22 16:58:07   00:02:06.813

$ qemu-img snapshot -l disk1.qcow2
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         checkpoint-1              0 2016-05-22 16:58:07   00:02:06.813

The patch uses snapshot name instead of snapshot id to determine whether a
snapshot is fully available and uses '--' instead of snapshot id in output
because the snapshot id is not guaranteed to be the same on all images.
For instance:
(qemu) info snapshots
List of snapshots present on all disks:
 ID        TAG                 VM SIZE                DATE       VM CLOCK
 --        checkpoint-1           165M 2016-05-22 16:58:07   00:02:06.813

Signed-off-by: Lin Ma <lma@suse.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1467869164-26688-2-git-send-email-lma@suse.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoqemu-iotests: Test naming of throttling groups
Alberto Garcia [Fri, 8 Jul 2016 14:03:01 +0000 (17:03 +0300)]
qemu-iotests: Test naming of throttling groups

Throttling groups are named using the 'group' parameter of the
block_set_io_throttle command and the throttling.group command-line
option. If that parameter is unspecified the groups get the name of
the block device.

This patch adds a new test to check the naming of throttling groups.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: d87d02823a6b91609509d8bb18e2f5dbd9a6102c.1467986342.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblockdev: Fix regression with the default naming of throttling groups
Alberto Garcia [Fri, 8 Jul 2016 14:03:00 +0000 (17:03 +0300)]
blockdev: Fix regression with the default naming of throttling groups

When I/O limits are set for a block device, the name of the throttling
group is taken from the BlockBackend if the user doesn't specify one.

Commit efaa7c4eeb7490c6f37f3 moved the naming of the BlockBackend in
blockdev_init() to the end of the function, after I/O limits are set.
The consequence is that the throttling group gets an empty name.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: qemu-stable@nongnu.org
Message-id: af5cd58bd2c4b9f6c57f260d9cfe586b9fb7d34d.1467986342.git.berto@igalia.com
[mreitz: Use existing "id" variable instead of new "blk_id"]
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agovmdk: fix metadata write regression
Reda Sallahi [Thu, 7 Jul 2016 08:42:49 +0000 (10:42 +0200)]
vmdk: fix metadata write regression

Commit "cdeaf1f vmdk: add bdrv_co_write_zeroes" causes a regression on
writes. It writes metadata after every write instead of doing it only once
for each cluster.

vmdk_pwritev() writes metadata whenever m_data is set as valid so this patch
sets m_data as valid only when we have a new cluster which hasn't been
allocated before or a zero grain.

Signed-off-by: Reda Sallahi <fullmanet@gmail.com>
Message-id: 20160707084249.29084-1-fullmanet@gmail.com
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoImprove block job rate limiting for small bandwidth values
Sascha Silbe [Tue, 28 Jun 2016 15:28:41 +0000 (17:28 +0200)]
Improve block job rate limiting for small bandwidth values

ratelimit_calculate_delay() previously reset the accounting every time
slice, no matter how much data had been processed before. This had (at
least) two consequences:

1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream.

   Not sure if there are real-world use cases where this would be a
   problem. Mirroring and backup over a slow link (e.g. DSL) would
   come to mind, though.

2. Tests for block job operations (e.g. cancel) were rather racy

   All block jobs currently use a time slice of 100ms. That's a
   reasonable value to get smooth output during regular
   operation. However this also meant that the state of block jobs
   changed every 100ms, no matter how low the configured limit was. On
   busy hosts, qemu often transferred additional chunks until the test
   case had a chance to cancel the job.

Fix the block job rate limit code to delay for more than one time
slice to address the above issues. To make it easier to handle
oversized chunks we switch the semantics from returning a delay
_before_ the current request to a delay _after_ the current
request. If necessary, this delay consists of multiple time slice
units.

Since the mirror job sends multiple chunks in one go even if the rate
limit was exceeded in between, we need to keep track of the start of
the current time slice so we can correctly re-compute the delay for
the updated amount of data.

The minimum bandwidth now is 1 data unit per time slice. The block
jobs are currently passing the amount of data transferred in sectors
and using 100ms time slices, so this translates to 5120
bytes/second. With chunk sizes usually being O(512KiB), tests have
plenty of time (O(100s)) to operate on block jobs. The chance of a
race condition now is fairly remote, except possibly on insanely
loaded systems.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoqcow2: Fix qcow2_get_cluster_offset()
Max Reitz [Mon, 20 Jun 2016 14:26:23 +0000 (16:26 +0200)]
qcow2: Fix qcow2_get_cluster_offset()

Recently, qcow2_get_cluster_offset() has been changed to work with bytes
instead of sectors. This invalidated some assertions and introduced a
possible integer multiplication overflow.

This could be reproduced using e.g.

$ qemu-img create -f qcow2 -o cluster_size=1M blub.qcow2 8G
Formatting 'foo.qcow2', fmt=qcow2 size=8589934592 encryption=off
cluster_size=1048576 lazy_refcounts=off refcount_bits=16
$ qemu-io -c map blub.qcow2
qemu-io: qemu/block/qcow2-cluster.c:504: qcow2_get_cluster_offset:
Assertion `bytes_needed <= INT_MAX' failed.
[1]    20775 abort (core dumped)  qemu-io -c map foo.qcow2

This patch removes the now wrong assertion, adding comments and more
assertions to prove its correctness (and fixing the overflow which would
become apparent with the original assertion removed).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160620142623.24471-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoqemu-io: Use correct range limitations
Max Reitz [Mon, 20 Jun 2016 14:26:22 +0000 (16:26 +0200)]
qemu-io: Use correct range limitations

create_iovec() has a comment lamenting the lack of SIZE_T_MAX. Since
there actually is a SIZE_MAX, use it.

Two places use INT_MAX for checking the upper bound of a sector count
that is used as an argument for a blk_*() function (blk_discard() and
blk_write_compressed(), respectively). BDRV_REQUEST_MAX_SECTORS should
be used instead.

And finally, do_co_pwrite_zeroes() used to similarly check that the
sector count does not exceed INT_MAX. However, this function is now
backed by blk_co_pwrite_zeroes() which takes bytes as an argument
instead of sectors. Therefore, it should be the byte count that does not
exceed INT_MAX, not the sector count.

Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoqcow2: Avoid making the L1 table too big
Max Reitz [Wed, 15 Jun 2016 15:36:30 +0000 (17:36 +0200)]
qcow2: Avoid making the L1 table too big

We refuse to open images whose L1 table we deem "too big". Consequently,
we should not produce such images ourselves.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160615153630.2116-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
[mreitz: Added QEMU_BUILD_BUG_ON()]
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoqemu-img: Use strerror() for generic resize error
Max Reitz [Wed, 15 Jun 2016 15:36:29 +0000 (17:36 +0200)]
qemu-img: Use strerror() for generic resize error

Emitting the plain error number is not very helpful. Use strerror()
instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160615153630.2116-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblock: Remove BB options from blockdev-add
Kevin Wolf [Thu, 30 Jun 2016 13:52:37 +0000 (15:52 +0200)]
block: Remove BB options from blockdev-add

werror/rerror are now available as qdev options. The stats-* options are
removed without an existing replacement; they should probably be
configurable with a separate QMP command like I/O throttling settings.

Removing id is left for another day because this involves updating
qemu-iotests cases to use node-name for everything. Before we can do
that, however, all QMP commands must support node-name.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
7 years agoqemu-iotests: Test setting WCE with qdev
Kevin Wolf [Thu, 30 Jun 2016 13:29:35 +0000 (15:29 +0200)]
qemu-iotests: Test setting WCE with qdev

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
7 years agoblock/qdev: Allow configuring rerror/werror with qdev properties
Kevin Wolf [Wed, 29 Jun 2016 15:41:35 +0000 (17:41 +0200)]
block/qdev: Allow configuring rerror/werror with qdev properties

The rerror/werror policies are implemented in the devices, so that's
where they should be configured. In comparison to the old options in
-drive, the qdev properties are only added to those devices that
actually support them.

If the option isn't given (or "auto" is specified), the setting of the
BlockBackend is used for compatibility with the old options. For block
jobs, "auto" is the same as "enospc".

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
7 years agocommit: Fix use of error handling policy
Kevin Wolf [Wed, 29 Jun 2016 15:38:57 +0000 (17:38 +0200)]
commit: Fix use of error handling policy

Commit implemented the 'enospc' policy as 'ignore' if the error was not
ENOSPC. The QAPI documentation promises that it's treated as 'stop'.
Using the common block job error handling function fixes this and also
adds the missing QMP event.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
7 years agoblock/qdev: Allow configuring WCE with qdev properties
Kevin Wolf [Thu, 23 Jun 2016 13:12:35 +0000 (15:12 +0200)]
block/qdev: Allow configuring WCE with qdev properties

As cache.writeback is a BlockBackend property and as such more related
to the guest device than the BlockDriverState, we already removed it
from the blockdev-add interface. This patch adds the new way to set it,
as a qdev property of the corresponding guest device.

For example: -drive if=none,file=test.img,node-name=img
             -device ide-hd,drive=img,write-cache=off

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
7 years agohostmem: detect host backend memory is being used properly
Xiao Guangrong [Wed, 13 Jul 2016 04:18:06 +0000 (12:18 +0800)]
hostmem: detect host backend memory is being used properly

Currently, we use memory_region_is_mapped() to detect if the host
backend memory is being used. This works if the memory is directly
mapped into guest's address space, however, it is not true for
nvdimm as it uses aliased memory region to map the memory. This is
why this bug can happen:
   https://bugzilla.redhat.com/show_bug.cgi?id=1352769

Fix it by introduce a new filed, is_mapped, to HostMemoryBackend,
we set/clear this filed accordingly when the device link/unlink to
host backend memory

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agohostmem: fix QEMU crash by 'info memdev'
Xiao Guangrong [Wed, 13 Jul 2016 04:18:05 +0000 (12:18 +0800)]
hostmem: fix QEMU crash by 'info memdev'

'info memdev' crashes QEMU:
   (qemu) info memdev
   Unexpected error in parse_str() at qapi/string-input-visitor.c:111:
   Parameter 'null' expects an int64 value or range
It is caused by null uint16List is returned if 'host-nodes' is the default
value

Return MAX_NODES under this case to fix this bug

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agochar: do not use atexit cleanup handler
Marc-André Lureau [Mon, 4 Jul 2016 15:38:23 +0000 (17:38 +0200)]
char: do not use atexit cleanup handler

It turns out qemu is calling exit() in various places from various
threads without taking much care of resources state. The atexit()
cleanup handlers cannot easily destroy resources that are in use (by
the same thread or other).

Since c1111a24a3, TCG arm guests run into the following abort() when
running tests, the chardev mutex is locked during the write, so
qemu_mutex_destroy() returns an error:

 #0  0x00007fffdbb806f5 in raise () at /lib64/libc.so.6
 #1  0x00007fffdbb822fa in abort () at /lib64/libc.so.6
 #2  0x00005555557616fe in error_exit (err=<optimized out>, msg=msg@entry=0x555555c38c30 <__func__.14622> "qemu_mutex_destroy")
     at /home/drjones/code/qemu/util/qemu-thread-posix.c:39
 #3  0x0000555555b0be20 in qemu_mutex_destroy (mutex=mutex@entry=0x5555566aa0e0) at /home/drjones/code/qemu/util/qemu-thread-posix.c:57
 #4  0x00005555558aab00 in qemu_chr_free_common (chr=0x5555566aa0e0) at /home/drjones/code/qemu/qemu-char.c:4029
 #5  0x00005555558b05f9 in qemu_chr_delete (chr=<optimized out>) at /home/drjones/code/qemu/qemu-char.c:4038
 #6  0x00005555558b05f9 in qemu_chr_delete (chr=<optimized out>) at /home/drjones/code/qemu/qemu-char.c:4044
 #7  0x00005555558b062c in qemu_chr_cleanup () at /home/drjones/code/qemu/qemu-char.c:4557
 #8  0x00007fffdbb851e8 in __run_exit_handlers () at /lib64/libc.so.6
 #9  0x00007fffdbb85235 in  () at /lib64/libc.so.6
 #10 0x00005555558d1b39 in testdev_write (testdev=0x5555566aa0a0) at /home/drjones/code/qemu/backends/testdev.c:71
 #11 0x00005555558d1b39 in testdev_write (chr=<optimized out>, buf=0x7fffc343fd9a "", len=0) at /home/drjones/code/qemu/backends/testdev.c:95
 #12 0x00005555558adced in qemu_chr_fe_write (s=0x5555566aa0e0, buf=buf@entry=0x7fffc343fd98 "0q", len=len@entry=2) at /home/drjones/code/qemu/qemu-char.c:282

Instead of using a atexit() handler, only run the chardev cleanup as
initially proposed at the end of main(), where there are less chances
(hic) of conflicts or other races.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reported-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20160704153823.16879-1-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonet: do not use atexit for cleanup
Paolo Bonzini [Fri, 8 Jul 2016 15:28:34 +0000 (17:28 +0200)]
net: do not use atexit for cleanup

This will be necessary in the next patch, which stops using atexit for
character devices; without it, vhost-user and the redirector filter
will cause a use-after-free.  Relying on the ordering of atexit calls
is also brittle, even now that both the network and chardev
subsystems are using atexit.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoslirp: use exit notifier for slirp_smb_cleanup
Paolo Bonzini [Tue, 12 Jul 2016 07:57:12 +0000 (09:57 +0200)]
slirp: use exit notifier for slirp_smb_cleanup

We would like to move back net_cleanup() at the end of main function,
like it used to be until f30dbae63a46f23116715dff8d130c, but minimum
cleanup is needed regardless at exit() time for slirp's SMB
functionality.  Use an exit notifier to call slirp_smb_cleanup.
If net_cleanup() is called first, then remove the exit notifier as it
will become a dangling pointer otherwise.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agotap: use an exit notifier to call down_script
Marc-André Lureau [Mon, 11 Jul 2016 14:48:47 +0000 (16:48 +0200)]
tap: use an exit notifier to call down_script

We would like to move back net_cleanup() at the end of main function,
like it used to be until f30dbae63a46f23116715dff8d130c, but minimum
tap cleanup is necessary regarless at exit() time. Use an exit notifier
to call TAP down_script. If net_cleanup() is called first, then remove
the exit notifier as it will become a dangling pointer otherwise.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20160711144847.16651-1-marcandre.lureau@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoblock/qdev: Allow node name for drive properties
Kevin Wolf [Tue, 21 Jun 2016 18:46:05 +0000 (20:46 +0200)]
block/qdev: Allow node name for drive properties

If a node name instead of a BlockBackend name is specified as the driver
for a guest device, an anonymous BlockBackend is created now.

The order of operations in release_drive() must be reversed in order to
avoid a use-after-free bug because now blk_detach_dev() frees the last
reference if an anonymous BlockBackend is used.

usb-storage uses a hack where it forwards its BlockBackend as a property
to another device that it internally creates. This hack must be updated
so that it doesn't drop its original BB before it can be passed to the
other device. This used to work because we always had the monitor
reference around, but with node-names the device reference is the only
one now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
7 years agocoroutine: move entry argument to qemu_coroutine_create
Paolo Bonzini [Mon, 4 Jul 2016 17:10:01 +0000 (19:10 +0200)]
coroutine: move entry argument to qemu_coroutine_create

In practice the entry argument is always known at creation time, and
it is confusing that sometimes qemu_coroutine_enter is used with a
non-NULL argument to re-enter a coroutine (this happens in
block/sheepdog.c and tests/test-coroutine.c).  So pass the opaque value
at creation time, for consistency with e.g. aio_bh_new.

Mostly done with the following semantic patch:

@ entry1 @
expression entry, arg, co;
@@
- co = qemu_coroutine_create(entry);
+ co = qemu_coroutine_create(entry, arg);
  ...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);

@ entry2 @
expression entry, arg;
identifier co;
@@
- Coroutine *co = qemu_coroutine_create(entry);
+ Coroutine *co = qemu_coroutine_create(entry, arg);
  ...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);

@ entry3 @
expression entry, arg;
@@
- qemu_coroutine_enter(qemu_coroutine_create(entry), arg);
+ qemu_coroutine_enter(qemu_coroutine_create(entry, arg));

@ reentry @
expression co;
@@
- qemu_coroutine_enter(co, NULL);
+ qemu_coroutine_enter(co);

except for the aforementioned few places where the semantic patch
stumbled (as expected) and for test_co_queue, which would otherwise
produce an uninitialized variable warning.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agotest-coroutine: prepare for the next patch
Paolo Bonzini [Mon, 4 Jul 2016 17:10:00 +0000 (19:10 +0200)]
test-coroutine: prepare for the next patch

The next patch moves the coroutine argument from first-enter to
creation time.  In this case, coroutine has not been initialized
yet when the coroutine is created, so change to a pointer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agocoroutine: use QSIMPLEQ instead of QTAILQ
Paolo Bonzini [Mon, 4 Jul 2016 17:09:59 +0000 (19:09 +0200)]
coroutine: use QSIMPLEQ instead of QTAILQ

CoQueue do not need to remove any element but the head of the list;
processing is always strictly FIFO.  Therefore, the simpler singly-linked
QSIMPLEQ can be used instead.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoraw-posix: Use qemu_dup
Fam Zheng [Wed, 22 Jun 2016 12:53:20 +0000 (20:53 +0800)]
raw-posix: Use qemu_dup

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoosdep: Introduce qemu_dup
Fam Zheng [Wed, 22 Jun 2016 12:53:19 +0000 (20:53 +0800)]
osdep: Introduce qemu_dup

And use it in qemu_dup_flags.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoblockjob: Update description of the 'device' field in the QMP API
Alberto Garcia [Tue, 5 Jul 2016 14:29:02 +0000 (17:29 +0300)]
blockjob: Update description of the 'device' field in the QMP API

The 'device' field in all BLOCK_JOB_* events and 'block-job-*' command
is no longer the device name, but the ID of the job. This patch
updates the documentation to clarify that.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>