review.tizen.org Git - sdk/emulator/qemu.git/log

cputlb: Fix for self-modifying writes across page boundaries

As it currently stands, QEMU does not properly handle self-modifying code
when the write is unaligned and crosses a page boundary. The procedure
for handling a write to the current translation block is to write-protect
the current translation block, catch the write, split up the translation
block into the current instruction (which remains write-protected so that
the current instruction is not modified) and the remaining instructions
in the translation block, and then restore the CPU state to before the
write occurred so the write will be retried and successfully executed.
However, since unaligned writes across pages are split into one-byte
writes for simplicity, writes to the second page (which is not the
current TB) may succeed before a write to the current TB is attempted,
and since these writes are not invalidated before resuming state after
splitting the TB, these writes will be performed a second time, thus
corrupting the second page. Credit goes to Patrick Hulin for
discovering this.

In recent 64-bit versions of Windows running in emulated mode, this
results in either being very unstable (a BSOD after a couple minutes of
uptime), or being entirely unable to boot. Windows performs one or more
8-byte unaligned self-modifying writes (xors) which intersect the end
of the current TB and the beginning of the next TB, which runs into the
aforementioned issue. This commit fixes that issue by making the
unaligned write loop perform the writes in forwards order, instead of
reverse order. This way, QEMU immediately tries to write to the current
TB, and splits the TB before any write to the second page is executed.
The write then proceeds as intended. With this patch applied, I am able
to boot and use Windows 7 64-bit and Windows 10 64-bit in QEMU without
KVM.

Per Richard Henderson's input, this patch also ensures the second page
is in the TLB before executing the write loop, to ensure the second
page is mapped.

The original discussion of the issue is located at
http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg02161.html.

Signed-off-by: Samuel Damashek <samuel.damashek@invincea.com>
Message-Id: <20160706182652.16190-1-samuel.damashek@invincea.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>

cputlb: Add address parameter to VICTIM_TLB_HIT

[rth: Split out from the original patch.]

Signed-off-by: Samuel Damashek <samuel.damashek@invincea.com>
Message-Id: <20160706182652.16190-1-samuel.damashek@invincea.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>

cputlb: Move VICTIM_TLB_HIT out of line

There are currently 22 invocations of this function,
and we're about to increase that number.

Signed-off-by: Richard Henderson <rth@twiddle.net>

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20160707' into staging

target-arm queue:
* fix a wrong variable type for A64 SYS_HEAPINFO semihosting call
* xlnx_dp: fix iffy xlnx_dp_aux_push_tx_fifo
* aux: fix break that wanted to break two levels out
* aux: Rename aux.[ch] to auxbus.[ch] for the benefit of Windows
* hw/block/m25p80: fix resource leak
* i.MX: split the GPT timer implementation into per SOC definitions

# gpg: Signature made Thu 07 Jul 2016 14:48:09 BST
# gpg:                using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20160707:
  i.MX: split the GPT timer implementation into per SOC definitions
  hw/block/m25p80: fix resource leak
  aux: Rename aux.[ch] to auxbus.[ch] for the benefit of Windows
  aux: fix break that wanted to break two levels out
  xlnx_dp: fix iffy xlnx_dp_aux_push_tx_fifo
  target-arm/arm-semi.c: In SYS_HEAPINFO use correct type for 'limit'

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

i.MX: split the GPT timer implementation into per SOC definitions

In various Freescale SOCs, the GPT timers can be configured to select
its input clock.

Depending on the SOC the set of available input clocks may vary.

The actual single GPT definition was no good enough and because of it
booting the sabrelite board with a i.MX6DL device tree would fail
because of an incorrect input clock definition for the i.MX6DL SOC.

This patch fixes the i.MX6DL boot failure by adding the ability to
define a different set of input clocks depending on the considered SOC.

A different class has been defined for i.MX25, i.MX31 and i.MX6 each with
its specific set of input clocks.

The patch has been tested by booting KZM, i.MX25 PDK, i.MX6Q sabrelite
and i.MX6DL sabrelite.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 1467325619-8374-1-git-send-email-jcd@tribudubois.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: fixed spacing round '/' operator]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/block/m25p80: fix resource leak

These two are spot by Coverity 1357232 and 1357233.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Message-id: 1467684998-12076-1-git-send-email-zhaoshenglong@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

aux: Rename aux.[ch] to auxbus.[ch] for the benefit of Windows

On Windows 'aux.*' is a reserved name and cannot be used for
filenames; see
https://msdn.microsoft.com/en-gb/library/windows/desktop/aa365247(v=vs.85).aspx

This prevents cloning the QEMU git repo on Windows:

C:\Java\sources\kvm> git clone https://github.com/qemu/qemu.git
Cloning into 'qemu'...
remote: Counting objects: 279563, done.
remote: Total 279563 (delta 0), reused 0 (delta 0), pack-reused 279563R
Receiving objects: 100% (279563/279563), 122.45 MiB | 3.52 MiB/s, done.
Resolving deltas: 100% (221942/221942), done.
Checking connectivity... done.
error: unable to create file hw/misc/aux.c (No such file or directory)
error: unable to create file include/hw/misc/aux.h (No such file or directory)
Checking out files: 100% (4795/4795), done.
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

(bug https://bugs.launchpad.net/bugs/1595240)

Rename the offending files for the benefit of Windows.

Reported-by: Алексей Курган <akurgan@yandex.ru>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Wei Huang <wei@redhat.com>
Tested-by: KONRAD Frederic <fred.konrad@greensocs.com>
Message-id: 1467377145-32385-1-git-send-email-peter.maydell@linaro.org

aux: fix break that wanted to break two levels out

The last "ret = AUX_I2C_NACK;" is dead, because it is always overridden
by AUX_I2C_ACK. What really the code wants is to jump out of the switch
statement, and a "return" will not cut it because it would omit a debug
printf.

Change the logic so that we can break out of the while loop. For clarity,
hoist the bus->last_* assignments up, right after i2c_start_transfer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

xlnx_dp: fix iffy xlnx_dp_aux_push_tx_fifo

xlnx_dp_aux_push_tx_fifo takes an immediate uint8_t and a buffer length,
which must be 1 because that is how many uint8_t's fit in a uint8_t.
Sure enough, that is what xlnx_dp_write passes to it, but the function
is just weird. Therefore, make xlnx_dp_aux_push_tx_fifo look like
xlnx_dp_aux_push_rx_fifo, taking a pointer to the buffer.

Reported by Coverity.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target-arm/arm-semi.c: In SYS_HEAPINFO use correct type for 'limit'

In commit f5666418c4 most of the SYS_HEAPINFO implementation was
fixed to use target_ulong rather than uint32_t, but the 'limit'
variable was not changed.

Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1467650942-28706-1-git-send-email-peter.maydell@linaro.org

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging

# gpg: Signature made Thu 07 Jul 2016 07:29:44 BST
# gpg:                using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  tap: vhost busy polling support

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

tap: vhost busy polling support

This patch add the capability of basic vhost net busy polling which is
supported by recent kernel. User could configure the maximum number of
us that could be spent on busy polling through a new property of tap
"poll-us".

Cc: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20160706' into staging

misc updates

# gpg: Signature made Wed 06 Jul 2016 17:17:02 BST
# gpg:                using RSA key 0xAD1270CC4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"
# Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B

* remotes/rth/tags/pull-tcg-20160706:
  tcg: Improve the alignment check infrastructure
  tcg: Optimize spills of constants
  tcg: Fix name for high-half register
  build: Use $(CCAS) for compiling .S files

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/spice/tags/pull-spice-20160706-1' into staging

spice and qxl bugfixes.

# gpg: Signature made Wed 06 Jul 2016 10:44:10 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/spice/tags/pull-spice-20160706-1:
  virgl: pass whole GL scanout dimensions
  spice: use the right head for multi-monitor
  virgl: count the calls to gl_block
  spice: avoid .set_mm_time on >= 0.12.6
  qxl: fix surface migration
  qxl: store memory region and offset instead of pointer for guest slots
  qxl: factor out qxl_get_check_slot_offset
  qxl: handle no updates in interface_update_area_complete
  qxl: use uint64_t for vram size

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-07-06' into staging

QAPI patches for 2016-07-06

# gpg: Signature made Wed 06 Jul 2016 10:00:51 BST
# gpg:                using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qapi-2016-07-06:
  replay: Use new QAPI cloning
  sockets: Use new QAPI cloning
  qapi: Add new clone visitor
  qapi: Add new visit_complete() function
  tests: Factor out common code in qapi output tests
  tests: Clean up test-string-output-visitor
  qmp-output-visitor: Favor new visit_free() function
  string-output-visitor: Favor new visit_free() function
  qmp-input-visitor: Favor new visit_free() function
  string-input-visitor: Favor new visit_free() function
  opts-visitor: Favor new visit_free() function
  qapi: Add new visit_free() function
  qapi: Add parameter to visit_end_*
  qemu-img: Don't leak errors when outputting JSON
  qapi: Improve use of qmp/types.h

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/afaerber/tags/qom-devices-for-peter' into staging

QOM infrastructure fixes and device conversions

* Documentation fix

# gpg: Signature made Wed 06 Jul 2016 08:26:49 BST
# gpg:                using RSA key 0xFA2ED12D3E7E013F
# gpg: Good signature from "Andreas Färber <afaerber@suse.de>"
# gpg:                 aka "Andreas Färber <afaerber@suse.com>"
# Primary key fingerprint: 174F 0347 1BCC 221A 6175  6F96 FA2E D12D 3E7E 013F

* remotes/afaerber/tags/qom-devices-for-peter:
  qom: Fix comment typo

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

replay: Use new QAPI cloning

Rather than rolling our own clone via an expensive conversion
in and back out of QObject, use the new clone visitor.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-16-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

sockets: Use new QAPI cloning

Rather than rolling our own clone via an expensive conversion
in and back out of QObject, use the new clone visitor.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-15-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi: Add new clone visitor

We have a couple places in the code base that want to deep-clone
one QAPI object into another, and they were resorting to serializing
the struct out to QObject then reparsing it.  A much more efficient
version can be done by adding a new clone visitor.

Since cloning is still relatively uncommon, expose the use of the
new visitor via a QAPI_CLONE() macro that takes care of type-punning
the underlying function pointer, rather than generating lots of
unused functions for types that won't be cloned.  And yes, we're
relying on the compiler treating all pointers equally, even though
a strict C program cannot portably do so - but we're not the first
one in the qemu code base to expect it to work (hello, glib!).

The choice of adding a fourth visitor type deserves some explanation.
On the surface, the clone visitor is mostly an input visitor (it
takes arbitrary input - in this case, another QAPI object - and
creates a new QAPI object during the course of the visit).  But
ever since commit da72ab0 consolidated enum visits based on the
visitor type, using VISITOR_INPUT would cause us to run
visit_type_str(), even though for cloning there is nothing to do
(we just copy the enum value across, without regards to its mapping
to strings).   Also, since our input happens to be a QAPI object,
we can also satisfy the internal checks for VISITOR_OUTPUT.  So in
the end, I settled with a new VISITOR_CLONE, and chose its value
such that many internal checks can use 'v->type & mask', sticking
to 'v->type == value' where the difference matters.

Note that we can only clone objects (including alternates) and lists,
not built-ins or enums.  The visitor core hides integer width from
the actual visitor (since commit 04e070d), and as long as that's the
case, we can't clone top-level integers.  Then again, those can
always be cloned by direct copy, since they are not objects with
deep pointers, so it's no real loss.  And restricting cloning to
just objects and lists is cleaner than restricting it to non-integers.
As such, I documented that the clone visitor is for direct use only
by code internal to QAPI, and should not be used on incomplete objects
(other than a hack to work around the fact that we allow NULL in place
of "" in visit_type_str() in other output visitors).  Note that as
written, the clone visitor will never fail on a complete object.

Scalars (including enums) not at the root of the clone copy just fine
with no additional effort while visiting the scalar, by virtue of a
g_memdup() each time we push another struct onto the stack.  Cloning
a string requires deduplication of a pointer, which means it can also
provide the guarantee of an input visitor of never producing NULL
even when still accepting NULL in place of "" the way the QMP output
visitor does.

Cloning an 'any' type could be possible by incrementing the QObject
refcnt, but it's not obvious whether that is better than implementing
a QObject deep clone.  So for now, we document it as unsupported,
and intentionally omit the .type_any() callback to let a developer
know their usage needs implementation.

Add testsuite coverage for several different clone situations, to
ensure that the code is working.  I also tested that valgrind was
happy with the test.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-14-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi: Add new visit_complete() function

Making each output visitor provide its own output collection
function was the only remaining reason for exposing visitor
sub-types to the rest of the code base.  Add a polymorphic
visit_complete() function which is a no-op for input visitors,
and which populates an opaque pointer for output visitors.  For
maximum type-safety, also add a parameter to the output visitor
constructors with a type-correct version of the output pointer,
and assert that the two uses match.

This approach was considered superior to either passing the
output parameter only during construction (action at a distance
during visit_free() feels awkward) or only during visit_complete()
(defeating type safety makes it easier to use incorrectly).

Most callers were function-local, and therefore a mechanical
conversion; the testsuite was a bit trickier, but the previous
cleanup patch minimized the churn here.

The visit_complete() function may be called at most once; doing
so lets us use transfer semantics rather than duplication or
ref-count semantics to get the just-built output back to the
caller, even though it means our behavior is not idempotent.

Generated code is simplified as follows for events:

|@@ -26,7 +26,7 @@ void qapi_event_send_acpi_device_ost(ACP
|     QDict *qmp;
|     Error *err = NULL;
|     QMPEventFuncEmit emit;
|-    QmpOutputVisitor *qov;
|+    QObject *obj;
|     Visitor *v;
|     q_obj_ACPI_DEVICE_OST_arg param = {
|         info
|@@ -39,8 +39,7 @@ void qapi_event_send_acpi_device_ost(ACP
|
|     qmp = qmp_event_build_dict("ACPI_DEVICE_OST");
|
|-    qov = qmp_output_visitor_new();
|-    v = qmp_output_get_visitor(qov);
|+    v = qmp_output_visitor_new(&obj);
|
|     visit_start_struct(v, "ACPI_DEVICE_OST", NULL, 0, &err);
|     if (err) {
|@@ -55,7 +54,8 @@ void qapi_event_send_acpi_device_ost(ACP
|         goto out;
|     }
|
|-    qdict_put_obj(qmp, "data", qmp_output_get_qobject(qov));
|+    visit_complete(v, &obj);
|+    qdict_put_obj(qmp, "data", obj);
|     emit(QAPI_EVENT_ACPI_DEVICE_OST, qmp, &err);

and for commands:

| {
|     Error *err = NULL;
|-    QmpOutputVisitor *qov = qmp_output_visitor_new();
|     Visitor *v;
|
|-    v = qmp_output_get_visitor(qov);
|+    v = qmp_output_visitor_new(ret_out);
|     visit_type_AddfdInfo(v, "unused", &ret_in, &err);
|-    if (err) {
|-        goto out;
|+    if (!err) {
|+        visit_complete(v, ret_out);
|     }
|-    *ret_out = qmp_output_get_qobject(qov);
|-
|-out:
|     error_propagate(errp, err);

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-13-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

tests: Factor out common code in qapi output tests

Create a new visitor_get() function to capture common
actions taken in collecting output from an output visitor,
to make it easier to refactor the output visitors in a
later patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-12-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

tests: Clean up test-string-output-visitor

Use &error_abort and error_free_or_abort() in more places, use
the generated qapi_free_intList() instead of open-coding it,
reduce the scope of some variables, avoid code duplication
during test setup with visitor_output_setup_internal(), and
copy the visitor_reset() concept from the qmp-output test to
the string-output test.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-11-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qmp-output-visitor: Favor new visit_free() function

Now that we have a polymorphic visit_free(), we no longer need
qmp_output_visitor_cleanup(); however, we still need to
expose the subtype for qmp_output_get_qobject().

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-10-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

string-output-visitor: Favor new visit_free() function

Now that we have a polymorphic visit_free(), we no longer need
string_output_visitor_cleanup(); however, we still need to
expose the subtype for string_output_get_string().

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-9-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qmp-input-visitor: Favor new visit_free() function

Now that we have a polymorphic visit_free(), we no longer need
qmp_input_visitor_cleanup(); which in turn means we no longer
need to return a subtype from qmp_input_visitor_new() nor a
public upcast function.

Generated code changes to qmp-marshal.c look like:

|@@ -52,11 +52,10 @@ void qmp_marshal_add_fd(QDict *args, QOb
| {
|     Error *err = NULL;
|     AddfdInfo *retval;
|-    QmpInputVisitor *qiv = qmp_input_visitor_new(QOBJECT(args), true);
|     Visitor *v;
|     q_obj_add_fd_arg arg = {0};
|
|-    v = qmp_input_get_visitor(qiv);
|+    v = qmp_input_visitor_new(QOBJECT(args), true);
|     visit_start_struct(v, NULL, NULL, 0, &err);
|     if (err) {
|         goto out;

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-8-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

string-input-visitor: Favor new visit_free() function

Now that we have a polymorphic visit_free(), we no longer need
string_input_visitor_cleanup(); which in turn means we no longer
need to return a subtype from string_input_visitor_new() nor a
public upcast function.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-7-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

opts-visitor: Favor new visit_free() function

Now that we have a polymorphic visit_free(), we no longer need
opts_visitor_cleanup(); which in turn means we no longer need
to return a subtype from opts_visitor_new() nor a public upcast
function.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-6-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi: Add new visit_free() function

Making each visitor provide its own (awkwardly-named) FOO_cleanup()
is unusual, when we can instead have a polymorphic visit_free()
interface.  Over the next few patches, we can use the polymorphic
functions to eliminate the need for a FOO_get_visitor() function
for accessing specific visitor functionality, once everything can
be accessed directly through the Visitor* interfaces.

The dealloc visitor is the first one converted to completely use
the new entry point, since qapi_dealloc_visitor_cleanup() was the
only reason that qapi_dealloc_get_visitor() existed, and only
generated and testsuite code was even using it.  With the new
visit_free() entry point in place, we no longer need to expose
the QapiDeallocVisitor subtype through qapi_dealloc_visitor_new(),
and can get by with less generated code, with diffs that look like:

| void qapi_free_ACPIOSTInfo(ACPIOSTInfo *obj)
| {
|-    QapiDeallocVisitor *qdv;
|     Visitor *v;
|
|     if (!obj) {
|         return;
|     }
|
|-    qdv = qapi_dealloc_visitor_new();
|-    v = qapi_dealloc_get_visitor(qdv);
|+    v = qapi_dealloc_visitor_new();
|     visit_type_ACPIOSTInfo(v, NULL, &obj, NULL);
|-    qapi_dealloc_visitor_cleanup(qdv);
|+    visit_free(v);
|}

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-5-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi: Add parameter to visit_end_*

Rather than making the dealloc visitor track of stack of pointers
remembered during visit_start_* in order to free them during
visit_end_*, it's a lot easier to just make all callers pass the
same pointer to visit_end_*.  The generated code has access to the
same pointer, while all other users are doing virtual walks and
can pass NULL.  The dealloc visitor is then greatly simplified.

All three visit_end_*() functions intentionally take a void**,
even though the visit_start_*() functions differ between void**,
GenericList**, and GenericAlternate**.  This is done for several
reasons: when doing a virtual walk, passing NULL doesn't care
what the type is, but when doing a generated walk, we already
have to cast the caller's specific FOO* to call visit_start,
while using void** lets us use visit_end without a cast. Also,
an upcoming patch will add a clone visitor that wants to use
the same implementation for all three visit_end callbacks,
which is made easier if all three share the same signature.

For visitors with already track per-object state (the QMP visitors
via a stack, and the string visitors which do not allow nesting),
add an assertion that the caller is indeed passing the same
pointer to paired calls.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-4-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qemu-img: Don't leak errors when outputting JSON

If our JSON output ever encounters an error, we would just silently
leak the error object. Instead, assert that our usage won't fail.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-3-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

qapi: Improve use of qmp/types.h

'qjson.h' is not a QObject subtype; include this file directly in
.c files that are using it, rather than abusing qmp/types.h for
that purpose.

Meanwhile, for files that include a list of individual QObject
subtypes, it's easier to just use qmp/types.h for that purpose.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-2-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

virgl: pass whole GL scanout dimensions

Spice client needs the whole GL texture dimension to be able to show a
scanout with a monitor offset (different than +0+0).

Furthermore, this fixes a crash when calling surface_{width,height}()
after dpy_gfx_replace_surface(con, NULL) was called in
virgl_cmd_set_scanout()

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 1465911849-30423-4-git-send-email-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

spice: use the right head for multi-monitor

Look up the associated head monitor config.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 1465911849-30423-3-git-send-email-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

virgl: count the calls to gl_block

In virgl_cmd_resource_flush(), when several consoles are updated, it
needs to keep blocking until all spice gl draws are done. This fixes an
assert() in spice when using multiple monitors with virgl.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 1465911849-30423-2-git-send-email-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

spice: avoid .set_mm_time on >= 0.12.6

Spice deprecated this callback in 0.12.6.
It's not a problem yet, but it will cause Clang to fail in a -Werror
build due to the deprecated tag.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1467240095-12507-2-git-send-email-jsnow@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

qxl: fix surface migration

Create a helper function qxl_dirty_one_surface() to mark a single qxl
surface as dirty.  Use the new qxl_get_check_slot_offset function and
lookup the memory region from the slot instead of assuming the surface
is stored in vram.

Use the new helper function in qxl_dirty_surfaces, for both primary and
off-screen surfaces.  For off-screen surfaces this is no functional
change.  For primary surfaces this will dirty only the memory actually
used instead of the whole surface0 region.  It will also work correctly
in case the guest places the primary surface in vram instead of the
surface0 region (linux kms driver does that).

https://bugzilla.redhat.com/show_bug.cgi?id=1235732

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1466597244-5938-3-git-send-email-kraxel@redhat.com

qxl: store memory region and offset instead of pointer for guest slots

Store MemoryRegion and offset instead of a pointer for each qxl memory
slot, so we can easily figure in which memory region an qxl object
stored.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1466597244-5938-2-git-send-email-kraxel@redhat.com

qxl: factor out qxl_get_check_slot_offset

New helper function which translates a qxl physical address into
memory slot and offset. Also applies sanity checks. Factored out
from qxl_phys2virt. No functional change.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1466597244-5938-1-git-send-email-kraxel@redhat.com

qxl: handle no updates in interface_update_area_complete

Simply return early in case there are no updated rects.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1465395101-13580-1-git-send-email-kraxel@redhat.com

qxl: use uint64_t for vram size

This allows for the 64bit vram bar to become larger than 2G
(try -device qxl-vga,vram64_size_mb=8192).

https://bugzilla.redhat.com/show_bug.cgi?id=1340439

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1465389648-5179-1-git-send-email-kraxel@redhat.com

qom: Fix comment typo

It's qom_unref, not qdef_unref.

Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>

tcg: Improve the alignment check infrastructure

Some architectures (e.g. ARMv8) need the address which is aligned
to a size more than the size of the memory access.
To support such check it's enough the current costless alignment
check implementation in QEMU, but we need to support
an alignment size specifying.

Signed-off-by: Sergey Sorokin <afarallax@yandex.ru>
Message-Id: <1466705806-679898-1-git-send-email-afarallax@yandex.ru>
Signed-off-by: Richard Henderson <rth@twiddle.net>
[rth: Assert in tcg_canonicalize_memop. Leave get_alignment_bits
available for, though unused by, user-mode. Retain logging difference
based on ALIGNED_ONLY.]

tcg: Optimize spills of constants

While we can store constants via constrants on INDEX_op_st_i32 et al,
we weren't able to spill constants to backing store.

Add a new backend interface, tcg_out_sti, which may store the constant
(and is allowed to fail). Rearrange the temp_* helpers so that we only
attempt to directly store a constant when the temp is becoming dead/free.

Signed-off-by: Richard Henderson <rth@twiddle.net>

tcg: Fix name for high-half register

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <rth@twiddle.net>

build: Use $(CCAS) for compiling .S files

We fail to pass to $(AS) all of the different flags that may be required
for a given set of CFLAGS. Rather than figuring out the host-specific
mapping, it's better to allow the compiler driver to do that.

However, simply using $(CC) runs afoul of clang trying to build the
option roms. C.f. 3dd46c78525a30e98c68, wherein we changed from
using $(CC) to using $(AS) in the first place.

Work around this by passing -fno-integrated-as to clang, so that we use
the external assembler, and the clang driver still passes along all of
the options that the assembler might require.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1466703558-7723-1-git-send-email-rth@twiddle.net>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Tue 05 Jul 2016 16:46:14 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (43 commits)
  block/qcow2: Don't use cpu_to_*w()
  block: Convert bdrv_co_preadv/pwritev to BdrvChild
  block: Convert bdrv_prwv_co() to BdrvChild
  block: Convert bdrv_pwrite_zeroes() to BdrvChild
  block: Convert bdrv_pwrite(v/_sync) to BdrvChild
  block: Convert bdrv_pread(v) to BdrvChild
  block: Convert bdrv_write() to BdrvChild
  block: Convert bdrv_read() to BdrvChild
  block: Use BlockBackend for I/O in bdrv_commit()
  block: Move bdrv_commit() to block/commit.c
  block: Convert bdrv_co_do_readv/writev to BdrvChild
  block: Convert bdrv_aio_writev() to BdrvChild
  block: Convert bdrv_aio_readv() to BdrvChild
  block: Convert bdrv_co_writev() to BdrvChild
  block: Convert bdrv_co_readv() to BdrvChild
  vhdx: Some more BlockBackend use in vhdx_create()
  blkreplay: Convert to byte-based I/O
  vvfat: Use BdrvChild for s->qcow
  block/qdev: Fix NULL access when using BB twice
  block: fix return code for partial write for Linux AIO
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc, pci, virtio: new features, cleanups, fixes

iommus can not be added with -device.
cleanups and fixes all over the place

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Tue 05 Jul 2016 11:18:32 BST
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (30 commits)
  vmw_pvscsi: remove unnecessary internal msi state flag
  e1000e: remove unnecessary internal msi state flag
  vmxnet3: remove unnecessary internal msi state flag
  mptsas: remove unnecessary internal msi state flag
  megasas: remove unnecessary megasas_use_msi()
  pci: Convert msi_init() to Error and fix callers to check it
  pci bridge dev: change msi property type
  megasas: change msi/msix property type
  mptsas: change msi property type
  intel-hda: change msi property type
  usb xhci: change msi/msix property type
  change pvscsi_init_msi() type to void
  tests: add APIC.cphp and DSDT.cphp blobs
  tests: acpi: add CPU hotplug testcase
  log: Permit -dfilter 0..0xffffffffffffffff
  range: Replace internal representation of Range
  range: Eliminate direct Range member access
  log: Clean up misuse of Range for -dfilter
  pci_register_bar: cleanup
  Revert "virtio-net: unbreak self announcement and guest offloads after migration"
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-07-05-v2' into queue-block

A block patch for the block queue

# gpg: Signature made Tue Jul  5 16:54:22 2016 CEST
# gpg:                using RSA key 0x3BB14202E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40
#      Subkey fingerprint: 58B3 81CE 2DC8 9CF9 9730  EE64 3BB1 4202 E838 ACAD

* mreitz/tags/pull-block-for-kevin-2016-07-05-v2:
  block/qcow2: Don't use cpu_to_*w()

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/qcow2: Don't use cpu_to_*w()

Don't use the cpu_to_*w() functions, which we are trying to deprecate.
Instead either just use cpu_to_*() to do the byteswap, or use
st*_be_p() if we need to do the store somewhere other than to a
variable that's already the correct type.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1466093177-17890-1-git-send-email-peter.maydell@linaro.org
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>

block: Convert bdrv_co_preadv/pwritev to BdrvChild

This is the final patch for converting the common I/O path to take
a BdrvChild parameter instead of BlockDriverState.

The completion of this conversion means that all users that perform I/O
on an image need to actually hold a reference (in the form of BdrvChild,
possible as part of a BlockBackend) to that image. This also protects
against inconsistent use of BlockBackend vs. BlockDriverState functions
because direct use of a BlockDriverState isn't possible any more and
blk->root is private for block-backends.c.

In addition, we can now distinguish different users in the I/O path,
and the future op blockers work is going to add assertions based on
permissions stored in BdrvChild.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_prwv_co() to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_pwrite_zeroes() to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_pwrite(v/_sync) to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_pread(v) to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_write() to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_read() to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Use BlockBackend for I/O in bdrv_commit()

Just like block jobs, the HMP commit command should use its own
BlockBackend for doing I/O on BlockDriverStates.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Move bdrv_commit() to block/commit.c

No code changes, just moved from one file to another.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_co_do_readv/writev to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_aio_writev() to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_aio_readv() to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_co_writev() to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_co_readv() to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

vhdx: Some more BlockBackend use in vhdx_create()

This does some easy conversions from bdrv_* to blk_* functions in
vhdx_create(). We should avoid bypassing the BlockBackend layer whenever
possible.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

blkreplay: Convert to byte-based I/O

The blkreplay driver only forwards the requests it gets, so converting
it to byte granularity is trivial.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

vvfat: Use BdrvChild for s->qcow

vvfat uses a temporary qcow file to cache written data in read-write
mode. In order to do things properly, this should show up in the BDS
graph and I/O should go through BdrvChild like for every other node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block/qdev: Fix NULL access when using BB twice

BlockBackend has only a single pointer to its guest device, so it makes
sure that only a single guest device is attached to it. device-add
returns an error if you try to attach a second device to a BB. In order
to make the error message nicer, -device that manually connects to a
if=none block device get a different message than -drive that implicitly
creates a guest device. The if=... option is stored in DriveInfo.

However, since blockdev-add exists, not every BlockBackend has a
DriveInfo any more. Check that it exists before we dereference it.

QMP reproducer resulting in a segfault:

{"execute":"blockdev-add","arguments":{"options":{"id":"disk","driver":"file","filename":"/tmp/test.img"}}}
{"execute":"device_add","arguments":{"driver":"virtio-blk-pci","drive":"disk"}}
{"execute":"device_add","arguments":{"driver":"virtio-blk-pci","drive":"disk"}}

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

block: fix return code for partial write for Linux AIO

Partial write most likely means that there is not space rather than
"something wrong happens". Thus it would be more natural to return
ENOSPC rather than EINVAL.

The problem actually happens with NBD server, which has reported EINVAL
rather then ENOSPC on the first error using its protocol, which makes
report to the user wrong.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Pavel Borzenkov <pborzenkov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Use bool as appropriate for BDS members

Using int for values that are only used as booleans is confusing.
While at it, rearrange a couple of members so that all the bools
are contiguous.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Fix error message style

error_setg() is not supposed to be used for multi-sentence
messages; tweak the message to append a hint instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Move request_alignment into BlockLimit

It makes more sense to have ALL block size limit constraints
in the same struct. Improve the documentation while at it.

Simplify a couple of conditionals, now that we have audited and
documented that request_alignment is always non-zero.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Split bdrv_merge_limits() from bdrv_refresh_limits()

During bdrv_merge_limits(), we were computing initial limits
based on another BDS in two places. At first glance, the two
computations are not identical (one is doing straight copying,
the other is doing merging towards or away from zero) - but
when you realize that the first round is starting with all-0
memory, all of the merging happens to work. Factoring out the
merging makes it easier to track how two BDS limits are merged,
in case we have future reasons to merge in even more limits.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Drop raw_refresh_limits()

The raw block driver was blindly copying all limits from bs->file,
even though: 1. the main bdrv_refresh_limits() already does this
for many of the limits, and 2. blindly copying from the children
can weaken any stricter limits that were already inherited from
the backing chain during the main bdrv_refresh_limits(). Also,
a future patch is about to move .request_alignment into
BlockLimits, and that is a limit that should NOT be copied from
other layers in the BDS chain.

Thus, we can completely drop raw_refresh_limits(), and rely on
the block layer setting up the proper limits.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Switch discard length bounds to byte-based

Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_discard and
discard_alignment.  Rename them, using 'pdiscard' as an aid to
track which remaining discard interfaces need conversion, and so
that the compiler will help us catch the change in semantics
across any rebased code.  The BlockLimits type is now completely
byte-based; and in iscsi.c, sector_limits_lun2qemu() is no
longer needed.

pdiscard_alignment is made unsigned (we use power-of-2 alignments
as bitmasks, where unsigned is easier to think about) while
leaving max_pdiscard signed (since we still have an 'int'
interface); this is comparable to what commit cf081fc did for
write zeroes limits.  We may later want to make everything an
unsigned 64-bit limit - but that requires a bigger code audit.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Wording tweaks to write zeroes limits

Improve the documentation of the write zeroes limits, to mention
additional constraints that drivers should observe. Worth squashing
into commit cf081fca, if that hadn't been pushed already :)

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Switch transfer length bounds to byte-based

Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_transfer_length
and opt_transfer_length. Rename them (dropping the _length suffix)
so that the compiler will help us catch the change in semantics
across any rebased code, and improve the documentation. Use unsigned
values, so that we don't have to worry about negative values and
so that bit-twiddling is easier; however, we are still constrained
by 2^31 of signed int in most APIs.

When a value comes from an external source (iscsi and raw-posix),
sanitize the results to ensure that opt_transfer is a power of 2.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Set default request_alignment during bdrv_refresh_limits()

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Now that all drivers have been updated to supply an override
of request_alignment during their .bdrv_refresh_limits(), as
needed, the block layer itself can defer setting the default
alignment until part of the overall bdrv_refresh_limits().

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Set request_alignment during .bdrv_refresh_limits()

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Add a .bdrv_refresh_limits() to all four of our legacy devices
that will always be sector-only (bochs, cloop, dmg, vvfat), in
spite of their recent conversion to expose a byte interface.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

raw-win32: Set request_alignment during .bdrv_refresh_limits()

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

In this case, raw_probe_alignment() already did what we needed,
so just fix its signature and wire it in correctly.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Set request_alignment during .bdrv_refresh_limits()

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iscsi: Set request_alignment during .bdrv_refresh_limits()

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blkdebug: Set request_alignment during .bdrv_refresh_limits()

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Note that when the user does not provide "align", then we were
defaulting to bs->request_alignment - but at this stage in the
initialization, that was always 512. We were also rejecting an
explicit "align":0 from the user; this patch now allows that,
as an explicit request for the default alignment (which may not
always be 512 in the future).

qemu-iotests 77 is particularly sensitive to the fact that we
can specify an artificial alignment override in blkdebug, and
that override must continue to work even when limits are
refreshed on an already open device.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Give nonzero result to blk_get_max_transfer_length()

Making all callers special-case 0 as unlimited is awkward,
and we DO have a hard maximum of BDRV_REQUEST_MAX_SECTORS given
our current block layer API limits.

In the case of scsi, this means that we now always advertise a
limit to the guest, even in cases where the underlying layers
previously use 0 for no inherent limit beyond the block layer.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

scsi: Advertise limits by blocksize, not 512

s->blocksize may be larger than 512, in which case our
tweaks to max_xfer_len and opt_xfer_len must be scaled
appropriately.

CC: qemu-stable@nongnu.org
Reported-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iscsi: Advertise realistic limits to block layer

The function sector_limits_lun2qemu() returns a value in units of
the block layer's 512-byte sector, and can be as large as
0x40000000, which is much larger than the block layer's inherent
limit of BDRV_REQUEST_MAX_SECTORS. The block layer already
handles '0' as a synonym to the inherent limit, and it is nicer
to return this value than it is to calculate an arbitrary
maximum, for two reasons: we want to ensure that the block layer
continues to special-case '0' as 'no limit beyond the inherent
limits'; and we want to be able to someday expand the block
layer to allow 64-bit limits, where auditing for uses of
BDRV_REQUEST_MAX_SECTORS will help us make sure we aren't
artificially constraining iscsi to old block layer limits.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

nbd: Advertise realistic limits to block layer

We were basing the advertisement of maximum discard and transfer
length off of UINT32_MAX, but since the rest of the block layer
has signed int limits on a transaction, nothing could ever reach
that maximum, and we risk overflowing an int once things are
converted to byte-based rather than sector-based limits. What's
more, we DO have a much smaller limit: both the current kernel
and qemu-nbd have a hard limit of 32M on a read or write
transaction, and while they may also permit up to a full 32 bits
on a discard transaction, the upstream NBD protocol is proposing
wording that without any explicit advertisement otherwise,
clients should limit ALL requests to the same limits as read and
write, even though the other requests do not actually require as
many bytes across the wire. So the better limit to tell the
block layer is 32M for both values.

Behavior doesn't actually change with this patch (the block layer
is currently ignoring the max_transfer advertisements); but when
that problem is fixed in a later series, this patch will prevent
the exposure of a latent bug.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

nbd: Allow larger requests

The NBD layer was breaking up request at a limit of 2040 sectors
(just under 1M) to cater to old qemu-nbd. But the server limit
was raised to 32M in commit 2d8214885 to match the kernel, more
than three years ago; and the upstream NBD Protocol is proposing
documentation that without any explicit communication to state
otherwise, a client should be able to safely assume that a 32M
transaction will work. It is time to rely on the larger sizing,
and any downstream distro that cares about maximum
interoperability to older qemu-nbd servers can just tweak the
value of #define NBD_MAX_SECTORS.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Fix harmless off-by-one in bdrv_aligned_preadv()

If the amount of data to read ends exactly on the total size
of the bs, then we were wasting time creating a local qiov
to read the data in preparation for what would normally be
appending zeroes beyond the end, even though this corner case
has nothing further to do.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Document supported flags during bdrv_aligned_preadv()

We don't pass any flags on to drivers to handle. Tighten an
assert to explain why we pass 0 to bdrv_driver_preadv(), and add
some comments on things to be aware of if we want to turn on
per-BDS BDRV_REQ_FUA support during reads in the future. Also,
document that we may want to consider using unmap during
copy-on-read operations where the read is all zeroes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Tighter assertions on bdrv_aligned_pwritev()

For symmetry with bdrv_aligned_preadv(), assert that the caller
really has aligned things properly. This requires adding an align
parameter, which is used now only in the new asserts, but will
come in handy in a later patch that adds auto-fragmentation to the
max transfer size, since that value need not always be a multiple
of the alignment, and therefore must be rounded down.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-img: fix failed autotests

There are 9 iotests failed on Ubuntu 15.10 at the moment.
The problem is that options parsing in qemu-img is broken by the
following commit:
    commit 10985131e337a0c52c5bd1e191fd7867a6ff8d02
    Author: Denis V. Lunev <den@openvz.org>
    Date:   Fri Jun 17 17:44:13 2016 +0300
    qemu-img: move common options parsing before commands processing

This strange command line reports error
  ./qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- 1024
  qemu-img: Invalid image size specified!
while original code parses it successfully.

The problem is that getopt_long state should be reset. This could be done
using this assignment according to the manual:
    optind = 0

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Eric Blake <eblake@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-ipxe-20160704-1' into staging

ipxe: update submodule from 4e03af8ec to 041863191
e1000e+vmxnet3: add boot rom

# gpg: Signature made Mon 04 Jul 2016 07:25:46 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-ipxe-20160704-1:
  build: add pc-bios to config-host.mak deps
  ipxe: add new roms to BLOBS
  ipxe: update prebuilt binaries
  vmxnet3: add boot rom
  e1000e: add boot rom
  ipxe: add vmxnet3 rom
  ipxe: add e1000e rom
  ipxe: update submodule from 4e03af8ec to 041863191

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

vmw_pvscsi: remove unnecessary internal msi state flag

Internal flag msi_used is uncesessary, msi_uninit() could be called
directly, msi_enabled() is enough to check device msi state.

But for migration compatibility, keep the field in structure.

cc: Paolo Bonzini <pbonzini@redhat.com>
cc: Dmitry Fleytman <dmitry@daynix.com>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>
cc: Michael S. Tsirkin <mst@redhat.com>

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

e1000e: remove unnecessary internal msi state flag

Internal big flag E1000E_USE_MSI is unnecessary, also is the helper
function: e1000e_init_msi(), e1000e_cleanup_msi(), so, remove them all.

cc: Dmitry Fleytman <dmitry@daynix.com>
cc: Jason Wang <jasowang@redhat.com>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>
cc: Michael S. Tsirkin <mst@redhat.com>

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

vmxnet3: remove unnecessary internal msi state flag

Internal flag msi_used is unnecessary, it has the same effect as msi_enabled().
msi_uninit() could be called directly without risk.

cc: Paolo Bonzini <pbonzini@redhat.com>
cc: Dmitry Fleytman <dmitry@daynix.com>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>
cc: Michael S. Tsirkin <mst@redhat.com>

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

mptsas: remove unnecessary internal msi state flag

internal flag msi_in_use in unnecessary, msi_uninit() could be called
directly, and msi_enabled() is enough to check device msi state.

cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>
cc: Paolo Bonzini <pbonzini@redhat.com>
cc: Michael S. Tsirkin <mst@redhat.com>

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

megasas: remove unnecessary megasas_use_msi()

megasas overwrites user configuration when msi_init fail to flag internal msi
state, which is unsuitable. megasa_use_msi() is unnecessary, we can call
msi_uninit() directly when unrealize, even no need to call msi_enabled() first.

cc: Hannes Reinecke <hare@suse.de>
cc: Paolo Bonzini <pbonzini@redhat.com>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>
cc: Michael S. Tsirkin <mst@redhat.com>

Acked-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

pci: Convert msi_init() to Error and fix callers to check it

msi_init() reports errors with error_report(), which is wrong
when it's used in realize().

Fix by converting it to Error.

Fix its callers to handle failure instead of ignoring it.

For those callers who don't handle the failure, it might happen:
when user want msi on, but he doesn't get what he want because of
msi_init fails silently.

cc: Gerd Hoffmann <kraxel@redhat.com>
cc: John Snow <jsnow@redhat.com>
cc: Dmitry Fleytman <dmitry@daynix.com>
cc: Jason Wang <jasowang@redhat.com>
cc: Michael S. Tsirkin <mst@redhat.com>
cc: Hannes Reinecke <hare@suse.de>
cc: Paolo Bonzini <pbonzini@redhat.com>
cc: Alex Williamson <alex.williamson@redhat.com>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>

pci bridge dev: change msi property type

>From bit to enum OnOffAuto.

cc: Michael S. Tsirkin <mst@redhat.com>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

megasas: change msi/msix property type

>From bit to enum OnOffAuto.

cc: Hannes Reinecke <hare@suse.de>
cc: Paolo Bonzini <pbonzini@redhat.com>
cc: Michael S. Tsirkin <mst@redhat.com>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>