sdk/emulator/qemu.git
9 years agomigration: Add migration events on target side
Juan Quintela [Wed, 20 May 2015 15:15:42 +0000 (17:15 +0200)]
migration: Add migration events on target side

We reuse the migration events from the source side, sending them on the
appropiate place.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
9 years agomigration: Make events a capability
Juan Quintela [Tue, 7 Jul 2015 12:44:05 +0000 (14:44 +0200)]
migration: Make events a capability

Make check fails with events.  THis is due to the parser/lexer that it
uses.  Just in case that they are more broken parsers, just only send
events when there are capabilities.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
9 years agomigration: create migration event
Juan Quintela [Wed, 20 May 2015 10:16:15 +0000 (12:16 +0200)]
migration: create migration event

We have one argument that tells us what event has happened.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
9 years agomigration: No need to call trace_migrate_set_state()
Juan Quintela [Tue, 16 Jun 2015 23:38:25 +0000 (01:38 +0200)]
migration: No need to call trace_migrate_set_state()

We now use the helper everywhere, so no need to call this on this two
places.  See on previous commit that there were a place where we missed
to mark the trace.  Now all tracing is done in migrate_set_state().

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
9 years agomigration: Use always helper to set state
Juan Quintela [Tue, 16 Jun 2015 23:36:40 +0000 (01:36 +0200)]
migration: Use always helper to set state

There were three places that were not using the migrate_set_state()
helper, just fix that.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
9 years agomigration: ensure we start in NONE state
Juan Quintela [Wed, 1 Jul 2015 07:32:29 +0000 (09:32 +0200)]
migration: ensure we start in NONE state

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
9 years agomigration: Use cmpxchg correctly
Juan Quintela [Wed, 17 Jun 2015 00:06:20 +0000 (02:06 +0200)]
migration: Use cmpxchg correctly

cmpxchg returns the old value

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
9 years agomigration: Add configuration section
Juan Quintela [Wed, 13 May 2015 16:17:43 +0000 (18:17 +0200)]
migration: Add configuration section

It needs to be the first one and it is not optional, that is the reason
why it is opencoded.  For new machine types, it is required that machine
type name is the same in both sides.

It is just done right now for pc's.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
9 years agovmstate: Create optional sections
Juan Quintela [Wed, 15 Oct 2014 07:39:14 +0000 (09:39 +0200)]
vmstate: Create optional sections

To make sections optional, we need to do it at the beggining of the code.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
9 years agoglobal_state: Make section optional
Juan Quintela [Wed, 8 Oct 2014 11:58:24 +0000 (13:58 +0200)]
global_state: Make section optional

This section would be sent:

a- for all new machine types
b- for old machine types if section state is different form {running,paused}
   that were the only giving us troubles.

So, in new qemus: it is alwasy there.  In old qemus: they are only
there if it an error has happened, basically stoping on target.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
9 years agomigration: create new section to store global state
Juan Quintela [Wed, 8 Oct 2014 08:58:10 +0000 (10:58 +0200)]
migration: create new section to store global state

This includes a new section that for now just stores the current qemu state.

Right now, there are only one way to control what is the state of the
target after migration.

- If you run the target qemu with -S, it would start stopped.
- If you run the target qemu without -S, it would run just after migration finishes.

The problem here is what happens if we start the target without -S and
there happens one error during migration that puts current state as
-EIO.  Migration would ends (notice that the error happend doing block
IO, network IO, i.e. nothing related with migration), and when
migration finish, we would just "continue" running on destination,
probably hanging the guest/corruption data, whatever.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
9 years agorunstate: migration allows more transitions now
Juan Quintela [Wed, 8 Oct 2014 10:47:08 +0000 (12:47 +0200)]
runstate: migration allows more transitions now

Next commit would allow to move from incoming migration to error happening on source.

Should we add more states to this transition?  Luiz?

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
9 years agorunstate: Add runstate store
Juan Quintela [Wed, 8 Oct 2014 09:53:22 +0000 (11:53 +0200)]
runstate: Add runstate store

This allows us to store the current state to send it through migration.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
9 years agoFix older machine type compatibility on power with section footers
Dr. David Alan Gilbert [Fri, 12 Jun 2015 17:37:52 +0000 (18:37 +0100)]
Fix older machine type compatibility on power with section footers

I forgot to add compatibility for Power when adding section footers.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Fixes: 37fb569c0198cba58e3e
Signed-off-by: Juan Quintela <quintela@redhat.com>
9 years agoFail more cleanly in mismatched RAM cases
Dr. David Alan Gilbert [Thu, 11 Jun 2015 17:17:28 +0000 (18:17 +0100)]
Fail more cleanly in mismatched RAM cases

If the number of RAMBlocks was different on the source from the
destination, QEMU would hang waiting for a disconnect on the source
and wouldn't release from that hang until the destination was manually
killed.

Mark the stream as being in error, this causes the destination to die
and the source to carry on.

(It still gets a whole bunch of warnings on the destination, and I've
not managed to complete another migration after the 1st one, still
progress).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
9 years agoSanity check RDMA remote data
Dr. David Alan Gilbert [Thu, 11 Jun 2015 17:17:27 +0000 (18:17 +0100)]
Sanity check RDMA remote data

Perform some basic (but probably not complete) sanity checking on
requests from the RDMA source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
9 years agoSort destination RAMBlocks to be the same as the source
Dr. David Alan Gilbert [Thu, 11 Jun 2015 17:17:26 +0000 (18:17 +0100)]
Sort destination RAMBlocks to be the same as the source

Use the order of incoming RAMBlocks from the source to record
an index number; that then allows us to sort the destination
local RAMBlock list to match the source.

Now that the RAMBlocks are known to be in the same order, this
simplifies the RDMA Registration step which previously tried to
match RAMBlocks based on offset (which isn't guaranteed to match).

Looking at the existing compress code, I think it was erroneously
relying on an assumption of matching ordering, which this fixes.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
9 years agoRework ram block hash
Dr. David Alan Gilbert [Thu, 11 Jun 2015 17:17:25 +0000 (18:17 +0100)]
Rework ram block hash

RDMA uses a hash from block offset->RAM Block; this isn't needed
on the destination, and it becomes harder to maintain after the next
patch in the series that sorts the block list.

Split the hash so that it's only generated on the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
9 years agoAllow rdma_delete_block to work without the hash
Dr. David Alan Gilbert [Thu, 11 Jun 2015 17:17:24 +0000 (18:17 +0100)]
Allow rdma_delete_block to work without the hash

In the next patch we remove the hash on the destination,
rdma_delete_block does two things with the hash which can be avoided:
  a) The caller passes the offset and rdma_delete_block looks it up
     in the hash; fixed by getting the caller to pass the block
  b) The hash gets recreated after deletion; fixed by making that
     conditional on the hash being initialised.

While this function is currently only used during cleanup, Michael
asked that we keep it general for future dynamic block registration
work.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
9 years agoRework ram_control_load_hook to hook during block load
Dr. David Alan Gilbert [Thu, 11 Jun 2015 17:17:23 +0000 (18:17 +0100)]
Rework ram_control_load_hook to hook during block load

We need the names of RAMBlocks as they're loaded for RDMA,
reuse a slightly modified ram_control_load_hook:
  a) Pass a 'data' parameter to use for the name in the block-reg
     case
  b) Only some hook types now require the presence of a hook function.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
9 years agoTranslate offsets to destination address space
Dr. David Alan Gilbert [Thu, 11 Jun 2015 17:17:22 +0000 (18:17 +0100)]
Translate offsets to destination address space

The 'offset' field in RDMACompress and 'current_addr' field
in RDMARegister are commented as being offsets within a particular
RAMBlock, however they appear to actually be offsets within the
ram_addr_t space.

The code currently assumes that the offsets on the source/destination
match, this change removes the need for the assumption for these
structures by translating the addresses into the ram_addr_t space of
the destination host.

Note: An alternative would be to change the fields to actually
take the data they're commented for; this would potentially be
simpler but would break stream compatibility for those cases
that currently work.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
9 years agoStore block name in local blocks structure
Dr. David Alan Gilbert [Thu, 11 Jun 2015 17:17:21 +0000 (18:17 +0100)]
Store block name in local blocks structure

In a later patch the block name will be used to match up two views
of the block list.  Keep a copy of the block name with the local block
list.

(At some point it could be argued that it would be best just to let
migration see the innards of RAMBlock and avoid the need to use
foreach).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
9 years agordma typos
Dr. David Alan Gilbert [Thu, 11 Jun 2015 17:17:20 +0000 (18:17 +0100)]
rdma typos

A couple of typo fixes.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
9 years agoOnly try and read a VMDescription if it should be there
Dr. David Alan Gilbert [Tue, 23 Jun 2015 16:34:35 +0000 (17:34 +0100)]
Only try and read a VMDescription if it should be there

The VMDescription section maybe after the EOF mark, the current code
does a 'qemu_get_byte' and either gets the header byte identifying the
description or an error (which it ignores).  Doing the 'get' upsets
RDMA which hangs on old machine types without the VMDescription.

Just avoid reading the VMDescription if we wouldn't send it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
9 years agordma: fix memory leak
Gonglei [Tue, 23 Jun 2015 07:56:38 +0000 (15:56 +0800)]
rdma: fix memory leak

Variable "r" going out of scope leaks the storage
it points to in line 3268.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
9 years agoMerge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20150706.0' into...
Peter Maydell [Tue, 7 Jul 2015 08:22:40 +0000 (09:22 +0100)]
Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20150706.0' into staging

VFIO updates for 2.4-rc0
- "real" host page size API (Peter Crosthwaite)
- platform device irqfd support (Eric Auger)
- spapr container disconnect fix (Alexey Kardashevskiy)
- quirk for broken Chelsio hardware (Gabriel Laupre)
- coverity fix (Paolo Bonzini)

# gpg: Signature made Mon Jul  6 19:23:49 2015 BST using RSA key ID 3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
# gpg:                 aka "Alex Williamson <alex@shazbot.org>"
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"

* remotes/awilliam/tags/vfio-update-20150706.0:
  vfio/pci : Add pba_offset PCI quirk for Chelsio T5 devices
  vfio: Unregister IOMMU notifiers when container is destroyed
  hw/vfio/platform: add irqfd support
  kvm: some fixes to kvm_resamplefds_allowed
  sysbus: add irq_routing_notifier
  intc: arm_gic_kvm: set the qemu_irq/gsi mapping
  kvm-all.c: add qemu_irq/gsi hash table and utility routines
  kvm: rename kvm_irqchip_[add,remove]_irqfd_notifier with gsi suffix
  vfio: cpu: Use "real" page size API
  cpu-all: complete "real" host page size API
  vfio: fix return type of pread

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Conflicts:
kvm-all.c

9 years agoMerge remote-tracking branch 'remotes/bonzini/tags/for-upstream-smm' into staging
Peter Maydell [Mon, 6 Jul 2015 22:37:53 +0000 (23:37 +0100)]
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream-smm' into staging

This series implements KVM support for SMM, and lets you enable/disable
it through the "smm" property of x86 machine types.

# gpg: Signature made Mon Jul  6 17:41:05 2015 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream-smm:
  pc: add SMM property
  ich9: add smm_enabled field and arguments
  pc_piix: rename kvm_enabled to smm_enabled
  target-i386: register a separate KVM address space including SMRAM regions
  kvm-all: kvm_irqchip_create is not expected to fail
  kvm-all: add support for multiple address spaces
  kvm-all: make KVM's memory listener more generic
  kvm-all: move internal types to kvm_int.h
  kvm-all: remove useless typedef
  kvm-all: put kvm_mem_flags to more work
  target-i386: add support for SMBASE MSR and SMIs
  piix4/ich9: do not raise SMI on ACPI enable/disable commands
  linux-headers: Update to 4.2-rc1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9 years agovfio/pci : Add pba_offset PCI quirk for Chelsio T5 devices
Gabriel Laupre [Mon, 6 Jul 2015 18:15:15 +0000 (12:15 -0600)]
vfio/pci : Add pba_offset PCI quirk for Chelsio T5 devices

Fix pba_offset initialization value for Chelsio T5 Virtual Function
device. The T5 hardware has a bug in it where it reports a Pending Interrupt
Bit Array Offset of 0x8000 for its SR-IOV Virtual Functions instead
of the 0x1000 that the hardware actually uses internally. As the hardware
doesn't return the correct pba_offset value, add a quirk to instead
return a hardcoded value of 0x1000 when a Chelsio T5 VF device is
detected.

This bug has been fixed in the Chelsio's next chip series T6 but there are
no plans to respin the T5 ASIC for this bug. It is just documented in the
T5 Errata and left it at that.

Signed-off-by: Gabriel Laupre <glaupre@chelsio.com>
Reviewed-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
9 years agovfio: Unregister IOMMU notifiers when container is destroyed
Alexey Kardashevskiy [Mon, 6 Jul 2015 18:15:15 +0000 (12:15 -0600)]
vfio: Unregister IOMMU notifiers when container is destroyed

On systems with guest visible IOMMU, adding a new memory region onto
PCI bus calls vfio_listener_region_add() for every DMA window. This
installs a notifier for IOMMU memory regions. The notifier is supposed
to be removed vfio_listener_region_del(), however in the case of mixed
PHB (emulated + VFIO devices) when last VFIO device is unplugged and
container gets destroyed, all existing DMA windows stay alive altogether
with the notifiers which are on the linked list which head was in
the destroyed container.

This unregisters IOMMU memory region notifier when a container is
destroyed.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
9 years agohw/vfio/platform: add irqfd support
Eric Auger [Mon, 6 Jul 2015 18:15:14 +0000 (12:15 -0600)]
hw/vfio/platform: add irqfd support

This patch aims at optimizing IRQ handling using irqfd framework.

Instead of handling the eventfds on user-side they are handled on
kernel side using
- the KVM irqfd framework,
- the VFIO driver virqfd framework.

the virtual IRQ completion is trapped at interrupt controller
This removes the need for fast/slow path swap.

Overall this brings significant performance improvements.

Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Vikram Sethi <vikrams@codeaurora.org>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
9 years agokvm: some fixes to kvm_resamplefds_allowed
Eric Auger [Mon, 6 Jul 2015 18:15:14 +0000 (12:15 -0600)]
kvm: some fixes to kvm_resamplefds_allowed

Commit f41389ae3c54b introduced kvm_resamplefds_enabled() and
associated kvm_resamplefds_allowed boolean. This patch adds
non-KVM version for kvm_resamplefds_enabled and also declares
kvm_resamplefds_allowed in kvm-stub as it is done for fellow
kvm_irqfds_allowed.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
9 years agosysbus: add irq_routing_notifier
Eric Auger [Mon, 6 Jul 2015 18:15:14 +0000 (12:15 -0600)]
sysbus: add irq_routing_notifier

Add a new connect_irq_notifier notifier in the SysBusDeviceClass. This
notifier, if populated, is called after sysbus_connect_irq.

This mechanism is used to setup VFIO signaling once VFIO platform
devices get attached to their platform bus, on a machine init done
notifier.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Tested-by: Vikram Sethi <vikrams@codeaurora.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
9 years agointc: arm_gic_kvm: set the qemu_irq/gsi mapping
Eric Auger [Mon, 6 Jul 2015 18:15:13 +0000 (12:15 -0600)]
intc: arm_gic_kvm: set the qemu_irq/gsi mapping

The arm_gic_kvm now calls kvm_irqchip_set_qemuirq_gsi to build
the hash table storing qemu_irq/gsi mappings. From that point on
irqfd can be setup directly from the qemu_irq using
kvm_irqchip_add_irqfd_notifier.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Vikram Sethi <vikrams@codeaurora.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
9 years agokvm-all.c: add qemu_irq/gsi hash table and utility routines
Eric Auger [Mon, 6 Jul 2015 18:15:13 +0000 (12:15 -0600)]
kvm-all.c: add qemu_irq/gsi hash table and utility routines

VFIO platform device needs to setup irqfd but it does not know the
gsi corresponding to the device qemu_irq. This patch proposes to
store a hash table in kvm_state using the qemu_irq as key and the gsi
as a value.

kvm_irqchip_set_qemuirq_gsi allows to insert such a pair. The interrupt
controller is supposed to use it.

kvm_irqchip_[add, remove]_irqfd_notifier allows to setup/tear down
irqfd directly from the qemu_irq.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Vikram Sethi <vikrams@codeaurora.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
9 years agokvm: rename kvm_irqchip_[add,remove]_irqfd_notifier with gsi suffix
Eric Auger [Mon, 6 Jul 2015 18:15:13 +0000 (12:15 -0600)]
kvm: rename kvm_irqchip_[add,remove]_irqfd_notifier with gsi suffix

Anticipating for the introduction of new add/remove functions taking
a qemu_irq parameter, let's rename existing ones with a gsi suffix.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Vikram Sethi <vikrams@codeaurora.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
9 years agovfio: cpu: Use "real" page size API
Peter Crosthwaite [Mon, 6 Jul 2015 18:15:12 +0000 (12:15 -0600)]
vfio: cpu: Use "real" page size API

This is system level code, and should only depend on the host page
size, not the target page size.

Note that HOST_PAGE_SIZE is misleadingly lead and is really aligning
to both host and target page size. Hence it's replacement with
REAL_HOST_PAGE_SIZE.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
9 years agocpu-all: complete "real" host page size API
Peter Crosthwaite [Mon, 6 Jul 2015 18:15:12 +0000 (12:15 -0600)]
cpu-all: complete "real" host page size API

Currently the "host" page size alignment API is really aligning to both
host and target page sizes. There is the qemu_real_page_size which can
be used for the actual host page size but it's missing a mask and ALIGN
macro as provided for qemu_page_size. Complete the API. This allows
system level code that cares about the host page size to use a
consistent alignment interface without having to un-needingly align to
the target page size. This also reduces system level code dependency
on the cpu specific TARGET_PAGE_SIZE.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
9 years agovfio: fix return type of pread
Paolo Bonzini [Mon, 6 Jul 2015 18:15:12 +0000 (12:15 -0600)]
vfio: fix return type of pread

size_t is an unsigned type, thus the error case is never reached in
the below call to pread.  If bytes is negative, it will be seen as
a very high positive value.

Spotted by Coverity.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
9 years agopc: add SMM property
Paolo Bonzini [Thu, 18 Jun 2015 16:30:52 +0000 (18:30 +0200)]
pc: add SMM property

The property can take values on, off or auto.  The default is "off"
for KVM and pre-2.4 machines, otherwise "auto" (which makes it
available on TCG or on new-enough kernels).

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoich9: add smm_enabled field and arguments
Paolo Bonzini [Thu, 18 Jun 2015 16:30:51 +0000 (18:30 +0200)]
ich9: add smm_enabled field and arguments

Q35's ACPI device is hard-coding SMM availability to KVM.  Place the
logic where the board is created instead, so that it will be possible
to override it.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agopc_piix: rename kvm_enabled to smm_enabled
Paolo Bonzini [Thu, 18 Jun 2015 16:30:17 +0000 (18:30 +0200)]
pc_piix: rename kvm_enabled to smm_enabled

We will enable SMM even if KVM is in use.  Rename the field and
arguments.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agotarget-i386: register a separate KVM address space including SMRAM regions
Paolo Bonzini [Thu, 18 Jun 2015 16:30:16 +0000 (18:30 +0200)]
target-i386: register a separate KVM address space including SMRAM regions

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm-all: kvm_irqchip_create is not expected to fail
Paolo Bonzini [Thu, 18 Jun 2015 16:30:15 +0000 (18:30 +0200)]
kvm-all: kvm_irqchip_create is not expected to fail

KVM_CREATE_IRQCHIP should never fail, and so should its userspace
wrapper kvm_irqchip_create.  The function does not do anything
if the irqchip capability is not available, as is the case for PPC.

With this patch, kvm_arch_init can allocate memory and it will not
be leaked.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm-all: add support for multiple address spaces
Paolo Bonzini [Thu, 18 Jun 2015 16:30:14 +0000 (18:30 +0200)]
kvm-all: add support for multiple address spaces

Make kvm_memory_listener_register public, and assign a kernel
address space id to each KVMMemoryListener.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm-all: make KVM's memory listener more generic
Paolo Bonzini [Thu, 18 Jun 2015 16:30:13 +0000 (18:30 +0200)]
kvm-all: make KVM's memory listener more generic

No semantic change, but s->slots moves into a new struct
KVMMemoryListener.  KVM's memory listener becomes a member of struct
KVMState, and becomes of type KVMMemoryListener.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm-all: move internal types to kvm_int.h
Paolo Bonzini [Thu, 18 Jun 2015 16:28:45 +0000 (18:28 +0200)]
kvm-all: move internal types to kvm_int.h

i386 code will have to define a different KVMMemoryListener.  Create
an internal header so that KVMSlot is not exposed outside.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm-all: remove useless typedef
Paolo Bonzini [Thu, 18 Jun 2015 16:28:44 +0000 (18:28 +0200)]
kvm-all: remove useless typedef

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm-all: put kvm_mem_flags to more work
Andrew Jones [Thu, 18 Jun 2015 16:28:43 +0000 (18:28 +0200)]
kvm-all: put kvm_mem_flags to more work

Currently kvm_mem_flags just translates bools to bits, let's
make it also determine the bools first. This avoids its parameter
list growing each time we add a flag.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agotarget-i386: add support for SMBASE MSR and SMIs
Paolo Bonzini [Thu, 18 Jun 2015 16:28:42 +0000 (18:28 +0200)]
target-i386: add support for SMBASE MSR and SMIs

Apart from the MSR, the smi field of struct kvm_vcpu_events has to be
translated into the corresponding CPUX86State fields.  Also,
memory transaction flags depend on SMM state, so pull it from struct
kvm_run on every exit from KVM to userspace.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agopiix4/ich9: do not raise SMI on ACPI enable/disable commands
Paolo Bonzini [Thu, 18 Jun 2015 16:28:41 +0000 (18:28 +0200)]
piix4/ich9: do not raise SMI on ACPI enable/disable commands

These commands are handled entirely by QEMU.  Do not raise an SMI
when they happen, because Windows (at least 2008r2) expects these
commands to work and (depending on the value of APMC_EN at
startup) the firmware might not have installed an SMI handler.

When this happens (e.g. the kernel supports SMIs, or you are using
TCG, but you have used "-machine smm=off") RIP is moved to 0x38000
where there is no code to execute.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agolinux-headers: Update to 4.2-rc1
Alexey Kardashevskiy [Mon, 6 Jul 2015 02:10:57 +0000 (12:10 +1000)]
linux-headers: Update to 4.2-rc1

This updates linux-headers against master 4.2-rc1 (commit
d770e558e21961ad6cfdf0ff7df0eb5d7d4f0754). This is the result of
./scripts/update-linux-headers.sh work.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMerge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
Peter Maydell [Mon, 6 Jul 2015 13:03:44 +0000 (14:03 +0100)]
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* more of Peter Crosthwaite's multiarch preparation patches
* unlocked MMIO support in KVM
* support for compilation with ICC

# gpg: Signature made Mon Jul  6 13:59:20 2015 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  exec: skip MMIO regions correctly in cpu_physical_memory_write_rom_internal
  Stop including qemu-common.h in memory.h
  kvm: Switch to unlocked MMIO
  acpi: mark PMTIMER as unlocked
  kvm: Switch to unlocked PIO
  kvm: First step to push iothread lock out of inner run loop
  memory: let address_space_rw/ld*/st* run outside the BQL
  exec: pull qemu_flush_coalesced_mmio_buffer() into address_space_rw/ld*/st*
  memory: Add global-locking property to memory regions
  main-loop: introduce qemu_mutex_iothread_locked
  main-loop: use qemu_mutex_lock_iothread consistently
  Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES
  cpu-defs: Move out TB_JMP defines
  include/exec: Move tb hash functions out
  include/exec: Move standard exceptions to cpu-all.h
  cpu-defs: Move CPU_TEMP_BUF_NLONGS to tcg
  memory_mapping: Rework cpu related includes
  cutils: allow compilation with icc
  qemu-common: add VEC_OR macro

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9 years agoexec: skip MMIO regions correctly in cpu_physical_memory_write_rom_internal
Paolo Bonzini [Fri, 3 Jul 2015 22:24:51 +0000 (00:24 +0200)]
exec: skip MMIO regions correctly in cpu_physical_memory_write_rom_internal

Loading the BIOS in the mac99 machine is interesting, because there is a
PROM in the middle of the BIOS region (from 16K to 32K).  Before memory
region accesses were clamped, when QEMU was asked to load a BIOS from
0xfff00000 to 0xffffffff it would put even those 16K from the BIOS file
into the region.  This is weird because those 16K were not actually
visible between 0xfff04000 and 0xfff07fff.  However, it worked.

After clamping was added, this also worked.  In this case, the
cpu_physical_memory_write_rom_internal function split the write in
three parts: the first 16K were copied, the PROM area (second 16K) were
ignored, then the rest was copied.

Problems then started with commit 965eb2f (exec: do not clamp accesses
to MMIO regions, 2015-06-17).  Clamping accesses is not done for MMIO
regions because they can overlap wildly, and MMIO registers can be
expected to perform full-width accesses based only on their address
(with no respect for adjacent registers that could decode to completely
different MemoryRegions).  However, this lack of clamping also applied
to the PROM area!  cpu_physical_memory_write_rom_internal thus failed
to copy the third range above, i.e. only copied the first 16K of the BIOS.

In effect, address_space_translate is expecting _something else_ to do
the clamping for MMIO regions if the incoming length is large.  This
"something else" is memory_access_size in the case of address_space_rw,
so use the same logic in cpu_physical_memory_write_rom_internal.

Reported-by: Alexander Graf <agraf@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Fixes: 965eb2f
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoStop including qemu-common.h in memory.h
Peter Maydell [Fri, 3 Jul 2015 14:18:24 +0000 (15:18 +0100)]
Stop including qemu-common.h in memory.h

Including qemu-common.h from other header files is generally a bad
idea, because it means it's very easy to end up with a circular
dependency. For instance, if we wanted to include memory.h from
qom/cpu.h we'd end up with this loop:
 memory.h -> qemu-common.h -> cpu.h -> cpu-qom.h -> qom/cpu.h -> memory.h

Remove the include from memory.h. This requires us to fix up a few
other files which were inadvertently getting declarations indirectly
through memory.h.

The biggest change is splitting the fprintf_function typedef out
into its own header so other headers can get at it without having
to include qemu-common.h.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1435933104-15216-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMerge remote-tracking branch 'remotes/xtensa/tags/20150706-xtensa' into staging
Peter Maydell [Mon, 6 Jul 2015 11:51:51 +0000 (12:51 +0100)]
Merge remote-tracking branch 'remotes/xtensa/tags/20150706-xtensa' into staging

Xtensa fixes:

- add 64-bit floating point registers;
- fix gdb register map construction.

# gpg: Signature made Mon Jul  6 11:27:45 2015 BST using RSA key ID F83FA044
# gpg: Good signature from "Max Filippov <max.filippov@cogentembedded.com>"
# gpg:                 aka "Max Filippov <jcmvbkbc@gmail.com>"

* remotes/xtensa/tags/20150706-xtensa:
  target-xtensa: fix gdb register map construction
  target-xtensa: add 64-bit floating point registers

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9 years agotarget-xtensa: fix gdb register map construction
Max Filippov [Wed, 1 Jul 2015 10:00:29 +0000 (13:00 +0300)]
target-xtensa: fix gdb register map construction

Due to different gdb overlay organization between windowed/call0
configurations core import script doesn't always work correctly.
Simplify the script: always copy complete gdb register map from overlay,
count registers at core registerstion time. Update existing cores.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
9 years agotarget-xtensa: add 64-bit floating point registers
Max Filippov [Mon, 29 Jun 2015 07:50:03 +0000 (10:50 +0300)]
target-xtensa: add 64-bit floating point registers

Xtensa ISA got specification for 64-bit floating point registers and
opcodes, see ISA, 4.3.11 "Floating point coprocessor option".

Add 64-bit FP registers.

Although 64-bit floating point is currently not supported by xtensa
translator, these registers need to be reported to gdb with proper size,
otherwise it wouldn't find other registers.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
9 years agoMerge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20150706' into...
Peter Maydell [Mon, 6 Jul 2015 10:04:54 +0000 (11:04 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20150706' into staging

target-arm queue:
 * TLBI ALLEI1IS should operate on all CPUs, not just this one
 * Fix interval interrupt of cadence ttc in decrement mode
 * Implement YIELD insn to yield in ARM and Thumb translators
 * ARM GIC: reset all registers
 * arm_mptimer: fix timer shutdown and mode change
 * arm_mptimer: respect IT bit state

# gpg: Signature made Mon Jul  6 10:58:27 2015 BST using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"

* remotes/pmaydell/tags/pull-target-arm-20150706:
  arm_mptimer: Respect IT bit state
  arm_mptimer: Fix timer shutdown and mode change
  hw/intc/arm_gic_common.c: Reset all registers
  target-arm: Implement YIELD insn to yield in ARM and Thumb translators
  target-arm: Split DISAS_YIELD from DISAS_WFE
  Fix interval interrupt of cadence ttc when timer is in decrement mode
  target-arm: fix write helper for TLBI ALLE1IS

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9 years agoarm_mptimer: Respect IT bit state
Dmitry Osipenko [Mon, 6 Jul 2015 01:27:12 +0000 (04:27 +0300)]
arm_mptimer: Respect IT bit state

The timer should fire the interrupt only if the IT (interrupt enable) bit
state of the control register is enabled.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9 years agoarm_mptimer: Fix timer shutdown and mode change
Dmitry Osipenko [Sun, 5 Jul 2015 22:47:47 +0000 (01:47 +0300)]
arm_mptimer: Fix timer shutdown and mode change

The running timer can't be stopped because timer control code just
doesn't handle disabling the timer. Fix it by deleting the timer if
the enable bit is cleared.

The timer won't start periodic ticking if a ONE-SHOT -> PERIODIC mode
change happens after a one-shot tick was completed. Fix it by
re-starting ticking if the timer isn't ticking right now.

To avoid code churning, these two fixes are squashed in one commit.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9 years agohw/intc/arm_gic_common.c: Reset all registers
Peter Maydell [Mon, 29 Jun 2015 18:25:45 +0000 (19:25 +0100)]
hw/intc/arm_gic_common.c: Reset all registers

The arm_gic_common reset function was missing reset code for
several of the GIC's state fields:
 * bpr[]
 * abpr[]
 * priority1[]
 * priority2[]
 * sgi_pending[]
 * irq_target[] (SMP configurations only)

These probably went unnoticed because most guests will either
never touch them, or will write to them in the process of
configuring the GIC before enabling interrupts.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1435602345-32210-1-git-send-email-peter.maydell@linaro.org
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
9 years agotarget-arm: Implement YIELD insn to yield in ARM and Thumb translators
Peter Maydell [Mon, 6 Jul 2015 09:05:44 +0000 (10:05 +0100)]
target-arm: Implement YIELD insn to yield in ARM and Thumb translators

Implement the YIELD instruction in the ARM and Thumb translators to
actually yield control back to the top level loop rather than being
a simple no-op. (We already do this for A64.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-id: 1435672316-3311-3-git-send-email-peter.maydell@linaro.org

9 years agotarget-arm: Split DISAS_YIELD from DISAS_WFE
Peter Maydell [Mon, 6 Jul 2015 09:05:44 +0000 (10:05 +0100)]
target-arm: Split DISAS_YIELD from DISAS_WFE

Currently we use DISAS_WFE for both WFE and YIELD instructions.
This is functionally correct because at the moment both of them
are implemented as "yield this CPU back to the top level loop so
another CPU has a chance to run". However it's rather confusing
that YIELD ends up calling HELPER(wfe), and if we ever want to
implement real behaviour for WFE and SEV it's likely to trip us up.

Split out the yield codepath to use DISAS_YIELD and a new
HELPER(yield) function, and have HELPER(wfe) call HELPER(yield).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1435672316-3311-2-git-send-email-peter.maydell@linaro.org
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
9 years agoFix interval interrupt of cadence ttc when timer is in decrement mode
Johannes Schlatow [Mon, 6 Jul 2015 09:05:44 +0000 (10:05 +0100)]
Fix interval interrupt of cadence ttc when timer is in decrement mode

The interval interrupt is not set if the timer is in decrement mode.
This is because x >=0 and x < interval after leaving the while-loop.

Signed-off-by: Johannes Schlatow <schlatow@ida.ing.tu-bs.de>
Message-id: 20150630135821.51f3b4fd@johanness-latitude
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9 years agotarget-arm: fix write helper for TLBI ALLE1IS
Sergey Fedorov [Mon, 6 Jul 2015 09:05:43 +0000 (10:05 +0100)]
target-arm: fix write helper for TLBI ALLE1IS

TLBI ALLE1IS is an operation that does invalidate TLB entries on all PEs
in the same Inner Sharable domain, not just on the current CPU. So we
must use tlbiall_is_write() here.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-id: 1435676538-31345-1-git-send-email-serge.fdrv@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9 years agoMerge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging
Peter Maydell [Sun, 5 Jul 2015 19:35:47 +0000 (20:35 +0100)]
Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging

# gpg: Signature made Sat Jul  4 07:06:08 2015 BST using RSA key ID AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/ide-pull-request: (35 commits)
  ahci: fix sdb fis semantics
  qtest/ahci: halted ncq migration test
  ahci: Do not map cmd_fis to generate response
  ahci: ncq migration
  ahci: add get_cmd_header helper
  ahci: add cmd header to ncq transfer state
  qtest/ahci: halted NCQ test
  ahci: correct ncq sector count
  ahci: correct types in NCQTransferState
  ahci: add rwerror=stop support for ncq
  ahci: factor ncq_finish out of ncq_cb
  ahci: refactor process_ncq_command
  ahci: assert is_ncq for process_ncq
  ahci: stash ncq command
  ide: add limit to .prepare_buf()
  qtest/ahci: ncq migration test
  qtest/ahci: simple ncq data test
  libqos/ahci: Force all NCQ commands to be LBA48
  libqos/ahci: set the NCQ tag on command_commit
  libqos/ahci: adjust expected NCQ interrupts
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9 years agoMerge remote-tracking branch 'remotes/ehabkost/tags/numa-pull-request' into staging
Peter Maydell [Sun, 5 Jul 2015 18:33:51 +0000 (19:33 +0100)]
Merge remote-tracking branch 'remotes/ehabkost/tags/numa-pull-request' into staging

NUMA queue, 2015-07-03

# gpg: Signature made Fri Jul  3 21:49:58 2015 BST using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/numa-pull-request:
  numa: API to lookup NUMA node by address
  numa: Store boot memory address range in node_info
  numa,pc-dimm: Store pc-dimm memory information in numa_info
  pc: Abort if HotplugHandlerClass::plug() fails
  pc,pc-dimm: Factor out reusable parts in pc_dimm_plug to a separate routine
  pc,pc-dimm: Extract hotplug related fields in PCMachineState to a structure

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9 years agoahci: fix sdb fis semantics
John Snow [Sat, 4 Jul 2015 06:06:05 +0000 (02:06 -0400)]
ahci: fix sdb fis semantics

There are two things to fix here:

The first one is subtle: the PxSACT register in the AHCI HBA has different
semantics from the field it is shadowing, the ACT field in the
Set Device Bits FIS.

In the HBA register, PxSACT acts as a bitfield indicating outstanding
NCQ commands where a set bit indicates a pending NCQ operation. The FIS
field however operates as an RWC register update to PxSACT, where a set
bit indicates a *successfully* completed command.

Correct the FIS semantics. At the same time, move the "clear finished"
action to the SDB FIS generation instead of the register read to mimick
how the other shadow registers work, which always just report the last
reported value from a FIS, and not the most current values which may
not have been reported by a FIS yet.

Lastly and more simply, SATA 3.2 section 13.6.4.2 (and later sections)
all specify that the Interrupt bit for the SDB FIS should always be set
to one for NCQ commands. That's currently the only time we generate this
FIS, so set it on all the time.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-16-git-send-email-jsnow@redhat.com

9 years agoqtest/ahci: halted ncq migration test
John Snow [Sat, 4 Jul 2015 06:06:05 +0000 (02:06 -0400)]
qtest/ahci: halted ncq migration test

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-15-git-send-email-jsnow@redhat.com

9 years agoahci: Do not map cmd_fis to generate response
John Snow [Sat, 4 Jul 2015 06:06:05 +0000 (02:06 -0400)]
ahci: Do not map cmd_fis to generate response

The Register D2H FIS should copy the current values of
the registers instead of just parroting back the same
values the guest sent back to it.

In this case, the SECTOR COUNT variables are actually
not generally meaningful in terms of standard commands
(See ATA8-AC3 Section 9.2 Normal Outputs), so it actually
probably doesn't matter what we put in here.

Meanwhile, we do need to use the Register update FIS from
the NCQ pathways (in error cases), so getting rid of
references to cur_cmd here is a win for AHCI concurrency.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-14-git-send-email-jsnow@redhat.com

9 years agoahci: ncq migration
John Snow [Sat, 4 Jul 2015 06:06:05 +0000 (02:06 -0400)]
ahci: ncq migration

Migrate the NCQ queue. This is solely for the benefit of halted commands,
since anything else should have completed and had any relevant status
flushed to the HBA registers already.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-13-git-send-email-jsnow@redhat.com

9 years agoahci: add get_cmd_header helper
John Snow [Sat, 4 Jul 2015 06:06:05 +0000 (02:06 -0400)]
ahci: add get_cmd_header helper

cur_cmd is an internal bookmark that points to the
current AHCI Command Header being processed by the
AHCI state machine. With NCQ needing to occasionally
rely on some of the same AHCI helpers, we cannot use
cur_cmd and will need to grab explicit pointers instead.

In an attempt to begin relying on the cur_cmd pointer
less, add a helper to let us specifically get the pointer
to the command header of particular interest.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-12-git-send-email-jsnow@redhat.com

9 years agoahci: add cmd header to ncq transfer state
John Snow [Sat, 4 Jul 2015 06:06:05 +0000 (02:06 -0400)]
ahci: add cmd header to ncq transfer state

While the rest of the AHCI device can rely on a single bookmarked
pointer for the AHCI Command Header currently being processed, NCQ
is asynchronous and may have many commands in flight simultaneously.

Add a cmdh pointer to the ncq_tfs object and make the sglist prepare
function take an AHCICmdHeader pointer so we can be explicit about
where we'd like to build SGlists from.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-11-git-send-email-jsnow@redhat.com

9 years agoqtest/ahci: halted NCQ test
John Snow [Sat, 4 Jul 2015 06:06:05 +0000 (02:06 -0400)]
qtest/ahci: halted NCQ test

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-10-git-send-email-jsnow@redhat.com

9 years agoahci: correct ncq sector count
John Snow [Sat, 4 Jul 2015 06:06:05 +0000 (02:06 -0400)]
ahci: correct ncq sector count

uint16_t isn't enough to hold the real sector count, since a value of
zero implies a full 64K sectors, so we need a uint32_t here.

We *could* cheat and pretend that this value is 0-based and fit it in
a uint16_t, but I'd rather waste 2 bytes instead of a future dev's
10 minutes when they forget to +1/-1 accordingly somewhere.

See SATA 3.2, section 13.6.4.1 "READ FPDMA QUEUED".

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-9-git-send-email-jsnow@redhat.com

9 years agoahci: correct types in NCQTransferState
John Snow [Sat, 4 Jul 2015 06:06:05 +0000 (02:06 -0400)]
ahci: correct types in NCQTransferState

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-8-git-send-email-jsnow@redhat.com

9 years agoahci: add rwerror=stop support for ncq
John Snow [Sat, 4 Jul 2015 06:06:04 +0000 (02:06 -0400)]
ahci: add rwerror=stop support for ncq

Handle NCQ failures for cases where we want to halt the VM on IO errors.
Upon a VM state change, retry the halted NCQ commands.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-7-git-send-email-jsnow@redhat.com

9 years agoahci: factor ncq_finish out of ncq_cb
John Snow [Sat, 4 Jul 2015 06:06:04 +0000 (02:06 -0400)]
ahci: factor ncq_finish out of ncq_cb

When we add werror=stop or rerror=stop support to NCQ,
we'll want to take a codepath where we don't actually
complete the command, so factor that out into a new routine.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-6-git-send-email-jsnow@redhat.com

9 years agoahci: refactor process_ncq_command
John Snow [Sat, 4 Jul 2015 06:06:04 +0000 (02:06 -0400)]
ahci: refactor process_ncq_command

Split off execute_ncq_command so that we can call
it separately later if we desire.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-5-git-send-email-jsnow@redhat.com

9 years agoahci: assert is_ncq for process_ncq
John Snow [Sat, 4 Jul 2015 06:06:04 +0000 (02:06 -0400)]
ahci: assert is_ncq for process_ncq

We already checked this in the handle_cmd phase, so just
change this to an assertion and simplify the error logic.

(Also, fix the switch indent, because checkpatch.pl yelled.)
((Sorry for churn.))

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-4-git-send-email-jsnow@redhat.com

9 years agoahci: stash ncq command
John Snow [Sat, 4 Jul 2015 06:06:04 +0000 (02:06 -0400)]
ahci: stash ncq command

For migration and werror=stop/rerror=stop resume purposes,
it will be convenient to have the command handy inside of
ncq_tfs.

Eventually, we'd like to avoid reading from the FIS entirely
after the initial read, so this is a byte (hah!) sized step
in that direction.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-3-git-send-email-jsnow@redhat.com

9 years agoide: add limit to .prepare_buf()
John Snow [Sat, 4 Jul 2015 06:06:04 +0000 (02:06 -0400)]
ide: add limit to .prepare_buf()

prepare_buf should not always grab as many descriptors
as it can, sometimes it should self-limit.

For example, an NCQ transfer of 1 sector with a PRDT that
describes 4GiB of data should not copy 4GiB of data, it
should just transfer that first 512 bytes.

PIO is not affected, because the dma_buf_rw dma helpers
already have a byte limit built-in to them, but DMA/NCQ
will exhaust the entire list regardless of requested size.

AHCI 1.3 specifies in section 6.1.6 Command List Underflow that
NCQ is not required to detect underflow conditions. Non-NCQ
pathways signal underflow by writing to the PRDBC field, which
will already occur by writing the actual transferred byte count
to the PRDBC, signaling the underflow.

Our NCQ pathways aren't required to detect underflow, but since our DMA
backend uses the size of the PRDT to determine the size of the transer,
if our PRDT is bigger than the transaction (the underflow condition) it
doesn't cost us anything to detect it and truncate the PRDT.

This is a recoverable error and is not signaled to the guest, in either
NCQ or normal DMA cases.

For BMDMA, the existing pathways should see no guest-visible difference,
but any bytes described in the overage will no longer be transferred
before indicating to the guest that there was an underflow.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435767578-32743-2-git-send-email-jsnow@redhat.com

9 years agoqtest/ahci: ncq migration test
John Snow [Sat, 4 Jul 2015 06:06:04 +0000 (02:06 -0400)]
qtest/ahci: ncq migration test

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-17-git-send-email-jsnow@redhat.com

9 years agoqtest/ahci: simple ncq data test
John Snow [Sat, 4 Jul 2015 06:06:04 +0000 (02:06 -0400)]
qtest/ahci: simple ncq data test

Test the NCQ pathways for a simple IO RW test.
Also, test that libqos doesn't explode when
running NCQ commands :)

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-16-git-send-email-jsnow@redhat.com

9 years agolibqos/ahci: Force all NCQ commands to be LBA48
John Snow [Sat, 4 Jul 2015 06:06:04 +0000 (02:06 -0400)]
libqos/ahci: Force all NCQ commands to be LBA48

NCQ commands are LBA48 by definition.

See SATA 3.2 13.6.4.1 "READ FPDMA QUEUED", or
    SATA 3.2 13.6.5.1 "WRITE FPDMA QUEUED."

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-15-git-send-email-jsnow@redhat.com

9 years agolibqos/ahci: set the NCQ tag on command_commit
John Snow [Sat, 4 Jul 2015 06:06:04 +0000 (02:06 -0400)]
libqos/ahci: set the NCQ tag on command_commit

NCQ commands have the concept of a "TAG" that they need to set,
but in the AHCI world, it is mandated that the TAG always match
the command slot that you executed the NCQ from.

See AHCI 9.3.1.1.5.2 "Native Queued Commands".

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-14-git-send-email-jsnow@redhat.com

9 years agolibqos/ahci: adjust expected NCQ interrupts
John Snow [Sat, 4 Jul 2015 06:06:04 +0000 (02:06 -0400)]
libqos/ahci: adjust expected NCQ interrupts

NCQ commands will expect the SDBS interrupt,
and in the normative case, do not expect to see
a D2H Register FIS unless something went wrong.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-13-git-send-email-jsnow@redhat.com

9 years agolibqos/ahci: edit wait to be ncq aware
John Snow [Sat, 4 Jul 2015 06:06:03 +0000 (02:06 -0400)]
libqos/ahci: edit wait to be ncq aware

The wait command should check to make sure SACT is clear as well
as the Command Issue register.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-12-git-send-email-jsnow@redhat.com

9 years agolibqos/ahci: add NCQ frame support
John Snow [Sat, 4 Jul 2015 06:06:03 +0000 (02:06 -0400)]
libqos/ahci: add NCQ frame support

NCQ frames are generated a little differently than
their non-NCQ cousins. Add support for them.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-11-git-send-email-jsnow@redhat.com

9 years agolibqos/ahci: fix cmd_sanity for ncq
John Snow [Sat, 4 Jul 2015 06:06:03 +0000 (02:06 -0400)]
libqos/ahci: fix cmd_sanity for ncq

NCQ commands should not / do not update the byte count
in the command header post command, so this field is
meaningless for NCQ tests.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-10-git-send-email-jsnow@redhat.com

9 years agoahci/qtest: Execute IDENTIFY prior to data commands
John Snow [Sat, 4 Jul 2015 06:06:03 +0000 (02:06 -0400)]
ahci/qtest: Execute IDENTIFY prior to data commands

If you try to execute an NCQ command before trying to engage with the
device by issuing an IDENTIFY command, the error bits that are part of
the signature will fool the test suite into thinking there was a failure.

Issue IDENTIFY first on "boot", which will clear the signature out of
the registers for us.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-9-git-send-email-jsnow@redhat.com

9 years agoahci: ncq sector count correction
John Snow [Sat, 4 Jul 2015 06:06:03 +0000 (02:06 -0400)]
ahci: ncq sector count correction

This value should not be size-corrected, 0 sectors does not imply
1 sector(s). This is just debug information, but it's misleading!

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-8-git-send-email-jsnow@redhat.com

9 years agoahci: add ncq debug checks
John Snow [Sat, 4 Jul 2015 06:06:03 +0000 (02:06 -0400)]
ahci: add ncq debug checks

Most of the time, these bits can be safely ignored. For the purposes
of debugging however, it's nice to know that they're not being used.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-7-git-send-email-jsnow@redhat.com

9 years agoahci: separate prdtl from opts
John Snow [Sat, 4 Jul 2015 06:06:03 +0000 (02:06 -0400)]
ahci: separate prdtl from opts

There's no real reason to have it bundled together, and this way
is a little nicer to follow if you have the AHCI spec pulled up.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-6-git-send-email-jsnow@redhat.com

9 years agoahci: check for ncq prdtl overflow
John Snow [Sat, 4 Jul 2015 06:06:03 +0000 (02:06 -0400)]
ahci: check for ncq prdtl overflow

Don't attempt the NCQ transfer if the PRDT we were given is not big
enough to perform the entire transfer.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-5-git-send-email-jsnow@redhat.com

9 years agoahci: add ncq_err helper
John Snow [Sat, 4 Jul 2015 06:06:03 +0000 (02:06 -0400)]
ahci: add ncq_err helper

Set some appropriate error bits for NCQ for us.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-4-git-send-email-jsnow@redhat.com

9 years agoahci: use shorter variables
John Snow [Sat, 4 Jul 2015 06:06:03 +0000 (02:06 -0400)]
ahci: use shorter variables

Trivial cleanup that I didn't want to tack-on to anything else.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-3-git-send-email-jsnow@redhat.com

9 years agoahci: Rename NCQFIS structure fields
John Snow [Sat, 4 Jul 2015 06:06:03 +0000 (02:06 -0400)]
ahci: Rename NCQFIS structure fields

Several fields of the NCQFIS structure are ambiguously named. This patch
clarifies the intended (if unsupported) usage of the NCQ fields to aid
in creating more meaningful debug messages through the NCQ codepaths.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435016308-6150-2-git-send-email-jsnow@redhat.com

9 years agoqtest/ahci: add port_reset test
John Snow [Sat, 4 Jul 2015 06:06:02 +0000 (02:06 -0400)]
qtest/ahci: add port_reset test

Test that we can survive a couple of cycles of running a basic identify
test, some IO, and resetting the HBA. Ensures that we can bring the HBA
back to compliant spec during the lifecycle of the VM.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1434470575-21625-5-git-send-email-jsnow@redhat.com

9 years agolibqos/ahci: fix memory management bugs
John Snow [Sat, 4 Jul 2015 06:06:02 +0000 (02:06 -0400)]
libqos/ahci: fix memory management bugs

There's a handful of trivial bugs in the libqos/ahci functions,
squish them together.

- Zero cached pointers after freeing them
- The Command List Buffer is an array of 32x 32 byte structures, not
  32x 8 byte pointers -- it's 1MiB, not 256 bytes. Zero it ALL.
- Free the correct command in ahci_pick_cmd.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1434470575-21625-4-git-send-email-jsnow@redhat.com