review.tizen.org Git - sdk/emulator/qemu.git/log

xen/pass-through: fold host PCI command register writes

The code introduced to address XSA-126 allows simplification of other
code in xen_pt_initfn(): All we need to do is update "cmd" suitably,
as it'll be written back to the host register near the end of the
function anyway.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

Merge remote-tracking branch 'remotes/agraf/tags/signed-s390-for-upstream' into staging

Patch queue for s390 - 2015-06-05

This time there are a lot of s390x TCG emulation bug fixes - almost all
of them from Aurelien, who returned from nirvana :).

# gpg: Signature made Fri Jun  5 00:39:27 2015 BST using RSA key ID 03FEDC60
# gpg: Good signature from "Alexander Graf <agraf@suse.de>"
# gpg:                 aka "Alexander Graf <alex@csgraf.de>"

* remotes/agraf/tags/signed-s390-for-upstream: (34 commits)
  target-s390x: Only access allocated storage keys
  target-s390x: fix MVC instruction when areas overlap
  target-s390x: use softmmu functions for mvcp/mvcs
  target-s390x: support non current ASC in s390_cpu_handle_mmu_fault
  target-s390x: add a cpu_mmu_idx_to_asc function
  target-s390x: implement high-word facility
  target-s390x: implement load-and-trap facility
  target-s390x: implement miscellaneous-instruction-extensions facility
  target-s390x: implement LPDFR and LNDFR instructions
  target-s390x: implement TRANSLATE EXTENDED instruction
  target-s390x: implement TRANSLATE AND TEST instruction
  target-s390x: implement LOAD FP INTEGER instructions
  target-s390x: move SET DFP ROUNDING MODE to the correct facility
  target-s390x: move STORE CLOCK FAST to the correct facility
  target-s390x: change CHRL and CGHRL format to RIL-b
  target-s390x: fix CLGIT instruction
  target-s390x: fix exception for invalid operation code
  target-s390x: implement LAY and LAEY instructions
  target-s390x: move a few instructions to the correct facility
  target-s390x: detect tininess before rounding for FP operations
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target-s390x: Only access allocated storage keys

We allocate ram_size / PAGE_SIZE storage keys, so we need to make sure that
we only access that many. Unfortunately the code can overrun this array by
one, potentially overwriting unrelated memory.

Fix it by limiting storage keys to their scope.

Signed-off-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

target-s390x: fix MVC instruction when areas overlap

The MVC instruction and the memmove C funtion do not have the same
semantic when memory areas overlap:

MVC: When the operands overlap, the result is obtained as if the
operands were processed one byte at a time and each result byte were
stored immediately after fetching the necessary operand byte.

memmove: Copying takes place as though the bytes in src are first copied
into a temporary array that does not overlap src or dest, and the bytes
are then copied from the temporary array to dest.

The behaviour is therefore the same when the destination is at a lower
address than the source, but not in the other case. This is actually a
trick for propagating a value to an area. While the current code detects
that and call memset in that case, it only does for 1-byte value. This
trick can and is used for propagating two or more bytes to an area.

In the softmmu case, the call to mvc_fast_memmove is correct as the
above tests verify that source and destination are each within a page,
and both in a different page. The part doing the move 8 bytes by 8 bytes
is wrong and we need to check that if the source and destination
overlap, they do with a distance of minimum 8 bytes before copying 8
bytes at a time.

In the user code, we should check check that the destination is at a
lower address than source or than the end of the source is at a lower
address than the destination before calling memmove. In the opposite
case we fallback to the same code as the softmmu one. Note that l
represents (length - 1).

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: use softmmu functions for mvcp/mvcs

mvcp and mvcs helper get access to the physical memory by a call to
mmu_translate for the virtual to real conversion and then using ldb_phys
and stb_phys to physically access the data. In practice this is quite
slow because it bypasses the QEMU softmmu TLB and because stb_phys calls
try to invalidate the corresponding memory for each access.

Instead use cpu_ldb_{primary,secondary} for the loads and
cpu_stb_{primary,secondary} for the stores. Ideally this should be
further optimized by a call to memcpy, but that already improves the
boot time of a guest by a factor 1.8.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: support non current ASC in s390_cpu_handle_mmu_fault

s390_cpu_handle_mmu_fault currently looks at the current ASC mode
defined in PSW mask instead of the MMU index. This prevent emulating
easily instructions using a specific ASC mode. Fix that by using the
MMU index converted back to ASC using the just added cpu_mmu_idx_to_asc
function.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: add a cpu_mmu_idx_to_asc function

Use constants to define the MMU indexes, and add a function to do
the reverse conversion of cpu_mmu_index.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: implement high-word facility

Besides RISBHG and RISBLG, all high-word instructions are not
implemented. Fix that.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: implement load-and-trap facility

At the same time move the trap code from op_ct into gen_trap and use it
for all new functions. The value needs to be stored back to register
before the exception, but also before the brcond (as we don't use
temp locals). That's why we can't use wout helper.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: implement miscellaneous-instruction-extensions facility

RISBGN is the same as RISBG, but without setting the condition code.
CLT and CLGT are the same as CLRT and CLGRT, but using memory for the
second operand.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: implement LPDFR and LNDFR instructions

This complete the floating point support sign handling facility.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: implement TRANSLATE EXTENDED instruction

It is part of the basic zArchitecture instructions.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: implement TRANSLATE AND TEST instruction

It is part of the basic zArchitecture instructions. Allow it to be call
from EXECUTE.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: implement LOAD FP INTEGER instructions

This is needed to pass the gcc.c-torture/execute/ieee/20010114-2.c test
in the gcc testsuite.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: move SET DFP ROUNDING MODE to the correct facility

It belongs to the DFP rounding facility.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: move STORE CLOCK FAST to the correct facility

STORE CLOCK FAST should be in the SCF facility.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: change CHRL and CGHRL format to RIL-b

Change to match the PoP. In practice both format RIL-a and RIL-b have
the same fields. They differ on the way we decode the fields, and it's
done correctly in QEMU.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: fix CLGIT instruction

The COMPARE LOGICAL IMMEDIATE AND TRAP instruction should compare the
numbers as unsigned, as its name implies.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: fix exception for invalid operation code

When an operation code is not recognized (ie invalid instruction) an
operation exception should be generated instead of a specification
exception. The latter is for valid opcode, with invalid operands or
modifiers.

This give a very basic GDB support in the guest, as it uses the invalid
opcode 0x0001 to generate a trap.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: implement LAY and LAEY instructions

This complete the general-instructions-extension facility, enable it.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
[agraf: remove facility bit]
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: move a few instructions to the correct facility

LY is part of the long-displacement facility.
RISBHG and RISBLG are part of the high-word facility.
STCMH is part of the z/Architecture.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: detect tininess before rounding for FP operations

The s390x floating point unit detects tininess before rounding, so set
the softfloat fp_status up appropriately.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: silence NaNs for LOAD LENGTHENED and LOAD ROUNDED

LOAD LENGTHENED and LOAD ROUNDED are considered as FP operations and
thus need to convert input sNaN into corresponding qNaN.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: define default NaN values

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: fix MMU index computation

The cpu_mmu_index function wrongly looks at PSW P bit to determine the
MMU index, while this bit actually only control the use of priviledge
instructions. The addressing mode is detected by looking at the PSW ASC
bits instead.

This used to work more or less correctly up to kernel 3.6 as the kernel
was running in primary space and userland in secondary space. Since
kernel 3.7 the default is to run the kernel in home space and userland
in primary space. While the current QEMU code seems to work it open some
security issues, like accessing the lowcore memory in R/W mode from a
userspace process once it has been accessed by the kernel (it is then
cached by the QEMU TLB).

At the same time change the MMU_USER_IDX value so that it matches the
value used in recent kernels.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: fix PSW value on dynamical exception from helpers

runtime_exception computes the psw.addr value using the actual exception
address and the instruction length computed by calling the get_ilen
function. However as explained above the get_ilen code, it returns the
actual instruction length, and not the ILC. Therefore there is no need to
multiply the value by 2.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: fix LOAD MULTIPLE instruction on page boundary

When consecutive memory locations are on page boundary a page fault
might occur when using the LOAD MULTIPLE instruction. In that case real
hardware doesn't load any register.

This is an important detail in case the base register is in the list
of registers to be loaded. If a page fault occurs this register might be
overwritten and when the instruction is later restarted the wrong
base register value is useD.

Fix this by first loading the first and last value from memory, hence
triggering all possible page faults, and then the remaining registers.

This fixes random segmentation faults seen in the guest.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: implement STPT helper

Save the timer target value in the SPT helper, so that the STPT helper
can compute the remaining time.

This allow the Linux kernel to correctly do time accounting.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: implement STCKC helper

The STCKC instruction just returns the last written clock comparator
value and KVM already provides the corresponding variable.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: streamline STCK helper

Now that clock_value is only used in one place, we can inline it in
the STCK helper.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: simplify SCKC helper

The clock comparator and the QEMU timer work the same way, triggering
at a given time, they just differ by the origin and the scale. It is
therefore possible to go from one to another without using the current
clock value. This spares two calls to qemu_clock_get_ns, which probably
return slightly different values, possibly reducing the accuracy.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: add a tod2time function

Add a tod2time function similar to the time2tod one, instead of open
coding the conversion.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: remove unused helpers

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: optimize (negative-) abs computation

Now that movcond exists, it's easy to write (negative-) absolute value
using TCG code instead of an helper.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: fix CC computation for LOAD POSITIVE instructions

LOAD POSITIVE instructions (LPR, LPGR and LPGFR) set the following
condition code:
  0: Result zero; no overflow
  1: --
  2: Result greater than zero; no overflow
  3: Overflow

The current code wrongly returns 1 instead of 2 in case of a result
greater than 0. This patches fixes that. This fixes the marshalling of
the value '0L' in Python.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-s390x: fix CC computation for EX instruction

Commit 7a6c7067f optimized CC computation by only saving cc_op before
calling helpers as they either don't touch the CC or generate a new
static value. This however doesn't work for the EX instruction as the
helper changes or not the CC value depending on the actual executed
instruction (e.g. MVC vs CLC).

This patches force a CC computation before calling the helper. This
fixes random memory corruption occuring in guests.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
[agraf: remove set_cc_static in op_ex as suggested by rth]
Signed-off-by: Alexander Graf <agraf@suse.de>

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc, acpi, virtio, tpm

This includes pxb support by Marcel, as well as multiple enhancements all over
the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Thu Jun  4 11:51:02 2015 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream: (28 commits)
  vhost: logs sharing
  hw/acpi: piix4_pm_init(): take fw_cfg object no more
  hw/acpi: move "etc/system-states" fw_cfg file from PIIX4 to core
  hw/acpi: acpi_pm1_cnt_init(): take "disable_s3" and "disable_s4"
  pc-dimm: don't assert if pc-dimm alignment != hotpluggable mem range size
  docs: Add PXB documentation
  apci: fix PXB behaviour if used with unsupported BIOS
  hw/pxb: add numa_node parameter
  hw/pci: add support for NUMA nodes
  hw/pxb: add map_irq func
  hw/pci: inform bios if the system has extra pci root buses
  hw/pci: introduce PCI Expander Bridge (PXB)
  hw/pci: removed 'rootbus nr is 0' assumption from qmp_pci_query
  hw/acpi: remove from root bus 0 the crs resources used by other buses.
  hw/acpi: add _CRS method for extra root busses
  hw/apci: add _PRT method for extra PCI root busses
  hw/acpi: add support for i440fx 'snooping' root busses
  hw/pci: extend PCI config access to support devices behind PXB
  hw/i386: query only for q35/pc when looking for pci host bridge
  hw/pci: made pci_bus_num a PCIBusClass method
  ...

Conflicts:
hw/i386/pc_piix.c

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/agraf/tags/signed-ppc-for-upstream' into staging

Patch queue for ppc - 2015-06-03

Highlights this time around:

  - sPAPR: endian fixes, speedups, bug fixes, hotplug basics
  - add default ram size capability for machines (sPAPR defaults to 512MB now)

# gpg: Signature made Wed Jun  3 22:59:09 2015 BST using RSA key ID 03FEDC60
# gpg: Good signature from "Alexander Graf <agraf@suse.de>"
# gpg:                 aka "Alexander Graf <alex@csgraf.de>"

* remotes/agraf/tags/signed-ppc-for-upstream: (40 commits)
  softmmu: support up to 12 MMU modes
  tcg: add TCG_TARGET_TLB_DISPLACEMENT_BITS
  tci: do not use CPUArchState in tcg-target.h
  Add David Gibson for sPAPR in MAINTAINERS file
  pseries: Enable in-kernel H_LOGICAL_CI_{LOAD, STORE} implementations
  spapr: override default ram size to 512MB
  machine: add default_ram_size to machine class
  spapr_pci: emit hotplug add/remove events during hotplug
  spapr_pci: enable basic hotplug operations
  pci: make pci_bar useable outside pci.c
  spapr_pci: create DRConnectors for each PCI slot during PHB realize
  spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge
  spapr_drc: add spapr_drc_populate_dt()
  spapr_events: event-scan RTAS interface
  spapr_events: re-use EPOW event infrastructure for hotplug events
  spapr_rtas: add ibm, configure-connector RTAS interface
  spapr: add rtas_st_buffer_direct() helper
  spapr_rtas: add get-sensor-state RTAS interface
  spapr_rtas: add set-indicator RTAS interface
  spapr_rtas: add get/set-power-level RTAS interfaces
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2015-06-03' into staging

trivial patches for 2015-06-03

# gpg: Signature made Wed Jun  3 14:07:47 2015 BST using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"

* remotes/mjt/tags/pull-trivial-patches-2015-06-03: (30 commits)
  configure: postfix --extra-cflags to QEMU_CFLAGS
  cadence_gem: Fix Rx buffer size field mask
  slirp: use less predictable directory name in /tmp for smb config (CVE-2015-4037)
  translate-all: delete prototype for non-existent function
  Add -incoming help text
  hw/display/tc6393xb.c: Fix misusing qemu_allocate_irqs for single irq
  hw/arm/nseries.c: Fix misusing qemu_allocate_irqs for single irq
  hw/alpha/typhoon.c: Fix misusing qemu_allocate_irqs for single irq
  hw/unicore32/puv3.c: Fix misusing qemu_allocate_irqs for single irq
  hw/lm32/milkymist.c: Fix misusing qemu_allocate_irqs for single irq
  hw/lm32/lm32_boards.c: Fix misusing qemu_allocate_irqs for single irq
  hw/ppc/prep.c: Fix misusing qemu_allocate_irqs for single irq
  hw/sparc/sun4m.c: Fix misusing qemu_allocate_irqs for single irq
  hw/timer/arm_timer.c: Fix misusing qemu_allocate_irqs for single irq
  hw/isa/i82378.c: Fix misusing qemu_allocate_irqs for single irq
  hw/isa/lpc_ich9.c: Fix misusing qemu_allocate_irqs for single irq
  hw/i386/pc: Fix misusing qemu_allocate_irqs for single irq
  hw/intc/exynos4210_gic.c: Fix memory leak by adjusting order
  hw/arm/omap_sx1.c: Fix memory leak spotted by valgrind
  hw/ppc/e500.c: Fix memory leak
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

vhost: logs sharing

Currently we allocate one vhost log per vhost device. This is sub
optimal when:

- Guest has several device with vhost as backend
- Guest has multiqueue devices

In the above cases, we can avoid the memory allocation by sharing a
single vhost log among all the vhost devices. This is done through:

- Introducing a new vhost_log structure with refcnt inside.
- Using a global pointer to vhost_log structure that will be used. And
introduce helper to get the log with expected log size and helper to
- drop the refcnt to the old log.
- Each vhost device still keep track of a pointer to the log that was
used.

With above, if no resize happens, all vhost device will share a single
vhost log. During resize, a new vhost_log structure will be allocated
and made for the global pointer. And each vhost devices will drop the
refcnt to the old log.

Tested by doing scp during migration for a 2 queues virtio-net-pci.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging

X86 queue 2015-06-02

# gpg: Signature made Tue Jun  2 20:21:17 2015 BST using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/x86-pull-request:
  arch_init: Drop target-x86_64.conf
  target-i386: Register QOM properties for feature flags
  apic: convert ->busdev.qdev casts to C casts
  target-i386: Fix signedness of MSR_IA32_APICBASE_BASE
  pc: Ensure non-zero CPU ref count after attaching to ICC bus

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/acpi: piix4_pm_init(): take fw_cfg object no more

This PIIX4 init function has no more reason to receive a pointer to the
FwCfg object. Remove the parameter from the prototype, and update callers.

As a result, the pc_init1() function no longer needs to save the return
value of pc_memory_init() and xen_load_linux(), which makes it more
similar to pc_q35_init().

The return type & value of pc_memory_init() and xen_load_linux() are not
changed themselves; maybe we'll need their return values sometime later.

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1204696
Cc: Amit Shah <amit.shah@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>

hw/acpi: move "etc/system-states" fw_cfg file from PIIX4 to core

The acpi_pm1_cnt_init() core function is responsible for setting up the
register block that will ultimately react to S3 and S4 requests (see
acpi_pm1_cnt_write()). It makes sense to advertise this configuration to
the guest firmware via an easy to parse fw_cfg file (ACPI is too complex
for firmware to parse), and indeed PIIX4 does that. However, since
acpi_pm1_cnt_init() is not specific to PIIX4, neither should be the fw_cfg
file.

This patch makes "etc/system-states" appear on all chipsets modified in
the previous patch, not just PIIX4 (assuming they have fw_cfg at all).

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1204696
Cc: Amit Shah <amit.shah@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>

hw/acpi: acpi_pm1_cnt_init(): take "disable_s3" and "disable_s4"

This patch only modifies the function prototype and updates all chipset
code that calls acpi_pm1_cnt_init() to pass in their own disable_s3 and
disable_s4 settings. vt82c686 is assumed to be fixed "S3 and S4 enabled".

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1204696
Cc: Amit Shah <amit.shah@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20150602' into staging

target-arm queue:
* more EL2 preparation patches
* revert a no-longer-necessary workaround for old glib versions
* add GICv2m support to virt board (MSI support)
* pl061: fix wrong calculation of GPIOMIS register
* support MSI via irqfd
* remove a confusing v8_ prefix from some variable names
* add dynamic sysbus device support to the virt board

# gpg: Signature made Tue Jun  2 17:30:38 2015 BST using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"

* remotes/pmaydell/tags/pull-target-arm-20150602: (22 commits)
  hw/arm/virt: change indentation in a15memmap
  hw/arm/virt: add dynamic sysbus device support
  hw/arm/boot: arm_load_kernel implemented as a machine init done notifier
  hw/arm/sysbus-fdt: helpers for platform bus nodes addition
  target-arm: Remove v8_ prefix from names of non-v8-specific cpreg arrays
  arm_gicv2m: set kvm_gsi_direct_mapping and kvm_msi_via_irqfd_allowed
  kvm: introduce kvm_arch_msi_data_to_gsi
  pl061: fix wrong calculation of GPIOMIS register
  target-arm: Add the GICv2m to the virt board
  target-arm: Extend the gic node properties
  arm_gicv2m: Add GICv2m widget to support MSIs
  target-arm: Add GIC phandle to VirtBoardInfo
  Revert "target-arm: Avoid g_hash_table_get_keys()"
  target-arm: Add TLBI_VAE2{IS}
  target-arm: Add TLBI_ALLE2
  target-arm: Add TLBI_ALLE1{IS}
  target-arm: Add TTBR0_EL2
  target-arm: Add TPIDR_EL2
  target-arm: Add SCTLR_EL2
  target-arm: Add TCR_EL2
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

pc-dimm: don't assert if pc-dimm alignment != hotpluggable mem range size

Drop superfluous pc-dimm alignment on hot-pluggable mem
range size assert, since it causes QEMU crash during hotplug
when hotplugging pc-dimm with alignment bigger than
an alignment of hot-pluggable mem range size.

Instead allow pc_dimm_get_free_addr() find free address
and bail out gracefully later in that function during
checking if pc-dimm will fit in hot-pluggable mem range.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

softmmu: support up to 12 MMU modes

At 8k per TLB (for 64-bit host or target), 8 or more modes
make the TLBs bigger than 64k, and some RISC TCG backends do
not like that. On the affected hosts, cut the TLB size in
half---there is still a measurable speedup on PPC with the
next patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1424436345-37924-3-git-send-email-pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

tcg: add TCG_TARGET_TLB_DISPLACEMENT_BITS

This will be used to size the TLB when more than 8 MMU modes are
used by the target. Limitations come from the limited size of
the immediate fields (which sometimes, as in the case of Aarch64,
extend to instructions that shift the immediate).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1424436345-37924-2-git-send-email-pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

tci: do not use CPUArchState in tcg-target.h

tcg-target.h does not use any QEMU-specific symbols, save for tci's usage
of CPUArchState. Pull that up to tcg/tcg.h.

This will make it possible to include tcg-target.h in cpu-defs.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

Add David Gibson for sPAPR in MAINTAINERS file

At Alex Graf's request I'm now acting as sub-maintainer for the sPAPR
(-machine pseries) code. This updates MAINTAINERS accordingly.

While we're at it, change the label to mention pseries since that's the
actual name of the machine type, even if most of the C files use the sPAPR
name.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Enable in-kernel H_LOGICAL_CI_{LOAD, STORE} implementations

qemu currently implements the hypercalls H_LOGICAL_CI_LOAD and
H_LOGICAL_CI_STORE as PAPR extensions.  These are used by the SLOF firmware
for IO, because performing cache inhibited MMIO accesses with the MMU off
(real mode) is very awkward on POWER.

This approach breaks when SLOF needs to access IO devices implemented
within KVM instead of in qemu.  The simplest example would be virtio-blk
using an iothread, because the iothread / dataplane mechanism relies on
an in-kernel implementation of the virtio queue notification MMIO.

To fix this, an in-kernel implementation of these hypercalls has been made,
(kernel commit 99342cf "kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM"
however, the hypercalls still need to be enabled from qemu.  This performs
the necessary calls to do so.

It would be nice to provide some warning if we encounter a problematic
device with a kernel which doesn't support the new calls.  Unfortunately,
I can't see a way to detect this case which won't either warn in far too
many cases that will probably work, or which is horribly invasive.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr: override default ram size to 512MB

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

machine: add default_ram_size to machine class

Machines types can have different requirement for default ram
size. Introduce a member in the machine class and set the current
default_ram_size to 128MB.

For QEMUMachine types override the value during the registration of
the machine and for MachineClass introduce the generic class init
setting the default_ram_size.

Add helpers [K,M,G,T,P,E]_BYTE for better readability and easy usage

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_pci: emit hotplug add/remove events during hotplug

This uses extension of existing EPOW interrupt/event mechanism
to notify userspace tools like librtas/drmgr to handle
in-guest configuration/cleanup operations in response to
device_add/device_del.

Userspace tools that don't implement this extension will need
to be run manually in response/advance of device_add/device_del,
respectively.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_pci: enable basic hotplug operations

This enables hotplug of PCI devices to a PHB. Upon hotplug we
generate the OF-nodes required by PAPR specification and
IEEE 1275-1994 "PCI Bus Binding to Open Firmware" for the
device.

We associate the corresponding FDT for these nodes with the DRC
corresponding to the slot, which will be fetched via
ibm,configure-connector RTAS calls by the guest as described by PAPR
specification.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pci: make pci_bar useable outside pci.c

We need to work with PCI BARs to generate OF properties
during PCI hotplug for sPAPR guests.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_pci: create DRConnectors for each PCI slot during PHB realize

These will be used to support hotplug/unplug of PCI devices to the PCI
bus associated with a particular PHB.

We also set up device-tree properties in each PHBs initial FDT to
describe the DRCs associated with them. This advertises to guests that
each PHB is DR-capable device with physical hotpluggable slots, each
managed by the corresponding DRC. This is necessary for allowing
hotplugging of devices to it later via bus rescan or guest rpaphp
hotplug module.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge

This option enables/disables PCI hotplug for a particular PHB.

Also add machine compatibility code to disable it by default for machine
types prior to pseries-2.4.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[agraf: move commas for compat fields]
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_drc: add spapr_drc_populate_dt()

This function handles generation of ibm,drc-* array device tree
properties to describe DRC topology to guests. This will by used
by the guest to direct RTAS calls to manage any dynamic resources
we associate with a particular DR Connector as part of
hotplug/unplug.

Since general management of boot-time device trees are handled
outside of sPAPRDRConnector, we insert these values blindly given
an FDT and offset. A mask of sPAPRDRConnector types is given to
instruct us on what types of connectors entries should be generated
for, since descriptions for different connectors may live in
different parts of the device tree.

Based on code originally written by Nathan Fontenot.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_events: event-scan RTAS interface

We don't actually rely on this interface to surface hotplug events, and
instead rely on the similar-but-interrupt-driven check-exception RTAS
interface used for EPOW events. However, the existence of this interface
is needed to ensure guest kernels initialize the event-reporting
interfaces which will in turn be used by userspace tools to handle these
events, so we implement this interface here.

Since events surfaced by this call are mutually exclusive to those
surfaced via check-exception, we also update the RTAS event queue code
to accept a boolean to mark/filter for events accordingly.

Events of this sort are not currently generated by QEMU, but the interface
has been tested by surfacing hotplug events via event-scan in place
of check-exception.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_events: re-use EPOW event infrastructure for hotplug events

This extends the data structures currently used to report EPOW events to
guests via the check-exception RTAS interfaces to also include event types
for hotplug/unplug events.

This is currently undocumented and being finalized for inclusion in PAPR
specification, but we implement this here as an extension for guest
userspace tools to implement (existing guest kernels simply log these
events via a sysfs interface that's read by rtas_errd, and current
versions of rtas_errd/powerpc-utils already support the use of this
mechanism for initiating hotplug operations).

We also add support for queues of pending RTAS events, since in the
case of hotplug there's chance for multiple events being in-flight
at any point in time.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_rtas: add ibm, configure-connector RTAS interface

This interface is used to fetch an OF device-tree nodes that describes a
newly-attached device to guest. It is called multiple times to walk the
device-tree node and fetch individual properties into a 'workarea'/buffer
provided by the guest.

The device-tree is generated by QEMU and passed to an sPAPRDRConnector during
the initial hotplug operation, and the state of these RTAS calls is tracked by
the sPAPRDRConnector. When the last of these properties is successfully
fetched, we report as special return value to the guest and transition
the device to a 'configured' state on the QEMU/DRC side.

See docs/specs/ppc-spapr-hotplug.txt for a complete description of
this interface.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr: add rtas_st_buffer_direct() helper

This is similar to the existing rtas_st_buffer(), but for cases
where the guest is not expecting a length-encoded byte array.
Namely, for calls where a "work area" buffer is used to pass
around arbitrary fields/data.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_rtas: add get-sensor-state RTAS interface

This interface allows a guest to read various platform/device sensors.
initially, we only implement support necessary to support hotplug:
reading of the dr-entity-sense sensor, which communicates the state of
a hotplugged resource/device to the guest (EMPTY/PRESENT/UNUSABLE).

See docs/specs/ppc-spapr-hotplug.txt for a complete description of
this interface.

Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_rtas: add set-indicator RTAS interface

This interface allows a guest to control various platform/device
sensors. Initially, we only implement support necessary to control
sensors that are required for hotplug: DR connector indicators/LEDs,
resource allocation state, and resource isolation state.

See docs/specs/ppc-spapr-hotplug.txt for a complete description of
this interface.

Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_rtas: add get/set-power-level RTAS interfaces

These interfaces manage the power domains that guest devices are
assigned to and are used to power on/off devices. Currently we
only utilize 1 power domain, the 'live-insertion' domain, which
automates power management of plugged/unplugged devices, essentially
making these calls no-ops, but the RTAS interfaces are still required
by guest hotplug code and PAPR+.

See docs/specs/ppc-spapr-hotplug.txt for a complete description of
these interfaces.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_drc: initial implementation of sPAPRDRConnector device

This device emulates a firmware abstraction used by pSeries guests to
manage hotplug/dynamic-reconfiguration of host-bridges, PCI devices,
memory, and CPUs. It is conceptually similar to an SHPC device,
complete with LED indicators to identify individual slots to physical
physical users and indicate when it is safe to remove a device. In
some cases it is also used to manage virtualized resources, such a
memory, CPUs, and physical-host bridges, which in the case of pSeries
guests are virtualized resources where the physical components are
managed by the host.

Guests communicate with these DR Connectors using RTAS calls,
generally by addressing the unique DRC index associated with a
particular connector for a particular resource. For introspection
purposes we expose this state initially as QOM properties, and
in subsequent patches will introduce the RTAS calls that make use of
it. This constitutes to the 'guest' interface.

On the QEMU side we provide an attach/detach interface to associate
or cleanup a DeviceState with a particular sPAPRDRConnector in
response to hotplug/unplug, respectively. This constitutes the
'physical' interface to the DR Connector.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

docs: add sPAPR hotplug/dynamic-reconfiguration documentation

This adds a general overview of hotplug/dynamic-reconfiguration
for sPAPR/pSeries guest.

As specified in PAPR+ v2.7.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

hw/ppc/spapr: Use error_report() instead of hw_error()

hw_error() is designed for printing CPU-related error messages
(e.g. it also prints a full CPU register dump). For error messages
that are not directly related to CPU problems, a function like
error_report() should be used instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

hw/ppc/spapr: Fix error message when firmware could not be loaded

When specifying a non-existing file with the "-bios" parameter, QEMU
complained that it "could not find LPAR rtas". That's obviously a
copy-n-paste bug from the code which loads the spapr-rtas.bin, it
should complain about a missing firmware file instead.
Additionally the error message was printed with hw_error() - which
also dumps the whole CPU state. However, this does not make much
sense here since the CPU is not running yet and thus the registers
only contain zeroes. So let's use error_report() here instead.
And while we're at it, let's also bail out if the firmware file
had zero length.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Add pseries-2.4 machine type

Now that 2.4 development has opened, create a new pseries machine type
variant. For now it is identical to the pseries-2.3 machine type, but
a number of new features are coming that will need to set backwards
compatibility options.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

hw/ppc/spapr_iommu: Fix the check for invalid upper bits in liobn

The check "liobn & 0xFFFFFFFF00000000ULL" in spapr_tce_find_by_liobn()
is completely useless since liobn is only declared as an uint32_t
parameter. Fix this by using target_ulong instead (this is what most
of the callers of this function are using, too).

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_iommu: Give unique QOM name to TCE table

Useful for debugging.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_pci: Rework device-tree rendering

This replaces object_child_foreach() and callback with existing
SPAPR_PCI_LIOBN() and spapr_tce_find_by_liobn() to make the code easier
to read.

This is a mechanical patch so no behaviour change is expected.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_iommu: Make spapr_tce_find_by_liobn() public

At the moment spapr_tce_find_by_liobn() is used by H_PUT_TCE/...
handlers to find an IOMMU by LIOBN.

We are going to implement Dynamic DMA windows (DDW), new code
will go to a new file and we will use spapr_tce_find_by_liobn()
there too so let's make it public.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_pci: Make find_phb()/find_dev() public

This makes find_phb()/find_dev() public and changed its names
to spapr_pci_find_phb()/spapr_pci_find_dev() as they are going to
be used from other parts of QEMU such as VFIO DDW (dynamic DMA window)
or VFIO PCI error injection or VFIO EEH handling - in all these
cases there are RTAS calls which are addressed to BUID+config_addr
in IEEE1275 format.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_iommu: Add separate trace points for PCI DMA operations

This is to reduce VIO noise while debugging PCI DMA.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_pci: Define default DMA window size as a macro

This gets rid of a magic constant describing the default DMA window size
for an emulated PHB.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_vio: Introduce a liobn number generating macros

This introduces a macro which makes up a LIOBN from fixed prefix and
VIO device address (@reg property).

This is to keep LIOBN macros rendering consistent - the same macro for
PCI has been added by the previous patch.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_pci: Introduce a liobn number generating macros

We are going to have multiple DMA windows per PHB and we want them to
migrate so we need a predictable way of assigning LIOBNs.

This introduces a macro which makes up a LIOBN from fixed prefix,
PHB index (unique PHB id) and window number.

This introduces a SPAPR_PCI_DMA_WINDOW_NUM() to know the window number
from LIOBN. It is used to distinguish the default 32bit windows from
dynamic windows and avoid picking default DMA window properties from
a wrong TCE table.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_iommu: Make H_PUT_TCE_INDIRECT endian-safe

PAPR is defined as big endian so TCEs need an adjustment so
does this patch.

This changes code to have ldq_be_phys() in one place.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows

The existing KVM_CREATE_SPAPR_TCE ioctl only support 4G windows max as
the window size parameter to the kernel ioctl() is 32-bit so
there's no way of expressing a TCE window > 4GB.

We are going to add huge DMA windows support so this will create small
window and unexpectedly fail later.

This disables KVM_CREATE_SPAPR_TCE for windows bigger that 4GB.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

spapr_pci: Fix unsafe signed/unsigned comparisons

spapr_pci.c contains a number of expressions of the form (uval == -1) or
(uval != -1), where 'uval' is an unsigned value.

This mostly works in practice, because as long as the width of uval is
greater or equal than that of (int), the -1 will be promoted to the
unsigned type, which is the expected outcome.

However, at least for the cases where uval is uint32_t, this would break
on platforms where sizeof(int) > 4 (and a few such do exist), because then
the uint32_t value would be promoted to the larger int type, and never be
equal to -1.

This patch fixes these errors.  The fixes for the (uint32_t) cases are
necessary as described above.  I've made similar fixes to (uint64_t) and
(hwaddr) cases.  Those are strictly theoretical, since I don't know of any
platforms where sizeof(int) > 8, but hey, it's not that hard so we might
as well be strictly C standard compliant.

Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

configure: Check for libfdt version 1.4.0

Some recent patches require a function from libfdt version 1.4.0,
so we should check for this version during the configure step
already. Unfortunately, there does not seem to be a proper #define
for the version number in the libfdt headers. So alternatively,
we check for the availability of the required function
fdt_get_property_by_offset() instead instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

dtc: Update dtc / libfdt submodule to version 1.4.0

Since some recent patches require libfdt version 1.4.0,
let's update the dtc submodule to this version.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

macio: Convert to realize()

Convert device models "macio-oldworld" and "macio-newworld".

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

docs: Add PXB documentation

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>

apci: fix PXB behaviour if used with unsupported BIOS

PXB does not work with unsupported bioses, but should
not interfere with normal OS operation.
We don't ship them anymore, but it's reasonable
to keep the work-around until we update the bios in qemu.

Fix this by not adding PXB mem/IO chunks to _CRS
if they weren't configured by BIOS.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>

hw/pxb: add numa_node parameter

The pxb can be attach to and existing numa node by specifying
numa_node option that equals the desired numa nodeid.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>

hw/pci: add support for NUMA nodes

PCI root buses can be attached to a specific NUMA node.
PCI buses are not attached by default to a NUMA node.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>

hw/pxb: add map_irq func

The bios does not index the pxb slot number when
it computes the IRQ because it resides on bus 0
and not on the current bus.
However Qemu routes the irq through bus 0 and adds
the pxb slot to the IRQ computation of the PXB device.

Synchronize between bios and Qemu by canceling
pxb's effect.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>

hw/pci: inform bios if the system has extra pci root buses

The bios looks for 'etc/extra-pci-roots' to decide if
is going to scan further buses after bus 0 tree.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>

hw/pci: introduce PCI Expander Bridge (PXB)

PXB is a "light-weight" host bridge whose purpose is to enable
the main host bridge to support multiple PCI root buses
for pc machines.

As oposed to PCI-2-PCI bridge's secondary bus, PXB's bus
is a primary bus and can be associated with a NUMA node
(different from the main host bridge) allowing the guest OS
to recognize the proximity of a pass-through device to
other resources as RAM and CPUs.

The PXB is composed from:
- A primary PCI bus (can be associated with a NUMA node)
   Acts like a normal pci bus and from the functionality point
   of view is an "expansion" of the bus behind the
   main host bridge.
- A pci-2-pci bridge behind the primary PCI bus where the actual
   devices will be attached.
- A host-bridge PCI device
   Situated on the bus behind the main host bridge, allows
   the BIOS to configure the bus number and IO/mem resources.
   It does not have its own config/data register for configuration
   cycles, this being handled by the main host bridge.
-  A host-bridge sysbus to comply with QEMU current design.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>

hw/pci: removed 'rootbus nr is 0' assumption from qmp_pci_query

Use the newer pci_bus_num to correctly get the root bus number.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>

hw/acpi: remove from root bus 0 the crs resources used by other buses.

If multiple root buses are used, root bus 0 cannot use all the
pci holes ranges. Remove the IO/mem ranges used by the other
primary buses.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>

hw/acpi: add _CRS method for extra root busses

Save the IO/mem/bus numbers ranges assigned to the extra root busses
to be removed from the root bus 0 range.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>

hw/apci: add _PRT method for extra PCI root busses

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>

hw/acpi: add support for i440fx 'snooping' root busses

If the machine has extra root busses that are snooping to
the i440fx host bridge, we need to add them to
acpi in order to be properly detected by guests.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>

hw/pci: extend PCI config access to support devices behind PXB

PXB buses are assumed to be children of bus 0. Look for them
while scanning the buses.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>

hw/i386: query only for q35/pc when looking for pci host bridge

Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE anymore.
On i386 arch we only have two pci hosts, so we can look only for them.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>