review.tizen.org Git - sdk/emulator/qemu.git/log

COLO: Implement failover work for secondary VM

If users require SVM to takeover work, COLO incoming thread should
exit from loop while failover BH helps backing to migration incoming
coroutine.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

COLO: Implement the process of failover for primary VM

For primary side, if COLO gets failover request from users.
To be exact, gets 'x_colo_lost_heartbeat' command.
COLO thread will exit the loop while the failover BH does the
cleanup work and resumes VM.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

COLO: Introduce state to record failover process

When handling failover, COLO processes differently according to
the different stage of failover process, here we introduce a global
atomic variable to record the status of failover.

We add four failover status to indicate the different stage of failover process.
You should use the helpers to get and set the value.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

COLO: Add 'x-colo-lost-heartbeat' command to trigger failover

We leave users to choose whatever heartbeat solution they want,
if the heartbeat is lost, or other errors they detect, they can use
experimental command 'x_colo_lost_heartbeat' to tell COLO to do failover,
COLO will do operations accordingly.

For example, if the command is sent to the Primary side,
the Primary side will exit COLO mode, does cleanup work,
and then, PVM will take over the service work. If sent to the Secondary side,
the Secondary side will run failover work, then takes over PVM's service work.

Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

COLO: Synchronize PVM's state to SVM periodically

Do checkpoint periodically, the default interval is 200ms.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

COLO: Add checkpoint-delay parameter for migrate-set-parameters

Add checkpoint-delay parameter for migrate-set-parameters, so that
we can control the checkpoint frequency when COLO is in periodic mode.

Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

COLO: Load VMState into QIOChannelBuffer before restore it

We should not destroy the state of SVM (Secondary VM) until we receive
the complete data of PVM's state, in case the primary fails in the process
of sending the state, so we cache the VM's state in secondary side before
load it into SVM.

Besides, we should call qemu_system_reset() before load VM state,
which can ensure the data is intact.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

COLO: Send PVM state to secondary side when do checkpoint

VM checkpointing is to synchronize the state of PVM to SVM, just
like migration does, we re-use save helpers to achieve migrating
PVM's state to Secondary side.

COLO need to cache the data of VM's state in the secondary side before
synchronize it to SVM. COLO need the size of the data to determine
how much data should be read in the secondary side.
So here, we can get the size of the data by saving it into I/O channel
before send it to the secondary side.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

COLO: Add a new RunState RUN_STATE_COLO

Guest will enter this state when paused to save/restore VM state
under COLO checkpoint.

Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

COLO: Introduce checkpointing protocol

We need communications protocol of user-defined to control
the checkpointing process.

The new checkpointing request is started by Primary VM,
and the interactive process like below:

Checkpoint synchronizing points:

                   Primary               Secondary
                                            initial work
'checkpoint-ready'    <-------------------- @

'checkpoint-request'  @ -------------------->
                                            Suspend (Only in hybrid mode)
'checkpoint-reply'    <-------------------- @
                      Suspend&Save state
'vmstate-send'        @ -------------------->
                      Send state            Receive state
'vmstate-received'    <-------------------- @
                      Release packets       Load state
'vmstate-load'        <-------------------- @
                      Resume                Resume (Only in hybrid mode)

                      Start Comparing (Only in hybrid mode)
NOTE:
1) '@' who sends the message
2) Every sync-point is synchronized by two sides with only
    one handshake(single direction) for low-latency.
    If more strict synchronization is required, a opposite direction
    sync-point should be added.
3) Since sync-points are single direction, the remote side may
    go forward a lot when this side just receives the sync-point.
4) For now, we only support 'periodic' checkpoint, for which
   the Secondary VM is not running, later we will support 'hybrid' mode.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

COLO: Establish a new communicating path for COLO

This new communication path will be used for returning messages
from Secondary side to Primary side.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

migration: Switch to COLO process after finishing loadvm

Switch from normal migration loadvm process into COLO checkpoint process if
COLO mode is enabled.

We add three new members to struct MigrationIncomingState,
'have_colo_incoming_thread' and 'colo_incoming_thread' record the COLO
related thread for secondary VM, 'migration_incoming_co' records the
original migration incoming coroutine.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

migration: Enter into COLO mode after migration if COLO is enabled

Add a new migration state: MIGRATION_STATUS_COLO. Migration source side
enters this state after the first live migration successfully finished
if COLO is enabled by command 'migrate_set_capability x-colo on'.

We reuse migration thread, so the process of checkpointing will be handled
in migration thread.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

COLO: migrate COLO related info to secondary node

We can determine whether or not VM in destination should go into COLO mode
by referring to the info that was migrated.

We skip this section if COLO is not enabled (i.e.
migrate_set_capability colo off), so that, It doesn't break compatibility
with migration no matter whether users configure the --enable-colo/disable-colo
on the source/destination side or not;

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

migration: Introduce capability 'x-colo' to migration

We add helper function colo_supported() to indicate whether
colo is supported or not, with which we use to control whether or not
showing 'x-colo' string to users, they can use qmp command
'query-migrate-capabilities' or hmp command 'info migrate_capabilities'
to learn if colo is supported.

The default value for COLO (COarse-Grain LOck Stepping) is disabled.

Cc: Juan Quintela <quintela@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Amit Shah <amit@amitshah.net>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-ui-20161028-1' into staging

braille fixes and improvements.
curses fix, switch to cursesw.
gtk bugfixes.

# gpg: Signature made Fri 28 Oct 2016 13:05:12 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-ui-20161028-1:
  curses: Use cursesw instead of curses
  curses: fix left/right arrow translation
  ui/gtk: Fix non-working DELETE key
  gtk: fix compilation warning with gtk 3.22.2
  Defer BrlAPI tty acquisition to when guest starts using device
  Add dots keypresses support to the baum braille device

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/vivier/tags/m68k-part2-pull-request' into staging

# gpg: Signature made Fri 28 Oct 2016 09:44:23 BST
# gpg:                using RSA key 0xF30C38BD3F2FBE3C
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>"
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>"
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>"
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier/tags/m68k-part2-pull-request:
  MAINTAINERS: update M68K entry
  target-m68k: immediate ops manage word and byte operands
  target-m68k: cmp manages word and bytes operands
  target-m68k: add/sub manage word and byte operands
  target-m68k: add addressing modes to neg
  target-m68k: introduce byte and word cc_ops
  target-m68k: some bit ops cleanup
  target-m68k: suba/adda can manage word operand
  target-m68k: and can manage word and byte operands
  target-m68k: or can manage word and byte operands
  target-m68k: eor can manage word and byte operands
  target-m68k: add addressing modes to not
  target-m68k: Inline addx, subx, negx
  target-m68k: add dbcc
  target-m68k: add addressing modes to scc
  target-m68k: add exg ops
  target-m68k: add linkl
  target-m68k: add bkpt instruction

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.8-20161028' into staging

ppc patch queue 2016-10-28

This pull request supersedes and extends the one from 2016-10-26
(which had a build bug).

Highlights:
  * SLOF (pseries guest firmware) update
  * Enable a number of extra testcases on ppc / pseries
  * Added the 'powernv' machine type
    - Almost enough to be minimally usable
    - But still missing necessary interrupt controller updates
  * Cleanup and consolidation of NVRAM handling on several platforms
    with related firmware
  * Substantial cleanup to device tree construction
  * Some more POWER9 instruction emulation
  * Cleanup to handling of pseries option vectors and CAS reboot
    handling (host/guest feature negotiation mechanism)
  * Significant cleanups to handling of PCI devices in test cases
  * New hotplug event infrastructure
  * Memory hot unplug support for pseries
  * Several bug fixes

The NVRAM cleanup affects some Sun sparc platforms as well as ppc
ones, but have been tested by the sparc maintainer (Mark Cave-Ayland).

The test additions also include substantial general changes to the
test framework that aren't strictly ppc related.  They don't seem to
break tests on other platforms, they're for the benefit of enabling
tests on ppc and there isn't a specific maintainer for them, so
they're included in this tree.

# gpg: Signature made Fri 28 Oct 2016 02:37:19 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.8-20161028: (73 commits)
  ppc: allow certain HV interrupts to be delivered to guests
  spapr: Memory hot-unplug support
  spapr: use count+index for memory hotplug
  spapr: Add DRC count indexed hotplug identifier type
  spapr: add hotplug interrupt machine options
  spapr_events: add support for dedicated hotplug event source
  spapr: update spapr hotplug documentation
  target-ppc: Add xvcmpnesp, xvcmpnedp instructions
  target-ppc: add xscmp[eq,gt,ge,ne]dp instructions
  tests: Add pseries machine to the prom-env-test, too
  spapr_nvram: Pre-initialize the NVRAM to support the -prom-env parameter
  libqos: Change PCI accessors to take opaque BAR handle
  tests: Don't assume structure of PCI IO base in ahci-test
  tests: Use qpci_mem{read,write} in ivshmem-test
  libqos: Add 64-bit PCI IO accessors
  tests: Clean up IO handling in ide-test
  libqos: Implement mmio accessors in terms of mem{read,write}
  libqos: Add streaming accessors for PCI MMIO
  tests: Adjust tco-test to use qpci_legacy_iomap()
  libqos: Better handling of PCI legacy IO
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/berrange/tags/pull-qio-2016-10-27-1' into staging

Merge qio 2016/10/27 v1

# gpg: Signature made Thu 27 Oct 2016 13:54:03 BST
# gpg:                using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/pull-qio-2016-10-27-1:
  main: set names for main loop sources created
  vnc: set name for all I/O channels created
  migration: set name for all I/O channels created
  char: set name for all I/O channels created
  nbd: set name for all I/O channels created
  io: add ability to set a name for IO channels
  io: Add a QIOChannelSocket cleanup test
  io: set LISTEN flag explicitly for listen sockets
  io: Introduce a qio_channel_set_feature() helper
  io: Use qio_channel_has_feature() where applicable
  io: Fix double shift usages on QIOChannel features

Conflicts:
qemu-char.c

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging

# gpg: Signature made Thu 27 Oct 2016 22:15:57 BST
# gpg:                using RSA key 0x7DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/ide-pull-request:
  qemu-iotests: Test creating floppy drives
  fdc: Move qdev properties to FloppyDrive
  fdc: Add a floppy drive qdev
  fdc: Add a floppy qbus
  macio: switch over to new byte-aligned DMA helpers
  dma-helpers: explicitly pass alignment into DMA helpers

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Thu 27 Oct 2016 18:15:47 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (23 commits)
  iotests: Add test for NBD's blockdev-add interface
  iotests: Add assert_json_filename_equal() method
  socket_scm_helper: Accept fd directly
  iotests.py: Allow concurrent qemu instances
  iotests.py: Add qemu_nbd function
  qapi: Allow blockdev-add for NBD
  block/nbd: Use SocketAddress options
  block/nbd: Accept SocketAddress
  block/nbd: Add nbd_has_filename_options_conflict()
  block/nbd: Use qdict_put()
  block/nbd: Default port in nbd_refresh_filename()
  block/nbd: Reject port parameter without host
  block/nbd: Drop trailing "." in error messages
  qemu-iotests: Fix typo for NFS with IMGOPTSSYNTAX
  block: Remove bdrv_aio_ioctl()
  raw: Implement .bdrv_co_ioctl instead of .bdrv_aio_ioctl
  block: Introduce .bdrv_co_ioctl() driver callback
  block: Remove bdrv_ioctl()
  raw-posix: Don't use bdrv_ioctl()
  block: Use blk_co_ioctl() for all BB level ioctls
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-seabios-20161027-2' into staging

seabios: update to 1.10.0 release.

# gpg: Signature made Thu 27 Oct 2016 15:50:54 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-seabios-20161027-2:
  seabios: update to 1.10.0 release.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

curses: Use cursesw instead of curses

Use ncursesw package instead of curses on non-mingw, and check a few
functions.
Also take cflags from pkg-config, since cursesw headers may be in a
separate, non-default directory.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-id: 20161015195308.20473-3-samuel.thibault@ens-lyon.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

curses: fix left/right arrow translation

In default VGA font, left/right arrow are glyphs 0x1a and 0x1b, not 0x0a and
0x0b.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-id: 20161015195308.20473-2-samuel.thibault@ens-lyon.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

ui/gtk: Fix non-working DELETE key

GTK generates key events for the delete key with key->string[0] = 0x7f
... but this does not work right with the readline_handle_byte()
function in util/readline.c, since this treats the keycode 127 as
backspace. So let's add a special case for the GTK delete key to make
this key behave right in the monitor interface of the GTK ui.

Buglink: https://bugs.launchpad.net/qemu/+bug/1619438
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1477570647-7100-1-git-send-email-thuth@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

gtk: fix compilation warning with gtk 3.22.2

gdk_screen_get_width() is deprecated since gtk 3.22.2, use
gdk_monitor_get_geometry() instead if it's available.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20161026152108.12364-1-berto@igalia.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Defer BrlAPI tty acquisition to when guest starts using device

We do not want to catch the BrlAPI input/ouput immediately, but only
when the guest has started discussing withour virtual device.

This notably fixes input before the guest driver has started.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Add dots keypresses support to the baum braille device

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-vga-20161027-1' into staging

virtio-gpu: fix memory leak in virtio_gpu_resource_create_2d

# gpg: Signature made Thu 27 Oct 2016 15:32:38 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-vga-20161027-1:
  virtio-gpu: fix memory leak in virtio_gpu_resource_create_2d

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

MAINTAINERS: update M68K entry

Add myself to be the M68K maintainer.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Thomas Huth <huth@tuxfamily.org>

target-m68k: immediate ops manage word and byte operands

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: cmp manages word and bytes operands

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: add/sub manage word and byte operands

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: add addressing modes to neg

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: introduce byte and word cc_ops

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: some bit ops cleanup

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: suba/adda can manage word operand

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: and can manage word and byte operands

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: or can manage word and byte operands

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: eor can manage word and byte operands

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: add addressing modes to not

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: Inline addx, subx, negx

Signed-off-by: Richard Henderson <rth@twiddle.net>
And add opcodes for 680x0

Signed-off-by: Laurent Vivier <laurent@vivier.eu>

target-m68k: add dbcc

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: add addressing modes to scc

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: add exg ops

Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: add linkl

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>

target-m68k: add bkpt instruction

Signed-off-by: Laurent Vivier <laurent@vivier.eu>

ppc: allow certain HV interrupts to be delivered to guests

ppc hypervisors have delivered system reset and machine check exception
interrupts to guests in some situations (e.g., see FWNMI feature of LoPAPR,
or NMI injection in QEMU).

These exceptions are architected to set the HV bit in hardware, however
when injected into a guest, the HV bit should be cleared. Current code
masks off the HV bit before setting the new MSR, however this happens after
the interrupt delivery model has calculated delivery mode for the exception.
This can result in the guest's MSR LE bit being lost.

Account for this in the exception handler and don't set HV bit for guest
delivery.

Also add another sanity check to ensure similar bugs get caught.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr: Memory hot-unplug support

Add support to hot remove pc-dimm memory devices.

Since we're introducing a machine-level unplug_request hook, we also
had handling for CPU unplug there as well to ensure CPU unplug
continues to work as it did before.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
* add hooks to CAS/cmdline enablement of hotplug ACR support
* add hook for CPU unplug
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr: use count+index for memory hotplug

Commit 0a417869:

spapr: Move memory hotplug to RTAS_LOG_V6_HP_ID_DRC_COUNT type

dropped per-DRC/per-LMB hotplugs event in favor of a bulk add via a
single LMB count value. This was to avoid overrunning the guest EPOW
event queue with hotplug events. This works fine, but relies on the
guest exhaustively scanning for pluggable LMBs to satisfy the
requested count by issuing rtas-get-sensor(DR_ENTITY_SENSE, ...) calls
until all the LMBs associated with the DIMM are identified.

With newer support for dedicated hotplug event source, this queue
exhaustion is no longer as much of an issue due to implementation
details on the guest side, but we still try to avoid excessive hotplug
events by now supporting both a count and a starting index to avoid
unecessary work. This patch makes use of that approach when the
capability is available.

Cc: bharata@linux.vnet.ibm.com
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr: Add DRC count indexed hotplug identifier type

Add support for DRC count indexed hotplug ID type which is primarily
needed for memory hot unplug. This type allows for specifying the
number of DRs that should be plugged/unplugged starting from a given
DRC index.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
* updated rtas_event_log_v6_hp to reflect count/index field ordering
used in PAPR hotplug ACR
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr: add hotplug interrupt machine options

This adds machine options of the form:

-machine pseries,modern-hotplug-events=true
-machine pseries,modern-hotplug-events=false

If false, QEMU will force the use of "legacy" style hotplug events,
which are surfaced through EPOW events instead of a dedicated
hot plug event source, and lack certain features necessary, mainly,
for memory unplug support.

If true, QEMU will enable support for "modern" dedicated hot plug
event source. Note that we will still default to "legacy" style unless
the guest advertises support for the "modern" hotplug events via
ibm,client-architecture-support hcall during early boot.

For pseries-2.7 and earlier we default to false, for newer machine
types we default to true.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr_events: add support for dedicated hotplug event source

Hotplug events were previously delivered using an EPOW interrupt
and were queued by linux guests into a circular buffer. For traditional
EPOW events like shutdown/resets, this isn't an issue, but for hotplug
events there are cases where this buffer can be exhausted, resulting
in the loss of hotplug events, resets, etc.

Newer-style hotplug event are delivered using a dedicated event source.
We enable this in supported guests by adding standard an additional
event source in the guest device-tree via /event-sources, and, if
the guest advertises support for the newer-style hotplug events,
using the corresponding interrupt to signal the available of
hotplug/unplug events.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr: update spapr hotplug documentation

This updates the existing documentation to reflect recent updates to
the hotplug event structure, which are in draft form but slated
for inclusion in PAPR/LoPAPR.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target-ppc: Add xvcmpnesp, xvcmpnedp instructions

xvcmpnedp[.]: VSX Vector Compare Not Equal Double-Precision
xvcmpnesp[.]: VSX Vector Compare Not Equal Single-Precision

Signed-off-by: Swapnil Bokade <bokadeswapnil@gmail.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target-ppc: add xscmp[eq,gt,ge,ne]dp instructions

xscmpeqdp: VSX Scalar Compare Equal Double-Precision
xscmpgedp: VSX Scalar Compare Greater Than or Equal Double-Precision
xscmpgtdp: VSX Scalar Compare Greater Than Double-Precision
xscmpnedp: VSX Scalar Compare Not Equal Double-Precision

Signed-off-by: Sandipan Das <sandipandas1990@gmail.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

tests: Add pseries machine to the prom-env-test, too

Now that we also support the "-prom-env" parameter for the pseries
machine, we can enable this test for this machine, too. Since booting
with TCG is rather slow with the pseries machine, we also enable
the "-nodefaults" parameter for this test now, so that SLOF does not
have to check that much devices during boot and thus runs a little
bit faster.

Signed-off-by: Thomas Huth <thuth@redhat.com>
[dwg: Don't add -nodefaults to the command line, it causes extra warnings
for the sparc testcases]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr_nvram: Pre-initialize the NVRAM to support the -prom-env parameter

In case we do not load the NVRAM contents from a file and the user
specified the "-prom-env" parameter, use the new CHRP NVRAM helper
functions to pre-initialize the NVRAM partitions, so that the SLOF
firmware now can pick up the environment variables from the -prom-env
parameter, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

libqos: Change PCI accessors to take opaque BAR handle

The usual use model for the libqos PCI functions is to map a specific PCI
BAR using qpci_iomap() then pass the returned token into IO accessor
functions.  This, and the fact that iomap() returns a (void *) which
actually contains a PCI space address, kind of suggests that the return
value from iomap is supposed to be an opaque token.

..except that the callers expect to be able to add offsets to it.  Which
also assumes the compiler will support pointer arithmetic on a (void *),
and treat it as working with byte offsets.

To clarify this situation change iomap() and the IO accessors to take
a definitely opaque BAR handle (enforced with a wrapper struct) along with
an offset within the BAR.  This changes both the functions and all the
callers.

There were a number of places that checked if iomap() returned non-NULL,
and or initialized it to NULL before hand.  Since iomap() already assert()s
if it fails to map the BAR, these tests were mostly pointless and are
removed.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>

tests: Don't assume structure of PCI IO base in ahci-test

In a couple of places ahci-test makes assumptions about how the tokens
returned from qpci_iomap() are formatted in ways it probably shouldn't.

First in verify_state() it uses a non-NULL token to indicate that the AHCI
device has been enabled (part of enabling is to iomap()).  This changes it
to use an explicit 'enabled' flag instead.

Second, it uses the fact that the token contains a PCI address, stored when
the BAR is mapped during initialization to check that the BAR has the same
value after a migration.  This changes it to explicitly read the BAR
register before and after the migration and compare.

Together, these changes will  make the test more robust against changes to
the internals of the libqos PCI layer.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

tests: Use qpci_mem{read,write} in ivshmem-test

ivshmem implements a block of shared memory in a PCI BAR. Currently our
test case accesses this using qtest_mem{read,write}. However, deducing
the correct addresses for these requires making assumptions about the
internel format returned by qpci_iomap(), along with some ugly casts.

This patch changes the test to use the new qpci_mem{read,write} interfaces
which is neater.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

libqos: Add 64-bit PCI IO accessors

Currently the libqos PCI layer includes accessor helpers for 8, 16 and 32
bit reads and writes.  It's likely that we'll want 64-bit accesses in the
future (plenty of modern peripherals will have 64-bit reigsters).  This
adds them.

For PIO (not MMIO) accesses on the PC backend, this is implemented as two
32-bit ins or outs.  That's not ideal but AFAICT x86 doesn't have 64-bit
versions of in and out.

This patch also converts the single current user of 64-bit accesses -
virtio-pci.c to use the new mechanism, rather than a sequence of 8 byte
reads.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

tests: Clean up IO handling in ide-test

ide-test uses many explicit inb() / outb() operations for its IO, which
means it's not portable to non-x86 platforms. This cleans it up to use
the libqos PCI accessors instead.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>

libqos: Implement mmio accessors in terms of mem{read,write}

In the libqos PCI code we now have accessors both for registers (byte
significance preserving) and for streaming data (byte address order
preserving).  These exist in both the interface for qtest drivers and in
the machine specific backends.

However, the register-style accessors aren't actually necessary in the
backend.  They can be implemented in terms of the byte address order
preserving accessors by the libqos wrappers.  This works because PCI is
always little endian.

This does assume that the back end byte address order preserving accessors
will perform the equivalent of a single bus transaction for short lengths.
This is the case, and in fact they currently end up using the same
cpu_physical_memory_rw() implementation within the qtest accelerator.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

libqos: Add streaming accessors for PCI MMIO

Currently PCI memory (aka MMIO) space is accessed via a set of readb/writeb
style accessors.  This is what we want for accessing discrete registers of
a certain size.  However, there are a few cases where we instead need a
"bag of bytes" style streaming interface to PCI MMIO space.  This can be
either for streaming data style registers or when there's actual memory
rather than registers in PCI space, for example frame buffers or ivshmem.

This patch adds backend callbacks, and libqos wrappers for this type of
byte address order preserving accesses.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

tests: Adjust tco-test to use qpci_legacy_iomap()

Avoid tco-test making assumptions about the internal format of the address
tokens passed to PCI IO accessors, by using the new qpci_legacy_iomap()
function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

libqos: Better handling of PCI legacy IO

The usual model for PCI IO with libqos is to use qpci_iomap() to map a
specific BAR for a PCI device, then perform IOs within that BAR using
qpci_io_{read,write}*().

However, certain devices also have legacy PCI IO.  In this case, instead of
(or as well as) being accessed via PCI BARs, the device can be accessed
via certain well-known, fixed addresses in PCI IO space.

Two existing tests use legacy PCI IO, and take different flawed approaches
to it:
    * tco-test manually constructs a tco_io_base value instead of calling
      qpci_iomap(), which assumes internal knowledge of the structure of
      the value it shouldn't have
    * ide-test uses direct in*() and out*() calls instead of using
      qpci_io_*() accessors, meaning it's not portable to non-x86 machine
      types.

This patch implements a new qpci_iomap_legacy() interface which gets a
handle in the same format as qpci_iomap() but refers to a region in
the legacy PIO space.  For a device which has the same registers
available both in a BAR and in legacy space (quite common), this
allows the same test code to test both options with just a different
iomap() at the beginning.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

libqos: Move BAR assignment to common code

The PCI backends in libqos each supply an iomap() and iounmap() function
which is used to set up a specified PCI BAR.  But PCI BAR allocation takes
place entirely within PCI space, so doesn't really need per-backend
versions.  For example, Linux includes generic BAR allocation code used on
platforms where that isn't done by firmware.

This patch merges the BAR allocation from the two existing backends into a
single simplified copy.  The back ends just need to set up some parameters
describing the window of PCI IO and PCI memory addresses which are
available for allocation.  Like both the existing versions the new one uses
a simple bump allocator.

Note that (again like the existing versions) this doesn't really handle
64-bit memory BARs properly.  It is actually used for such a BAR by the
ivshmem test, and apparently the 32-bit MMIO BAR logic is close enough to
work, as long as the BAR isn't too big.  Fixing that to properly handle
64-bit BAR allocation is a problem for another time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

libqos: Handle PCI IO de-multiplexing in common code

The PCI IO space (aka PIO, aka legacy IO) and PCI memory space (aka MMIO)
are distinct address spaces by the PCI spec (although parts of one might be
aliased to parts of the other in some cases).

However, qpci_io_read*() and qpci_io_write*() can perform accesses to
either space depending on parameter.  That's convenient for test case
drivers, since there are a fair few devices which can be controlled via
either a PIO or MMIO BAR but with an otherwise identical driver.

This is implemented by having addresses below 64kiB treated as PIO, and
those above treated as MMIO.  This works because low addresses in memory
space are generally reserved for DMA rather than MMIO.

At the moment, this demultiplexing must be handled by each PCI backend
(pc and spapr, so far).  There's no real reason for this - the current
encoding is likely to work for all platforms, and even if it doesn't we
can still use a more complex common encoding since the value returned from
iomap are semi-opaque.

This patch moves the demultiplexing into the common part of the libqos PCI
code, with the backends having simpler, separate accessors for PIO and
MMIO space.  This also means we have a way of explicitly accessing either
space if it's necessary for some special case.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

libqos: Give qvirtio_config_read*() consistent semantics

The 'addr' parameter to qvirtio_config_read*() doesn't have a consistent
meaning: when using the virtio-pci versions, it's a full PCI space address,
but for virtio-mmio, it's an offset from the device's base mmio address.

This means that the callers need to do different things to calculate the
addresses in the two cases, which rather defeats the purpose of function
pointer backends.

All the current users of these functions are using them to retrieve
variables from the device specific portion of the virtio config space.
So, this patch alters the semantics to always be an offset into that
device specific config area.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

adb: change handler only when recognized

ADB devices must take new handler into account only when they recognize it.
This lets operating systems probe for valid/invalid handles, to know device capabilities.

Add a FIXME in keyboard handler, which should use a different translation
table depending of the selected handler.

Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr: improve ibm,architecture-vec-5 property handling

ibm,architecture-vec-5 is supposed to encode all option vector 5 bits
negotiated between platform/guest. Currently we hardcode this property
in the boot-time device tree to advertise a single negotiated
capability, "Form 1" NUMA Affinity, regardless of whether or not CAS
has been invoked or that capability has actually been negotiated.

Improve this by generating ibm,architecture-vec-5 based on the full
set of option vector 5 capabilities negotiated via CAS.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr: add option vector handling in CAS-generated resets

In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.

We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.

Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:

1) When ibm,client-architecture-support gets called, we
   call spapr_dt_cas_updates() with the set of capabilities
   added since the previous call to ibm,client-architecture-support.
   For the initial boot, or a system reset generated by something
   other than the CAS call itself, this set will consist of *all*
   options supported both the platform and the guest. For calls
   to ibm,client-architecture-support immediately after a CAS-induced
   reset, we call spapr_dt_cas_updates() with only the set
   of capabilities added since the previous call, since the other
   capabilities will have already been addressed by the boot-time
   device-tree this time around. In the unlikely event that
   capabilities are *removed* since the previous CAS, we will
   generate a CAS-induced reset. In the unlikely event that we
   cannot fit the device-tree updates into the buffer provided
   by the guest, well generate a CAS-induced reset.

2) When a CAS update results in the need to reset the machine and
   include the updates in the boot-time device tree, we call the
   spapr_dt_cas_updates() using the full set of negotiated
   capabilities as part of the reset path. At initial boot, or after
   a reset generated by something other than the CAS call itself,
   this set will be empty, resulting in what should be the same
   boot-time device-tree as we generated prior to this patch. For
   CAS-induced reset, this routine will be called with the full set of
   capabilities negotiated by the platform/guest in the previous
   CAS call, which should result in CAS updates from previous call
   being accounted for in the initial boot-time device tree.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Changed an int -> bool conversion to be more explicit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr_hcall: use spapr_ovec_* interfaces for CAS options

Currently we access individual bytes of an option vector via
ldub_phys() to test for the presence of a particular capability
within that byte. Currently this is only done for the "dynamic
reconfiguration memory" capability bit. If that bit is present,
we pass a boolean value to spapr_h_cas_compose_response()
to generate a modified device tree segment with the additional
properties required to enable this functionality.

As more capability bits are added, will would need to modify the
code to add additional option vector accesses and extend the
param list for spapr_h_cas_compose_response() to include similar
boolean values for these parameters.

Avoid this by switching to spapr_ovec_* helpers so we can do all
the parsing in one shot and then test for these additional bits
within spapr_h_cas_compose_response() directly.

Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr_ovec: initial implementation of option vector helpers

PAPR guests advertise their capabilities to the platform by passing
an ibm,architecture-vec structure via an
ibm,client-architecture-support hcall as described by LoPAPR v11,
B.6.2.3. during early boot.

Using this information, the platform enables the capabilities it
supports, then encodes a subset of those enabled capabilities (the
5th option vector of the ibm,architecture-vec structure passed to
ibm,client-architecture-support) into the guest device tree via
"/chosen/ibm,architecture-vec-5".

The logical format of these these option vectors is a bit-vector,
where individual bits are addressed/documented based on the byte-wise
offset from the beginning of the bit-vector, followed by the bit-wise
index starting from the byte-wise offset. Thus the bits of each of
these bytes are stored in reverse order. Additionally, the first
byte of each option vector is encodes the length of the option vector,
so byte offsets begin at 1, and bit offset at 0.

This is not very intuitive for the purposes of mapping these bits to
a particular documented capability, so this patch introduces a set
of abstractions that encapsulate the work of parsing/encoding these
options vectors and testing for individual capabilities.

Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[dwg: Tweaked double-include protection to not trigger a checkpatch
false positive]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

pseries: Remove spapr_create_fdt_skel()

For historical reasons construction of the guest device tree in spapr is
divided between spapr_create_fdt_skel() which is called at init time, and
spapr_build_fdt() which runs at reset time. Over time, more and more
things have needed to be moved to reset time.

Previous cleanups mean the only things left in spapr_create_fdt_skel() are
the properties of the root node itself. Finish consolidating these two
parts of device tree construction, by moving this to the start of
spapr_build_fdt(), and removing spapr_create_fdt_skel() entirely.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

pseries: Consolidate construction of /vdevice device tree node

Construction of the /vdevice node (and its children) is divided between
spapr_create_fdt_skel() (at init time), which creates the base node, and
spapr_populate_vdevice() (at reset time) which creates the nodes for each
individual virtual device.

This consolidates both into a single function called from
spapr_build_fdt().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

pseries: Move /hypervisor node construction to fdt_build_fdt()

Currently the /hypervisor device tree node is constructed in
spapr_create_fdt_skel(). As part of consolidating device tree construction
to reset time, move it to a function called from spapr_build_fdt().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

pseries: Move /event-sources construction to spapr_build_fdt()

The /event-sources device tree node is built from spapr_create_fdt_skel().
As part of consolidating device tree construction to reset time, this moves
it to spapr_build_fdt().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

pseries: Consolidate construction of /rtas device tree node

For historical reasons construction of the /rtas node in the device
tree (amongst others) is split into several places. In particular
it's split between spapr_create_fdt_skel(), spapr_build_fdt() and
spapr_rtas_device_tree_setup().

In fact, as well as adding the actual RTAS tokens to the device tree,
spapr_rtas_device_tree_setup() just adds the ibm,lrdr-capacity
property, which despite going in the /rtas node, doesn't have a lot to
do with RTAS.

This patch consolidates the code constructing /rtas together into a new
spapr_dt_rtas() function. spapr_rtas_device_tree_setup() is renamed to
spapr_dt_rtas_tokens() and now only adds the token properties.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

pseries: Consolidate construction of /chosen device tree node

For historical reasons, building the /chosen node in the guest device tree
is split across several places and includes both parts which write the DT
sequentially and others which use random access functions.

This patch consolidates construction of the node into one place, using
random access functions throughout.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

pseries: Move construction of /interrupt-controller fdt node

Currently the device tree node for the XICS interrupt controller is in
spapr_create_fdt_skel(). As part of consolidating device tree construction
to reset time, this moves it to a function called from spapr_build_fdt().

In addition we move the actual code into hw/intc/xics_spapr.c with the
rest of the PAPR specific interrupt controller code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

pseries: Consolidate RTAS loading

At each system reset, the pseries machine needs to load RTAS (the runtime
portion of the guest firmware) into the VM. This means copying
the actual RTAS code into guest memory, and also updating the device
tree so that the guest OS and boot firmware can locate it.

For historical reasons the copy and update to the device tree were in
different parts of the code. This cleanup brings them both together in
an spapr_load_rtas() function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

pseries: Move adding of fdt reserve map entries

The flattened device tree passed to pseries guests contains a list of
reserved memory areas. Currently we construct this list early in
spapr_create_fdt_skel() as we sequentially write the fdt.

This will be inconvenient for upcoming cleanups, so this patch moves
the reserve map changes to the end of fdt construction. This changes
fdt_add_reservemap_entry() calls - which work when writing the fdt
sequentially to fdt_add_mem_rsv() calls used when altering the fdt in
random access mode.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

pseries: Make spapr_create_fdt_skel() get information from machine state

Currently spapr_create_fdt_skel() takes a bunch of individual parameters
for various things it will put in the device tree. Some of these can
already be taken directly from sPAPRMachineState. This patch alters it so
that all of them can be taken from there, which will allow this code to
be moved away from its current caller in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

pseries: Remove rtas_addr and fdt_addr fields from machinestate

These values are used only within ppc_spapr_reset(), so just change them
to local variables.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

pseries: Split device tree construction from device tree load

spapr_finalize_fdt() both finishes building the device tree for the guest
and loads it into guest memory. For future cleanups, it's going to be
more convenient to do these two things separately. The loading portion is
pretty trivial, so we move it inline into the caller, ppc_spapr_reset().

We also rename spapr_finalize_fdt(), because the current name is going to
become inaccurate.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>

target-ppc: add vmul10[u,eu,cu,ecu]q instructions

vmul10uq : Vector Multiply-by-10 Unsigned Quadword VX-form
vmul10euq : Vector Multiply-by-10 Extended Unsigned Quadword VX-form
vmul10cuq : Vector Multiply-by-10 & write Carry Unsigned Quadword VX-form
vmul10ecuq: Vector Multiply-by-10 Extended & write Carry Unsigned Quadword VX-form

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[ Add GEN_VXFORM_DUAL_EXT with invalid bit mask ]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc/pnv: add a ISA bus

As Qemu only supports a single instance of the ISA bus, we use the LPC
controller of chip 0 to create one and plug in a couple of useful
devices, like an UART and RTC. An IPMI BT device, which is also an ISA
device, can be defined on the command line to connect an external BMC.
That is for later.

The PowerNV machine now has a console. Skiboot should load a kernel
and jump into it but execution will stop quite early because we lack a
model for the native XICS controller for the moment :

    [    0.000000] NR_IRQS:512 nr_irqs:512 16
    [    0.000000] XICS: Cannot find a Presentation Controller !
    [    0.000000] ------------[ cut here ]------------
    [    0.000000] WARNING: at arch/powerpc/platforms/powernv/setup.c:81
    ...
    [    0.000000] NIP [c00000000079d65c] pnv_init_IRQ+0x30/0x44

You can still do a few things under xmon.

Based on previous work from :
      Benjamin Herrenschmidt <benh@kernel.crashing.org>

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Trivial fix for a change in the serial_hds_isa_init() interface]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc/pnv: add a LPC controller

The LPC (Low Pin Count) interface on a POWER8 is made accessible to
the system through the ADU (XSCOM interface). This interface is part
of set of units connected together via a local OPB (On-Chip Peripheral
Bus) which act as a bridge between the ADU and the off chip LPC
endpoints, like external flash modules.

The most important units of this OPB are :
- OPB Master: contains the ADU slave logic, a set of internal
   registers and the logic to control the OPB.
- LPCHC (LPC HOST Controller): which implements a OPB Slave, a set of
   internal registers and the LPC HOST Controller to control the LPC
   interface.

Four address spaces are provided to the ADU :
- LPC Bus Firmware Memory
- LPC Bus Memory
- LPC Bus I/O (ISA bus)
- and the registers for the OPB Master and the LPC Host Controller

On POWER8, an intermediate hop is necessary to reach the OPB, through
a unit called the ECCB. OPB commands are simply mangled in ECCB write
commands.

On POWER9, the OPB master address space can be accessed via MMIO. The
logic is same but the code will be simpler as the XSCOM and ECCB hops
are not necessary anymore.

This version of the LPC controller model doesn't yet implement support
for the SerIRQ deserializer present in the Naples version of the chip
though some preliminary work is there.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.7
      - ported on latest PowerNV patchset
      - changed the XSCOM interface to fit new model
      - QOMified the model
      - moved the ISA hunks in another patch
      - removed printf logging
      - added a couple of UNIMP logging
      - rewrote commit log ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc/pnv: add XSCOM handlers to PnvCore

Now that we are using real HW ids for the cores in PowerNV chips, we
can route the XSCOM accesses to them. We just need to attach a
specific XSCOM memory region to each core in the appropriate window
for the core number.

To start with, let's install the DTS (Digital Thermal Sensor) handlers
which should return 38°C for each core.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc/pnv: add XSCOM infrastructure

On a real POWER8 system, the Pervasive Interconnect Bus (PIB) serves
as a backbone to connect different units of the system. The host
firmware connects to the PIB through a bridge unit, the
Alter-Display-Unit (ADU), which gives him access to all the chiplets
on the PCB network (Pervasive Connect Bus), the PIB acting as the root
of this network.

XSCOM (serial communication) is the interface to the sideband bus
provided by the POWER8 pervasive unit to read and write to chiplets
resources. This is needed by the host firmware, OPAL and to a lesser
extent, Linux. This is among others how the PCI Host bridges get
configured at boot or how the LPC bus is accessed.

To represent the ADU of a real system, we introduce a specific
AddressSpace to dispatch XSCOM accesses to the targeted chiplets. The
translation of an XSCOM address into a PCB register address is
slightly different between the P9 and the P8. This is handled before
the dispatch using a 8byte alignment for all.

To customize the device tree, a QOM InterfaceClass, PnvXScomInterface,
is provided with a populate() handler. The chip populates the device
tree by simply looping on its children. Therefore, each model needing
custom nodes should not forget to declare itself as a child at
instantiation time.

Based on previous work done by :
Benjamin Herrenschmidt <benh@kernel.crashing.org>

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Added cpu parameter to xscom_complete()]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc/pnv: add a PnvCore object

This is largy inspired by sPAPRCPUCore with some simplification, no
hotplug for instance. A set of PnvCore objects is added to the PnvChip
and the device tree is populated looping on these cores.

Real HW cpu ids are now generated depending on the chip cpu model, the
chip id and a core mask. The id is propagated to the CPU object, using
properties, to set the SPR_PIR (Processor Identification Register)

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc/pnv: add a PIR handler to PnvChip

The Processor Identification Register (PIR) is a register that holds a
processor identifier which is used for bus transactions (XSCOM) and
for processor differentiation in multiprocessor systems. It also used
in the interrupt vector entries (IVE) to identify the thread serving
the interrupts.

P9 and P8 have some differences in the CPU PIR encoding.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc/pnv: add a core mask to PnvChip

This will be used to build real HW ids for the cores and enforce some
limits on the available cores per chip.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc/pnv: add a PnvChip object

This is is an abstraction of a POWER8 chip which is a set of cores
plus other 'units', like the pervasive unit, the interrupt controller,
the memory controller, the on-chip microcontroller, etc. The whole can
be seen as a socket. It depends on a cpu model and its characteristics:
max cores and specific inits are defined in a PnvChipClass.

We start with an near empty PnvChip with only a few cpu constants
which we will grow in the subsequent patches with the controllers
required to run the system.

The Chip CFAM (Common FRU Access Module) ID gives the model of the
chip and its version number. It is generally the first thing firmwares
fetch, available at XSCOM PCB address 0xf000f, to start initialization.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc/pnv: add skeleton PowerNV platform

The goal is to emulate a PowerNV system at the level of the skiboot
firmware, which loads the OS and provides some runtime services. Power
Systems have a lower firmware (HostBoot) that does low level system
initialization, like DRAM training. This is beyond the scope of what
qemu will address in a PowerNV guest.

No devices yet, not even an interrupt controller. Just to get started,
some RAM to load the skiboot firmware, the kernel and initrd. The
device tree is fully created in the machine reset op.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.7
      - replaced fprintf by error_report
      - used a common definition of _FDT macro
      - removed VMStateDescription as migration is not yet supported
      - added IBM Copyright statements
      - reworked kernel_filename handling
      - merged PnvSystem and sPowerNVMachineState
      - removed PHANDLE_XICP
      - added ppc_create_page_sizes_prop helper
      - removed nmi support
      - removed kvm support
      - updated powernv machine to version 2.8
      - removed chips and cpus, They will be provided in another patches
      - added a machine reset routine to initialize the device tree (also)
      - french has a squelette and english a skeleton.
      - improved commit log.
      - reworked prototypes parameters
      - added a check on the ram size (thanks to Michael Ellerman)
      - fixed chip-id cell
      - changed MAX_CPUS to 2048
      - simplified memory node creation to one node only
      - removed machine version
      - rewrote the device tree creation with the fdt "rw" routines
      - s/sPowerNVMachineState/PnvMachineState/
      - etc.]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

configure, ppc64: Copy skiboot.lid to build directory when configuring

When configured to compile out of tree, the configure script
copies BIOS blobs to the build directory. However since the PPC64 powernv
machine ROM has .lid extension, it is ignored and "make check" fails
when trying the powernv machine.

This adds *.lid to the list of copied blobs.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc: add skiboot firmware for the pnv platform

This is the initial image of skiboot 5.3.7 (commit 762d0082) for
the PowerPC PowerNV (Non-Virtualized) platform. Built from
submodule.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc: Fix single step with gdb stub

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>