target-ppc: improve lxvw4x implementation
Load 8byte at a time and manipulate.
Big-Endian Storage
+-------------+-------------+-------------+-------------+
| 00 11 22 33 | 44 55 66 77 | 88 99 AA BB | CC DD EE FF |
+-------------+-------------+-------------+-------------+
Little-Endian Storage
+-------------+-------------+-------------+-------------+
| 33 22 11 00 | 77 66 55 44 | BB AA 99 88 | FF EE DD CC |
+-------------+-------------+-------------+-------------+
Vector load results in (32-bit elements):
+----------+----------+----------+----------+
|
00112233 |
44556677 |
8899AABB |
CCDDEEFF |
+----------+----------+----------+----------+
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
[dwg: Slight tweak to commit description]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>