[S390] convert/optimize csum_fold() to C
authorHeiko Carstens <heiko.carstens@de.ibm.com>
Fri, 11 Sep 2009 08:28:32 +0000 (10:28 +0200)
committerMartin Schwidefsky <schwidefsky@de.ibm.com>
Fri, 11 Sep 2009 08:29:43 +0000 (10:29 +0200)
commit04efc3be767cfed0d348fd598eba8fe5f5bf5fe6
tree53a6509f836e2f85c34e5a3af66fa9fbff910f77
parent05e7ff7da78bad3edc1290ed86b4a37da72ced62
[S390] convert/optimize csum_fold() to C

In the meantime gcc generates better code than the old inline
assemblies do. Original inline assembly results in:

lr %r1,%r2
sr %r3,%r3
lr %r2,%r1
srdl %r2,16
alr %r2,%r3
alr %r1,%r2
srl %r1,16
xilf %r1,65535
llghr %r2,%r1
br %r14

Out of the C code gcc generates this:

rll %r1,%r2,16
ar %r1,%r2
srl %r1,16
xilf %r1,65535
llghr %r2,%r1
br %r14

In addition we don't have any static register allocations anymore and
gcc is free to shuffle instructions around for better pipeline usage.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
arch/s390/include/asm/checksum.h