Add a fast path for C rd/wrlock v2
One difference of the C versions to the assembler wr/rdlock
is that the C compiler saves some registers which are unnecessary
for the fast path in the prologue of the functions. Split the
uncontended fast path out into a separate function. Only when contention is
detected is the full featured function called. This makes
the fast path code (nearly) identical to the assembler version,
and gives uncontended performance within a few cycles.
v2: Rename some functions and add space.