string: Improve generic strncmp
It follows the strategy:
- Align the first input to word boundary using byte operations.
- If second input is also word aligned, read a word per time, check
for null (using has_zero), and check final words using byte
operation.
- If second input is not word aligned, loop by aligning the source,
and merge the result of two reads. Similar to aligned case, check
for null with has_zero, and check final words using byte operation.
Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64-linux-gnu,
and powerpc-linux-gnu by removing the arch-specific assembly
implementation and disabling multi-arch (it covers both LE and BE
for 64 and 32 bits).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>