[libc] Use different alignment for memcpy between ARM and x86.
Aligned copy used to be 'destination aligned' for x86 but this decision was reverted in D93457 where we noticed that it was better for ARM to be 'source aligned'.
More benchmarking confirmed that it can be up to 30% faster to align copy to destination for x86. This Patch offers both implementations and switches x86 back to destination aligned.
It also fixes alignment to 32 byte on x86.
Differential Revision: https://reviews.llvm.org/D101296