[libc] Simplifies multi implementations
This is a roll forward of D101895 with two additional fixes:
Original Patch description:
> This is a follow up on D101524 which:
>
> - simplifies cpu features detection and usage,
> - flattens target dependent optimizations so it's obvious which implementations are generated,
> - provides an implementation targeting the host (march/mtune=native) for the mem* functions,
> - makes sure all implementations are unittested (provided the host can run them).
Additional fixes:
- Fix uninitialized ALL_CPU_FEATURES
- Use non pseudo microarch as it is only supported from Clang 12 on
Differential Revision: https://reviews.llvm.org/D102233