[ARM] bottom-top mul support ARMParallelDSP
Originally committed in rL342210 but was reverted in rL342260 because
it was causing issues in vectorized code, because I had forgotten to
ensure that we're operating on scalar values.
Original commit message:
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.
Differential Revision: https://reviews.llvm.org/D51983
llvm-svn: 342870