[Hexagon] Further improve code generation for shuffles
* Concatenate partial shuffles into longer ones whenever possible:
In selection DAG, shuffle's operands and return type must all agree. This
is not the case in LLVM IR, and non-conforming IR-level shuffles will be
rewritten to match DAG's requirements. This can also make a shuffle that
can be matched to a single HVX instruction become shuffles that require
more complex handling. Example: anything that takes two single vectors
and returns a pair (e.g. V6_vshuffvdd).
This is avoided by concatenating such shuffles into ones that take a vector
pair, and an undef pair, and produce a vector pair.
* Recognize perfect shuffles when masks contain `undef` values.
* Use funnel shifts for contracting shuffles.
* Recognize rotations as a separate step.
These changes go into a single commit, because each one on their own
introduced some regressions.