[AArch64][GlobalISel] Add a simple cross-regclass copy optimization post-selection.
This does some trivial cross-regclass folding, where we can either do some extra
constraining to eliminate the copy or modify uses to use a smaller regclass.
There are minor code size improvements on average.
Program size.__text
before after diff
tramp3d-v4/tramp3d-v4 366000.00 366012.00 0.0%
mafft/pairlocalalign 248196.00 248188.00 -0.0%
7zip/7zip-benchmark 568612.00 568592.00 -0.0%
kimwitu++/kc 434704.00 434676.00 -0.0%
Bullet/bullet 456128.00 456096.00 -0.0%
sqlite3/sqlite3 284136.00 284100.00 -0.0%
ClamAV/clamscan 381492.00 381396.00 -0.0%
SPASS/SPASS 412052.00 411944.00 -0.0%
lencod/lencod 428060.00 427912.00 -0.0%
consumer-typeset/consumer-typeset 413148.00 411116.00 -0.5%
Geomean difference -0.1%
Differential Revision: https://reviews.llvm.org/D136793