Fix GenerateShuffleArray to support cyclic shuffles (dotnet/coreclr#26169)
* Fix GenerateShuffleArray to support cyclic shuffles
The GenerateShuffleArray was not handling case when there was a cycle in
the register / stack slots shuffle and it resulted in an infinite loop
in this function. This issue is Unix Amd64 ABI specific.
To fix that, this change reworks the algorithm completely. Besides
fixing the issue, it has also better performance in some cases.
To fix the cyclic shuffling, I needed an extra helper register. However,
there was no available general purpose register available, so I had to
use xmm8 for this purpose.
* Remove special handling of the hang from ABI stress