[Unix x64|Arm64] Correct canfastTailCall decisions
This will change how the fastTailCall decision is made for x64 unix and arm64.
Before this change the decision was based on the amount of incoming and outgoing
caller arguments like on Windows. This was incorrect on Unix x64 and Arm64
because one argument does not translate to one register or one stack slot use.
Before this change structs on Arm64 and Amd64 Unix could
pessimize when we could fastTailCall if they were engregisterable
and took more than one register.
This change also fixes several cases when determining to fastTailCall. It fixes
dotnet/coreclr#12479 and will cause a no fastTailCalls decisions for case dotnet/coreclr#12468.
In addition this change adds several regression cases for dotnet/coreclr#12479 and dotnet/coreclr#12468. It
includes more logging ofr fastTailCall decisions, including a new COMPlus
variable named COMPlus_JitReportFastTailCallDecisions, which can be toggled with
COMPlus_JitReportFastTailCallDecisions=1.
Commit migrated from https://github.com/dotnet/coreclr/commit/
ee95d7c5f552dcfc1b69f8ac2567c4afda40695e