JIT: refactor how we do late devirtualization (dotnet/coreclr#20553)
Change late devirtualization to run in a postorder callback during
the tree search to update GT_RET_EXPRs, instead of shoehorning it into
the preorder callback.
This allows the jit to reconsider all calls for late devirtualization,
not just calls that are parents of particular GT_RET_EXPRs. The jit will
take advantage of this in subsequent work that does more aggressive
bottup-up type sharpening.
Reconsidering all calls for late devirt instead of just a select subset
incurs around a 0.5% throughput impact.
To mitigate this, short-circult the tree walk when we no longer see a
GTF_CALL flag. This prevents us from walking through CALL and RET_EXPR
free subtrees.
To make this work we have to make sure all node types propagate flags from
their children. This required some updates to a few node constructors.
There is also an odd quirk in the preorder callback where it may overwrite
the parent node (from GT_ASG to GT_BLK or GT_NOP). If the overwrite creates
a GT_NOP, the postorder callback may see a null tree. Tolerate this for now
by checking for that case in the postorder callback.
Also take advantage of `gtRetExprVal` to tunnel through GT_RET_EXPR chains
to the final tree to avoid needing multiple overwrites when propagating
the return expression back into the IR.
With these mitigations this change has minimal throughput impact.
Commit migrated from https://github.com/dotnet/coreclr/commit/
dfd0f4f8032e8297ba8d5b5dc92e4f08d5204983