Fix x86 stack probing (#23881)
* Fix x86 stack probing
On x86, structs are passed by value on the stack. We copy structs
to the stack in various ways, but one way is to first subtract the
size of the struct and then use a "rep movsb" instruction. If the
struct we are passing is sufficiently large, this can cause us to
miss the stack guard page.
So, introduce stack probes for these struct copies.
It turns out the stack pointer after prolog probing can be sitting
near the very end of the guard page (one `STACK_ALIGN` slot before
the end, which allows a "call" instruction which pushes its
return address to touch the guard page with the return address push).
We don't want to probe with every argument push, though. So change
the prolog probing to insert an "extra" touch at the final SP location
if the previous touch was "too far" away, leaving at least some
buffer zone for un-probed SP adjustments. I chose this to be the
size of the largest SIMD register, which also can get copied to the
argument stack with a "SUB;MOV" sequence.
Added several test case variations showing different large stack
probe situations.
Fixes #23796
* Increase the argument size probe buffer
* Formatting