cut out/inline wrapper calls of sv_*catpvf*
these wrappers use 1 or 2 indirect calls to reach the guts of Perl's
format string engine. Win64 and ARM (and I assume other 4 register regcall
platforms) compilers can't easily tailcall or optimize these due to
sv_vcatpvfn_flags > 4 args and strlen calls. They allocate lots of C
stack and set up new frames and copy args over. So just call the core
format string function directly.
Not all paths to sv_vcatpvfn_flags were optimized since they have no CORE
or CPAN grep usage. sv_vcatpvf has usage, sv_catpvf_mg* have none but were
done anyway because they sound like they would be commonly used.