Add some fast paths that are missing outside Windows
- Allocation fast path for arrays of object elements
- On Linux, a microbenchmark is 25% faster with the portable fast path
- On Windows with the asm fast path, the microbenchmark was 52% faster than on Linux before, and is now 22% faster with the portable fast path
- On Windows, the portable fast path is within 4% slower than the asm fast path
- Allocation fast path for objects
- On Linux, a microbenchmark is 200% faster with the portable fast path
- On Windows with the asm fast path, the microbenchmark was 325% faster than on Linux before, and is now 43% faster with the portable fast path
- On Windows, the portable fast path is within 1% slower than the asm fast path
- Skipped the Box fast path since that seems to be inlined into jitted code using the new object fast path. As a result of adding the new object fast path, boxing perf has also improved outside Windows similarly to above.