iris: Use MI_COPY_MEM_MEM for tiny resource_copy_region calls.
If our resource_copy_region size is a small number of DWords, then
instead of firing up BLORP, we can simply use MI_COPY_MEM_MEM (after
a CS stall). We also try and select the optimal batch.
Improves performance in Shadow of Mordor on Low settings at 1920x1080
on Skylake GT4e by 0.689096% +/- 0.473968% (n=4). It tries to copy
4 bytes of data to a buffer which was most recently used as a writable
compute shader SSBO. Previously we were switching from compute to the
render pipeline, then firing up all of blorp_buffer_copy...for 4 bytes.
I arbitrarily decided to support 4/8/12/16 bytes. Jason thinks this
is about the right threshold where it's cheaper to use MI_COPY_MEM_MEM.