i965/tiled_memcpy: ytiled_to_linear a cache line at a time
authorScott D Phillips <scott.d.phillips@intel.com>
Mon, 30 Apr 2018 17:25:47 +0000 (10:25 -0700)
committerKenneth Graunke <kenneth@whitecape.org>
Mon, 30 Apr 2018 22:18:36 +0000 (15:18 -0700)
commit2a08ae3c7cba14b9805d006e1981ba9d762bf241
treee9f36ce2c3ddd5d852d34197361f11e5abcfa320
parent682bdaa658d63993e32f95a4244568aeab85642a
i965/tiled_memcpy: ytiled_to_linear a cache line at a time

Similar to the transformation applied to linear_to_ytiled, also align
each readback from the ytiled source to a cacheline (i.e. transfer a
whole cacheline from the source before moving on to the next column).
This will allow us to utilize movntqda (_mm_stream_si128) in a
subsequent patch to obtain near WB readback performance when accessing
the uncached ytiled memory, an order of magnitude improvement.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
src/mesa/drivers/dri/i965/intel_tiled_memcpy.c