radeonsi: add new SDMA texture copy code
This implements:
- Linear-to-linear partial copies. (unaligned)
- Tiled-to-linear and linear-to-tiled partial copies.
(unaligned except 1-2 Bpp)
- Tiled-to-tiled partial copies aligned to 8x8.
v2: Extend the SDMA L2T VM fault workaround to T2L.
- Same algorithm, just applied to T2L.
(and using a 0-based address and surface.bo_size instead of buf->size)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>