Cells starting at a position aligned to 8 pixels but wider than
4 blocks are copied with 3 blocks per loop. This creates problems on the
next loop iterations since the routine copying 2 blocks requires the
same alignment on some architectures like ARM NEON.