radeonsi: optimize si_emit_prefetch_L2 when it's split
When using the prefetch with VS_ONLY=true followed by VS_ONLY=false,
we tested the VS_ONLY bits in the mask when executing VS_ONLY=false where
the bits were always 0. It's also useless to clear the prefetch mask when
VS_ONLY=true.
This commit skips those tests by splitting the function properly using
BEFORE_DRAW and AFTER_DRAW template parameters.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8794>