[mlir][linalg] Refine `tensor.extract` vectorisation
This patch updates the vectorisation of the extract Op so that the
permutation map for the transfer_read Op is defined explicitly by the
vectoriser (as opposed to being constructed implicitly by the
transfer_read builder).
This change is needed for cases where the rank of the source tensor is
lower than the rank of the output vector generated by the vectoriser:
```mlir
%17 = vector.transfer_read %arg1[%14, %16], %cst_4 {in_bounds = [true, true]} : tensor<257x24xf32>, vector<1x1x4xf32>
```
In cases like this, the vectorize will create the following permutation map:
```
(d0, d1) -> (0, d0, d1)
```
In other cases the behaviour remains unchanged.
Fixes https://github.com/openxla/iree/issues/13036. That's also where
the test case was extracted from.
Differential Revision: https://reviews.llvm.org/D148537