llvmpipe: do final the pixel in/out triangle test in the fragment shader
The test to determine which of the pixels in a 2x2 quad is now done in
the fragment shader rather than in the calling C code. This is a little
faster but there's a few more things to do.
Note that the step[] array elements are in a different order now. Rather
than being in row-major order for the 4x4 grid, they're in "quad-major"
order. The setup of the step arrays is a little more complicated now.
So is the course/intermediate tile test code, but some lookup tables
help with that.
Next steps:
- early-cull 2x2 quads which are totally outside the triangle.
- skip the in/out test for fully contained quads
- make the in/out comparison code tighter/faster.