Inline xform_quant() in encode_block_intra().
Also inline some of the block calculations to assist the compiler to
not do silly things like calculating the same offset (or converting
between raster/transform block offset or block, mi and pixel unit)
many, many, many times.
Cycle times:
4x4: 584 -> 505 cycles (16% faster)
8x8: 1651 -> 1560 cycles (6% faster)
16x16: 7897 -> 7704 cycles (2.5% faster)
32x32: 16096 -> 15852 cycles (1.5% faster)
Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the
first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall.
Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80