by the GL-level divisor to get the hardware-level divisor. This case is
further divided into two more cases. If the hardware-level divisor is a
power of two, then we just need to shift. The shift amount is specified by
-the shift field, so that the hardware-level divisor is just 2^shift.
+the shift field, so that the hardware-level divisor is just
+:math:`2^\text{shift}`.
If it isn't a power of two, then we have to divide by an arbitrary integer.
For that, we use the well-known technique of multiplying by an approximation
amount, and then the hardware does the multiplication and shift. The
hardware and driver also use the "round-down" optimization as described in
http://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf.
-The hardware further assumes the multiplier is between 2^31 and 2^32, so the
-high bit is implicitly set to 1 even though it is set to 0 by the driver --
-presumably this simplifies the hardware multiplier a little. The hardware
-first multiplies linear_id by the multiplier and takes the high 32 bits,
-then applies the round-down correction if extra_flags = 1, then finally
-shifts right by the shift field.
+The hardware further assumes the multiplier is between :math:`2^{31}` and
+:math:`2^{32}`, so the high bit is implicitly set to 1 even though it is set
+to 0 by the driver -- presumably this simplifies the hardware multiplier a
+little. The hardware first multiplies linear_id by the multiplier and
+takes the high 32 bits, then applies the round-down correction if
+extra_flags = 1, then finally shifts right by the shift field.
There are some differences between ridiculousfish's algorithm and the Mali
hardware algorithm, which means that the reference code from ridiculousfish
doesn't always produce the right constants. Mali does not use the pre-shift
optimization, since that would make a hardware implementation slower (it
would have to always do the pre-shift, multiply, and post-shift operations).
-It also forces the multiplier to be at least 2^31, which means that the
-exponent is entirely fixed, so there is no trial-and-error. Altogether,
-given the divisor d, the algorithm the driver must follow is:
+It also forces the multiplier to be at least :math:`2^{31}`, which means
+that the exponent is entirely fixed, so there is no trial-and-error.
+Altogether, given the divisor d, the algorithm the driver must follow is:
1. Set shift = :math:`\lfloor \log_2(d) \rfloor`.
2. Compute :math:`m = \lceil 2^{shift + 32} / d \rceil` and :math:`e = 2^{shift + 32} % d`.