A strange variation was seen in the BogoMIPS figure for the ST40-300.
This was eventually tracked down to sensitivity to the alignment of
the loop. So add an align directive to ensure this doesn't occur.
Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
void __delay(unsigned long loops)
{
__asm__ __volatile__(
+ /*
+ * ST40-300 appears to have an issue with this code,
+ * normally taking two cycles each loop, as with all
+ * other SH variants. If however the branch and the
+ * delay slot straddle an 8 byte boundary, this increases
+ * to 3 cycles.
+ * This align directive ensures this doesn't occur.
+ */
+ ".balign 8\n\t"
+
"tst %0, %0\n\t"
"1:\t"
"bf/s 1b\n\t"