scheduling: optionally create schedules with outermost parallelism
Some applications require not only tilable bands, but also parallel
loops inside those bands. The default algorithm favors larger bands
over having parallel loops inside the band. It is always possible
to sacrifice one of the loops in the band to create parallelism by
applying a wavefront transformation. This typically leads to more
complicated schedules, however.
Instead, when enabled, we now force there to be at least one parallel
loop inside each tilable band.
Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>