RT#120692 slow intuit with long utf8 strings
Some code in re_intuit_start() that tries to find the range of chars
to which the BM substr find can be applied, uses logic that is very
inefficient once utf8 was enabled. Basically the code tries to find
the maximum end-point where the substr could be found, by taking the
minimum of:
* start + prog->check_offset_max + length(substr)
* end - prog->check_end_shift
Except that these values are in char lengths and need to be converted to
bytes before calling fbm_instr(). The code formerly involved scanning the
whole of the remaining string to determine how many chars it had.
By doing the calculation a different way, we can avoid this.
This makes the following two regexps each take milliseconds rather than
10s of seconds:
my $s = 'ab' x 1_000_000;
utf8::upgrade($s);
1 while $s =~ m/\Ga+ba+b/g;
$s=~ /^a{1,2}x/ for 1..10_000;