Fix rules for parsing numeric escapes in regexes
Commit
726ee55d introduced better handling of things like \87 in a
regex, but as an unfortunate side effect broke latex2html.
The rules for handling backslashes in regexen are a bit arcane.
Anything starting with \0 is octal.
The sequences \1 through \9 are always backrefs.
Any other sequence is interpreted as a decimal, and if there
are that many capture buffers defined in the pattern at that point
then the sequence is a backreference. If however it is larger
than the number of buffers the sequence is treated as an octal digit.
A consequence of this is that \118 could be a backreference to
the 118th capture buffer, or it could be the string "\11" . "8". In
other words depending on the context we might even use a different
number of digits for the escape!
This also left an awkward edge case, of multi digit sequences
starting with 8 or 9 like m/\87/ which would result in us parsing
as though we had seen /87/ (iow a null byte at the start) or worse
like /\x{00}87/ which is clearly wrong.
This patches fixes the cases where the capture buffers are defined,
and causes things like the \87 or \97 to throw the same error that
/\8/ would. One might argue we should complain about an illegal
octal sequence, but this seems more consistent with an error like
/\9/ and IMO will be less surprising in an error message.
This patch includes exhaustive tests of patterns of the form
/(a)\1/, /((a))\2/ etc, so that we dont break this again if we
change the logic more.