Currently all EXACTish nodes that are SIMPLE must be a single UTF-8
invariant character. It turns out that the code works not just for
these, but for all Latin1 characters (when the pattern isn't UTF-8)
except the SHARP S under /d folding.
SIMPLE nodes allow for better optimization possibilities, such as CURLY
instead of CURLYM.
There is still a discrepancy in that non-EXACTish nodes that match a
single character, such as the dot (SANY), can be SIMPLE, but EXACTish
nodes have to be just a single byte.
}
*flagp |= HASWIDTH;
- if (len == 1 && UNI_IS_INVARIANT(code_point))
+ if (len == 1 && (code_point != LATIN_SMALL_LETTER_SHARP_S
+ || ! FOLD || ! DEPENDS_SEMANTICS))
+ {
*flagp |= SIMPLE;
+ }
}
/*
do_exactf:
c = (U8)*STRING(p);
- assert(! UTF_PATTERN || UNI_IS_INVARIANT(c));
if (utf8_target || OP(p) == EXACTFU_SS) { /* Use full Unicode fold matching */
char *tmpeol = loceol;