stop callers of rex engine using RX_MATCH_UTF8_set
authorDavid Mitchell <davem@iabyn.com>
Mon, 20 May 2013 13:17:33 +0000 (14:17 +0100)
committerDavid Mitchell <davem@iabyn.com>
Sun, 2 Jun 2013 21:28:51 +0000 (22:28 +0100)
commit0603fe5cded5ad964b7ff06f91a5c2c244f93337
treeb6616deee0a3f0010670987085ff949e5a08c581
parent8adc0f72b0398cece49d44d4acc0962d03543ea9
stop callers of rex engine using RX_MATCH_UTF8_set

The way that the regex engine knows that the match string is utf8 is
currently a complete mess. It's partially signalled by the utf8 flag of
the passed SV, but also by the RXf_MATCH_UTF8 flag in the regex itself,
and the value of PL_reg_match_utf8.

Currently all the callers of the engine (such as pp_match, pp_split etc)
initially use RX_MATCH_UTF8_set() before calling the engine. This sets both
the RXf_MATCH_UTF8 flag on the regex, and PL_reg_match_utf8.

Then the two entry points to the engine (regexec_flags() and
re_intuit_start()) initially repeat the RX_MATCH_UTF8_set()
themselves.

Remove the usage of RX_MATCH_UTF8_set() by the callers of the engine,
and instead just rely on the engine to do it.

Also, remove the "secret" setting of PL_reg_match_utf8 by
RX_MATCH_UTF8_set(), and do it explicitly.

This is a prelude to eliminating PL_reg_match_utf8.
pp.c
pp_ctl.c
pp_hot.c
regexec.c
regexp.h