From: Yves Orton Date: Sun, 24 Dec 2006 14:38:15 +0000 (+0100) Subject: Re: Named-capture regex syntax X-Git-Tag: accepted/trunk/20130322.191538~16284 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=1f1031fe96c14865e4f60fdd3a6a6ce073d190c1;p=platform%2Fupstream%2Fperl.git Re: Named-capture regex syntax Message-ID: <9b18b3110612240538m5c45654br7d27171835f6664@mail.gmail.com> p4raw-id: //depot/perl@29621 --- diff --git a/pod/perldiag.pod b/pod/perldiag.pod index af7c9c4..c219182 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -3683,6 +3683,11 @@ where the problem was discovered. See L. <-- HERE shows in the regular expression about where the problem was discovered. See L. +=item Sequence \\%s... not terminated in regex; marked by <-- HERE in m/%s/ + +(F) The regular expression expects a mandatory argument following the escape +sequence and this has been omitted or incorrectly written. + =item Sequence (?#... not terminated in regex; marked by <-- HERE in m/%s/ (F) A regular expression comment must be terminated by a closing diff --git a/pod/perlre.pod b/pod/perlre.pod index a876211..6c20496 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -250,6 +250,7 @@ X X \g1 Backreference to a specific or previous group, \g{-1} number may be negative indicating a previous buffer and may optionally be wrapped in curly brackets for safer parsing. + \g{name} Named backreference \k Named backreference \N{name} Named unicode character, or unicode escape \x12 Hexadecimal escape sequence @@ -486,7 +487,7 @@ backreference only if at least 11 left parentheses have opened before it. And so on. \1 through \9 are always interpreted as backreferences. -X<\g{1}> X<\g{-1}> X +X<\g{1}> X<\g{-1}> X<\g{name}> X X In order to provide a safer and easier way to construct patterns using backrefs, in Perl 5.10 the C<\g{N}> notation is provided. The curly brackets are optional, however omitting them is less safe as the meaning @@ -494,6 +495,8 @@ of the pattern can be changed by text (such as digits) following it. When N is a positive integer the C<\g{N}> notation is exactly equivalent to using normal backreferences. When N is a negative integer then it is a relative backreference referring to the previous N'th capturing group. +When the bracket form is used and N is not an integer, it is treated as a +reference to a named buffer. Thus C<\g{-1}> refers to the last buffer, C<\g{-2}> refers to the buffer before that. For example: @@ -510,11 +513,12 @@ buffer before that. For example: and would match the same as C. Additionally, as of Perl 5.10 you may use named capture buffers and named -backreferences. The notation is C<< (?...) >> and C<< \k >> -(you may also use single quotes instead of angle brackets to quote the -name). The only difference with named capture buffers and unnamed ones is +backreferences. The notation is C<< (?...) >> to declare and C<< \k >> +to reference. You may also use single quotes instead of angle brackets to quote the +name; and you may use the bracketed C<< \g{name} >> back reference syntax. +The only difference between named capture buffers and unnamed ones is that multiple buffers may have the same name and that the contents of -named capture buffers is available via the C<%+> hash. When multiple +named capture buffers are available via the C<%+> hash. When multiple groups share the same name C<$+{name}> and C<< \k >> refer to the leftmost defined group, thus it's possible to do things with named capture buffers that would otherwise require C<(??{})> code to accomplish. Named @@ -751,12 +755,20 @@ pattern $+{foo} will be the same as $2, and $3 will contain 'z' instead of the opposite which is what a .NET regex hacker might expect. -Currently NAME is restricted to word chars only. In other words, it -must match C. +Currently NAME is restricted to simple identifiers only. +In other words, it must match C or +its Unicode extension (see L), +though it isn't extended by the locale (see L). -=item C<< \k >> +B In order to make things easier for programmers with experience +with the Python or PCRE regex engines the pattern C<< (?Ppattern) >> +maybe be used instead of C<< (?pattern) >>; however this form does not +support the use of single quotes as a delimiter for the name. This is +only available in Perl 5.10 or later. -=item C<< \k'name' >> +=item C<< \k >> + +=item C<< \k'NAME' >> Named backreference. Similar to numeric backreferences, except that the group is designated by name and not number. If multiple groups @@ -768,6 +780,10 @@ earlier in the pattern. Both forms are equivalent. +B In order to make things easier for programmers with experience +with the Python or PCRE regex engines the pattern C<< (?P=NAME) >> +maybe be used instead of C<< \k >> in Perl 5.10 or later. + =item C<(?{ code })> X<(?{})> X X X @@ -989,6 +1005,10 @@ the same name, then it recurses to the leftmost. It is an error to refer to a name that is not declared somewhere in the pattern. +B In order to make things easier for programmers with experience +with the Python or PCRE regex engines the pattern C<< (?P>NAME) >> +maybe be used instead of C<< (?&NAME) >> as of Perl 5.10. + =item C<(?(condition)yes-pattern|no-pattern)> X<(?()> @@ -1980,6 +2000,28 @@ part of this regular expression needs to be converted explicitly $re = customre::convert $re; /\Y|$re\Y|/; +=head1 PCRE/Python Support + +As of Perl 5.10 Perl supports several Python/PCRE specific extensions +to the regex syntax. While Perl programmers are encouraged to use the +Perl specific syntax, the following are legal in Perl 5.10: + +=over 4 + +=item C<< (?Ppattern) >> + +Define a named capture buffer. Equivalent to C<< (?pattern) >>. + +=item C<< (?P=NAME) >> + +Backreference to a named capture buffer. Equivalent to C<< \g{NAME} >>. + +=item C<< (?P>NAME) >> + +Subroutine call to a named capture buffer. Equivalent to C<< (?&NAME) >>. + +=back 4 + =head1 BUGS This document varies from difficult to understand to completely diff --git a/regcomp.c b/regcomp.c index bbab843..9047b1d 100644 --- a/regcomp.c +++ b/regcomp.c @@ -126,6 +126,7 @@ typedef struct RExC_state_t { I32 utf8; HV *charnames; /* cache of named sequences */ HV *paren_names; /* Paren names */ + regnode **recurse; /* Recurse regops */ I32 recurse_count; /* Number of recurse regops */ #if ADD_TO_REGEXEC @@ -135,8 +136,10 @@ typedef struct RExC_state_t { #ifdef DEBUGGING const char *lastparse; I32 lastnum; + AV *paren_name_list; /* idx -> name */ #define RExC_lastparse (pRExC_state->lastparse) #define RExC_lastnum (pRExC_state->lastnum) +#define RExC_paren_name_list (pRExC_state->paren_name_list) #endif } RExC_state_t; @@ -4055,6 +4058,9 @@ Perl_re_compile(pTHX_ char *exp, char *xend, PMOP *pm) RExC_close_parens = NULL; RExC_opend = NULL; RExC_paren_names = NULL; +#ifdef DEBUGGING + RExC_paren_name_list = NULL; +#endif RExC_recurse = NULL; RExC_recurse_count = 0; @@ -4576,7 +4582,14 @@ reStudy: r->paren_names = (HV*)SvREFCNT_inc(RExC_paren_names); else r->paren_names = NULL; - +#ifdef DEBUGGING + if (RExC_paren_names) { + ri->name_list_idx = add_data( pRExC_state, 1, "p" ); + ri->data->data[ri->name_list_idx] = (void*)SvREFCNT_inc(RExC_paren_name_list); + } else + ri->name_list_idx = 0; +#endif + if (RExC_recurse_count) { for ( ; RExC_recurse_count ; RExC_recurse_count-- ) { const regnode *scan = RExC_recurse[RExC_recurse_count-1]; @@ -4660,17 +4673,19 @@ Perl_reg_named_buff_sv(pTHX_ SV* namesv) STATIC SV* S_reg_scan_name(pTHX_ RExC_state_t *pRExC_state, U32 flags) { char *name_start = RExC_parse; - if ( UTF ) { - STRLEN numlen; - while( isIDFIRST_uni(utf8n_to_uvchr((U8*)RExC_parse, - RExC_end - RExC_parse, &numlen, UTF8_ALLOW_DEFAULT))) - { - RExC_parse += numlen; - } - } else { - while( isIDFIRST(*RExC_parse) ) - RExC_parse++; + + if (isIDFIRST_lazy_if(RExC_parse, UTF)) { + /* skip IDFIRST by using do...while */ + if (UTF) + do { + RExC_parse += UTF8SKIP(RExC_parse); + } while (isALNUM_utf8((U8*)RExC_parse)); + else + do { + RExC_parse++; + } while (isALNUM(*RExC_parse)); } + if ( flags ) { SV* sv_name = sv_2mortal(Perl_newSVpvn(aTHX_ name_start, (int)(RExC_parse - name_start))); @@ -4916,10 +4931,46 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) ret = NULL; /* For look-ahead/behind. */ switch (paren) { + case 'P': /* (?P...) variants for those used to PCRE/Python */ + paren = *RExC_parse++; + if ( paren == '<') /* (?P<...>) named capture */ + goto named_capture; + else if (paren == '>') { /* (?P>name) named recursion */ + goto named_recursion; + } + else if (paren == '=') { /* (?P=...) named backref */ + /* this pretty much dupes the code for \k in regatom(), if + you change this make sure you change that */ + char* name_start = RExC_parse; + U32 num = 0; + SV *sv_dat = reg_scan_name(pRExC_state, + SIZE_ONLY ? REG_RSN_RETURN_NULL : REG_RSN_RETURN_DATA); + if (RExC_parse == name_start || *RExC_parse != ')') + vFAIL2("Sequence %.3s... not terminated",parse_start); + + if (!SIZE_ONLY) { + num = add_data( pRExC_state, 1, "S" ); + RExC_rxi->data->data[num]=(void*)sv_dat; + SvREFCNT_inc(sv_dat); + } + RExC_sawback = 1; + ret = reganode(pRExC_state, + (U8)(FOLD ? (LOC ? NREFFL : NREFF) : NREF), + num); + *flagp |= HASWIDTH; + + Set_Node_Offset(ret, parse_start+1); + Set_Node_Cur_Length(ret); /* MJD */ + + nextchar(pRExC_state); + return ret; + } + goto unknown; case '<': /* (?<...) */ if (*RExC_parse == '!') paren = ','; else if (*RExC_parse != '=') + named_capture: { /* (?<...>) */ char *name_start; SV *svname; @@ -4944,6 +4995,10 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) if (!RExC_paren_names) { RExC_paren_names= newHV(); sv_2mortal((SV*)RExC_paren_names); +#ifdef DEBUGGING + RExC_paren_name_list= newAV(); + sv_2mortal((SV*)RExC_paren_name_list); +#endif } he_str = hv_fetch_ent( RExC_paren_names, svname, 1, 0 ); if ( he_str ) @@ -4964,6 +5019,10 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) SvIOK_on(sv_dat); SvIVX(sv_dat)= 1; } +#ifdef DEBUGGING + if (!av_store(RExC_paren_name_list, RExC_npar, SvREFCNT_inc(svname))) + SvREFCNT_dec(svname); +#endif /*sv_dump(sv_dat);*/ } @@ -5009,6 +5068,7 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) char * parse_start; case '&': /* (?&NAME) */ parse_start = RExC_parse - 1; + named_recursion: { SV *sv_dat = reg_scan_name(pRExC_state, SIZE_ONLY ? REG_RSN_RETURN_NULL : REG_RSN_RETURN_DATA); @@ -6323,46 +6383,44 @@ tryagain: ret= reg_namedseq(pRExC_state, NULL); break; case 'k': /* Handle \k and \k'NAME' */ + parse_named_seq: { char ch= RExC_parse[1]; - if (ch != '<' && ch != '\'') { - if (SIZE_ONLY) - vWARN( RExC_parse + 1, - "Possible broken named back reference treated as literal k"); - parse_start--; - goto defchar; + if (ch != '<' && ch != '\'' && ch != '{') { + RExC_parse++; + vFAIL2("Sequence %.2s... not terminated",parse_start); } else { + /* this pretty much dupes the code for (?P=...) in reg(), if + you change this make sure you change that */ char* name_start = (RExC_parse += 2); U32 num = 0; SV *sv_dat = reg_scan_name(pRExC_state, SIZE_ONLY ? REG_RSN_RETURN_NULL : REG_RSN_RETURN_DATA); - ch= (ch == '<') ? '>' : '\''; - + char sch = ch; + ch= (ch == '<') ? '>' : (ch == '{') ? '}' : '\''; if (RExC_parse == name_start || *RExC_parse != ch) - vFAIL2("Sequence \\k%c... not terminated", - (ch == '>' ? '<' : ch)); - + vFAIL2("Sequence %.3s... not terminated",parse_start); + + if (!SIZE_ONLY) { + num = add_data( pRExC_state, 1, "S" ); + RExC_rxi->data->data[num]=(void*)sv_dat; + SvREFCNT_inc(sv_dat); + } + RExC_sawback = 1; ret = reganode(pRExC_state, (U8)(FOLD ? (LOC ? NREFFL : NREFF) : NREF), num); *flagp |= HASWIDTH; - - - if (!SIZE_ONLY) { - num = add_data( pRExC_state, 1, "S" ); - ARG_SET(ret,num); - RExC_rxi->data->data[num]=(void*)sv_dat; - SvREFCNT_inc(sv_dat); - } + /* override incorrect value set in reganode MJD */ Set_Node_Offset(ret, parse_start+1); Set_Node_Cur_Length(ret); /* MJD */ nextchar(pRExC_state); - + } break; - } + } case 'n': case 'r': case 't': @@ -6391,7 +6449,11 @@ tryagain: RExC_parse++; isrel = 1; } - } + if (hasbrace && !isDIGIT(*RExC_parse)) { + if (isrel) RExC_parse--; + RExC_parse -= 2; + goto parse_named_seq; + } } num = atoi(RExC_parse); if (isrel) { num = RExC_npar - num; @@ -6404,6 +6466,8 @@ tryagain: char * const parse_start = RExC_parse - 1; /* MJD */ while (isDIGIT(*RExC_parse)) RExC_parse++; + if (parse_start == RExC_parse - 1) + vFAIL("Unterminated \\g... pattern"); if (hasbrace) { if (*RExC_parse != '}') vFAIL("Unterminated \\g{...} pattern"); @@ -8306,9 +8370,29 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const regnode *o) } else if (k == WHILEM && o->flags) /* Ordinal/of */ Perl_sv_catpvf(aTHX_ sv, "[%d/%d]", o->flags & 0xf, o->flags>>4); - else if (k == REF || k == OPEN || k == CLOSE || k == GROUPP || OP(o)==ACCEPT) + else if (k == REF || k == OPEN || k == CLOSE || k == GROUPP || OP(o)==ACCEPT) { Perl_sv_catpvf(aTHX_ sv, "%d", (int)ARG(o)); /* Parenth number */ - else if (k == GOSUB) + if ( prog->paren_names ) { + AV *list= (AV *)progi->data->data[progi->name_list_idx]; + SV **name= av_fetch(list, ARG(o), 0 ); + if (name) + Perl_sv_catpvf(aTHX_ sv, " '%"SVf"'", *name); + } + } else if (k == NREF) { + if ( prog->paren_names ) { + AV *list= (AV *)progi->data->data[ progi->name_list_idx ]; + SV *sv_dat=(SV*)progi->data->data[ ARG( o ) ]; + I32 *nums=(I32*)SvPVX(sv_dat); + SV **name= av_fetch(list, nums[0], 0 ); + I32 n; + if (name) { + for ( n=0; nflags) @@ -9116,9 +9200,10 @@ S_dumpuntil(pTHX_ const regexp *r, const regnode *start, const regnode *node, register U8 op = PSEUDO; /* Arbitrary non-END op. */ register const regnode *next; const regnode *optstart= NULL; + RXi_GET_DECL(r,ri); GET_RE_DEBUG_FLAGS_DECL; - + #ifdef DEBUG_DUMPUNTIL PerlIO_printf(Perl_debug_log, "--- %d : %d - %d - %d\n",indent,node-start, last ? last-start : 0,plast ? plast-start : 0); @@ -9129,13 +9214,12 @@ S_dumpuntil(pTHX_ const regexp *r, const regnode *start, const regnode *node, while (PL_regkind[op] != END && (!last || node < last)) { /* While that wasn't END last time... */ - NODE_ALIGN(node); op = OP(node); if (op == CLOSE || op == WHILEM) indent--; next = regnext((regnode *)node); - + /* Where, what. */ if (OP(node) == OPTIMIZED) { if (!optstart && RE_DEBUG_FLAG(RE_DEBUG_COMPILE_OPTIMISE)) @@ -9144,23 +9228,21 @@ S_dumpuntil(pTHX_ const regexp *r, const regnode *start, const regnode *node, goto after_print; } else CLEAR_OPTSTART; - + regprop(r, sv, node); PerlIO_printf(Perl_debug_log, "%4"IVdf":%*s%s", (IV)(node - start), (int)(2*indent + 1), "", SvPVX_const(sv)); - - if (OP(node) != OPTIMIZED) { - if (next == NULL) /* Next ptr. */ - PerlIO_printf(Perl_debug_log, "(0)"); - else if (PL_regkind[(U8)op] == BRANCH && PL_regkind[OP(next)] != BRANCH ) - PerlIO_printf(Perl_debug_log, "(FAIL)"); - else - PerlIO_printf(Perl_debug_log, "(%"IVdf")", (IV)(next - start)); - - /*if (PL_regkind[(U8)op] != TRIE)*/ - (void)PerlIO_putc(Perl_debug_log, '\n'); - } - + + if (OP(node) != OPTIMIZED) { + if (next == NULL) /* Next ptr. */ + PerlIO_printf(Perl_debug_log, " (0)"); + else if (PL_regkind[(U8)op] == BRANCH && PL_regkind[OP(next)] != BRANCH ) + PerlIO_printf(Perl_debug_log, " (FAIL)"); + else + PerlIO_printf(Perl_debug_log, " (%"IVdf")", (IV)(next - start)); + (void)PerlIO_putc(Perl_debug_log, '\n'); + } + after_print: if (PL_regkind[(U8)op] == BRANCHJ) { assert(next); diff --git a/regcomp.h b/regcomp.h index a5eb0fb..7df47d3 100644 --- a/regcomp.h +++ b/regcomp.h @@ -103,10 +103,14 @@ typedef struct regexp_paren_ofs { } regexp_paren_ofs; typedef struct regexp_internal { - regexp_paren_ofs *swap; /* Swap copy of *startp / *endp */ +#ifdef DEBUGGING + int name_list_idx; /* Optional data index of an array of paren names */ +#endif + U32 *offsets; /* offset annotations 20001228 MJD data about mapping the program to the string*/ + regexp_paren_ofs *swap; /* Swap copy of *startp / *endp */ regnode *regstclass; /* Optional startclass as identified or constructed by the optimiser */ struct reg_data *data; /* Additional miscellaneous data used by the program. diff --git a/t/op/pat.t b/t/op/pat.t index 31922e9..f42847f 100755 --- a/t/op/pat.t +++ b/t/op/pat.t @@ -3663,6 +3663,32 @@ SKIP:{ $s=~s/(?'digits'\d+)\k'digits'/$+{digits}/; ok($s eq '123456','Named capture (single quotes) s///'); } + +{ + if (ord("A") == 193) { + for (1..10) { + print "ok $test # Skip: in EBCDIC"; + $test++; + } + } else { + use utf8; + # ñ = U+00F1 (n-tilde) + # ̧ = U+0327 (cedilla) + # ² = U+00B2 (superscript two) + + ok("..foo foo.." =~ /(?'ñ'foo) \k<ñ>/, 'Named capture UTF'); + ok($+{ñ} eq 'foo', 'Named capture UTF'); + ok("..bar bar.." =~ /(?<_ñ>bar) \k'_ñ'/, 'Named capture UTF'); + ok($+{_ñ} eq 'bar', 'Named capture UTF'); + ok("..abc abc.." =~ /(?'ç'abc) \k'ç'/, 'Named capture UTF'); + ok($+{ç} eq 'abc', 'Named capture UTF'); + ok("..xyz xyz.." =~ /(?'ņ̃'xyz) \k'ņ̃'/, 'Named capture UTF'); + ok($+{ņ̃} eq 'xyz', 'Named capture UTF'); + ok("..456 456.." =~ /(?456) \k'a²'/, 'Named capture UTF'); + ok($+{a²} eq '456', 'Named capture UTF'); + } +} + sub iseq($$;$) { my ( $got, $expect, $name)=@_; @@ -4193,6 +4219,9 @@ ok((q(a)x 100) =~ /^(??{'(.)'x 100})/, "Regexp /^(??{'(.)'x 100})/ crashes older perls") or print "# Unexpected outcome: should pass or crash perl\n"; +eval '/\k/'; +ok($@=~/\QSequence \k... not terminated in regex;\E/); + { local $Message = "substitution with lookahead (possible segv)"; $_="ns1ns1ns1"; @@ -4210,7 +4239,7 @@ ok((q(a)x 100) =~ /^(??{'(.)'x 100})/, iseq(0+$::test,$::TestCount,"Got the right number of tests!"); # Don't forget to update this! BEGIN { - $::TestCount = 1573; + $::TestCount = 1584; print "1..$::TestCount\n"; } diff --git a/t/op/re_tests b/t/op/re_tests index 48dbb79..1700588 100644 --- a/t/op/re_tests +++ b/t/op/re_tests @@ -1029,6 +1029,15 @@ X(?<=foo.)[YZ] ..XfooXY.. y pos 8 (?foo|bar|baz)(?[ew]+) snofooewa y $+{m} ew (?foo)|(?bar)|(?baz) snofooewa y $+{n} foo (?foo)(??{ $+{n} }) snofooefoofoowaa y $+{n} foo +(?Pfoo|bar|baz) snofooewa y $1 foo +(?Pfoo|bar|baz) snofooewa y $+{n} foo +(?Pfoo|bar|baz)(?P[ew]+) snofooewa y $+{n} foo +(?Pfoo|bar|baz)(?P[ew]+) snofooewa y $+{m} ew +(?Pfoo)|(?Pbar)|(?Pbaz) snofooewa y $+{n} foo +(?Pfoo)(??{ $+{n} }) snofooefoofoowaa y $+{n} foo +(?P<=n>foo|bar|baz) snofooewa c - Sequence (?P<=...) not recognized +(?Pfoo|bar|baz) snofooewa c - Sequence (?Pfoo|bar|baz) snofooewa c - Sequence (?PX<...) not recognized /(?'n'foo|bar|baz)/ snofooewa y $1 foo /(?'n'foo|bar|baz)/ snofooewa y $+{n} foo /(?'n'foo|bar|baz)(?'m'[ew]+)/ snofooewa y $+{n} foo @@ -1179,7 +1188,6 @@ round\(([^()]++)\) _I(round(xs * sz),1) y $1 xs * sz (x|y|z[QW])*+(longish|loquatious|excessive|overblown[QW])*+ xyzQzWlongishoverblownW y $1-$2 zW-overblownW (x|y|z[QW]){1,5}+(longish|loquatious|excessive|overblown[QW]){1,5}+ xyzQzWlongishoverblownW y $1-$2 zW-overblownW - a*(?!) aaaab n - - a*(*FAIL) aaaab n - - a*(*F) aaaab n - - @@ -1196,9 +1204,64 @@ a*(*F) aaaab n - - (([abc]+) \g-1)(([abc]+) \g{-1}) abc abccba cba y $2-$4 abc-cba (a)(b)(c)\g1\g2\g3 abcabc y $1$2$3 abc - +# \k preceded by a literal /(?'n'foo) \k/ ..foo foo.. y $1 foo /(?'n'foo) \k/ ..foo foo.. y $+{n} foo /(?foo) \k'n'/ ..foo foo.. y $1 foo /(?foo) \k'n'/ ..foo foo.. y $+{n} foo +/(?'a1'foo) \k'a1'/ ..foo foo.. y $+{a1} foo +/(?foo) \k/ ..foo foo.. y $+{a1} foo +/(?'_'foo) \k'_'/ ..foo foo.. y $+{_} foo +/(?<_>foo) \k<_>/ ..foo foo.. y $+{_} foo +/(?'_0_'foo) \k'_0_'/ ..foo foo.. y $+{_0_} foo +/(?<_0_>foo) \k<_0_>/ ..foo foo.. y $+{_0_} foo +/(?'0'foo) bar/ ..foo bar.. c - Sequence (?' +/(?<0>foo) bar/ ..foo bar.. c - Sequence (?< +/(?'12'foo) bar/ ..foo bar.. c - Sequence (?' +/(?<12>foo) bar/ ..foo bar.. c - Sequence (?< +/(?'1a'foo) bar/ ..foo bar.. c - Sequence (?' +/(?<1a>foo) bar/ ..foo bar.. c - Sequence (?< +/(?''foo) bar/ ..foo bar.. c - Sequence (?'' +/(?<>foo) bar/ ..foo bar.. c - Sequence (?<> +/foo \k'n'/ foo foo c - Reference to nonexistent named group +/foo \k/ foo foo c - Reference to nonexistent named group +/foo \k'a1'/ foo foo c - Reference to nonexistent named group +/foo \k/ foo foo c - Reference to nonexistent named group +/foo \k'_'/ foo foo c - Reference to nonexistent named group +/foo \k<_>/ foo foo c - Reference to nonexistent named group +/foo \k'_0_'/ foo foo c - Reference to nonexistent named group +/foo \k<_0_>/ foo foo c - Reference to nonexistent named group +/foo \k'0'/ foo foo c - Sequence \\k' +/foo \k<0>/ foo foo c - Sequence \\k< +/foo \k'12'/ foo foo c - Sequence \\k' +/foo \k<12>/ foo foo c - Sequence \\k< +/foo \k'1a'/ foo foo c - Sequence \\k' +/foo \k<1a>/ foo foo c - Sequence \\k< +/foo \k''/ foo foo c - Sequence \\k' +/foo \k<>/ foo foo c - Sequence \\k< /(?as) (\w+) \k (\w+)/ as easy as pie y $1-$2-$3 as-easy-pie + +# \g{...} with a name as the argument +/(?'n'foo) \g{n}/ ..foo foo.. y $1 foo +/(?'n'foo) \g{n}/ ..foo foo.. y $+{n} foo +/(?foo) \g{n}/ ..foo foo.. y $1 foo +/(?foo) \g{n}/ ..foo foo.. y $+{n} foo +/(?as) (\w+) \g{as} (\w+)/ as easy as pie y $1-$2-$3 as-easy-pie + +# Python style named capture buffer stuff +/(?Pfoo)(?P=n)/ ..foofoo.. y $1 foo +/(?Pfoo)(?P=n)/ ..foofoo.. y $+{n} foo +/(?:(?Pfoo)|(?Pbar))(?P=n)/ ..barbar.. y $+{n} bar +/^(?P(?P.)((?P>PAL)|.?)(?P=CHAR))$/ madamimadam y $& madamimadam +/^(?P(?P.)((?P>PAL)|.?)(?P=CHAR))$/ madamiamadam n - - +/(?Pfoo) (?P=n)/ ..foo foo.. y $1 foo +/(?Pfoo) (?P=n)/ ..foo foo.. y $+{n} foo +/(?Pas) (\w+) (?P=as) (\w+)/ as easy as pie y $1-$2-$3 as-easy-pie + +#check that non identifiers as names are treated as the appropriate lookaround +(?<=bar>)foo bar>foo y $& foo +(?)foo bar>foo n - - +(?<=bar>ABC)foo bar>ABCfoo y $& foo +(?ABC)foo bar>ABCfoo n - - +(?)foo bar>ABCfoo y $& foo +(?ABC)foo bar>ABCfoo y $& ABCfoo \ No newline at end of file