3 PCRE - Perl-compatible regular expressions
4 .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
7 The full syntax and semantics of the regular expressions that are supported by
8 PCRE are described in the
12 documentation. This document contains just a quick-reference summary of the
19 \ex where x is non-alphanumeric is a literal x
20 \eQ...\eE treat enclosed characters as literal
26 \ea alarm, that is, the BEL character (hex 07)
27 \ecx "control-x", where x is any ASCII character
31 \er carriage return (hex 0D)
33 \eddd character with octal code ddd, or backreference
34 \exhh character with hex code hh
35 \ex{hhh..} character with hex code hhh..
41 . any character except newline;
42 in dotall mode, any character whatsoever
43 \eC one byte, even in UTF-8 mode (best avoided)
45 \eD a character that is not a decimal digit
46 \eh a horizontal whitespace character
47 \eH a character that is not a horizontal whitespace character
48 \eN a character that is not a newline
49 \ep{\fIxx\fP} a character with the \fIxx\fP property
50 \eP{\fIxx\fP} a character without the \fIxx\fP property
51 \eR a newline sequence
52 \es a whitespace character
53 \eS a character that is not a whitespace character
54 \ev a vertical whitespace character
55 \eV a character that is not a vertical whitespace character
56 \ew a "word" character
57 \eW a "non-word" character
58 \eX an extended Unicode sequence
60 In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
61 characters, even in UTF-8 mode. However, this can be changed by setting the
65 .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
94 Pc Connector punctuation
98 Pi Initial punctuation
105 Sm Mathematical symbol
110 Zp Paragraph separator
114 .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
117 Xan Alphanumeric: union of properties L and N
118 Xps POSIX space: property Z or tab, NL, VT, FF, CR
119 Xsp Perl space: property Z or tab, NL, FF, CR
120 Xwd Perl word: property Xan or underscore
123 .SH "SCRIPT NAMES FOR \ep AND \eP"
147 Egyptian_Hieroglyphs,
162 Inscriptional_Pahlavi,
163 Inscriptional_Parthian,
220 .SH "CHARACTER CLASSES"
223 [...] positive character class
224 [^...] negative character class
225 [x-y] range (can be used for hex characters)
226 [[:xxx:]] positive POSIX named set
227 [[:^xxx:]] negative POSIX named set
233 cntrl control character
235 graph printing, excluding space
236 lower lower case letter
237 print printing, including space
238 punct printing, excluding alphanumeric
240 upper upper case letter
242 xdigit hexadecimal digit
244 In PCRE, POSIX character set names recognize only ASCII characters by default,
245 but some of them use Unicode properties if PCRE_UCP is set. You can use
246 \eQ...\eE inside a character class.
253 ?+ 0 or 1, possessive
256 *+ 0 or more, possessive
259 ++ 1 or more, possessive
262 {n,m} at least n, no more than m, greedy
263 {n,m}+ at least n, no more than m, possessive
264 {n,m}? at least n, no more than m, lazy
265 {n,} n or more, greedy
266 {n,}+ n or more, possessive
267 {n,}? n or more, lazy
270 .SH "ANCHORS AND SIMPLE ASSERTIONS"
274 \eB not a word boundary
276 also after internal newline in multiline mode
279 also before newline at end of subject
280 also before internal newline in multiline mode
282 also before newline at end of subject
284 \eG first matching position in subject
287 .SH "MATCH POINT RESET"
290 \eK reset start of match
302 (...) capturing group
303 (?<name>...) named capturing group (Perl)
304 (?'name'...) named capturing group (Perl)
305 (?P<name>...) named capturing group (Python)
306 (?:...) non-capturing group
307 (?|...) non-capturing group; reset group numbers for
308 capturing groups in each alternative
314 (?>...) atomic, non-capturing group
322 (?#....) comment (not nestable)
329 (?J) allow duplicate names
331 (?s) single line (dotall)
332 (?U) default ungreedy (lazy)
333 (?x) extended (ignore white space)
334 (?-...) unset option(s)
336 The following are recognized only at the start of a pattern or after one of the
337 newline-setting options with similar syntax:
339 (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
340 (*UTF8) set UTF-8 mode (PCRE_UTF8)
341 (*UCP) set PCRE_UCP (use Unicode properties for \ed etc)
344 .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
347 (?=...) positive look ahead
348 (?!...) negative look ahead
349 (?<=...) positive look behind
350 (?<!...) negative look behind
352 Each top-level branch of a look behind must be of a fixed length.
358 \en reference by number (can be ambiguous)
359 \egn reference by number
360 \eg{n} reference by number
361 \eg{-n} relative reference by number
362 \ek<name> reference by name (Perl)
363 \ek'name' reference by name (Perl)
364 \eg{name} reference by name (Perl)
365 \ek{name} reference by name (.NET)
366 (?P=name) reference by name (Python)
369 .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
372 (?R) recurse whole pattern
373 (?n) call subpattern by absolute number
374 (?+n) call subpattern by relative number
375 (?-n) call subpattern by relative number
376 (?&name) call subpattern by name (Perl)
377 (?P>name) call subpattern by name (Python)
378 \eg<name> call subpattern by name (Oniguruma)
379 \eg'name' call subpattern by name (Oniguruma)
380 \eg<n> call subpattern by absolute number (Oniguruma)
381 \eg'n' call subpattern by absolute number (Oniguruma)
382 \eg<+n> call subpattern by relative number (PCRE extension)
383 \eg'+n' call subpattern by relative number (PCRE extension)
384 \eg<-n> call subpattern by relative number (PCRE extension)
385 \eg'-n' call subpattern by relative number (PCRE extension)
388 .SH "CONDITIONAL PATTERNS"
391 (?(condition)yes-pattern)
392 (?(condition)yes-pattern|no-pattern)
394 (?(n)... absolute reference condition
395 (?(+n)... relative reference condition
396 (?(-n)... relative reference condition
397 (?(<name>)... named reference condition (Perl)
398 (?('name')... named reference condition (Perl)
399 (?(name)... named reference condition (PCRE)
400 (?(R)... overall recursion condition
401 (?(Rn)... specific group recursion condition
402 (?(R&name)... specific recursion condition
403 (?(DEFINE)... define subpattern for reference
404 (?(assert)... assertion condition
407 .SH "BACKTRACKING CONTROL"
410 The following act immediately they are reached:
412 (*ACCEPT) force successful match
413 (*FAIL) force backtrack; synonym (*F)
415 The following act only when a subsequent match failure causes a backtrack to
416 reach them. They all force a match failure, but they differ in what happens
417 afterwards. Those that advance the start-of-match point do so only if the
418 pattern is not anchored.
420 (*COMMIT) overall failure, no advance of starting point
421 (*PRUNE) advance to next starting character
422 (*SKIP) advance start to current matching position
423 (*THEN) local failure, backtrack to next alternation
426 .SH "NEWLINE CONVENTIONS"
429 These are recognized only at the very start of the pattern or after a
430 (*BSR_...) or (*UTF8) or (*UCP) option.
432 (*CR) carriage return only
434 (*CRLF) carriage return followed by linefeed
435 (*ANYCRLF) all three of the above
436 (*ANY) any Unicode newline sequence
439 .SH "WHAT \eR MATCHES"
442 These are recognized only at the very start of the pattern or after a
443 (*...) option that sets the newline convention or UTF-8 or UCP mode.
445 (*BSR_ANYCRLF) CR, LF, or CRLF
446 (*BSR_UNICODE) any Unicode newline sequence
453 (?Cn) callout with data n
459 \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
460 \fBpcrematching\fP(3), \fBpcre\fP(3).
468 University Computing Service
469 Cambridge CB2 3QH, England.
477 Last updated: 21 November 2010
478 Copyright (c) 1997-2010 University of Cambridge.