Karl Williamson [Mon, 31 Dec 2012 17:00:55 +0000 (10:00 -0700)]
regcomp.c: Collapse switch cases
These cases differ only in that the union is of the complement in one of
them. There is a function that provides both possibilities.
Karl Williamson [Mon, 31 Dec 2012 16:45:06 +0000 (09:45 -0700)]
regcomp.c: Collapse two switch cases
These two cases differ only in that the union is of the complement in
one of them. There is a function that provides both possibilities.
Karl Williamson [Mon, 31 Dec 2012 16:03:14 +0000 (09:03 -0700)]
regcomp.c: Remove redundant code
The statements just above the lines removed by this commit cause the
above-ASCII range to always match, not just under the conditions of the
removed code, so it is redundant.
Karl Williamson [Mon, 31 Dec 2012 15:45:10 +0000 (08:45 -0700)]
regcomp.c: Use data structure to remove special handling
[:digit:] matches nothing in the upper Latin1 range. That means the
inversion list for the whole of Latin1 is the same as the one for the
ASCII range. By copying the ASCII one into the Latin1's slot, we
eliminate the need for code to handle this case specially.
Future commits will remove the switch statement affected by this
commit entirely, and eliminating this special case becomes more
important given this direction.
Karl Williamson [Mon, 31 Dec 2012 04:41:44 +0000 (21:41 -0700)]
locale.t, pat_advanced.t: Remove TODOs
The previous commit fixed these TODOs.
Karl Williamson [Mon, 31 Dec 2012 04:14:58 +0000 (21:14 -0700)]
regex: Add pseudo-Posix class: 'cased'
/[[:upper:]]/i and /[[:lower:]]/i should match the Unicode property
\p{Cased}. This commit introduces a pseudo-Posix class, internally named
'cased', to represent this. This class isn't specifiable by the user,
except through using either /[[:upper:]]/i or /[[:lower:]]/i. Debug
output will say ':cased:'.
The regex parsing either of :lower: or :upper: will change them into
:cased:, where already existing logic can handle this, just like any
other class.
This commit fixes the regression introduced in
3018b823898645e44b8c37c70ac5c6302b031381, and that these have never
worked under 'use locale'. The next commit will un-TODO the tests for
these things.
Karl Williamson [Mon, 31 Dec 2012 03:55:49 +0000 (20:55 -0700)]
handy.h, regcomp.h, regexec.c: Sort initializers, switch()
Until recently, these were needed to be (or it made sense to be) in
numerical value of what the rhs of each #define evaluates to. But now,
they are all initialized to something else, and the numerical value is
not even apparent. Alphabetical order gives a logical ordering to help
a reader find things.
Karl Williamson [Mon, 31 Dec 2012 03:39:37 +0000 (20:39 -0700)]
locale.t: Add TODO tests
It turns out that Perl has always assumed that the Posix character
classes are closed under folding. For example, if a character is in
[:alpha:], its fold will be in [:alpha:] as well.
This seems like a reasonable assumption except for two classes, where it
is almost certainly wrong. If a character matches [:upper:], its fold
likely won't. Same for [:lower:]. What this means is that a regex of
the form
/[[:lower:]]/i
has never properly matched the uppercased versions of the characters in
the target string.
This commit adds TODO tests for these.
Karl Williamson [Mon, 31 Dec 2012 03:28:20 +0000 (20:28 -0700)]
locale.t: Add tests for [:upper:], [:lower:]
If our uc() and lc() functions are working properly, and the locale is
properly set up, things that are uppercase and not lowercase should
match [:upper:], and vice-versa. This adds tests for that.
Karl Williamson [Mon, 31 Dec 2012 03:23:07 +0000 (20:23 -0700)]
locale.t: Add extra debugging info
The tryneoalpha() routine is given an extra parameter to print when
debugging is enabled.
Karl Williamson [Mon, 31 Dec 2012 03:19:35 +0000 (20:19 -0700)]
locale.t: Fix off by 1 error in debug output
The message about skipping tests was giving the wrong initial test
number for the range of those skipped.
Karl Williamson [Mon, 31 Dec 2012 03:15:09 +0000 (20:15 -0700)]
locale.t: Add capability to have TODO tests
This .t has hand-rolled tests.
Karl Williamson [Sun, 30 Dec 2012 16:42:35 +0000 (09:42 -0700)]
re/pat_advanced.t: Add TODO test for recent regression
Commit
3018b823898645e44b8c37c70ac5c6302b031381 added a regression for
the Posix classes [:upper:] and [:lower:] when matching
case-insensitively. If an above-Latin1 code point has been matched by
one of these classes at the time another regex is compiled which also
has the same class as the first one, and the second regex is /i, the
case-insensitivity is ignored.
Karl Williamson [Fri, 28 Dec 2012 17:53:36 +0000 (10:53 -0700)]
regcomp.c: Add comments
James E Keenan [Mon, 31 Dec 2012 00:52:37 +0000 (19:52 -0500)]
Correct typo in test description.
David Mitchell [Sun, 30 Dec 2012 12:56:20 +0000 (12:56 +0000)]
fix g++ and nm
Under some versions of linux, Configure wouldn't work with g++ due to
the forcing of using nm in hints/linux.sh. This forcing is apparently no
longer necessary, so just remove it.
See this thread for more details: <
20121204151925.GO1900@iabyn.com>
Karl Williamson [Fri, 28 Dec 2012 16:44:47 +0000 (09:44 -0700)]
regcomp.c: Make sure optimizer handles node; Note for
3018b823
The check that there isn't a simple regnode unknown to the optimizer
has been commented out since at least
653099ff2c52a6af02b3894d684593dfe31dcc17 in 1999. This caused us some
slow down for a while in the 5.17 release, as I added a regnode and
didn't think about it needing to be added to the optimizer. This commit
uncomments out the check, but only for debugging builds, so that a
production environment will run with missing optimization, but
developers will be notified of their error.
Also, in commit
3018b823898645e44b8c37c70ac5c6302b031381, I made an
error in the commit message; I'm adding it here so people digging will
find out the relevant information. The message said that the new regex
ops were unknown to the optimizer. I had forgotten that I had included
them in that commit, so that statement was wrong.
Karl Williamson [Thu, 27 Dec 2012 22:26:01 +0000 (15:26 -0700)]
regcomp.c: Use xor to save a test
(Perhaps the C optimizer already figures this out.)
Karl Williamson [Thu, 27 Dec 2012 21:35:46 +0000 (14:35 -0700)]
regcomp.c: Free up ANYOF flag bit
This frees up a flag bit for ANYOF regnodes. The freed bit is currently
not needed for other uses; I decided to make the change now, while how
to do it was fresh in my mind. There are fewer shifts and masks as a
result, as well.
This commit moves the information this bit contains to the otherwise
unused 'next_off' field in the synthetic start class. This paradigm
could be used to pass information to the regex matching code for just
the synthetic start class, but the current bit is used just during
compilation.
Karl Williamson [Thu, 27 Dec 2012 20:49:30 +0000 (13:49 -0700)]
Add new regnode for synthetic start class
This creates a regnode specifically for the synthetic start class, which
is a type of ANYOF node. The flag bit previously used to denote this is
removed. This paves the way for this bit to be freed up, but first the
other use of this bit must also be removed, which will be done in the
next commit.
There are now three ANYOF-type regnodes. This one should be called only
in one place in regexec.c. The other special one is ANYOF_WARN_SUPER.
A synthetic start class node should not do any warning, so there is no
issue of having something need to be both types.
Karl Williamson [Thu, 27 Dec 2012 20:24:06 +0000 (13:24 -0700)]
regcomp.c, regcomp.h: White-space, comment only
No code changes
Karl Williamson [Thu, 27 Dec 2012 20:15:46 +0000 (13:15 -0700)]
regcomp.h: Split two ANYOF flag bits
This essentially reverts
8b27d3db700fc2fce268e3d78e221a16ccaca2e8
and causes ANYOF nodes that are in locale but don't match things like \w
to have a smaller node size.
Karl Williamson [Thu, 27 Dec 2012 17:59:13 +0000 (10:59 -0700)]
Free up regex ANYOF bit.
This uses a regnode type, of which we have many available, to free up
a bit in the ANYOF regnode flag field, of which we have none, and are
trying to have the same bit do double duty. This will enable us to
remove some of that double duty in the next commit.
Karl Williamson [Thu, 27 Dec 2012 19:15:56 +0000 (12:15 -0700)]
regcomp.c: Remove unnecessary flag setting
The function cl_anything() sets things up so that the synthetic start
class parameter will match any character. But this flag doesn't
contribute to that, as it sets things to match under certain conditions
only, and the characters it matches already match by the other actions
of the routine (the ANYOF_BITMAP_SETALL just above).
Karl Williamson [Thu, 27 Dec 2012 19:34:41 +0000 (12:34 -0700)]
regcomp.c: Clean up ANYOF_CLASS handling.
The ANYOF_CLASS flag is used in ANYOF nodes (for [bracketed] and the
synthetic start class) only when matching something like \w, [:punct:]
etc., under /l (locale). It should not be set unless /l is specified.
However, it was always getting set for the synthetic start class. This
commit fixes that. The previous code was masking errors in which it was
being tested for unnecessarily, and for much of the 5.17 series, the
synthetic start class was always set to test for locale, which was a
waste of cpu when no locale was specified.
Karl Williamson [Wed, 26 Dec 2012 17:33:13 +0000 (10:33 -0700)]
regcomp.c: Add #undef
The #defines in the first part of the file are #undef'd to try to keep
anything from inadvertently leaking into the 2nd part; this one was
omitted.
Karl Williamson [Wed, 26 Dec 2012 17:32:43 +0000 (10:32 -0700)]
regcomp.c: typo in comment
Karl Williamson [Wed, 26 Dec 2012 17:14:34 +0000 (10:14 -0700)]
regcomp.c: Handle edge case in inversion list max
The function _invlist_max() cannot currently be called on an inversion
list that was compiled in (via a C language array), because this edge
case is not considered. I don't think current code ever does call this
function on this case, but that could inadvertently easily change.
Karl Williamson [Tue, 25 Dec 2012 19:21:53 +0000 (12:21 -0700)]
regcomp.c: Reword comment
To more accurately reflect newer code
Karl Williamson [Thu, 27 Dec 2012 17:25:48 +0000 (10:25 -0700)]
regcomp.c: Don't iterate while changing an inversion list
This adds functions to prevent accidental (or deliberate) iteration over
an inversion list while it is being modified. This is to catch
development errors, and in production builds, the asserts() are likely
no-ops
Karl Williamson [Thu, 27 Dec 2012 16:55:54 +0000 (09:55 -0700)]
regen/embed.pl: Add sanity test for entries
A void returning function should not be required to have its return
value tested.
Chris 'BinGOs' Williams [Thu, 27 Dec 2012 21:49:23 +0000 (21:49 +0000)]
Update Scalar-List-Utils to CPAN version 1.27
[DELTA]
1.26_001 -- Sun Dec 23 15:58
* Fix multicall refcount bug RT#80646
Chris 'BinGOs' Williams [Thu, 27 Dec 2012 21:43:42 +0000 (21:43 +0000)]
Update HTTP-Tiny to CPAN version 0.025
[DELTA]
0.025 2012-12-26 12:09:43 America/New_York
[ADDED]
- Agent string appends default if it ends in a space, just like LWP
[Chris Weyl]
Matthew Horsfall (alh) [Wed, 26 Dec 2012 19:54:52 +0000 (14:54 -0500)]
Add regcomp.c warning checks to t/porting/diag.t.
* Support regcomp.c ckWARN and vWARN macros
* Update pod/perldiag.pod for fixes discovered with new checks
* Allow t/porting/diag.t to match printfs with flags more liberally
Karl Williamson [Mon, 24 Dec 2012 15:56:22 +0000 (08:56 -0700)]
Perl_instr() implement with libc equivalent.
C89 libc contains the strstr() function that does the same thing as
instr(). Convert to use it under the assumption that it is faster than
our code, and is less for us to maintain. Because early versions of
Lunix libc had a bug when the 'little' argument was NULL (fixed in late
1994), check for that explicitly. If we were willing to ignore issues
with such old libc versions, or if we had a Configure probe that tested
for the bug, we could replace the macro instr() completely with a call
to strstr().
The memmem() GNU extension does the same thing as Perl_ninstr(). It
however has bugs in much later libc versions. A Configure probe could
be written to see if it is usable.
Karl Williamson [Wed, 26 Dec 2012 16:32:58 +0000 (09:32 -0700)]
Fix comment references to removed rexex ops
Commit
3018b823898645e44b8c37c70ac5c6302b031381 removed the regular
expression operations (regnodes) that these comments refer to, replacing
them with different ones. Update the comments to be accurate
Karl Williamson [Tue, 25 Dec 2012 16:04:05 +0000 (09:04 -0700)]
regcomp.c: Properly indent
Karl Williamson [Tue, 25 Dec 2012 15:55:14 +0000 (08:55 -0700)]
pat_advanced.t: Add tests
I almost broke this, so adding a precautionary test.
David Mitchell [Wed, 26 Dec 2012 13:53:23 +0000 (13:53 +0000)]
Revert "porting/podcheck corrections."
This reverts commit
f6a6501216dee24e251d4482bd3a1f6daf4ac0da.
The fix seems wrong and is causing podcheck.t test failures, and (for my
system at least), reverting it removes those errors and doesn't create new
errors. Whatever was originally causing podcheck errors needs to be fixed,
rather than trying to mask it.
James E Keenan [Wed, 26 Dec 2012 03:55:17 +0000 (22:55 -0500)]
porting/podcheck corrections.
James E Keenan [Wed, 26 Dec 2012 03:07:21 +0000 (22:07 -0500)]
Add another address for Renee Baecker.
Keep t/porting/authors.t happy.
James E Keenan [Wed, 26 Dec 2012 02:39:39 +0000 (21:39 -0500)]
ext/Hash-Util/lib/Hash/Util.pm: Bump $VERSION to reflect documentation
changes.
Revert one typographical correction to satisfy t/porting/podcheck.t
reneeb [Tue, 25 Dec 2012 23:06:02 +0000 (00:06 +0100)]
Add new functions of Hash::Util in documentation.
Add call of 'hash_value' to synopsis and fix typo.
Typographical and grammatical touch-ups (by committer).
David Mitchell [Tue, 25 Dec 2012 21:44:58 +0000 (21:44 +0000)]
fix utf8ness in ${"string"}
Commit
28123549 introduced some consistency checks which started warning
against code like:
$c = chr 163;
$x = $$c;
This was caused by a symbol table lookup of a value which isn't utf8
encoded, but which was being treated as utf8 encoded.
David Mitchell [Tue, 25 Dec 2012 21:03:27 +0000 (21:03 +0000)]
Eliminate PL_reg_flags
The previous 3 commits have removed any usage of the 3 flags bits from
this var; remove the (now unused) varable (which is actually #deffed to
PL_reg_state.re_state_reg_flags).
This change brought to you by the Campaign to Remove Global State from the
Regex Engine(tm).
David Mitchell [Tue, 25 Dec 2012 20:51:50 +0000 (20:51 +0000)]
Eliminate RF_tainted flag from PL_reg_flags
This global flag is cleared at the start of execution, and then set if
any locale-based nodes are executed. At the end of execution, the
RXf_TAINTED_SEEN flag on the regex is set/cleared based on RF_tainted.
We eliminate RF_tainted by simply directly setting RXf_TAINTED_SEEN
each time a taintable node is executed.
This is the final step before eliminating PL_reg_flags.
David Mitchell [Tue, 25 Dec 2012 18:24:50 +0000 (18:24 +0000)]
eliminate RF_warned flag from PL_reg_flags
This global flag indicates whether the currently executing regex has
issued a recursion limit warning yet.
Replace it with a boolean var local to the regmatch_info struct.
This is a second step to eliminating PL_reg_flags.
David Mitchell [Tue, 25 Dec 2012 18:09:32 +0000 (18:09 +0000)]
eliminate RF_utf8 flag from PL_reg_flags
This global flag indicates whether the currently executing regex is utf8.
Replace it with a boolean var local to to the matching function, and pass
it around via function args, or as a member of the regmatch_info struct.
This is a first step to eliminating PL_reg_flags.
David Mitchell [Tue, 25 Dec 2012 15:39:25 +0000 (15:39 +0000)]
[perl #116148] Pattern utf8ness sticks around
Perl_re_intuit_start would set, but never unset, the RF_utf8 flag in
PL_reg_flags. This meant that two successive patterns, the first utf8 and
the sdeconfd not, that are processed using only intuit, will get the flag
wrong on the second one. The fix is trivial.
Karl Williamson [Mon, 24 Dec 2012 15:39:58 +0000 (08:39 -0700)]
perlapi: Fix misstatement
According to the comments for Perl_sv_setuv(), for performance reasons,
a UV that fits in an IV is stored as an IV.
Karl Williamson [Mon, 24 Dec 2012 14:52:50 +0000 (07:52 -0700)]
perlapi: Fix typos
H.Merijn Brand [Mon, 24 Dec 2012 15:08:40 +0000 (16:08 +0100)]
Remove register keywork from randfunc tests in Configure
Karl Williamson [Mon, 24 Dec 2012 02:44:03 +0000 (19:44 -0700)]
regcomp.c: Add comment, Note for
3018b8238
This adds a comment for this code, which I mistakenly included in commit
3018b823898645e44b8c37c70ac5c6302b031381, and should have been in its
own commit. The code optimizes something that has a quantifier of zero
into a NOTHING node.
Karl Williamson [Mon, 24 Dec 2012 02:09:28 +0000 (19:09 -0700)]
regexec.c: Comments, white-space; no code changes
Karl Williamson [Mon, 24 Dec 2012 02:04:30 +0000 (19:04 -0700)]
utf8.h: Make sure char* is cast to U8* for unsigned comparison
If a char* is passed prior to this commit, an above-ASCII char could
have been considered negative instead of positive, and thus screwed up
these tests
Karl Williamson [Mon, 24 Dec 2012 02:01:34 +0000 (19:01 -0700)]
utf8.h: Parenthesize macro parameter
This apparently hasn't caused us problems, but all uses of a macro
paramenter should be parenthesized to prevent surprises.
Karl Williamson [Mon, 24 Dec 2012 01:46:15 +0000 (18:46 -0700)]
perlapi: Clarify isSPACE(), document isPSXSPC()
Daniel Dragan [Sun, 16 Dec 2012 22:26:49 +0000 (17:26 -0500)]
uninline panic branch from POPSTACK
This commit saves machine instructions by preventing inlining, and keeps
the error handling code for an extremely rare panic out of hot code. This
should make the interp smaller and faster.
Perl_error_log is a macro that has a very large expansion on threaded
perls, 4 branches and possibly a call to Perl_PerlIO_stderr. POPSTACK
18 times, by asm, on my non DEBUGGING threaded Win32 32 bit Perl 5.17
-O1 compiled with VC 2003. POPSTACK is also used in some core XS modules,
for example List::Util and PerlIO::encoding. The .text section of
perl517.dll dropped from 0xc05ff bytes of x86 instructions to 0xc00ff
after applying this for me.
Perl_croak_popstack was made contextless to save a push/move instruction
at each caller (less instructions in the instruction cache) and for more
opportunity for the compiler to optimize. Since Perl_croak_popstack is a
noreturn, some compilers may optimize it to just a conditional jump
instruction. VC 2003 32 bit did this inside perl517.dll and from XS
modules using POPSTACK. Perl_croak_popstack measures at 0x48 bytes of
instructions under -O1 for me, so previously, those 0x48 minus the
dTHX overhead would have been sitting in the caller because of macro
expansion.
Karl Williamson [Sat, 12 Nov 2011 20:36:53 +0000 (13:36 -0700)]
perlhack: in-line functions need extra care
Karl Williamson [Sun, 23 Dec 2012 20:49:02 +0000 (13:49 -0700)]
handy.h: Add full complement of isIDCONT() macros
This also changes isIDCONT_utf8() to use the Perl definition, which
excludes any \W characters (the Unicode definition includes a few of
these). Tests are also added. These macros remain undocumented for
now.
Karl Williamson [Sun, 23 Dec 2012 20:47:11 +0000 (13:47 -0700)]
ext/XS-APItest/t/handy.t: White-space only
Indent 2 newly formed blocks properly
Karl Williamson [Sun, 23 Dec 2012 20:44:19 +0000 (13:44 -0700)]
ext/XS-APItest/t/handy.t: Work better on platforms sans proper locales
This was skipping a bunch of tests that should have been done when the
platform does not have properly working locales.
Craig A. Berry [Sun, 23 Dec 2012 19:42:47 +0000 (13:42 -0600)]
Proper IEEE overflow semantics for VMS.
Way back in
67597c89125e7e14 we misspelled _IEEE_FP as __IEEE_FP,
with a spurious leading underscore. Which I then copied and
pasted into pp_pack.c in
baf3cf9c09c529. This means that on
Alpha and Itanium systems with the default selection of IEEE
floating point, we've actually (for the last decade!) been
using a workaround intended for VAX or Alpha and Itanium builds
that have explicitly selected VAX-compatible floating point
formats. Oh well.
Karl Williamson [Sat, 22 Dec 2012 19:45:00 +0000 (12:45 -0700)]
svleak.t: Add a test; make another more robust
The code that these tests are for has recently changed to perhaps
allocate a scalar which is not freed until global destruction.
Therefore, run the loop more times and allow a tolerance of one scalar.
Karl Williamson [Sun, 23 Dec 2012 17:03:16 +0000 (10:03 -0700)]
Deprecate calling isFOO_utf8() with malformed
handy.h has character classification macros to determine if a UTF-8
encoded character is of a given type FOO, such as isALPHA_utf8(), etc.
Code that calls these should have first made sure that the parameter is
legal UTF-8. Prior to this patch, false was silently returned for all
illegal UTF-8. Now, in most instances, a deprecation warning is raised.
This is to catch bugs, and prepare for eventual elimination of this
check, which fails to catch read-off-end-of-buffer malformations anyway.
(One idea would be to leave the check in for DEBUGGING builds.)
The cases where no deprecation warning is raised as a result of this
commit is for the classes where the character does not have to be
converted to a code point for its inclusion to be determined. For
example, if malformed UTF-8 is checked to see if it is ASCII, we only
need to check that it is one of the 128 ASCII characters. If it isn't,
we don't bother to see if it is malformed or not. There are other
cases, as well, such as with isSPACE(), where we check if the UTF-8 is
one of a very finite set, without checking for malformedness.
This commit causes a number of apparent bugs to be shown by the Perl
test suite. These do not cause actual failures.
Karl Williamson [Sun, 23 Dec 2012 16:17:27 +0000 (09:17 -0700)]
Add Augustina Blair to AUTHORS
Augustina Blair [Mon, 17 Dec 2012 18:47:52 +0000 (13:47 -0500)]
Removed p5p-faq reference from perlhack.pod.
Original document linked to has been removed because it was out of date and
redundant.
Father Chrysostomos [Sun, 23 Dec 2012 07:26:37 +0000 (23:26 -0800)]
regcomp.pl: Calculate col widths for perldebguts
I put this in originally, but somehow undid it by mistake before
committing
65aa4ca74a9c.
Father Chrysostomos [Sun, 23 Dec 2012 07:07:31 +0000 (23:07 -0800)]
Regenerate the regnode table in perldebguts.pod automatically
Karl Williamson [Sun, 23 Dec 2012 03:45:39 +0000 (20:45 -0700)]
regcomp.c: Change some instances to SvREFCNT_dec_NN
I went through regcomp.c and for the areas I am familiar with, plus
those where it was trivially determinable, converted to SvREFCNT_dec_NN
where possible. I am not saying there aren't more ones that could be
converted.
Father Chrysostomos [Sun, 23 Dec 2012 01:47:27 +0000 (17:47 -0800)]
Increase $diagnostics::VERSION to 1.31
Matthew Horsfall (alh) [Sun, 16 Dec 2012 23:02:43 +0000 (18:02 -0500)]
RT-89642 - Don't treat ,; as special end-of-line characters.
Support multi-line "=item ..." expressions per the POD spec.
This also allows warnings with white-space differences to match.
Karl Williamson [Sun, 23 Dec 2012 00:54:46 +0000 (17:54 -0700)]
regcomp.c: Yet another move of declaration to proper place
Commit
3fde42ccb5ca4eb8238f0fcbd2a01464a9c6193d did not get the
declarations in the proper order. I don't understand how this got out
of order in the first place, but hopefully this will fix it up.
Craig A. Berry [Sat, 22 Dec 2012 22:38:41 +0000 (16:38 -0600)]
Fix erroneous USE_LONG_DOUBLE in configure.com.
Once upon a time there was a "use64bit" option [1] that only later
became separated into use64bitint, uselongdouble, and use64bitall,
but we didn't properly separate out everything. So if you chose
64-bit integers but not long doubles, you would get the macro
USE_LONG_DOUBLE defined but without other supporting defines and
with incompatible branches followed in various parts of the #ifdef
jungle. So separate them out. Thanks to Thomas Pfau for trying
what's apparently a rare configuration.
[1] See
fafa4fee6354847ae7fda.
Craig A. Berry [Sat, 22 Dec 2012 22:32:02 +0000 (16:32 -0600)]
Fix d_nv_preserves_uv on VMS with 64-bit int but no long double.
There was a typo in
a5bd55ee8902ea3fcb that left a spurious double
quote in config.h and caused compile failures when compiling with
-Duse64bitint but not also selecting -Duselongdouble. Problem
reported by Thomas Pfau.
Karl Williamson [Sat, 22 Dec 2012 20:29:24 +0000 (13:29 -0700)]
regcomp.c: Move declaration to proper place
Declarations must come before code in C89. I fixed this earlier based
on smoke results, but somehow it got lost, and my compiler doesn't warn
for this.
Karl Williamson [Sat, 22 Dec 2012 18:12:08 +0000 (11:12 -0700)]
Merge character class handling revamp topic branch into blead
This sequence of commits cuts down the amount of duplication in handling
character classes (like \w, [:graph:]). There are three principal
components: 1) refactor a switch statement in regcomp.c to have common
code for most of the possible classes; 2) replace the 30 regops that
handle these classes by just 4 (effectively) distinct ones; 3) deprecate
the functions callable from XS that do character classification. XS
code should instead (if they weren't already) use macros in handy.h to
accomplish this purpose.
Karl Williamson [Thu, 20 Dec 2012 17:22:26 +0000 (10:22 -0700)]
regcomp.c: Reorder some case: statements so can FALL THROUGH
Karl Williamson [Thu, 20 Dec 2012 17:10:34 +0000 (10:10 -0700)]
regcomp.c: White-space only; no code changes
This properly indents some lines, and adds/subtracts white space
elsewhere
Karl Williamson [Thu, 20 Dec 2012 16:39:22 +0000 (09:39 -0700)]
Deprecate all is_(uni|utf8)_foo function uses
Coders should use the macros in handy.h instead of calling these
directly.
Karl Williamson [Thu, 20 Dec 2012 16:29:36 +0000 (09:29 -0700)]
Create internal _is_utf8_mark()
This is so we can deprecate non-core use of the existing one in a future
commit. XS coders should be using the macros in handy.h instead of
calling such functions directly. A future commit will deprecate all of
them, but first the core uses of this one must change so they don't
generate deprecation messages. I will not have a chance to look for
some time, but I suspect that most uses of this function in the core
should be changed to use something else, but in the meantime, the
non-core uses can be deprecated.
Karl Williamson [Wed, 19 Dec 2012 20:54:16 +0000 (13:54 -0700)]
regexec.c: Combine adjacent 'ifs' with same clause
These two instances of 'if (a) { b } if (c) { b } are combined to
if (a || c) { b }
The final instance is made into an else since the 'if' before it does a
break, so that the break is eliminated.
Karl Williamson [Wed, 19 Dec 2012 20:53:02 +0000 (13:53 -0700)]
regexec.c: Remove 2 unnecessary break statements
Karl Williamson [Wed, 19 Dec 2012 20:39:45 +0000 (13:39 -0700)]
Remove TODO for test for #114272
This is now fixed
Karl Williamson [Wed, 19 Dec 2012 20:30:33 +0000 (13:30 -0700)]
Remove temporary back-compat PL_ variable names
These names are synonyms for specific array elements, and were used
temporarily until all uses of them were removed. This commit removes
the remaining uses, and the definitions
Karl Williamson [Wed, 19 Dec 2012 20:27:08 +0000 (13:27 -0700)]
utf8.c: Remove two internal now unused functions.
These functions were used internally as helpers for matching \X in
regular expressions. They are no longer used.
Karl Williamson [Wed, 19 Dec 2012 20:07:48 +0000 (13:07 -0700)]
regexec.c: Revamp the macros to load swashes
This changes the swash-load macros to use swash_init() and swash_fetch()
instead of calling is_utf8_xxx() functions that may only be needed for
this purpose (which will hence become eligible for removal because of
this commit).
The check that a known character matches the loaded swash is now only
done in DEBUGGING builds. And the macro to load the DIGIT swash is
removed, as there are no remaining calls of it.
Karl Williamson [Wed, 19 Dec 2012 19:55:47 +0000 (12:55 -0700)]
handy.h: Improve some comments
Karl Williamson [Wed, 19 Dec 2012 19:09:20 +0000 (12:09 -0700)]
regexec.c: Remove unused macro definitions
These are no longer used.
Karl Williamson [Tue, 18 Dec 2012 20:58:00 +0000 (13:58 -0700)]
regcomp.c: Reorder two if-elses
It is generally easier to understand an
if () { few statements } else { many statements}
than the other way around.
Karl Williamson [Tue, 18 Dec 2012 04:37:40 +0000 (21:37 -0700)]
Consolidate some regex OPS
The regular rexpression operation POSIXA works on any of the (currently)
16 posix classes (like \w and [:graph:]) under the regex modifier /a.
This commit creates similar operations for the other modifiers: POSIXL
(for /l), POSIXD (for /d), POSIXU (for /u), plus their complements.
It causes these ops to be generated instead of the ALNUM, DIGIT,
HORIZWS, SPACE, and VERTWS ops, as well as all their variants. The net
saving is 22 regnode types.
The reason to do this is for maintenance. As of this commit, there are
now 22 fewer node types for which code has to be maintained. The code
for each variant was essentially the same logic, but on different
operands. It would be easy to make a change to one copy and forget to
make the corresponding change in the others. Indeed, this patch fixes
[perl #114272] in which one copy was out of sync with others.
This patch actually reduces the number of separate code paths to 5:
POSIXA, NPOSIXA, POSIXL, POSIXD, and POSIXU. The complements of the
last 3 use the same code path as their non-complemented version, except
that a variable is initialized differently. The code then XORs this
variable with its result to do the complementing or not. Further, the
POSIXD branch now just checks if the target string being matched is
UTF-8 or not, and then jumps to either the POSIXU or POSIXA code
respectively. So, there are effectively only 4 cases that are coded:
POSIXA, NPOSIXA, POSIXL, and POSIXU. (POSIXA doesn't have to worry
about UTF-8, while NPOSIXA does, hence these for efficiency are coded
separately.)
Removing all this code saves memory. The output of the Linux size
command shows that the perl executable was shrunk by 33K bytes on my
platform compiled under -O0 (.7%) and by 18K bytes (1.3%) under -O2.
The reason this patch was doable was previous work in numbering the
POSIX classes, so that they could be indexed in arrays and bit
positions. This is a large patch; I didn't see how to break it into
smaller components.
I chose to make this code more efficient as opposed to saving even more
memory. Thus there is a separate loop that is jumped to after we know
we have to load a swash; this just saves having to test if the swash is
loaded each time through the loop. I avoid loading the swash until
absolutely necessary. In places in the previous version of this code,
the swash was loaded when the input was UTF-8, even if it wasn't yet
needed (and might never be if the input didn't contain anything above
Latin1); apparently to avoid the extra test per iteration.
The Perl test suite runs slightly faster on my platform with this patch
under -O0, and the speeds are indistinguishable under -O2. This is in
spite of these new POSIX regops being unknown to the regex optimizer
(this will be addressed in future commits), and extra machine
instructions being required for each character (the xor, and some
shifting and masking). I expect this is a result of better caching, and
not loading swashes unless absolutely necessary.
Karl Williamson [Mon, 17 Dec 2012 16:09:05 +0000 (09:09 -0700)]
regexec.c: comments, white-space only
No code changes
Karl Williamson [Wed, 19 Dec 2012 19:51:05 +0000 (12:51 -0700)]
handy.h: Refactor some internal macro calls
I didn't plan very well when I added these macros recently. This
refactors them to be more logical.
Karl Williamson [Mon, 17 Dec 2012 03:35:30 +0000 (20:35 -0700)]
regcomp.c: Expand only call of a macro
Karl Williamson [Mon, 17 Dec 2012 03:11:30 +0000 (20:11 -0700)]
regcomp.c: Combine two cases in a switch()
Like [[:^digit]] done in a previous commit, the non-complemented
[[:digit:]] can be combined with its kin, provided we override a
variable setting for just it.
Karl Williamson [Mon, 17 Dec 2012 01:58:33 +0000 (18:58 -0700)]
regcomp.c: Replace macro by expansion in only place called
Previous commits have removed all but one call of this macro. Replace
that call by its expansion. It also adds some comments.
Karl Williamson [Sun, 16 Dec 2012 21:29:41 +0000 (14:29 -0700)]
regcomp.c: Collapse switch() case
This combines one switch case with another, overriding a variable
setting is all that is needed to make these identical.
Karl Williamson [Sun, 16 Dec 2012 04:10:37 +0000 (21:10 -0700)]
regcomp.c: Expand single instance of macro
This small macro has only one call of it
Karl Williamson [Sun, 16 Dec 2012 04:03:23 +0000 (21:03 -0700)]
regcomp.c: Collapse cases in a switch()
[:upper:] and [:lower:] have the same logic as some other cases in the
switch statement, but differ in that under /i they match more than just
themselves, and this has to be accounted for.
By moving the test for /i to outside the switch(), these cases can be
collapsed. There is a small performance penalty in having to test for
all classes if /i is active, and if so, if the class is one of these
two.
Karl Williamson [Sun, 16 Dec 2012 03:47:53 +0000 (20:47 -0700)]
regcomp.c: Use auto variables set to array elements
This permits slightly clearer reading of the code, and will be useful in
a future commit to allow further collapsing of cases int the switch()