Karl Williamson [Tue, 9 Oct 2012 19:34:08 +0000 (13:34 -0600)]
regexec.c: Change variable name
This actually is a pointer to the pattern string, not to a byte.
Karl Williamson [Tue, 9 Oct 2012 19:32:12 +0000 (13:32 -0600)]
regexp.h: Update comments
These comments should have been changed in commit
c74f6de970ef0f0eb8ba43b1840fde0cf5a45497, but were mistakenly omitted.
Father Chrysostomos [Tue, 16 Oct 2012 23:07:19 +0000 (16:07 -0700)]
[perl #115260] Stop length($obj) from returning undef
When commit
9f621bb00 made length(undef) return undef, it also made it
return undef for objects with string overloading that returns undef.
But stringifying as undef is a contradiction in terms, and this makes
length inconsistent with defined, which returns true for such objects.
Changing this allows is to simplify pp_length, as we can now call
sv_len_utf8 on the argument unconditionally (except under the bytes
pragma). sv_len_utf8 is now careful not to record caches on magical
or overloaded scalars (any non-PV, in fact).
Note that sv_len is now just a wrapper around SvPV_const, so we use
SvPV_const_nomg, as there is no equivalent sv_len_nomg.
Father Chrysostomos [Tue, 16 Oct 2012 21:36:43 +0000 (14:36 -0700)]
[perl #96230] Stop s/$qr// from reusing last pattern
qr// should not be using the last-successful pattern, because it is
"(?^:)", not the empty pattern. A stringified qr// does not use the
last-successful pattern.
This was fixed for m/$qr/ (and =~ qr//) in commit
7e31363783, but
s/$qr// was left out.
Father Chrysostomos [Tue, 16 Oct 2012 21:09:14 +0000 (14:09 -0700)]
perlδ up to the present
Steffen Mueller [Tue, 16 Oct 2012 15:27:47 +0000 (17:27 +0200)]
cpan: Upgrade AutoLoader to 5.73
Just syncing to CPAN release.
Chris 'BinGOs' Williams [Tue, 16 Oct 2012 11:55:44 +0000 (12:55 +0100)]
Update HTTP-Tiny to CPAN version 0.024
[DELTA]
0.024 2012-10-09 20:44:53 America/New_York
[ADDED]
- SSL connections now auto-retry I/O after SSL renegotiation [Alan
Gardner]
[FIXED]
- User-specified CA bundles take precedence over Mozilla::CA [Alan
Gardner]
[PREREQS]
- SSL support now requires Net::SSLeay 1.49 or greater to support
auto-retry [Mike Doherty]
- Downgraded IO::Socket::SSL and related prereqs to 'suggests' again
0.023 2012-09-19 09:55:46 America/New_York
[PREREQS]
- IO::Socket::SSL and related prereqs changed to 'required' for dev
release to get better failure diagnostics from CPAN Testers
[TESTING]
- Skip live SSL testing unless IO::Socket::SSL 1.56+ installed
Chris 'BinGOs' Williams [Tue, 16 Oct 2012 11:50:46 +0000 (12:50 +0100)]
Sync Module-CoreList in Porting/Maintainers.pl with CPAN
Father Chrysostomos [Tue, 16 Oct 2012 06:56:33 +0000 (23:56 -0700)]
Father Chrysostomos [Tue, 16 Oct 2012 06:06:31 +0000 (23:06 -0700)]
Make PerlIO::encoding handle cows
Commits
667763bdbf and
e9a8753af fixed bugs involving buffer realloca-
tions during encode and decode. But what was not taken into account
was that the COW flags could still be left on even when buffer real-
ocations were accounted for. This could result in SvPV_set and
SvLEN_set(sv,0) being called on an SV with the COW flags still on,
so SvPVX would be treated as a key inside a shared_he, resulting in
assertion failures.
Father Chrysostomos [Tue, 16 Oct 2012 05:53:30 +0000 (22:53 -0700)]
Prune some dead code in pp.c:pp_undef
Since commit
6fc9266916, the if (SvFAKE) check under the SVt_PVGV case
in pp_undef has been redundant. And PVBMs are no longer GVs.
Father Chrysostomos [Mon, 15 Oct 2012 06:09:56 +0000 (23:09 -0700)]
Make PerlIO::encoding even more resilient to moving buffers
Commit
667763bdbf was not good enough.
If the buffer passed to an encode method is reallocated, it may be
smaller than the size (bufsiz) stored inside the encoding layer. So
we need to extend the buffer in that case and make sure the buffer
pointer is not pointing to freed memory.
The test as modified by this commit causes malloc errors on stderr
when I try it without the encoding.xs changes.
Steve Hay [Mon, 15 Oct 2012 07:35:59 +0000 (08:35 +0100)]
Bump $Win32CORE::VERSION following commit
35f601d964
Daniel Dragan [Tue, 9 Oct 2012 01:00:06 +0000 (21:00 -0400)]
have Win32CORE use ALIAS/XSANY
Using XSANY in addition to a struct of strings, saved 650 bytes (.rdata
and .text combined, 32bit/MS VC2K3/O1) from the previous implementation of
Win32CORE. Instead of encoding pointers or relative pointer sized offsets
to string literals, use unsigned chars. Instead of creating new XSUB C
function stubs, one per forwarded sub, use the ALIAS/XSANY feature and
have only 1 XSUB which has many names. If a length aware version of newXS
is ever added to perl, the sub names's lengths already are available. See
also commit
eff5b9d539e for something similar to this commit.
Karl Williamson [Sun, 14 Oct 2012 00:17:11 +0000 (18:17 -0600)]
regcomp.c: Don't set /i in start class unless /l
There is a deficiency in the optimizer in which it doesn't get rid of
flags that it should. One of these is if it should match /i or not.
Currently it always (perhaps not quite, I don't know) assumes that it
should match under /i, yielding false positives and slowing things down.
But a recent commit changed the flag that tells it to do this, so that it
only gets set if /l is also specified. There is already existing code to
work around the optimizer deficiency for /l. This commit just moves the
/i flag handling to that existing code, so it won't get invoked unless
/l is specified.
Karl Williamson [Sat, 13 Oct 2012 16:00:18 +0000 (10:00 -0600)]
regexp.t: Add 'no warnings "utf8";
This .t works fine unless there are failures that it tries to output,
and the handle hasn't been opened using utf8. Because we aren't sure if
that operation works, just turn off warnings.
Karl Williamson [Sat, 13 Oct 2012 15:52:42 +0000 (09:52 -0600)]
utf8.h: Correct some values for EBCDIC
It occurred to me that EBCDIC has different maximums for the number of
bytes a character can occupy. This moves the definition in utf8.h to
within an #ifndef EBCDIC, and adds the correct values to utfebcdic.h
Karl Williamson [Sat, 13 Oct 2012 15:20:11 +0000 (09:20 -0600)]
regex: White-space, comment only; no code changes
This outdents code that just had its containing block removed, and
reflows its comments to fill 79 columns; and does some other white space
adjustments, plus a typo in a comment.
Karl Williamson [Sat, 13 Oct 2012 15:15:37 +0000 (09:15 -0600)]
regex: Rename macro to reflect its narrowed use
This macro is now only used under locale; its other use has now been
removed. Change the name to reflect its only use.
Karl Williamson [Sat, 13 Oct 2012 15:07:05 +0000 (09:07 -0600)]
regex: Splice out no longer used array element
A recent commit removed all uses of an array element in the middle of an
array. This moves up the elements that followed it.
Karl Williamson [Sat, 13 Oct 2012 14:49:26 +0000 (08:49 -0600)]
regex: Remove old code that tried to handle multi-char folds
A recent commit has changed the algorithm used to handle multi-character
folding in bracketed character classes. The old code is no longer
needed.
Karl Williamson [Fri, 12 Oct 2012 17:42:38 +0000 (11:42 -0600)]
regcomp.c: Fix-up indentaion; no code changes
Indent a newly-formed block
Karl Williamson [Fri, 12 Oct 2012 03:49:31 +0000 (21:49 -0600)]
PATCH: [perl #89774] multi-char fold + its fold in char class
The design for handling characters that fold to multiple characters when
the former are encountered in a bracketed character class is defective.
The ticket reads, "If a bracketed character class includes a character
that has a multi-char fold, and it also includes the first character of
that fold, the multi-char fold will never be matched; just the first
character of the fold.". Thus, in the class /[\0-\xff]/i, \xDF will
never be matched, because its fold is 'ss', the first character of
which, 's', is also in the class.
The reason the design is defective is that it doesn't allow for
backtracking and trying the other options.
This commit solves this by effectively rewriting the above to be
/ (?: \xdf | [\0-\xde\xe0-\xff] ) /xi. And so the backtracking gets
handled automatcially by the regex engine.
Karl Williamson [Fri, 12 Oct 2012 17:24:34 +0000 (11:24 -0600)]
regen/mk_invlists.pl: Make list for multi-fold chars
This causes charclass_invlists.h to have a new list of all the
characters whose fold is a sequence of more than one character.
Karl Williamson [Fri, 12 Oct 2012 15:10:10 +0000 (09:10 -0600)]
mktables: Add table for chars with multi-char fold
This will be used in a later commit
Karl Williamson [Sat, 13 Oct 2012 14:31:29 +0000 (08:31 -0600)]
regcomp.c: Rename a macro, fix-up comments
This very recently introduced macro's name could be clearer, and it can
be used in another place, and the comment concerning that is slightly
inaccurate.
Dominic Hargreaves [Sat, 13 Oct 2012 10:06:48 +0000 (11:06 +0100)]
Link to 5.14.3 announcement
Father Chrysostomos [Sat, 13 Oct 2012 05:29:04 +0000 (22:29 -0700)]
Handle cow $_ in @INC filter
Setting $_ to a copy-on-write scalar in an @INC filter causes the
parser to modify every other scalar sharing the same string buffer.
It needs to be forced to a regular scalar before the parser sees it.
Father Chrysostomos [Sat, 13 Oct 2012 04:59:47 +0000 (21:59 -0700)]
Allow COW copies in aassign
When the ‘no common vars’ optimisation is not active, list assignment
does not allow COW copies (unless assigning to an empty hash or
array). It has been this way since
61e5f455dc. The recent addition
of sv_mortalcopy_flags gives us an easy way to fix this.
A certain test in tr.t was marked TODO if not given a COW. This test
used to pass before
61e5f455dc, but after than becaming a failing TODO
test. It makes sense to test that we do have a COW instead of having
a conditional TODO.
Dominic Hargreaves [Fri, 12 Oct 2012 22:42:07 +0000 (23:42 +0100)]
Add references to perl5143delta
Dominic Hargreaves [Fri, 12 Oct 2012 22:39:22 +0000 (23:39 +0100)]
Add the 5.14.3 perldelta
Dominic Hargreaves [Thu, 11 Oct 2012 18:56:22 +0000 (19:56 +0100)]
Add 5.14.3 to perlhist
Dominic Hargreaves [Fri, 12 Oct 2012 21:57:16 +0000 (22:57 +0100)]
add 5.14.3 epigraph
Chris 'BinGOs' Williams [Fri, 12 Oct 2012 21:24:16 +0000 (22:24 +0100)]
Update Module-CoreList Changes file
Chris 'BinGOs' Williams [Fri, 12 Oct 2012 09:26:38 +0000 (10:26 +0100)]
Add v5.14.3 data to Module::CoreList and bump to 2.74
Father Chrysostomos [Fri, 12 Oct 2012 07:08:14 +0000 (00:08 -0700)]
toke.c: Rewrite bogus yylex comment
This comment has been wrong since it was first added. What it des-
cribed was then the code at the beginning of yylex, before the main
tokenizing switch.
Since then, two parts of what it described have moved elsewhere; the
pending identifier code to S_pending_ident, and the sort $a<=>$b check
to op.c:S_simplify_sort.
Father Chrysostomos [Fri, 12 Oct 2012 06:40:09 +0000 (23:40 -0700)]
perlreapi.pod: Consistent spaces after dots
Father Chrysostomos [Fri, 12 Oct 2012 06:31:00 +0000 (23:31 -0700)]
perlreapi.pod: Document RXf_MODIFIES_VARS
Father Chrysostomos [Fri, 12 Oct 2012 06:27:37 +0000 (23:27 -0700)]
perlreapi.pod: Update RXf_SKIPWHITE section
Father Chrysostomos [Fri, 12 Oct 2012 06:26:21 +0000 (23:26 -0700)]
perlreapi.pod: Update RXf_SPLIT section
Father Chrysostomos [Fri, 12 Oct 2012 05:51:44 +0000 (22:51 -0700)]
Disable const repl optimisation for empty pattern
s//$a/ cannot assume that the $a expression is going to return the
same value at each iteration, because the last-used pattern may con-
tain code blocks that clobber *a.
Father Chrysostomos [Fri, 12 Oct 2012 03:26:23 +0000 (20:26 -0700)]
defins.t: Suppress uninit warning
and make the no-warnings test pass.
Brad Gilbert [Tue, 9 Oct 2012 19:24:38 +0000 (14:24 -0500)]
Move tests from t/op/while_readdir.t to t/op/defins.t
It turns out that some of what t/op/while_readdir.t was testing
was also tested by t/op/defins.t
Father Chrysostomos [Fri, 12 Oct 2012 03:22:08 +0000 (20:22 -0700)]
Use const repl optimisation with s///e where possible
In those cases where s///e contains a single variable or a sequence
that is folded to a const op, we can do away with substcont.
PMf_EVAL means that there was an /e. But we don’t actually need to
check that; instead we can just examine the op tree, which we have to
do anyway.
The op tree that s//$x/e and s//"constant"/e compile down to have a
null (a do-block) containing a scope op (block with a single state-
ment, as opposed to op_leave which represents multiple statements)
containing a null followed by the constant or variable.
Father Chrysostomos [Fri, 12 Oct 2012 02:55:20 +0000 (19:55 -0700)]
perl5180delta: B::Generate is fixed
Father Chrysostomos [Thu, 11 Oct 2012 21:38:31 +0000 (14:38 -0700)]
[perl #49190] Don’t prematurely optimise s/foo/bar$baz/
$baz could be aliased to a package variable, so we do need to recon-
catenate for every iteration of s///g. For s/// without /g, only one
more op will be executed, so the speed difference is negligible.
The only cases we can optimise in terms of skipping the evaluation of
the ops on the rhs (by eliminating the substconst op) are s//constant/
and s//$single_variable/. Anything more complicated causes bugs.
A recent commit made s/foo/$bar/g re-stringify $bar for each iteration
(though without having to reevaluate the ops that return $bar). So we
no longer have to special-case match vars at compile time.
This means that s/foo/bar$baz/g will be slower (and less buggy), but
s/foo/$1/g will be faster.
This also caused an existing taint but in pp_subst to surface. If
get-magic turns off taint on a replacement string, it should not be
considered tainted. So the taint check on the replacement should come
*after* the stringification. This applies to the constant replacement
optimisation. pp_substcont was already doing this correctly.
Father Chrysostomos [Fri, 12 Oct 2012 01:01:40 +0000 (18:01 -0700)]
Don’t taint return value of s///e based on replacement
According to the comments about how taint works above pp_subst in
pp_hot.c, the return value of s/// should not be tainted based on
the taintedness of the replacement. That makes sense, because the
replacement does not affect how many iterations there were. (The
return value is the number of iterations).
It only applies, however, to the cases where the ‘constant replace-
ment’ optimisation applies.
That means /e taints its return value:
$ perl5.16.0 -MDevel::Peek -Te '$_ = "abcd"; $x = s//$^X/; Dump $x'
SV = PVMG(0x822ff4) at 0x824dc0
REFCNT = 1
FLAGS = (pIOK)
IV = 1
NV = 0
PV = 0
$ perl5.16.0 -MDevel::Peek -Te '$_ = "abcd"; $x = s//$^X/e; Dump $x'
SV = PVMG(0x823010) at 0x824dc0
REFCNT = 1
FLAGS = (GMG,SMG,pIOK)
IV = 1
NV = 0
PV = 0
MAGIC = 0x201940
MG_VIRTUAL = &PL_vtbl_taint
MG_TYPE = PERL_MAGIC_taint(t)
MG_LEN = 1
The number pushed on to the stack was becoming tainted due to the set-
ting of PL_tainted. PL_tainted is assigned to and the return value
explicitly tainted if appropriate shortly after the mPUSHi (which
implies sv_setiv, which taints when PL_tainted is true), so setting
PL_tainted to 0 just before the mPUSHi is safe.
Father Chrysostomos [Thu, 11 Oct 2012 21:05:29 +0000 (14:05 -0700)]
Remove PMf_MAYBE_CONST
It was added in
ce862d02d but has never been used.
Father Chrysostomos [Thu, 11 Oct 2012 09:03:35 +0000 (02:03 -0700)]
[perl #49190] Stringify repl repeatedly in s///g
pm_runtime in op.c examines the rhs of s/// to see whether it is safe
to execute that set of ops just once. If it sees a match var or an
expression with side effects, it creates a pp_substcont op, which
results in the rhs being executed multiple times.
If the rhs seems constant enough, pp_subst does the substitution in a
tight loop.
This unfortunately causes s/a/$a/ to fail if *a has been aliased to
*1. Furthermore, $REGMARK and $REGERROR did not count as match vars.
pp_subst actually has two separate loops. One of them modifies the
target in place. The other appends to a new scalar and then copies it
back to the target. The first loop is used if it seems safe.
This commit makes $REGMARK, $REGERROR and aliases to match vars work=
when the replacement consists solely of the variable.
It does this by setting PL_curpm before stringifying the replacement,
so that $1 et al. see the right pattern. It also stringifies the
variable for each iteration of the second loop, so that $1 and
$REGMARK update.
The first loop, which requires the rhs to be constant, is skipped if
the regexp contains the special backtracking control verbs that mod-
ify $REGMARK and $REGERROR.
Father Chrysostomos [Thu, 11 Oct 2012 15:37:44 +0000 (08:37 -0700)]
RXf_MODIFIES_VARS
regcomp.c sets this new flag whenever regops that could modify
$REGMARK or $REGERROR have been seen. pp_subst will use this
to tell whether it should repeatedly stringify the replacement.
Father Chrysostomos [Thu, 11 Oct 2012 16:27:18 +0000 (09:27 -0700)]
Define RXf_SPLIT and RXf_SKIPWHITE as 0
They are on longer used in core, and we need room for more flags.
The only CPAN modules that use them check whether RXf_SPLIT is set
(which no longer happens) before setting RXf_SKIPWHITE (which is
ignored).
Father Chrysostomos [Thu, 11 Oct 2012 07:54:56 +0000 (00:54 -0700)]
pp_hot.c:pp_subst: add comment
Father Chrysostomos [Thu, 11 Oct 2012 07:24:18 +0000 (00:24 -0700)]
Simplify the fix for bug #41530
We don’t need to upgrade the target string and redo the pattern match
if the replacement is in utf8. We can simply convert during concate-
nation, using the more recently added SV_CATUTF8 and SV_CATBYTES flags
to sv_catpvn_flags.
This should make things faster, too, as sv_catpvn_flags does not need
to allocate extra SVs or string buffers.
This happened to trigger an existing COW bug, causing test failures.
SvIsCOW and sv_force_normal_flags were being called on TARG before
get-magic. So a magical scalar returning a COW could have that COW
modified in place.
I added a test for something I nearly broke.
Karl Williamson [Fri, 12 Oct 2012 04:04:12 +0000 (22:04 -0600)]
perldelta for [perl #114982]
Karl Williamson [Fri, 12 Oct 2012 02:43:47 +0000 (20:43 -0600)]
regcomp.c: Use more precise definition of folding chars
Previously, in the Latin1 range, whether a character is alphabetic or
not has served as a surrogate for if the character participates in
folds, as it is a superset of the folding class, with two characters
which are alpha but not folding: the masculine and feminine ordinal
indicators. But we have plenty of bits available in the bit array for
Latin1 character classifications, so this commit makes the definition
precise.
Karl Williamson [Fri, 12 Oct 2012 02:40:09 +0000 (20:40 -0600)]
handy.h: Add macro which returns if a char is folding
This adds a macro for regcomp.c to use to determine if a Latin1 range
character participates in any folds
Karl Williamson [Fri, 12 Oct 2012 02:25:04 +0000 (20:25 -0600)]
regen/mk_PL_charclass.pl: Add bit for if character folds
This takes the existing mktables-generated table that lists all
characters that participate in any way in a fold, and creates a bit for
it in l1_char_class_tab.h
Karl Williamson [Thu, 11 Oct 2012 18:15:53 +0000 (12:15 -0600)]
regcomp.c: Optimize EXACTFish nodes without folds to EXACT
Often, case folding will be applied to the entire regular expression
(such as by using "/i"), but there will be components in it that are the
same, folded or not. These components could be represented as EXACT
nodes with no loss of information. The regex optimizer is then able to
apply more optimizations to them than it could otherwise, and pattern
matching will execute faster.
This commit turns any EXACTFish node (except those under locale rules,
whose folding rules are not known until runtime)) that contains entirely
unfoldable characters into the equivalent EXACT node.
This optimization brings up the idea of possibly splitting an EXACTFish
node that contains a sufficiently long contiguous string of non-folding
characters into the portions that have folding and the portions that
don't. That might or might not be beneficial; I'm not undertaking the
experiments to check that out.
Karl Williamson [Thu, 11 Oct 2012 20:56:27 +0000 (14:56 -0600)]
regexec.c: Fix EXACT node handling in regrepeat()
Commit
b40a2c17551b484a78122be98db5dc06bb4614d5 introduced a bug in
handling EXACT nodes when the pattern is in UTF-8. This cleans that up.
Steve Hay [Thu, 11 Oct 2012 12:08:45 +0000 (13:08 +0100)]
Fix indentation after
a19baa613c
Daniel Dragan [Thu, 11 Oct 2012 05:47:19 +0000 (01:47 -0400)]
stop Win32 VC miniperl from exporting functions
miniperl.exe does not load XS modules. It has no reason to export anything.
About 130 things are exported by VC Win32 miniperl. 90% of them are
the win32_* functions. All but a couple Perl_* exports are gone in the
exporting miniperl. See perl #115216 for the full list of accidentally
exported items. Also stop trying to find Win32CORE's boot function in
Perl_init_os_extras through the export table. It is not exported and not
in the miniperl image and GetProcAddress will never return not NULL. By
removing this GetProcAddress call, miniperl stops importing GetProcAddress
from kernel32 and a tiny bit startup time by miniperl during the full
perl build process. By removing the exports the compiler is free to use
more random (not cdecl) calling conventions and/or optimizing away code
than before. Also by removing the export entries, and the GetProcAddress
import, RO strings are removed from the miniperl image. This commit only
affects the VC miniperl. The Mingw miniperl remains unmodified except
for not trying to load Win32CORE through the export table and some of the
.c files being compiled with PERL_IS_MINIPERL when previously they were
not.
Daniel Dragan [Wed, 10 Oct 2012 21:23:27 +0000 (17:23 -0400)]
fix C++ builds broken by
cdc4a174060
In commit
cdc4a174060 static noreturn function, on a C++ build, (specific
example, GCC ) got a post preprocessor prototype of
"extern "C" static void S_fn_doesnt_return(". GCC generates a compile error
if "extern "C"" and static used together. Plain C build were not affected.
This commit fixed the problem by creating 2 new static exclusive macros, so
extern "C" does not wind up on statics in a C++ build. The macros allow
enough flexibility so any compiler/platform that needs a noreturn
declaration specifier instead of a noreturn function attribute can have
one.
Father Chrysostomos [Wed, 10 Oct 2012 20:14:31 +0000 (13:14 -0700)]
[perl #114658] Fix line numbers at the end of string eval
$ perl -e 'eval "{;"; print $@'
Missing right curly or square bracket at (eval 1) line 1, at end of line
syntax error at (eval 1) line 1, at EOF
$ perl -e 'eval "{"; print $@'
Missing right curly or square bracket at (eval 1) line 2, at end of line
syntax error at (eval 1) line 2, at EOF
Notice how the line number goes up when there is no semicolon.
What happens is that eval tacks "\n;" on to the end of the string if
it does not already end with a semicolon.
I actually changed this in blead in commit
11076590 to tack "\n;"
on to the end all the time, to make eval "q;;" and
eval "return #comment;" work.
This caused the line number to increase for eval "{;".
This commit fixes both examples above by modifying S_incline to
account for the "\n;" at the end of a string eval.
Existing tests had to be modified, as they were testing for the wrong
line number.
Steve Hay [Wed, 10 Oct 2012 17:27:07 +0000 (18:27 +0100)]
Add extern "C" to definitions of four win32_ functions
This makes them match their declarations in perlhost.h, which fixes linker
errors when linking perl5XX.dll in a C++ build with VC.
David Mitchell [Wed, 10 Oct 2012 15:39:43 +0000 (16:39 +0100)]
[MERGE] revamp dist/B-Deparse/t/core.t
rewrite core.t to be a comprehensive test of most keywords
along with various lengths of args of the form foo($a, $b, $c),
where $a etc may or may not be lexical.
(Previously this test file only checked that CORE::foo deparsed as
CORE::foo when sub foo was defined: it generated args, but didn't check
how they were deparsed).
This branch also includes a number of fixups to Deparse.pm that were found
while working on core.t. Note there are a few places in core.t marked
with 'XXX' where there is some dodgy stuff going on that core.t just
ignores for now.
David Mitchell [Wed, 10 Oct 2012 15:32:52 +0000 (16:32 +0100)]
Deparse/t/core.t: add support for lex vars
Enlarge the testing regime: before, for each op it tested
foo($a,$b,$c,...)
now it also does
foo(my $a,$b,$c,...)
my ($a,$b,$c,...); foo($a,$b,$c,...)
David Mitchell [Wed, 10 Oct 2012 11:28:38 +0000 (12:28 +0100)]
overhaul dist/B-Deparse/t/core.t
Originally, this test file just checked that CORE::foo got correctly
deparsed as CORE::foo, hence the name. This commit expands it
to fully test both CORE:: verses none, plus that any arguments
are correctly deparsed. It tests many more keywords, and it also
cross-checks against regen/keywords.pl to make sure we've tested all
keywords, and with the correct strength.
(There is very little of the original file left.)
David Mitchell [Wed, 10 Oct 2012 10:32:57 +0000 (11:32 +0100)]
fix deparsing of select(F)
Because select doesn't have a prototype (it's really two different functions
with the same name), the code that handles "first arg as filename" was
skipping select(F). This meant that 'select $fh' was being deparsed as
'select *$fh'.
Make select behave the same as open etc.
(There's still an issue that 'select/open *$fh' is deparsed as
'select/open $fh')
David Mitchell [Tue, 9 Oct 2012 16:12:09 +0000 (17:12 +0100)]
Deparse: handle system/exec prog arg,arg,,..
Deparse wasn't handling the form of system and exec where
the extra first arg (without comma) gave the program name.
These now deparse ok, without an additional comma:
system $prog $arg1,$arg2;
exec $prog $arg1,$arg2;
David Mitchell [Tue, 9 Oct 2012 15:49:26 +0000 (16:49 +0100)]
Deparse: grep($a) became grep($a,); ditto map
Not fatal, but ugly,and messed up the test format I'm currently
working on in core.t
David Mitchell [Tue, 9 Oct 2012 14:50:12 +0000 (15:50 +0100)]
Deparse: handle some strong keywords better
In general, a strong keyword 'foo' will get deparsed as plain 'foo'
rather than 'CORE::foo', even in the presence of a sub foo{}.
However, these weren't:
glob
pos
prototype
scalar
study
undef
This was due to them having prototypes.
David Mitchell [Tue, 9 Oct 2012 10:59:37 +0000 (11:59 +0100)]
Deparse crashed on argless sort()
This would crash
@a = sort;
(Test will come in a separate commit)
Father Chrysostomos [Wed, 10 Oct 2012 03:47:18 +0000 (20:47 -0700)]
[perl #115206] Don’t crash when vivifying $|
It was trying to read the currently-selected handle without checking
whether it was selected. It is actually not necessary to initialise
the variable this way, as the next use of get-magic on it will clobber
the cached value.
This initialisation was originally added in commit
d8ce0c9a45. The
bug it was fixing was probably caused by missing FETCH calls that are
no longer missing.
Father Chrysostomos [Wed, 10 Oct 2012 03:43:00 +0000 (20:43 -0700)]
Test perl #4760
Steve Hay [Wed, 10 Oct 2012 13:01:43 +0000 (14:01 +0100)]
Add $(EXTRACFLAGS) to $(CFLAGS) for MinGW/gcc build on Windows
This is useful if anything is ever put into $(EXTRACFLAGS) (e.g. I'm
currently experimenting with optionally putting $(CXX_FLAG) into it for a
C++ build), and is already done in the VC case (here, and in
win32/Makefile).
Steve Hay [Wed, 10 Oct 2012 12:56:36 +0000 (13:56 +0100)]
Fix VC compilation of universal.c as C++ following commit
613875e219
Steve Hay [Wed, 10 Oct 2012 12:48:00 +0000 (13:48 +0100)]
Minor tidy-ups from
624a1c42c1
Fix indents and update comments to reflect the fact that the compiler's
msvcr*.dll is now used rather than loading msvcrt.dll too.
Daniel Dragan [Tue, 9 Oct 2012 09:15:55 +0000 (05:15 -0400)]
clean up vmem.h, remove unused instrumentation hooks
Removed virtual. Removed dyn loading msvcrt.dll and function pointers.
Replaced with Compiler's native CRT's malloc. Moved the CS parts
into _USE_LINKED_LIST blocks. There is nothing to protect if we aren't
putting headers on. Faster startup time is the result of this commit.
Before .text be8df .rdata 21171, after .text be88f .rdata 21121. I did
turn off _USE_LINKED_LIST as an experiment, it compiled successfully and
passed the /t/op/*.t tests (only ones I performed). I did not try the Knuth
stuff. See also this msg by Jan Dubois
https://rt.perl.org/rt3/Ticket/Display.html?id=88840#txn-1144384
Steve Hay [Wed, 10 Oct 2012 08:07:18 +0000 (09:07 +0100)]
Fixes to enable building win32 files as C++ with VC
Mostly providing explicit casts where required by VC with /TP option,
plus one renamed variable (can't have a variable called 'new' in C++).
Daniel Dragan [Tue, 9 Oct 2012 23:00:24 +0000 (19:00 -0400)]
add const to Perl_boot_core_UNIVERSAL's xsub registration struct
Move struct xsub_details details to RO memory from RW memory. This increases
the amount of bytes of the image that can be shared between Perl processes
by the OS. The inverse is each perl process takes less process specific
memory. I saw no change in .text, .rdata went from 0x21121 to 0x21391
(+0x270), .data went from 0x3b18 to 0x38b8 (-0x260). 32 bit Visual C.
Father Chrysostomos [Tue, 9 Oct 2012 20:34:54 +0000 (13:34 -0700)]
[perl #26986] Skip subst const repl optimisation for logops
pm_runtime iterates through the ops that make up the replacement part
of s///, to see whether the ops on the rhs can have side effects or
contain match vars (in which case they must only be evaluted after the
pattern). If they do not have side-effects, the rhs is presumed to be
constant and evaluated first, and then pp_subst hangs on to the return
value and reuses it in each iteration of s///g.
This iteration simply follows op_next pointers. Logops are not that
simple, so it is possible to hide match vars inside them, resulting in
incorrect optimisations:
"g" =~ /(.)/;
@l{'a'..'z'} = 'a'..'z';
$_ = "hello";
s/(.)/$l{$a||$1}/g;
print;
__END__
ggggg
This commit skips the optimisation whenever a logop is present.
This does not fix all the optimisation problems. See ticket #49190.
Daniel Dragan [Mon, 8 Oct 2012 06:13:22 +0000 (02:13 -0400)]
have embed.pl add PERL_CALLCONV_NO_RET to noreturn statics
In commit
12a2785c7e8 PERL_CALLCONV_NO_RET was added to allow MS Visual C's
noreturn to work. In that commit, statics did not get a PERL_CALLCONV_NO_RET
so Visual C may not always figure out that a certain static is a noreturn.
This patch fixes that and allows statics to be Visual C noreturns. I
observed a drop in the .text section from 0xBEAAF to 0xBE8CF on no
DEBUGGING 32 bit VC 2003 -01 -GL/-LTCG after applying this.
Karl Williamson [Tue, 9 Oct 2012 17:07:18 +0000 (11:07 -0600)]
perlreapi.pod: grammar and other nits
Karl Williamson [Sat, 6 Oct 2012 21:06:53 +0000 (15:06 -0600)]
regexec.c: White-space only; comment only; no code changes
Karl Williamson [Sun, 7 Oct 2012 20:08:42 +0000 (14:08 -0600)]
regexec.c: Refactor slightly for clarity
This reverses the sense of an if...else, so that the tiny trivial code
is done after the 'if', and the larger, non-trivial part is done in the
else. This makes it easier to understand. It also is clear that the
label and goto are no longer needed, if they ever were.
Karl Williamson [Sat, 6 Oct 2012 21:08:19 +0000 (15:08 -0600)]
utf8.c: Remove an unnecessary conditional
The unconditional array lookup is clearer, and I suspect faster than
testing for a special case.
Karl Williamson [Thu, 4 Oct 2012 04:17:19 +0000 (22:17 -0600)]
PATCH: [perl #114982]: case-insensitive regex bug with UTF8-flagged strings
It turns out that this whole area in regexec.c has never worked
properly, we just didn't have test coverage for it. Indeed, the portion
that deals with CURLYM had never been updated to include UTF-8!
The failure rate for the new tests added by this commit on the blead
that existed just prior to this commit was 96%. (4% pass)
This commit refactors the part of regexec.c that deals with quantifiers
and case insensitive matching, and adds comprehensive tests.
Consider a regex of the form A*B. How do we know how many of A* to
match? The answer is we don't, until we try B. But if B begins with
text, we can easily rule out places where the next thing isn't a B.
Only if the next character (following where we are) is the first
character of B, is this a potential match; otherwise it isn't, so there
is no need to even try B.
Code existed to do this, but it was wrong in many ways. If B can match
case-insensitively under /i, then there may be two or more possible
characters that could begin B. The previous code assumed there was a
max of two. It didn't adequately account for the differences between
/a, /d, /l, and /u; and as I mentioned above, in one place it didn't
know that there was such a thing as UTF-8. Also, it assumed that all
code points can fit into an I32.
To do this right takes quite a bit more intelligence then was there;
and it needs to be done in two different places. So, this commit
extracts out the code into a single subroutine, heavily modified to
account for the various oversights that were there previously.
The pre-existing infrastructure of fold_grind.t allowed the tests to be
added easily. I did not add tests for the above-I32 problems, as it
turns out that there are other bugs on these, [perl #115166]
Karl Williamson [Sat, 6 Oct 2012 21:03:06 +0000 (15:03 -0600)]
Allow _swash_inversion_hash() to be called in regexec.c
To prevent this very-internal core function from being used by XS
writers, it isn't defined except if the preprocessor indicates it is
compiling certain .c files. Add regexec.c to the list
Karl Williamson [Sat, 6 Oct 2012 20:57:38 +0000 (14:57 -0600)]
regex: Allow any single char to be SIMPLE
This commit relaxes the previous requirement that an EXACTish node must
contain a single Latin1-range character in order to be considered
SIMPLE. Now it allows any single character, not just Latin1.
This allows above-Unicode characters to be in optimizations like STAR or
CURLY, instead of having to match with the more complex CURLYM; and it
brings EXACTish nodes in alignment with other SIMPLE nodes, such as
those matching \w or the dot metacharacter, which all along have
supported any code point being SIMPLE
Karl Williamson [Sat, 6 Oct 2012 17:21:02 +0000 (11:21 -0600)]
regcomp.c: Slightly relax restriction of SIMPLE nodes
Currently all EXACTish nodes that are SIMPLE must be a single UTF-8
invariant character. It turns out that the code works not just for
these, but for all Latin1 characters (when the pattern isn't UTF-8)
except the SHARP S under /d folding.
SIMPLE nodes allow for better optimization possibilities, such as CURLY
instead of CURLYM.
There is still a discrepancy in that non-EXACTish nodes that match a
single character, such as the dot (SANY), can be SIMPLE, but EXACTish
nodes have to be just a single byte.
Karl Williamson [Thu, 4 Oct 2012 17:01:57 +0000 (11:01 -0600)]
regexec.c: Turn test into an assertion
Commit
31c15ce5372b770c3ca899df6cf102f1ed6866ba should have made it so
that the situation of a quantifier {m,n} with m>n never happens. Remove
the check for it, but replace it with an assertion.
Karl Williamson [Thu, 4 Oct 2012 04:24:21 +0000 (22:24 -0600)]
regexec.c: White-space only
Add some spacing for clarity.
Karl Williamson [Thu, 4 Oct 2012 01:42:39 +0000 (19:42 -0600)]
regexec.c: indent properly and reflow some comments to 80 cols
Karl Williamson [Wed, 3 Oct 2012 22:35:00 +0000 (16:35 -0600)]
regcomp.c: Refactor join_exact() to handle all multi-char folds
join_exact() prior to this commit returned a delta for 3 problematic
sequences showing that the minimum length they match is less than their
nominal length. It turns out that this is needed for all
multi-character fold sequences; our test suite just did not have the
tests in it to show that. Tests that do show this will be added in a
future commit, but code elsewhere must be fixed before they pass.
regcomp.c
Karl Williamson [Wed, 3 Oct 2012 20:42:51 +0000 (14:42 -0600)]
regen/regcharclass.pl: Generate macros for multi-char fold sequences
These will be used in future commits
Karl Williamson [Wed, 3 Oct 2012 03:48:26 +0000 (21:48 -0600)]
Add regen/regcharclass_multi_char_folds.pl
This takes as input the current Unicode character data base, and outputs
lists of the multi-character folds in it, in a form suitable for input
to regen/regcharclass.pl
Karl Williamson [Wed, 3 Oct 2012 03:45:45 +0000 (21:45 -0600)]
regen/regcharclass.pl: Simplify regex
There doesn't need to be a quantifier or capturing on this regex.
Karl Williamson [Wed, 3 Oct 2012 03:42:59 +0000 (21:42 -0600)]
regen/regcharclass.pl: Add ability for more complex inputs
This adds the capability to get input to this program from another
program, thus allowing essentially arbitrary input.
This will be used in future commits.
Karl Williamson [Sun, 30 Sep 2012 15:41:51 +0000 (09:41 -0600)]
regcomp.c: min len is chars, not bytes
The traditionally-called tricky folds occur because, under /i, a
6-byte/3-character sequence can match a 2-byte/1-character sequence.
The code here has assumed that the delta quantity is measured in bytes
(6-2=4), whereas everywhere else (AFAICT), assumes the measure is to be
in characters (3-2=1).
Father Chrysostomos [Tue, 9 Oct 2012 06:05:04 +0000 (23:05 -0700)]
[perl #114658] perl5180delta: Mention B::Hooks::Parser