Rafael Garcia-Suarez [Sat, 27 Nov 2010 14:47:44 +0000 (15:47 +0100)]
Fix a warning (that spotted a potential mro bug that I could not produce)
Craig A. Berry [Sat, 27 Nov 2010 00:45:24 +0000 (18:45 -0600)]
Skip multi-arg piped open in autodie test on VMS.
Awaiting upstream integration at:
https://rt.cpan.org/Ticket/Display.html?id=59123
Craig A. Berry [Sat, 27 Nov 2010 00:33:54 +0000 (18:33 -0600)]
Skip t/porting/FindExt.t on VMS.
win32::FindExt doesn't work on VMS, but it's only intended to work
on Windows, so there's not much reason to port it or to test it.
Craig A. Berry [Sat, 27 Nov 2010 00:29:42 +0000 (18:29 -0600)]
Fix Time::HiRes probes on VMS.
We have not been correctly building this on VMS since the location
was changed in core because we have not been able to loate include
files. This mirrors the upstream patch at:
https://rt.cpan.org/Ticket/Display.html?id=63363
Father Chrysostomos [Fri, 26 Nov 2010 23:52:54 +0000 (15:52 -0800)]
[perl #78908] Reinstate mod for one more stable release
Chris 'BinGOs' Williams [Fri, 26 Nov 2010 23:24:22 +0000 (23:24 +0000)]
Update MIME-Base64 to CPAN version 3.13
[DELTA]
2010-10-26 Gisle Aas <gisle@ActiveState.com>
Release 3.13
The fix in v3.12 to try to preserve the SvUTF8 flag was buggy
and actually managed to set the flag on strings that did not
have it originally.
This resolves the test failures for Encode::Encoder
Father Chrysostomos [Fri, 26 Nov 2010 22:51:07 +0000 (14:51 -0800)]
Clarify op_lvalue’s docs
as requested by Reini Urban [perl #78908]
Father Chrysostomos [Fri, 26 Nov 2010 22:24:30 +0000 (14:24 -0800)]
[perl #78810] PERLDB_NOOPT ignored by adjacent nextstate optimisation
As mentioned in the RT ticket, ac56e7d did not take PERLDB_NOOPT
into account.
Rafael Garcia-Suarez [Fri, 26 Nov 2010 22:30:06 +0000 (23:30 +0100)]
No need to nest printfs. DIE() takes format strings.
Father Chrysostomos [Fri, 26 Nov 2010 22:00:56 +0000 (14:00 -0800)]
Fix compiler warning
Father Chrysostomos [Fri, 26 Nov 2010 20:44:43 +0000 (12:44 -0800)]
Avoid redundant hv_delete call in pp_entereval
This commit just avoids a redundant hv_delete call if leave_scope is
already going to do it.
Father Chrysostomos [Fri, 26 Nov 2010 15:22:47 +0000 (07:22 -0800)]
[perl #78634] Conflict in defining constant INIT
When gv_init tries to turn a constant named INIT into a CV, the auto-
matic special processing of the INIT ‘block’ kicks in, which removes
the CV from the GV.
This should not happen with gv_init, as $::{INIT} = \5 is supposed
to be equivalent to *INIT = sub(){5}, which does not do that.
This commit makes gv_init check for that, increase the reference
count, and reassign the CV. It does not stop the CV from being called
as a special block, but it is harmless to call a constant CV.
David Mitchell [Mon, 22 Nov 2010 19:18:49 +0000 (19:18 +0000)]
Make PerlIO marginally reentrant
Currently if an operation on a file handle is interrupted, and if
the signal handler accesses that same file handle (e.g. closes it),
then perl will crash. See [perl #75556].
This commit provides some basic infrastructure to avoid segfaults.
Basically it adds a lock count field to each handle (by re-purposing the
unused flags field in the PL_perlio array), then each time a signal
handler is called, the count is incremented. Then various parts of PerlIO
use a positive count to change behaviour. Most importantly, when layers
are popped, the PerlIOl structure is cleared, but not freed, and is left
in the chain of layers. This means that callers still holding pointers to
the various layers won't access freed structures. It does however mean
that PerlIOl structs may be leaked, and possibly slots in PL_perlio. But
this is better than crashing.
Not much has been done to give sensible behaviour on re-entrancy; for
example, a buffer that has already been written once might get written
again. Fixing this sort of thing would require a large-scale audit of
perlio.c.
David Mitchell [Fri, 19 Nov 2010 17:23:17 +0000 (17:23 +0000)]
perlio: always guard against null function table
In some places it already checks for a null tab field; extend that
coverage. This is in preparation for a commit which may leave active
layers with a null tab field.
David Mitchell [Wed, 17 Nov 2010 16:29:04 +0000 (16:29 +0000)]
add PerlIO_init_table() to initialise PL_perio
Previously, the PL_perio table was initialised by calling PerlIO_allocate,
and throwing away the result. Since a slot with a null ->next was regarded
as freed, the next call to PerlIO_allocate would reuse that slot, which is
important, as STDIN etc are expected to occupy slots 1,2,3.
Once reference counting of the slots is introduced, however, the first
slot will leak, and STDIN etc will be assigned to the wrong slots. So do it
properly now.
David Mitchell [Tue, 16 Nov 2010 22:44:34 +0000 (22:44 +0000)]
add 'head' field to PerlIOl struct
This allows any layer to find the top of the layer stack,
or more specifically, the entry in PL_perlio that points to
the top.
Needed for the next commit, which will implement a reference counting
scheme.
There's currently a bug in MakeMaker which causes several extensions to
miss the dependency on perliol.h having changed, so this commit includes
a gratuitous whitespace change to perl.h to hopefully force recompilation.
David Mitchell [Mon, 15 Nov 2010 17:06:37 +0000 (17:06 +0000)]
make PL_perlio an array of PerlIOl, not PerlIO *
Layers in PerlIO are implemented as a linked list of PerlIOl structs;
eaxch one has a 'next' field pointing to the next layer. Now here's the
clever bit: When PerlIO* pointers are passed around to refer to a
particular handle, these are actually pointers to the 'next' field of the
*parent* layer (so to access the flags field say of a PerlIOl, you have to
double-defref it, e.g. (*f)->flags). The big advantage of this is that
it's easy for a layer to pop itself; when you call PerlIO_pop(f), f is a
pointer to the parent's 'next' field, so pop(f) can just do
*f = (*f)->next.
This means that there has to be a fake 'next' field above the topmost
layer. This is where PL_perlio comes in: it's a pointer to an arena of
arrays of pointers, each one capable of pointing to a PerlIOl structure.
When a new handle is created, a spare arena slot is grabbed, and the
address of that slot is returned. This also allows for a handle with no
layers.
What this commit does is change PL_perlio from being an array of
PerlIO* into an array of PerlIOl structures - i.e. each element in the
array goes from being a single pointer, to having several fields. These
will be made used of in follow-up commits.
Nicholas Clark [Fri, 26 Nov 2010 15:29:07 +0000 (15:29 +0000)]
In deparse.t, give a description to every test. Remove the test numbers.
Pass all test descriptions to Test::More. Remove one duplicated test.
Father Chrysostomos [Fri, 26 Nov 2010 14:25:36 +0000 (06:25 -0800)]
Stop eval "BEGIN{die}" from leaking
This fixes the rest of [perl #78438].
eval "BEGIN{die}" creates a *{"_<(eval 1)"} glob regardless of $^P’s
setting in non-threaded builds as of change f9bddea (5.12.0).
Here are the results with various configurations:
version threaded eval text $^P Is *{"_<(eval 1)"} set?
------- -------- --------- --- -----------------------
5.10.1 yes BEGIN{} 0 no
5.10.1 yes BEGIN{die} 0 no
5.10.1 yes BEGIN{} 0xA yes
5.10.1 yes BEGIN{die} 0xA no
5.10.1 no BEGIN{} 0 no
5.10.1 no BEGIN{die} 0 no
5.10.1 no BEGIN{} 0xA yes
5.10.1 no BEGIN{die} 0xA no
5.13.7 yes BEGIN{} 0 no
5.13.7 yes BEGIN{die} 0 no
5.13.7 yes BEGIN{} 0xA yes
5.13.7 yes BEGIN{die} 0xA yes
5.13.7 no BEGIN{} 0 no
5.13.7 no BEGIN{die} 0 yes
5.13.7 no BEGIN{} 0xA yes
5.13.7 no BEGIN{die} 0xA yes
Notice that, for non-threaded builds, BEGIN{die} goes from never sav-
ing the text to always saving it.
The commit in question is:
commit
f9bddea7d2a0d824366014c8ee6ba57e7dedd8c3
Author: Nicholas Clark <nick@ccl4.org>
Date: Tue Dec 2 20:43:58 2008 +0000
Implement PERLDBf_SAVESRC_INVALID, which saves source lines for string
evals that fail to compile.
p4raw-id: //depot/perl@34985
It stops unconditionally using the scoping mechanism to delete
$::{"_<(eval $num)"} on compilation failure:
- safestr = savepvn(tmpbuf, len);
- SAVEDELETE(PL_defstash, safestr, len);
but instead does it explicitly in this block:
+ if (doeval(gimme, NULL, runcv, seq)) {
+ if (was != PL_breakable_sub_gen /* Some subs defined here. */
+ ? (PERLDB_LINE || PERLDB_SAVESRC)
+ : PERLDB_SAVESRC_NOSUBS) {
+ /* Retain the filegv we created. */
+ } else {
+ char *const safestr = savepvn(tmpbuf, len);
+ SAVEDELETE(PL_defstash, safestr, len);
+ }
+ return DOCATCH(PL_eval_start);
+ } else {
+ /* We have already left the scope set up earler thanks to the LEAVE
+ in doeval(). */
+ if (PERLDB_SAVESRC_INVALID) {
+ /* Retain the filegv we created. */
+ } else {
+ (void)hv_delete(PL_defstash, tmpbuf, len, G_DISCARD);
+ }
+ return PL_op->op_next;
+ }
In the case of BEGIN{die}, that doeval() never returns, so the
clean-up code is not reached.
S_doeval never returns because call_list calls Perl_croak if it
catches a BEGIN error (appending the extra ‘BEGIN failed--compilation
aborted’, etc.). That takes execution all the way back to perl_run, so
it bypasses the clean-up code in pp_entereval.
What’s leaking is the GV created earlier in pp_entereval by this line:
CopFILE_set(&PL_compiling, tmpbuf+2);
CopFILE_set simply stores a string under threads, but creates a GV
under non-threaded builds.
This commit solves the problem by scheduling a deletion *before* call-
ing doeval, if the source lines have not been saved.
This works because the usual code to handle it is only bypassed when
there is a BEGIN block (a subroutine), so PL_breakable_sub_gen will
have gone up. So we never need to delete the saved lines when that
code is bypassed.
H.Merijn Brand [Fri, 26 Nov 2010 09:03:46 +0000 (10:03 +0100)]
Special compiler settings only change for 64bitall, not for 64bitint
Chris 'BinGOs' Williams [Fri, 26 Nov 2010 00:46:37 +0000 (00:46 +0000)]
Update MIME-Base64 to CPAN version 3.12
[DELTA]
2010-10-25 Gisle Aas <gisle@ActiveState.com>
Release 3.12
Don't change SvUTF8 flag on the strings encoded [RT#60105]
Documentation tweaks
Jan Dubois [Fri, 26 Nov 2010 00:26:51 +0000 (16:26 -0800)]
Can't spawn fresh Perl interpreter with an empty PATH
Jan Dubois [Thu, 25 Nov 2010 22:26:21 +0000 (14:26 -0800)]
Sync win32/Makefile with win32/makefile.mk
Jan Dubois [Thu, 25 Nov 2010 22:14:18 +0000 (14:14 -0800)]
Pass STATIC_EXT to t/porting/FindExt.t
The list of static extensions on Windows is only known
inside win32/Makefile and win32/makefile.mk, so we need
to somehow pass it to t/porting/FindExt.t to give it
a chance to pass on Windows.
Unfortunately this means that PERL_STATIC_EXT will have
to be set manually if this test is to be run directly
and not via the Makefile.
Jan Dubois [Thu, 25 Nov 2010 20:23:05 +0000 (12:23 -0800)]
Include ws2tcpip.h to get IPv6 definitions
This commit also moves down the 'extern "C"' wrapper so that
it doesn't apply to any #included headers because they may
generate C++ code (templates) which doesn't confirm to "C"
linkage (when this header is included in C++ mode, e.g. while
compiling win32/perllib.c).
Paul Evans [Thu, 25 Nov 2010 20:10:25 +0000 (20:10 +0000)]
[PATCH 5/5] Added Paul Evans to AUTHORS
Signed-off-by: Chris 'BinGOs' Williams <chris@bingosnet.co.uk>
Paul Evans [Thu, 25 Nov 2010 20:09:15 +0000 (20:09 +0000)]
[PATCH 4/5] Adjust unit tests to cope with new sockaddr_in6 functions in Socket (pulled in via IO::Socket)
Signed-off-by: Chris 'BinGOs' Williams <chris@bingosnet.co.uk>
Paul Evans [Thu, 25 Nov 2010 20:08:05 +0000 (20:08 +0000)]
[PATCH 3/3] Provide wrappers for IN6ADDR_ANY and IN6ADDR_LOOPBACK
Signed-off-by: Chris 'BinGOs' Williams <chris@bingosnet.co.uk>
Paul Evans [Thu, 25 Nov 2010 20:07:23 +0000 (20:07 +0000)]
[PATCH 2/3] Implement sockaddr_in6 wrapper
Signed-off-by: Chris 'BinGOs' Williams <chris@bingosnet.co.uk>
Paul Evans [Thu, 25 Nov 2010 20:06:36 +0000 (20:06 +0000)]
[PATCH 1/3] Implement Socket::pack_sockaddr_in6() and unpack_sockaddr_in6()
Signed-off-by: Chris 'BinGOs' Williams <chris@bingosnet.co.uk>
Father Chrysostomos [Thu, 25 Nov 2010 14:14:40 +0000 (06:14 -0800)]
[perl #78438] Memory leak with 'use v5.42'
Father Chrysostomos [Thu, 25 Nov 2010 14:11:12 +0000 (06:11 -0800)]
Stop eval "use 6" from leaking
Father Chrysostomos [Thu, 25 Nov 2010 14:06:31 +0000 (06:06 -0800)]
Stop eval "no 5" from leaking
Nicholas Clark [Thu, 25 Nov 2010 17:08:18 +0000 (17:08 +0000)]
Make BEGIN {require 5.12.0} behave as documented.
Previously in a BEGIN block, require was behaving identically to use 5.12.0 -
ie erroneously executing the use feature ':5.12.0'; and use strict;
use warnings behaviour, which only use was documented to provide.
Nicholas Clark [Thu, 25 Nov 2010 14:58:42 +0000 (14:58 +0000)]
Extend -d:foo=bar to make -d:-foo expand to C<no foo>, consistent with -M-foo
Nicholas Clark [Thu, 25 Nov 2010 13:53:28 +0000 (13:53 +0000)]
Use newSVpvs_flags() instead of sv_2mortal(newSVpvs())
And similarly for newSVpvn() for a known length.
Chris 'BinGOs' Williams [Thu, 25 Nov 2010 12:48:21 +0000 (12:48 +0000)]
Update MIME-Base64 to CPAN version 3.11
[DELTA]
2010-10-24 Gisle Aas <gisle@ActiveState.com>
Release 3.11
Provide encode_base64url and decode_base64url functions to process
the base64 scheme for "URL applications".
The decode_base64() does not issue warnings on suspect input data
any more.
Nicholas Clark [Thu, 25 Nov 2010 11:53:10 +0000 (11:53 +0000)]
Refactor ENAME dumping in Perl_do_sv_dump() to simplify the code slightly.
Simpler code avoids the need for a comment explaining how the complex code was
working. Also use newSVpvs_flags() in place of sv_newmortal() and sv_setpv().
Nicholas Clark [Thu, 25 Nov 2010 11:52:09 +0000 (11:52 +0000)]
Test dumping stashes, with various combinations of NAME and ENAME.
Nicholas Clark [Thu, 25 Nov 2010 10:24:22 +0000 (10:24 +0000)]
Refactor Peek.t to give more useable diagnostics.
Change the numeric test IDs to meaningful names. Provide the names as test
descriptions. Describe the purpose of the second test. Only output the line
numbers if the tests fail. Swap from an explicit plan to done_testing().
Florian Ragwitz [Thu, 25 Nov 2010 03:09:23 +0000 (04:09 +0100)]
It's a little late to get changes into 5.12
Florian Ragwitz [Thu, 25 Nov 2010 01:43:27 +0000 (02:43 +0100)]
Fix signature of sv_unmagic in perlguts.pod
Nicholas Clark [Wed, 24 Nov 2010 21:55:15 +0000 (21:55 +0000)]
Explicitly export Perl_sv_compile_2op_is_broken(), for ext/re.
Frustratingly, because regcomp.c is also compiled as ext/re/re_comp.c, anything
it needs has to be exported. So this has to be X. I'd rather it wasn't.
David Golden [Wed, 24 Nov 2010 19:51:27 +0000 (14:51 -0500)]
minor amendment to documentation of ?PATTERN?
Zefram [Wed, 24 Nov 2010 19:32:50 +0000 (14:32 -0500)]
Deprecate ?PATTERN? without explicit m operator
Deprecate ?PATTERN?, recommending the equivalent m?PATTERN? syntax, in
order to eventually allow the question mark to be used in new operators
that would currently be ambiguous.
(With minor reconciliation edits by David Golden)
Signed-off-by: David Golden <dagolden@cpan.org>
Nicholas Clark [Wed, 24 Nov 2010 17:56:39 +0000 (17:56 +0000)]
Deprecate sv_compile_2op()
It attempted to provide an API to compile code down to an optree, but failed
to bind correctly to lexicals in the enclosing scope. It's not possible to
fix this problem within the constraints of its parameters and return value.
Searches suggest +that nothing on CPAN is using it, so removing it should have
zero impact.
Nicholas Clark [Wed, 24 Nov 2010 17:53:32 +0000 (17:53 +0000)]
Fix a typo introduced by
15d9236d3878cc50. (The wrong member of a union).
This would have caused no functional changes, just compiler warnings.
Max Maischein [Wed, 24 Nov 2010 15:54:52 +0000 (16:54 +0100)]
Add fold_latin1 to the list of exported variable symbols (unbreaking Win32+gcc build)
Rafael Garcia-Suarez [Wed, 24 Nov 2010 13:43:30 +0000 (14:43 +0100)]
Don't use "try" as a variable name
"try" being a C++ keyword, this produces compilation warnings.
Nicholas Clark [Wed, 24 Nov 2010 11:59:50 +0000 (11:59 +0000)]
Remove unused variable from S_set_regclass_bit_fold()
David Golden [Wed, 24 Nov 2010 11:50:08 +0000 (06:50 -0500)]
Clarify m?PATTERN? is ok and only ?PATTERN? is not
Nicholas Clark [Wed, 24 Nov 2010 11:36:36 +0000 (11:36 +0000)]
Convert xhv_name in struct xpvhv_aux to be a union of HEK* and HEK**
This avoids a lot of casting. Nothing outside the perl core code is accessing
that member directly.
Chris 'BinGOs' Williams [Tue, 23 Nov 2010 19:12:21 +0000 (19:12 +0000)]
Update Unicode-Collate to CPAN version 0.68
[DELTA]
0.68 Tue Nov 23 20:17:22 2010
- doc: clarified about (backwards => [ ]) and (backwards => undef).
- separated t/backwds.t from t/test.t.
- added cjk_b5.t, cjk_gb.t, cjk_ja.t, cjk_ko.t, cjk_py.t, cjk_st.t in t
for CJK/*.pm without Locale.pm.
Father Chrysostomos [Tue, 23 Nov 2010 17:56:19 +0000 (09:56 -0800)]
Document the refcount of version functions’ retval
Nicholas Clark [Tue, 23 Nov 2010 11:48:51 +0000 (11:48 +0000)]
When dup'ing CVs, only take the OP_REFCNT_LOCK if it is needed.
Previously it was being taken for all CVs, including XSUBS.
Also, refactor other non-XSUB specific code into the same if block.
Nicholas Clark [Tue, 23 Nov 2010 11:46:43 +0000 (11:46 +0000)]
No need to clone pad name 0, as it's never used.
Pad entry 0 is for @_, but no name is recorded for it, so the name slot is
always &PL_sv_undef.
Chris 'BinGOs' Williams [Tue, 23 Nov 2010 12:21:02 +0000 (12:21 +0000)]
Update IPC-Cmd to CPAN version 0.66
[DELTA]
Changes for 0.66 Tue Nov 23 12:10:24 GMT 2010
=================================================
* Apply documentation patch from Dan Dascalescu [RT # 63250]
* Apply another documentation patch from Dan Dascalescu [RT #63251]
* Fix an issue with _split_like_shell_win32() raised by tunakermit [RT #62961]
Craig A. Berry [Tue, 23 Nov 2010 04:38:12 +0000 (22:38 -0600)]
Fix multiple perldelta entries from buildtoc on VMS.
Karl Williamson [Tue, 23 Nov 2010 00:41:23 +0000 (17:41 -0700)]
regcomp.h: Restore separate bit for LOC class
This commit partially reverts
cefafd73018b048fa66d2b22250431112141955a
which unconditionally used a bitmap for classes like \w in ANYOF nodes
used in locales. Unfortunately, I forgot to unconditionally allocate
that space, so things were getting corrupted. It is scary that that did
not show up in my testing, but locales are hard to test. It showed up
in a workspace without DEBUGGING.
This commit now causes the bitmap to be used only when necessary, at the
expense of using a precious bit in the flags field to indicate that it
is being used. However, as events have turned out since that commit,
that flags bit isn't as precious as I thought. It looks like we will
have to split the ANYOF node into two similar nodes, one of which is
variable length, as there are bugs due to the optimizer thinking it is
of length 1, when in fact it doesn't currently have to be. That split
should allow more bits to be freed.
I'm retaining for now some ancillary code that was to help improve the
efficiency when that bit was removed; just in case we have to redo this.
But if we do, we have to unconditionally allocate the space we think we
are using.
Signed-off-by: David Golden <dagolden@cpan.org>
Father Chrysostomos [Tue, 23 Nov 2010 00:55:19 +0000 (16:55 -0800)]
Only call mro_package_moved on new substashes
Commit
298d65111 added this mro_package_moved call.
It does not need to happen if the substash already exists, as it will
already have had effective names assigned to it.
It also may not be a good idea to set it in such cases, as it may make
a recursive call to mro_get_linear_isa. I know this is utter paranoia,
but someone may write a mro plugin that is not reëntrant. (The speed
gain is worth it, though.)
Father Chrysostomos [Tue, 23 Nov 2010 00:42:00 +0000 (16:42 -0800)]
Don’t CLONE nameless hashes
The cloning code was trying to call CLONE on nameless hashes that nonetheless had an effective name (HvENAME).
This can happen if a nameless hash is assigned over a stash, as in
*foo:: = {}
or if a stash is undefined:
undef %foo::
(The effective name is how perl tracks the location internally, for
the sake of updating MRO caches.)
Father Chrysostomos [Tue, 23 Nov 2010 00:22:08 +0000 (16:22 -0800)]
Clarify the hekp assignment in dump.c
Karl Williamson [Mon, 15 Nov 2010 20:51:24 +0000 (13:51 -0700)]
[bracketed char class] fixes
This patch adds two functions for setting the ANYOF node bitmaps. The
one for dealing with folds has intelligence as to what to do if unicode
semantics is in effect.
Together with previous commits, this fixes the unicode bug for bracketed
character classes, as far as known bugs go, so pods are updated as well.
Karl Williamson [Mon, 15 Nov 2010 20:19:18 +0000 (13:19 -0700)]
fold_grind.t: Only test [char classes]
I meant to do this on the initial commit for this .t. The
non-char-class tests show many failures now, as they are more
comprehensive than the reg_fold.t ones. Until I iron those out, use
reg_fold.t for these.
Karl Williamson [Mon, 15 Nov 2010 19:56:49 +0000 (12:56 -0700)]
handy.h: New #define to use new bit
This creates a new macro for use by regcomp to test the new bit
regarding non-ascii folds.
Because the semantics may change in the future to deal with multi-char
folds, the name of the macro is unwieldy and specific enough that no one
should be tempted to use it.
Karl Williamson [Mon, 15 Nov 2010 19:55:15 +0000 (12:55 -0700)]
l1_char_class_tab.h: Add new bit to table.
The output of the revised Porting/mk_charclass.pl is here incorporated
into this .h., with a #define for the new bit that signifies if a
character participates in a fold with a non-latin1 character.
Karl Williamson [Mon, 15 Nov 2010 19:53:27 +0000 (12:53 -0700)]
mk_PL_charclass.pl: Find non-latin1 folds
The output of this .pl is to be used as the main table in
l1_char_class_tab.h. Add a new bit to indicate if a Latin1 character
particpates in a a simple fold with a character outside the Latin1
range. This will be used by regcomp.c to make decisions about how to
compile regexes.
Karl Williamson [Mon, 15 Nov 2010 19:49:34 +0000 (12:49 -0700)]
regexec.c: indent code in new block
This is a white-space, formatting only patch. The previous commit
placed existing code in a new {block}. This now indents it and
reformats it to still fit in 80 columns. No logic changes are in it.
Karl Williamson [Mon, 15 Nov 2010 19:48:20 +0000 (12:48 -0700)]
regexec.c: utf8 could fold to ascii/latin1
Some non-Latin1 characters fold to that range, and hence should be
tested against the generated bit map.
Karl Williamson [Mon, 15 Nov 2010 19:40:16 +0000 (12:40 -0700)]
regcomp.c: Add comment
Karl Williamson [Mon, 15 Nov 2010 19:38:20 +0000 (12:38 -0700)]
regcomp.c: Add comment
Karl Williamson [Mon, 15 Nov 2010 19:10:28 +0000 (12:10 -0700)]
reg_fold.t: Don't duplicate fold_grind.t
This removes testing of bracketed character classes from reg_fold.t,
which are better tested by the new fold_grind.t
Karl Williamson [Mon, 15 Nov 2010 19:05:43 +0000 (12:05 -0700)]
Add fold_grind.t
This .t is designed to mostly replace reg_fold.t. it provides much more
comprehensive tests, but has a huge number of failures that should be
TODOs that I found it difficult to classify, so have deferred adding it
until now, when in a few commits, those will be whittled way down.
And, those would only come in testing things that aren't character
classes, which are currently commented out, and reg_fold.t is relied on
for testing those.
Karl Williamson [Mon, 15 Nov 2010 17:49:52 +0000 (10:49 -0700)]
pp.c: tiny performance enhancement
I believe on many processors two increments are faster than two
additions
Karl Williamson [Mon, 15 Nov 2010 17:27:02 +0000 (10:27 -0700)]
pp.c, utf8.c: Convert to use TWO_BYTE_UTF8_TO_UNI
Karl Williamson [Mon, 15 Nov 2010 17:18:58 +0000 (10:18 -0700)]
utf8.h: Add macro TWO_BYTE_UTF8_TO_UNI()
The code to do this isn't obvious, as it was wrong in 5 different places
in two different files (forgetting one or both of the required
conversions to UTF (which is a no-op except on EBCDIC machines, or it
would have been detected sooner.)
Some of that code depended on left shifting being truncated in a U8.
This adds UTF_START_MASK so it can work in a larger width variable.
Karl Williamson [Mon, 15 Nov 2010 16:28:28 +0000 (09:28 -0700)]
utfebcdic.h: comment additions, fix typo
Karl Williamson [Sun, 14 Nov 2010 21:00:47 +0000 (14:00 -0700)]
regexec.c: Correct indent
Karl Williamson [Sun, 14 Nov 2010 19:20:40 +0000 (12:20 -0700)]
mk_PL_charclass.pl: Correct comment
Karl Williamson [Sun, 14 Nov 2010 19:11:12 +0000 (12:11 -0700)]
utf8_heavy: Guard against infinite recursion
If things aren't just so, it could be that utf8_heavy calls something
which requires a pattern, such as split or just a pattern match that
ends up calling utf8_heavy again, ad infinitum. When this happens,
memory gets eaten up and the machine grinds to a halt, likely requiring a
manual forced reboot.
To prevent this undesirable situation, utf8_heavy now stacks all its
calls in progress, and if any is a repeat, panics.
Karl Williamson [Sun, 14 Nov 2010 17:37:41 +0000 (10:37 -0700)]
Split ANYOF_NONBITMAP into two components
ANYOF_NONBITMAP means that the node can match things that aren't in its
bitmap. Some things can match only when the target string is in utf8,
and some things can match even if it isn't. If the target string is not
in utf8, and we know that the only possible match is when it is in utf8,
we know it can't match, and avoid a fruitless, expensive swash load.
This change also fixes a number of problems shown in t/re/grind_fold.t
that I will deliver soon.
Karl Williamson [Sun, 14 Nov 2010 17:32:28 +0000 (10:32 -0700)]
regcomp.c: Optimizer wrongly turning off bit
ANYOF_UNICODE_ALL and ANYOF_NONBITMAP are not mutually exclusive, so
there is no need for the optimizer to make them so.
Karl Williamson [Sun, 14 Nov 2010 05:39:23 +0000 (22:39 -0700)]
regcomp.c: Add explanatory comment
Karl Williamson [Sun, 14 Nov 2010 03:55:49 +0000 (20:55 -0700)]
regcomp.h: Add comment
Karl Williamson [Sun, 14 Nov 2010 03:51:08 +0000 (20:51 -0700)]
regcomp.h: Renumber ANYOF_EOS bit
This is in preparation for adding a new bit which for debugging ease
ought to be adjacent to another one.
Karl Williamson [Sun, 14 Nov 2010 03:41:44 +0000 (20:41 -0700)]
regcomp.c: Fix indent
Karl Williamson [Sun, 14 Nov 2010 00:21:14 +0000 (17:21 -0700)]
rename ANYOF_UNICODE to ANYOF_NONBITMAP
I am about the hone the meaning of this to mean that there is something
outside the bitmap that is matchable by the node, and the new name
reflects that more accurately.
I am not retaining the old name because I'm about to remove it from the
flags field to save a bit and avoid masking operations, and any code
that would be using it would break at that point.
Karl Williamson [Fri, 12 Nov 2010 16:33:52 +0000 (09:33 -0700)]
perl.h: Add latin1 fold table
The adds a folding table that works on the full Latin1 character set,
except for three problematic characters that need special handling.
It is accessed by PL_fold_latin1.
Karl Williamson [Fri, 12 Nov 2010 16:26:04 +0000 (09:26 -0700)]
regcomp.sym: Clarify comment
make regen needed
Karl Williamson [Fri, 12 Nov 2010 16:07:48 +0000 (09:07 -0700)]
Nits in perlunicode.pod
Karl Williamson [Fri, 12 Nov 2010 16:06:50 +0000 (09:06 -0700)]
regexec.c: Split EXACT, folding nodes in regrepeat
As I started working on fixing more bugs in regrepeat, I realized that
the EXACT node had enough different things going on from the folding
nodes that it was better to give it its own case statement. This patch
does this and refactors the remaining code to compensate, and for
clarity.
Karl Williamson [Fri, 12 Nov 2010 16:05:19 +0000 (09:05 -0700)]
PL_fold wrong for EBCDIC platforms.
The PL_fold table map on EBCDIC only works on the ASCII-subrange
characters, not the full native Latin1.
To fix this, I moved the table to utfebcdic.h for EBCDIC platforms, and
actually changed it to three tables, one for each of the code pages
known to Perl.
There is no EBCDIC platform available to test on. What I did was hack
together a program from existing code that does EBCDIC transforms. I
ran it in ASCII mode, and verified that the generated table was
identical to the Latin1 table I had previously constructed by hand and
extensively tested. I then ran it on each of the three EBCDIC
transforms, and verified that each matched the places in the original
table that I knew were correct, all the ASCII alphabetics, the controls,
and a few other code points.
So these tables are at least as correct as the existing one, as they are
identical to it for [A-Z], [a-z].
Karl Williamson [Fri, 12 Nov 2010 16:04:37 +0000 (09:04 -0700)]
perl.h: Expand comment
Karl Williamson [Fri, 12 Nov 2010 16:02:46 +0000 (09:02 -0700)]
re/pat.t: Skip tests on EBCDIC
There's no convenient way to translate to EBCDIC in these tests, since
they don't use the normal test routines which have this facility.
Therefore, have to skip these tests on those platforms.
Karl Williamson [Fri, 12 Nov 2010 15:41:46 +0000 (08:41 -0700)]
regcomp.sym: Fix descriptions
requires regen
Karl Williamson [Fri, 12 Nov 2010 03:07:09 +0000 (20:07 -0700)]
regex free up bit in ANYOF node
This patch causes all locale ANYOF nodes to have a class bitmap (4
bytes) even if they don't have a class (such as \w, \d, [:posix:]).
This frees up a bit in the flags field that was used to signal if the
node had the bitmap. I intend to use it instead to signal that loading
a swash, which is slow, can be bypassed. Thus this is a time/space
tradeoff, applicable to not just locale nodes: adding a word to the
locale nodes saves time for all nodes.
I added the ANYOF_CLASS_TEST_ANY_SET() macro to determine quickly if
there are actually any classes in the node.
Minimal code was changed, so this can be easily reversed if another bit
frees up.
Another possibility is to share with the ANYOF_EOS bit instead, as this
is used just in the optimizer's start class, and only in regcomp.c. But
this requires more careful coding.
Another possibility is to add a byte (hence likely at least 4 because of
alignment issues) to store extra flags.
And still another possibility is to add just the byte for the start
class, which would not need to affect other ANYOF nodes, since the EOS
bit is not used outside regcomp.c. But various routines in regcomp
assume that the start class and other ANYOF nodes are interchangeable,
so this option would require more code changes.
Karl Williamson [Fri, 12 Nov 2010 02:20:40 +0000 (19:20 -0700)]
regcomp.h: Add comment
Karl Williamson [Fri, 12 Nov 2010 02:00:13 +0000 (19:00 -0700)]
regcomp.c: Remove references to old #define
Two #defines were recently collapsed to mean the same thing.
Standardize on using one of them.
Karl Williamson [Thu, 11 Nov 2010 22:56:24 +0000 (15:56 -0700)]
regcomp.h: Reorder statements for clarity
Reorder #defines of bits so are in numerical order
Father Chrysostomos [Mon, 22 Nov 2010 17:25:40 +0000 (09:25 -0800)]
Newly-created stashes may need effective names added