David Mitchell [Tue, 4 Feb 2014 18:55:28 +0000 (18:55 +0000)]
re_intuit_start(): eliminate saved_s var
In the "find other substr" block, we enter with s pointing to the "check"
substr. We save s to saved_s, then use its value, then use s for
something else, then finally restore s from saved_s.
However, at entry to this block, we have already set check_at to s,
so use check_at rather than s as the input to the block; then there's
also no need to use saved_s to remember its value. But it turns
out we don't need to set s to the old value anyway, as the next block of
code always assigns to s anyway.
David Mitchell [Tue, 4 Feb 2014 18:52:08 +0000 (18:52 +0000)]
re_intuit_start(): localise t
The function-wide variable t is now only used locally within two
separate blocks, so remove the outer declaration and add two inner
declarations.
David Mitchell [Sun, 2 Feb 2014 19:49:51 +0000 (19:49 +0000)]
re_intuit_start(): remove try_at_* labels
Now that both "other" blocks have been merged into one block, there's
only one occurrence of the following rather than two:
if (rx_origin == strpos)
goto try_at_start;
goto try_at_offset;
which allows us to eliminate these two gotos and just fall through into
the 'if (rx_origin == strpos)' just before the two code blocks marked by
those two labels.
Also intro introduce another label, 'postprocess_substr_matches',
which is needed by the stclass code now that those other two labels have
gone.
David Mitchell [Sun, 2 Feb 2014 19:29:35 +0000 (19:29 +0000)]
re_intuit_start(): simplify check-only origin test
In the case where there's no "other" substring, we check whether
the regex origin would be at the start of string.However, a few commits
ago we introduced the rx_origin var, and we can use this value now
simplify the test, which was effectively re-calculating rx_origin each
time.
David Mitchell [Sun, 2 Feb 2014 17:49:57 +0000 (17:49 +0000)]
re_intuit_start(): merge anch and float "other"
When processing the "other" substring, there are two very similar
branches: one if "other" is anchored, the other if it's floating.
Merge these two branches.
The diff output makes it look a lot messier that it actually is; really
it's a bit like
if (other_ix) {
A;
B1;
C;
}
else {
A;
B2;
C;
}
becomes
{
A;
if (other_ix)
B1;
else
B2;
C;
}
where each statement such as B, that differs between the two branches,
is handled separately.
David Mitchell [Sun, 2 Feb 2014 14:47:34 +0000 (14:47 +0000)]
re_intuit_start(): calc fbm_instr() end in bytes
When calculating the end limit of the string to pass to fbm_instr(),
we usually have a pointer to the latest point where the substr could
start, whereas fbm_instr() expects a pointer to the latest point where
the substr could end.
Since fmb_intr() purely matches bytes (it cares not whether those bytes
are part of a utf8 stream of not), the value of the latest end point will
always be:
(latest start point) + SvCUR(sv) - !!SvTAIL(sv)
i.e. work in bytes, even if we have utf8 values.
In some of the places where fbm_instr() is used, the calculation is being
done partially or fully in chars rather than bytes. This is not incorrect,
and indeed may in theory calculate a slightly lower end limit sometimes
and thus stop the fbm earlier. But this comes at the cost having to do
utf8 length calculations and HOPs back from the end of the string.
So we're trading off not having to do utf8 skips on the last few chars
against the fbm not uselessly searching the last few chars. These roughly
cancel each other out. But since we no longer do HOPs before starting the
fbm, we win every time the fbm doesn't get near the end of the string.
So in conclusion, simpler code and better than or equal performance.
David Mitchell [Sun, 2 Feb 2014 14:14:47 +0000 (14:14 +0000)]
re_intuit_start(): move a line of code earlier
This makes no functional difference, but makes the two branches
of "other substr" calculate the three values last1, last, s in the same
order.
David Mitchell [Sun, 2 Feb 2014 14:02:24 +0000 (14:02 +0000)]
re_intuit_start(): re-indent after brace removal
The previous commit removed one level of {} from a block of code;
re-indent to match.
Whitespace-only change
David Mitchell [Sun, 2 Feb 2014 13:56:14 +0000 (13:56 +0000)]
re_intuit_start(): move do_other_anchored label up
The "other substr" code currently looks like
if (anchored) {
do_other_anchored:
{
...
}
}
else {
....
}
Replace it with
do_other_substr:
if (anchored) {
...
}
else {
....
}
and make the two places that currently do 'goto do_other_anchored'
do 'goto do_other_substr' instead, after first asserting that the "other
substr" is indeed always anchored.
This would appear to be infinitesimally less efficient, but is is part
of plan to make the two branches of the "other substr" code more similar,
allowing eventually merging.
David Mitchell [Sun, 2 Feb 2014 13:44:07 +0000 (13:44 +0000)]
re_intuit_start(): reduce use of *_offset macros
There are a number of macros with definitions like
#define anchored_offset substrs->data[0].min_offset
#define float_min_offset substrs->data[1].min_offset
In the two "other substr" branches, replace uses of these macros, e.g.
{
...
foo = prog->float_min_offset;
...
}
with
{
struct reg_substr_datum *other = &prog->substrs->data[other_ix];
...
foo = other->min_offset;
...
}
As well as making debugging easier (a debugger might display real fields
but not macros), and potentially making the binary more compact and faster
(unless the compiler is clever enough to optimise away every use of the
'prog->substrs->data[0]' dereference), it also helps make the two "other
substr" branches more similar, bringing us closer to eventually merging
them.
David Mitchell [Sat, 1 Feb 2014 00:33:11 +0000 (00:33 +0000)]
re_intuit_start(): harmonise other_last++
In the other=anchored branch, at the end on failure or success, we
set other_last to HOP3(last, 1) or HOP3(s, 1) respectively,
indicating the minimum point we should start matching if we ever
have to try again. Clearly for failure, we know the substring can't be
found at any position up to, or including last, so next time we should try
at last+1. For success, if we return later it means that some other
constraint failed, and we already know that the substr wasn't found at
positions up to s-1, and that if we tried position s again we'd just
repeat the previous failure. So in both cases set to N+1.
In the other=float branch however, other_last is set to last or s on
failure or success, with a big "XXX is this right?" against the
"other_last = s" code. It turns out that "other_last = s" *is* right, for
the special reasons explained in the code comments added by this commit;
while "other_last = last" is changed to be "other_last = HOP3(last,1)".
David Mitchell [Fri, 31 Jan 2014 23:58:14 +0000 (23:58 +0000)]
re_intuit_start(): simplify other=anchored block
This block of code calculates 2 limits: last, last2; plus a third,
last1 = min(last, last2)
It turns out that (as explained below), last is always <= last2, which
allows us to simplify the code. In particular, this means that last always
equals last1, so eliminate last1 and always use last instead.
At the same time, rename last2 to last1, so the vars have the same names /
meanings as in the other=float branch.
Here's the math (ignoring char/byte differences for simplicity's sake):
last = s (== start of just matched float substr)
- float_min_offset
+ anchored_offset
last2 = strend - minlen + anchored_offset
Let
delta = last2 - last
= (strend - minlen + anchored_offset)
- (s - float_min_offset + anchored_offset)
= (strend - s) - (minlen - float_min_offset) [1]
Now, we've just matched a floating substring at s. But this previous
match was constrained to *end* no later than end_shift chars before
strend, so it was constrained to *start* no later than
end_shift + length(float) chars before strend; i.e.
strend - s >= end_shift + length(float) [2].
Also, more or less by definition,
minlen = float_min_offset + length(float) + end_shift
or
end_shift = minlen - float_min_offset - length(float) [2]
So, combining [2] and [3] gives
strend - s >= (minlen - float_min_offset - length(float)) + length(float)
strend - s >= minlen - float_min_offset
Therefore, from [1],
delta >= 0
David Mitchell [Fri, 31 Jan 2014 23:45:01 +0000 (23:45 +0000)]
re_intuit_start(): add tmp assertion
This assertion confirms it is safe to strip out some redundant code that
will be removed (and explained) in the next commit
David Mitchell [Thu, 30 Jan 2014 16:12:14 +0000 (16:12 +0000)]
re_intuit_start(): fixup some code comments
Based on some feedback from Hugo, this makes some of the comments I've
added recently less confusing (hopefully).
In particular, it standardises on one set of terminology for string
positions: earliest/first to latest/last, avoiding others like
smallest/least/minimum to greatest/most/maximum, and
bottom/lowest to top/highest.
David Mitchell [Sun, 26 Jan 2014 16:07:17 +0000 (16:07 +0000)]
re_intuit_start(): update rx_origin after check
Previously the code for updating rx_origin after a 'check' match or an
'other' match looked a bit like this:
s = fbm_instr(check);
if (other exists) {
if (other is anchored) {
rx_origin = HOP3c(s, -prog->check_offset_max);
....
}
else {
rx_origin = HOP3c(s, -prog->check_offset_min);
....
}
}
else
rx_origin = HOP3c(s, -prog->check_offset_max);
This commit changes it to
s = fbm_instr(check);
rx_origin = HOP3c(s, -prog->check_offset_max);
if (other exists) {
if (other is anchored) {
....
}
else {
....
}
}
Of course in each case the 'HOP3' code was slightly different, but they
all happened to be equivalent, especially as for an anchored string,
check_offset_min == check_offset_max.
The only complication was a goto do_other_anchored, but it turns
out that setting rx_origin in that case was easy.
David Mitchell [Sun, 26 Jan 2014 14:19:47 +0000 (14:19 +0000)]
regex substrs: record index of check substr
Currently prog->substrs->data[] is a 3 element array of structures.
Elements 0 and 1 record the longest anchored and floating substrings,
while element 2 ('check'), is a copy of the longest of 0 and 1.
Record in a new field, prog->substrs->check_ix, the index of which element
was copied. (Eventually I intend to remove the copy altogether.)
Also for the anchored substr, set max_offset equal to min offset.
Previously it was left as zero and ignored, although if copied to check,
the check copy of max *was* set equal to min. Having this always set will
allow us to make the code simpler.
David Mitchell [Sat, 25 Jan 2014 10:30:51 +0000 (10:30 +0000)]
re_intuit_start(): use the rx_origin var more
Make the rx_origin variable (introduced in the previous commit, and which
specifies the current minimum legal place the regex could match at) to
also be used at the start and end of the "other" substr match: the origin
is now passed in this var to the other parts of the code that use it,
rather than in the anonymous "t" variable, which is slowly being reduced
in function to a temporary generic char pointer.
David Mitchell [Fri, 24 Jan 2014 16:39:40 +0000 (16:39 +0000)]
re_intuit_start(): introduce rx_origin var
re_intuit_start() is a bit of mess. It uses two general function-scope
vars, s and t, to point at string offsets while processing. These vars
mean different things at different times. Introduce a new var, rx_origin,
which indicates the current minimum position that the regex could begin
matching at. It starts off at strpos, and gradually moves up as various
constraints are rejected. It will be the value eventually returned.
For the moment, s and/or t will continue serving that function at various
points in the code; this commit just makes rx_origin valid at the entry to
the 'restart:' block.
David Mitchell [Fri, 24 Jan 2014 14:41:56 +0000 (14:41 +0000)]
re_intuit_start(): use different var for tmp value
Make the anchored branch more similar to the floating branch by using s to
hold the start position for fbm rather than t. Should be functionally
equivalent.
Note that on failure in the anchored branch, we leave with t holding a
different value than before, but it shouldn't matter, since the value of t
is only used in the success case.
David Mitchell [Fri, 24 Jan 2014 13:48:21 +0000 (13:48 +0000)]
re_intuit_start(): substr SV cannot be undef
Commit
7e0d5ad7c removed the code that sometimes set the substr
to &PL_sv_undef, so there's no need to test for that value any more.
David Mitchell [Mon, 20 Jan 2014 16:51:31 +0000 (16:51 +0000)]
re_intuit_start(): simplify fixed offset_max code
Since we now assert that all offsets are non-negative, this code can
be simplified a bit. Also, by using HOP3lim() rather than HOP3(), we can
remove a trailing conditional.
David Mitchell [Mon, 20 Jan 2014 16:28:08 +0000 (16:28 +0000)]
re_intuit_start(): thinko from a few commits ago
I bit of code I modified a few commits ago was supposed to be
subtracting the offset 'start_shift' if it was positive, but the
test condition I coded was 'end_shift > 0' by mistake.
It turns out this is harmless, since start_shift is always positive
anyway, and if the shift wasn't subtracted, it just made the code slightly
less efficient. (So it worked either way).
Fix it any way.
David Mitchell [Sun, 19 Jan 2014 00:15:57 +0000 (00:15 +0000)]
Perl_regexec_flags(): use HOP4c in another place
Now that we have this macro, use it.
David Mitchell [Sat, 18 Jan 2014 23:46:49 +0000 (23:46 +0000)]
re_intuit_start(): bias last* vars; revive reghop4
In the "just matched float substr, now match fixed substr" branch,
initially add an extra prog->anchored_offset to the last and last2 vars;
since a lot of the later calculations involve adding anchored_offset,
doing this early to the last* vars means less work in some cases. In
particular, last is calculated from s by a single
HOP4(s, prog->anchored_offset-start_shift,...)
rather than two separate
HOP3(s, -start_shift,...);
HOP3(..., prog->anchored_offset,...);
which may mostly cancel each other out.
Similarly with last2. Later, we can skip adding prog->anchored_offset to
last1, since its antecedents already have the bias added.
In the case of failure, calculating a new start position involves an extra
HOP to s, but removes a HOP from other_last, so the two cancel out.
To make this work, I revived the reghop4() function which had been
commented out, and added a HOP4c() wrapper macro. This is like HOP3c(),
but allows you to specify both lower and upper limits. Useful when you
don't know the sign of the offset in advance.
(Yves had earlier added this function, but had commented it out until such
time as it was actually used.)
I also added some extra comments to this block and removed the comment
about it being maybe broken under utf8, since I'm auditing the code for
utf8-safeness.
David Mitchell [Fri, 17 Jan 2014 16:09:23 +0000 (16:09 +0000)]
re_intuit_start(): add some more code comments
David Mitchell [Thu, 16 Jan 2014 17:07:13 +0000 (17:07 +0000)]
re_intuit_start(): delete srch_(start|end)_shift
remove these two vars; these are now just unmodified copies of
start_shift, end_shift; so just use those two vars directly.
David Mitchell [Thu, 16 Jan 2014 16:59:50 +0000 (16:59 +0000)]
re_intuit_start(): assert substr offsets are >= 0.
Some parts of this function handle the negative offset case, while other
parts don't. Also, nothing in the test suite generates negative offsets.
So for now, assert that all offsets are positive, and strip out any
code that handles negative offsets. This will make my current activity
in fixing and refactoring this function easier. If at some future point
someone wants to add support for negative offsets (e.g. with look-behind)
then they'll have to add support fully to re_intuit_start() from scratch.
David Mitchell [Thu, 16 Jan 2014 16:00:41 +0000 (16:00 +0000)]
re_intuit_start(): fix another utf8 slowdown
The code that looks for a floating substr after a fixed substr has
already been found, was very slow on long utf8 strings. For example
this used to take an hour or more, and now takes millisecs:
$s = "ab" x 1_000_000;
utf8::upgrade($s);
$s =~ /ab.{1,2}x/;
When calculating the maximum position at which the floating substr could
start, there are two possible upper limits.
First, the absolute max position, ignoring the results of the previous
fixed substr match - this is the end-of-string less a bit (last1);
Second, float_max_offset on from the current origin of the regex (this
is dependent on where the fixed substr previously matched).
To decide which of these two values to use (the smaller), it used to
calculate the distance in chars from the regex origin to last1, and if
this was greater than float_max_offset, it used origin + float_max_offset
(in chars) instead.
This distance calculation involved doing a utf8 length calculation on the
majority of the string, which for long strings was a big slowdown.
Fix this by instead always using HOP3(origin + float_max_offset), but
using last1 as the upper HOP limit rather than strend, so it's always
limited to <= last1.
If L is number of chars that had to be hopped over for the distance
calculation (which could be most of the string), and if M is the
chars hopped for origin + float_max_offset (typically either small or
infinite), then we:
previously hopped: (M>=L ? L : L+M) chars
now hop: min(L,M) chars; or if M is infinite, hop 0 chars
Which is always less than or equal to the amount of work done previously,
and is a very big win for long strings with smallish maximum float
offsets.
David Mitchell [Thu, 16 Jan 2014 15:16:34 +0000 (15:16 +0000)]
re_intuit_start(): document floating code better
Add some comments to the float-finding code that explains what the t, last
and last1 vars are, and how they're calculated.
Also make the calculation of last separate from last1;
it should logically be the same, but clearer. i.e. change
last = last1 = ...;
if (cond)
last = ....;
to
last1 = ...;
last = cond ? .... : last1;
David Mitchell [Wed, 8 Jan 2014 16:30:32 +0000 (16:30 +0000)]
re_intuit_start(): add more debugging output
Add some debugging output to some parts of the code without them, so it's
easier to follow progress through intuit(); also add an initial "we're in
intuit" message.
Make all the debugging output, apart from the initial and final intuit
messages, indented by 2 chars so that they are seen to be things happening
within intuit.
Dump the susbtrs data array.
Fix up a few of the existing outputs to be more informative.
David Mitchell [Fri, 27 Dec 2013 23:23:12 +0000 (23:23 +0000)]
re_intuit_start(): simplify ml_anch evaluation
rather than enumerating all the anchor flag combos where ml_anch *isn't*
true, enumerate the flags for which is *is* true. This is slighly simpler
logic, and involves once less negation, which makes it easier to
understand.
David Mitchell [Fri, 27 Dec 2013 23:16:23 +0000 (23:16 +0000)]
test for single-line ^ within /m
This combo doesn't appear to be tested anywhere; specifically, adding this
in re_intuit_start() didn't trigger the assertion when run against the
test suite:
if (prog->extflags & RXf_ANCH_BOL)
assert(!multiline);
David Mitchell [Fri, 27 Dec 2013 22:28:31 +0000 (22:28 +0000)]
eliminate RXf_ANCH_SINGLE
This macro defines two flag bits:
#define PREGf_ANCH_SINGLE (PREGf_ANCH_SBOL|PREGf_ANCH_GPOS)
but is only used twice in core (and not on CPAN),
don't really add any value, but increases cognitive complexity.
David Mitchell [Fri, 27 Dec 2013 22:12:31 +0000 (22:12 +0000)]
re_intuit_start(): add comments to a block of code
explain what it does!
David Mitchell [Fri, 27 Dec 2013 22:06:36 +0000 (22:06 +0000)]
re_intuit_start(): refactor an if/else block
change
if (X) { do nothing } else if (Y) { Z }
to
if (!X && Y) { Z }
David Mitchell [Thu, 26 Dec 2013 22:47:33 +0000 (22:47 +0000)]
re_intuit_start(): rationalise ml_anch var
Make ml_anch bool rather than I32, since that's all its used for.
Also, unconditionally initialise it to zero then only set where needed.
This eliminates an else branch that just sets it to zero.
David Mitchell [Thu, 26 Dec 2013 22:29:39 +0000 (22:29 +0000)]
re_intuit_start(); eliminate max_shift var
I introduced this var a few commits ago, but after a bit of refactoring,
I can now eliminate it and just use its antecedents directly.
David Mitchell [Thu, 26 Dec 2013 22:00:24 +0000 (22:00 +0000)]
re_intuit_start(): merge two similar code branches
When the check string is anchored at a fixed offset from the start
of the string (e.g. /^..abc/), there are two similar code branches
that do a quick memNE() reject - the difference being whether SvTAIL() is
true, i.e. whether the pattern ends with $ and may match a \n.
Merge the common bits of the two branches. Technically this makes the
non-SvTAIL() branch slightly less efficient, since it will always retrieve
SvCUR(check) even when its not needed, but in practice it will already be
in the cache since we have to access SvPVX(check), and the I-cache misses
will have been reduced by reducing the code size.
It also removes one label.
David Mitchell [Thu, 26 Dec 2013 20:54:08 +0000 (20:54 +0000)]
re_intuit_start(): factor out some common code
3 assignments are done at the end of both branches of an if/else;
factor them out after the end.
David Mitchell [Mon, 16 Dec 2013 14:18:58 +0000 (14:18 +0000)]
regexp.h: document the fields of reg_substr_datum
In particular, specify that the various offset fields are char rather
than byte counts.
David Mitchell [Fri, 13 Dec 2013 16:35:14 +0000 (16:35 +0000)]
RT#120692 slow intuit with long utf8 strings
Some code in re_intuit_start() that tries to find the range of chars
to which the BM substr find can be applied, uses logic that is very
inefficient once utf8 was enabled. Basically the code tries to find
the maximum end-point where the substr could be found, by taking the
minimum of:
* start + prog->check_offset_max + length(substr)
* end - prog->check_end_shift
Except that these values are in char lengths and need to be converted to
bytes before calling fbm_instr(). The code formerly involved scanning the
whole of the remaining string to determine how many chars it had.
By doing the calculation a different way, we can avoid this.
This makes the following two regexps each take milliseconds rather than
10s of seconds:
my $s = 'ab' x 1_000_000;
utf8::upgrade($s);
1 while $s =~ m/\Ga+ba+b/g;
$s=~ /^a{1,2}x/ for 1..10_000;
David Mitchell [Fri, 13 Dec 2013 13:40:15 +0000 (13:40 +0000)]
re_intuit_start(): re-indent a block of code
(whitespace-only changes)
A block of code had a confusing mixture of 4- and 2-space indents.
Re-indent to be consistently 4 chars. Also, wrap some long lines
and add a few blank lines for clarity.
David Mitchell [Tue, 10 Dec 2013 17:17:06 +0000 (17:17 +0000)]
regcomp utf8 len cache panic
Compiling this regex:
/\x{100}[xy]\x{100}{2}/
caused this:
panic: sv_len_utf8 cache 1 real 2
This was due to the code in S_study_chunk() mixing up char and byte
lengths when updating the utf8 length cache on a utf8 string that
had been extended by repeatedly duplicating the last n chars.
(The second test is for an issue introduced during an initial attempt to
fix this).
Tony Cook [Fri, 7 Feb 2014 09:52:47 +0000 (20:52 +1100)]
perldelta updates
Zefram [Thu, 6 Feb 2014 18:53:09 +0000 (18:53 +0000)]
merge basic zefram/purple_signatures into blead
Karl Williamson [Thu, 6 Feb 2014 17:48:57 +0000 (10:48 -0700)]
t/run/locale.t: Rmv test that isn't generally valid
The return from setlocale() on a new locale is documented in Linux as
opaque, even though the Linux smokers we have return the name of the
new locale. It turns out that VMS actually does return something
different, and this test fails there. So, the test is testing for
something that just happens to be currently true on many of our systems;
hence isn't general, and hence shouldn't be tested for.
And, the test isn't necessary, as we can infer from the other tests in
the file that a locale passed via the environment actually takes hold,
as we do so for a locale which has a comma radix, and test that the
radix is correctly set.
Matthew Horsfall [Thu, 6 Feb 2014 15:34:16 +0000 (07:34 -0800)]
Fix typo in perldelta.pod
-----------------------------------------------------------------
Karl Williamson [Thu, 6 Feb 2014 05:06:02 +0000 (22:06 -0700)]
Make a literal "{" fatal after \b and \B in regexes
These have been deprecated since v5.14.
Karl Williamson [Thu, 6 Feb 2014 04:59:23 +0000 (21:59 -0700)]
Karl Williamson [Sat, 30 Nov 2013 23:53:32 +0000 (16:53 -0700)]
util.c: Add comment.
I wonder if we need to worry about compatibility with a 1994 libc?
Brian Fraser [Thu, 6 Feb 2014 01:38:41 +0000 (22:38 -0300)]
DynaLoader: On Android, define DLOPEN_WONT_DO_RELATIVE_PATHS
Android's linker will actually do relative paths just fine; the problem
is that it won't search from the current directory, only on
/vendor/lib, /system/lib, and whatever is in LD_LIBRARY_PATH.
While the core handles that just fine, bits of CPAN rather rightfully
expect this to work:
use lib 'foo' # puts foo/ in @INC
use My::Module::In::Foo; # calls dlopen() with foo/My/Module/...
# which will likely fail
So we take this route instead.
Brian Fraser [Wed, 5 Feb 2014 10:37:32 +0000 (07:37 -0300)]
CBuilder::Android: fix ->link() to respect wantarray.
Brian Fraser [Wed, 5 Feb 2014 10:34:52 +0000 (07:34 -0300)]
Fix more tests to work on systems that don't define LC_ALL and friends
Karl Williamson [Wed, 5 Feb 2014 22:42:35 +0000 (15:42 -0700)]
dquote_static.c: White-space only
Outdent code whose surrounding block was removed by the previous commit
Karl Williamson [Wed, 5 Feb 2014 21:54:47 +0000 (14:54 -0700)]
Forbid "\c{" and \c{non-ascii}
These constructs have been deprecated since v5.14 with the intention of
making them fatal in 5.18. This wasn't done; and is being done now.
Karl Williamson [Sat, 13 Apr 2013 17:41:04 +0000 (11:41 -0600)]
handy.h Special case toCTRL('?') for EBCDIC
There is no change for ASCII platforms. For EBCDIC ones, toCTRL('?")
and its inverse are special cased to map to/from the APC control
character, which is the outlier control on these platforms. The reason
to special case this is that otherwise toCTRL('?') would map to a
graphic character, not a control. By outlier, I mean it is the one
control not in the single block where all the other controls are placed.
Further, it corresponds on two of the platforms with 0xFF, which is
would be an EBCDIC rub-out character corresponding to an ASCII rub-out
(or DEL) 0x7F, which is what toCTRL('?') maps to on ASCII. This is an
outlier control on ASCII not being a member of the C0 nor C1 controls.
Hence this make '?' mean the outlier control on both platforms.
Karl Williamson [Wed, 5 Feb 2014 21:56:50 +0000 (14:56 -0700)]
B.pm: Bump version
Karl Williamson [Wed, 5 Feb 2014 20:01:45 +0000 (13:01 -0700)]
Allow blead to compile under some g++
Various platforms are refusing to compile blead with g++ [perl #121151]
This patch, suggested by Tony Cook seems to work. There may be a better
way to do it, so I'm not closing the ticket, but this gets things
working again.
David Golden [Wed, 5 Feb 2014 19:16:19 +0000 (14:16 -0500)]
fix Module::CoreList::is_core version comparision
David Golden [Wed, 5 Feb 2014 19:08:10 +0000 (14:08 -0500)]
fix Module::CoreList::is_core default perl version
Yves Orton [Wed, 5 Feb 2014 17:39:30 +0000 (01:39 +0800)]
Fix regression in floating mandatory string optimisation
In
304ee84bde82d4eee33b0d0ff03080b360eae72b I introduced a regression
where floating mandatory strings which were at known offsets started
being treated as though they were allowed "anywhere".
This patch fixes the bug, and adds tests to make sure it does not come
back.
See also discussion in Perl RT #121182.
https://rt.perl.org/Public/Bug/Display.html?id=121182
Yves Orton [Wed, 5 Feb 2014 16:56:23 +0000 (00:56 +0800)]
Eliminate stupid macro
Years ago I was lazy. False laziness. Today I undo that laziness. Sigh
Brian Fraser [Wed, 5 Feb 2014 06:36:15 +0000 (03:36 -0300)]
File::Spec: Mopre extensive fix for #120593
The original fix only handled '.', this one handles all relative
paths.
Brian Fraser [Wed, 5 Feb 2014 06:29:13 +0000 (03:29 -0300)]
t/uni/fold.t: use &LC_ALL with & to avoid strict errors
Brian Fraser [Wed, 5 Feb 2014 02:35:57 +0000 (23:35 -0300)]
sv,c, sv_cmp_locale_flags: flags is unused if locales are disabled
Brian Fraser [Tue, 4 Feb 2014 21:32:36 +0000 (18:32 -0300)]
sv.c: Remove leftover ifdef from the %vd format
Brian Fraser [Tue, 4 Feb 2014 04:00:40 +0000 (01:00 -0300)]
Added missing prototypes.
This was mostly for XS functions defined in the core, but also
for a handful of functions in the :stdio layer.
Brian Fraser [Tue, 4 Feb 2014 03:47:12 +0000 (00:47 -0300)]
Fix the prototypes of some functions without context
Their prototypes are (void), but the implementation was ()
Brian Fraser [Tue, 4 Feb 2014 03:38:39 +0000 (00:38 -0300)]
Avoid compiler warnings by consistently using #ifdef instead of plain #if
Brian Fraser [Fri, 31 Jan 2014 18:24:30 +0000 (15:24 -0300)]
.gitignore: Handle cross-compilation files better
Tony Cook [Wed, 5 Feb 2014 01:44:47 +0000 (12:44 +1100)]
perldelta for
0ecf23179326, none needed for
a5368aebf374
Jerry D. Hedden [Tue, 4 Feb 2014 23:38:44 +0000 (18:38 -0500)]
Upgrade to threads 1.92
Jerry D. Hedden [Tue, 4 Feb 2014 21:51:26 +0000 (16:51 -0500)]
Upgrade to threads::shared 1.46
Tom Hukins [Tue, 21 Jan 2014 19:48:21 +0000 (19:48 +0000)]
perldelta for IO::Socket::IP version 0.27
Tony: grammar fix and fix module sort order
Tom Hukins [Tue, 21 Jan 2014 19:48:09 +0000 (19:48 +0000)]
Upgrade IO::Socket::IP to version 0.27
Tony Cook [Tue, 4 Feb 2014 21:36:52 +0000 (08:36 +1100)]
perldelta for
e40f8e806ef
Karl Williamson [Tue, 4 Feb 2014 19:15:14 +0000 (12:15 -0700)]
Don't test locales that are invalid for needed categories
When looking for locales to test, skip ones which aren't defined in
every locale category we care about. This was motivated by a Net BSD
machine which has a Pig Latin locale, but it is defined only for
LC_MESSAGES.
This necessitated adding parameters to pass the desired locale(s), and
renaming a test function to indicate the current category it is valid
for.
Karl Williamson [Tue, 4 Feb 2014 17:43:42 +0000 (10:43 -0700)]
Revert "'use utf8' should imply /u regex matching"
This reverts commit
bfa0ee78b652802412c3cab86bb873ed67ea6550.
This commit turned out to be contentious, and since we are past the
contentious features freeze date, no matter what else, it should be
reverted.
The argument for the commit essentially boils down to 'use utf8'
indicates that the text within its scope should be treated as utf8.
That means that any patterns with literals in them should be treated as
utf8, but utf8-encoded patterns follow Unicode rules by definition.
The arguments against it are that code relies on the way it has always
worked (even if that was an oversight), and in fact several CPAN modules
were broken by it, [perl #121162]. Also it has been the stated intent
that 'use utf8' will eventually become a no-op, meaning all text will be
treated as utf8, and that shouldn't have to mean that backwards
compatibility will be broken then.
Steve Hay [Tue, 4 Feb 2014 12:55:44 +0000 (12:55 +0000)]
Add note of blead customizations for previous commit
Brian Fraser [Tue, 4 Feb 2014 09:38:55 +0000 (06:38 -0300)]
vutil.c, vxs.inc: Avoid warnings from -Wmissing-prototypes -Wundef -Wunused-label
-Wmissing-prototypes was complaining about declaring XS()
functions without previously declaring a prototype.
-Wundef didn't like using #if foo instead of #ifdef foo
-Wunused-label warned because VER_{IV,NM,PV} were defined on all
versions of perl, but only used on < 5.17.2
Yves Orton [Tue, 4 Feb 2014 10:48:42 +0000 (18:48 +0800)]
Add tests and fix new fatal errors related to $/
In
b3a2acfa0c0e4f8e48e1f6eb4d6fd143f293d2c6 I added new exceptions, but
forgot to test them properly. In the process I managed to partially break
the functionality, and since it was not tested I did not notice.
Ilmari on #p5p pointed out I forgot the test, and in the end I had to completely
rewrite the original patch.
Now tested as fully as I could. Thanks Ilmari.
Chris 'BinGOs' Williams [Tue, 4 Feb 2014 10:16:55 +0000 (10:16 +0000)]
Add porting test for Module-CoreList
When the perl version is bumped in blead, Module-CoreList
should be prepared to include a stub entry for this new version.
This tests for the existence of the stub-entry.
Steve Hay [Tue, 4 Feb 2014 09:36:45 +0000 (09:36 +0000)]
Upgrade libnet from version 1.24 to 1.25
Tony Cook [Tue, 4 Feb 2014 08:55:16 +0000 (19:55 +1100)]
ignore a test data file for a Locale-Codes test we don't ship
Yves Orton [Tue, 4 Feb 2014 08:18:16 +0000 (16:18 +0800)]
do not overflow when the pattern is unbounded
Steve Hay [Tue, 4 Feb 2014 08:20:17 +0000 (08:20 +0000)]
Exclude new vutil/Makefile.PL in cpan/version/
John Peacock [Mon, 3 Feb 2014 23:42:20 +0000 (18:42 -0500)]
And now the rest of the sync to 0.9908
Karl Williamson [Tue, 4 Feb 2014 02:12:16 +0000 (19:12 -0700)]
Add -DL option to trace setlocale calls
This will help field debugging of locale issues.
Karl Williamson [Tue, 4 Feb 2014 02:52:54 +0000 (19:52 -0700)]
Revert "Fix handy.t for systems without $Config{d_isblank}."
This reverts commit
d61570b1bbf3e2d76cc293690156fb361b054272. This
commit was made unnecessary by commit
3f9a3488327f59f53c00adc132d91f19840e2a50.
Karl Williamson [Mon, 3 Feb 2014 19:18:38 +0000 (12:18 -0700)]
Regenerate podcheck db due to recent 79col fixes
Commits
51b4c035919497f474ce46dcbdac1d2f3fd18a84 and
02257115537194d7a3b36a956d5643069f78c54f fixed some too-long verbatim
line issues. I'm not sure why commit
b3a2acfa0c0e4f8e48e1f6eb4d6fd143f293d2c6 added them to the db, as they
were fixed before it was applied. My guess is that the workspace had
not been rebased recently enough.
Chris 'BinGOs' Williams [Mon, 3 Feb 2014 23:11:20 +0000 (23:11 +0000)]
Module-CoreList prepared for v5.19.9
Tony Cook [Wed, 22 Jan 2014 04:14:59 +0000 (15:14 +1100)]
[perl #121028] avoid creating a shell process
Chris 'BinGOs' Williams [Mon, 3 Feb 2014 21:41:24 +0000 (21:41 +0000)]
Update Pod-Parser to CPAN version 1.62
[DELTA]
02-Feb-2014 Marek Rouchal <marekr@cpan.org>
-----------------------------------------------------------------------------
Version 1.62
+ CPAN#87891: More sanity checks in podselect()
documentation patches by florent.angly@gmail.com, and a bit of stricter
checking what clients pass to podselect()
Brian Fraser [Mon, 3 Feb 2014 20:36:31 +0000 (21:36 +0100)]
DynaLoader: On android, dl_load_flags should always be 0x00
The linker ignores all the flags and works as if under RTLD_LOCAL,
so don't give users the headache of seeing warnings ala
"Can't make loaded symbols global on this platform while loading %s"
when using a module that subclasses DynaLoader and defines
dl_load_flags to anything else.
Brian Fraser [Mon, 3 Feb 2014 20:22:58 +0000 (21:22 +0100)]
CBuilder, link: On Android, always return absolute paths to libraries
Several modules on CPAN expect being able to pass the library
name returned by ->link to DynaLoader::dl_load_file and have it Just Work.
However, because ->link returns relative paths, those modules ran afoul
of Android's linker, which will only look in a handful of hardcoded
system directories for relative libraries, plus whatever LD_LIBRARY_PATH
pointed to at the start of execution.
This commit makes ->link on Android always return an absolute path,
which will be found by the linker.
Chris 'BinGOs' Williams [Mon, 3 Feb 2014 20:35:24 +0000 (20:35 +0000)]
Update Compress-Raw-Zlib to CPAN version 2.065
[DELTA]
2.065 3 February 2014
* [PATCH] resolve c++ build failure in core
[#92657]
* gcc -g3: final link failed: Memory exhausted
[#88936]
Karl Williamson [Mon, 3 Feb 2014 18:52:24 +0000 (11:52 -0700)]
Fix [[:blank:]] handling when no isblank() on platform
isblank() is a C99 construct, Perl tries to handle the use of this on
C89 platforms by using the standard hard-coded definition. However,
this code was not updated to account for UTF-8 locales when handling for
those was recently added (
31f05a37c), since in a UTF-8 locale the
no-break space is also considered to be a blank.
This commit fixes that. Previously regcomp.c generated the hard-coded
definitions when there was no isblank(), using #ifdef'd code. That
special handling was removed, and [:blank:] is always treated just like
any other POSIX class. The specialness of it is hidden entirely in
handy.h. This simplifies the regcomp.c code slightly. I considered
removing the special handling for isascii(), also a C99 construct, in
the name of simplicity over the slight speed that would be lost. But
the special handling is only a single line in two places, so I left it
in.
Yves Orton [Mon, 3 Feb 2014 14:20:13 +0000 (22:20 +0800)]
deal with assignment to $/ better, deprecate edge cases, and forbid others
The actual behavior of $/ under various settings and how it is documented
varies quite a bit. Clarify the documentation, and add various checks
that are validated when setting $/.
The gist of the problem was that the way that weirdo ref assignments were
handled was mostly broken:
* setting to a reference to an array, hash, or other higher level
construct would behave similarly to setting it to a reference to a
an integer, by numifying the ref and using it as an integer. This
behavior was entirely undocumented.
* setting to a reference to 0 or to -1 was *documented* in triggering
"slurp" behavior, but actually did not. Instead it would set the
separator to the stringified form of the ref, which would *appear* as
slurp behavior due to the unlikelihood of a file actually containing
a string which matched, however was less efficient, and if someone's
luck were *terrible* might actually behave as a split.
In the future we wish to support more sophisticated ways of setting the
input record separator, possibly supporting things like:
$/= [ "foo", "bar" ];
$/= qr/foo|bar/;
Accordingly this patch *forbids* the use of a non scalar ref, and raises
a fatal exception when one does so.
Additionally it treats non-positive refs *exactly* the same as assigning
undef, *including* ignoring the original value and setting $/ to undef.
The means the implementation now matches the documentation. However
since this might involve some crazy script changing in behavior (as one
can't fetch back the original ref from $/) I have added a warning in
category "deprecated" advising the user what has happened and
recommending setting to "undef" explicitly.
As far as I can tell this will only *break* code doing extremely dodgy
things with $/.
While putting together this patch I encountered numerous problems with
porting tests. First off was porting/podcheck.t, which failed test without
saying why or what to do, even under TEST_VERBOSE=1. Then when I did a
regen to update the exceptions database and then used that information
to try to fix the reported problems it seems that it does not work properly
anyway. Specifically you aren't allowed to have a / in the interesting
parts of a L<> reference. If you replace the / with an E<0x2f> then the
link is valid POD, but podcheck.t then considers it a broken link. If
you then replace the / in perdiag with E<0x2f> as well then
porting/diag.t complains that you have an undocumented diagnostic!
Accordingly I used the --regen option of podcheck.t to add exceptions to
the exception database. I have no idea if the pod is correctly formatted
or not.
Yves Orton [Sun, 2 Feb 2014 15:37:37 +0000 (23:37 +0800)]
Add RXf_UNBOUNDED_QUANTIFIER and regexp->maxlen
The flag tells us that a pattern may match an infinitely long string.
The new member in the regexp struct tells us how long the string might
be.
With these two items we can implement regexp based $/
Chris 'BinGOs' Williams [Mon, 3 Feb 2014 13:14:02 +0000 (13:14 +0000)]
Bump version version and remove/update customisations