Karl Williamson [Thu, 5 Jan 2012 22:17:18 +0000 (15:17 -0700)]
regcomp.c: Avoid leaking a scalar
Karl Williamson [Thu, 5 Jan 2012 20:27:35 +0000 (13:27 -0700)]
regcomp.c: truncate long debug dump output
What an ANYOF node matches could theoretically be millions of characters
long; This only outputs the first portion of very long ones.
Karl Williamson [Thu, 5 Jan 2012 20:21:57 +0000 (13:21 -0700)]
regcomp.c: in debug output, don't duplicate code points
The non-bitmap portion of an ANYOF node may also be in the bitmap
portion. There is no sense in having duplicate output
Karl Williamson [Thu, 5 Jan 2012 20:17:19 +0000 (13:17 -0700)]
regcomp.c: Change debug dump of bitmap/non-bitmap
Instead of '...' separating the two components of the output, change it
to a single space, which is output only if the first component isn't
null.
Karl Williamson [Thu, 5 Jan 2012 20:13:55 +0000 (13:13 -0700)]
regcomp.c: Change \t to a - in debug dumping ranges
This changes the separator in the output of a range from a tab to a
hyphen, which is clearer.
Karl Williamson [Thu, 5 Jan 2012 20:01:36 +0000 (13:01 -0700)]
regcomp.c: White-space only
Remove trailing tabs
Karl Williamson [Thu, 5 Jan 2012 18:44:48 +0000 (11:44 -0700)]
regcomp.c: put_byte wants an ord, not a utf8 char
These were calling put_byte() incorrectly, with a utf8 char instead of a
the ordinal.
Karl Williamson [Thu, 5 Jan 2012 18:41:36 +0000 (11:41 -0700)]
regcomp.c: White-space only
These lines were indented one stop too many for the enclosing block
Karl Williamson [Tue, 29 Nov 2011 21:57:02 +0000 (14:57 -0700)]
regcomp.c: Don't read beyond input
This code was assuming that there were several more bytes in the input
stream, when there may not be. This was discovered by valgrind.
Karl Williamson [Mon, 28 Nov 2011 19:32:02 +0000 (12:32 -0700)]
regcomp.c: Optimize a single Unicode property in a [character class]
All Unicode properties actually turn into bracketed character classes,
whether explicitly done or not. A swash is generated for each property
in the class. If that is the only thing not in the class's bitmap, it
specifies completely the non-bitmap behavior of the class, and can be
passed explicitly to regexec.c. This avoids having to regenerate the
swash. It also means that the same swash is used for multiple instances
of a property. And that means the number of duplicated data structures
is greatly reduced. This currently doesn't extend to cases where
multiple Unicode properties are used in the same class
[\p{greek}\p{latin}] will not share the same swash as another character
class with the same components. This is because I don't know of a
an efficient method to determine if a new class being parsed has the
same components as one already generated. I suppose some sort of
checksum could be generated, but that is for future consideration.
Karl Williamson [Mon, 28 Nov 2011 17:26:28 +0000 (10:26 -0700)]
Move Unicode property defn processing to compile time
This patch moves the processing of most Unicode property definitions
from execution (regexec.c) to compilation (regcomp.c). There is a cost
to do this. By deferring it to execution, it may be that the affected
path will never be taken, and hence the work won't have to be done;
whereas, it's always done if it gets done at compilation.
However, doing it at compilation, has many advantages. We can't
optimize what we don't know about, so this allows for better
optimization, as well as feature enhancements, such as set
manipulations, restricting matches to certain scripts, etc. A big one,
about to be committed allows for significantly reducing the number of
copies of the data structure used for each property. (Currently, every
mention in every regular expression of a given property will generate a
new instance of its hash, and so results of look-ups of code points in
one instance aren't automatically known to other instances, so the code
point has to be looked-up again.)
This commit leaves the processing to execution time when the class is to
be inverted. This was done purely to make the commit smaller, and will
be removed in a future commit; hence the redundant test here will be
removed shortly.
It also has to leave to execution time processing of properties whose
definition is not known yet. That can happen when the property is
user-defined. We call _core_swash_init(), and if it fails, we assume
that it's because it's such a property, and if it turns out that it was
an unknown property, we leave to execution time the raising of a warning
for it, just as before.
Currently, the processing of properties in inverted character classes is
also left to execution time. This restriction will be lifted in a
future commit, and this patch assumes that, and doesn't indent some code
that it otherwise would, in anticipation of the surrounding 'if' tests
being removed.
Karl Williamson [Mon, 28 Nov 2011 16:43:54 +0000 (09:43 -0700)]
regcomp.c: Pass inversion list directly to regexec.c
Currently, any generated inversion list is stringified and passed in the
data structure to regexec.c as such. regexec.c then calls
_core_swash_init() to convert it into a swash and back into an inversion
list. This intermediate step is wasteful, and this commit dispenses
with it, based on preparatory commits in regexec.c and utf8.c
Karl Williamson [Mon, 28 Nov 2011 16:25:45 +0000 (09:25 -0700)]
regexec.c: Prepare for inversion lists in ANYOF nodes
Future commits will start passing inversion lists to regexec.c from the
compilation phase. This commit causes regexec.c to accept them, trace
them for debug output, and pass them along to utf8.c
Karl Williamson [Mon, 28 Nov 2011 16:20:12 +0000 (09:20 -0700)]
regcomp.c: Add _invlist_contents() to compactly dump inversion list
This will be used in future commits for debug traces
Karl Williamson [Mon, 28 Nov 2011 16:00:52 +0000 (09:00 -0700)]
utf8.c: White-space only
As a result of previous commits adding and removing if() {} blocks,
indent and outdent and reflow comments and statements to not exceed 80
columns.
Karl Williamson [Mon, 28 Nov 2011 15:36:54 +0000 (08:36 -0700)]
utf8.c: Add ability to pass inversion list to _core_swash_init()
Add a new parameter to _core_swash_init() that is an inversion list to
add to the swash, along with a boolean to indicate if this inversion
list is derived from a user-defined property. This capability will prove
useful in future commits
Karl Williamson [Mon, 28 Nov 2011 15:24:07 +0000 (08:24 -0700)]
utf8.c: Add flag to swash_init() to not croak on error
This adds the capability, to be used in future commits, for swash_ini()
to return NULL instead of croaking if it can't find a property, so that
the caller can choose how to handle the situation.
Karl Williamson [Mon, 28 Nov 2011 03:55:33 +0000 (20:55 -0700)]
regcomp.c: Use '*a == b', not 'a == &b'
The latter doesn't always work. The consequences of this failure were
memory leaks
Karl Williamson [Mon, 28 Nov 2011 03:53:56 +0000 (20:53 -0700)]
regcomp.c: decrement ptr ref cnt before invalidating ptr
Otherwise there coul be memory leaks
Karl Williamson [Mon, 28 Nov 2011 03:49:59 +0000 (20:49 -0700)]
regcomp.c: Add some assertions
Subsequent code assumes that these are true
Karl Williamson [Sun, 27 Nov 2011 22:45:22 +0000 (15:45 -0700)]
regcomp.c: Don't overallocate space for cloned SV
The length passed to _new_invlist() is in elements and not bytes, so
this was overallocating space because the number of bytes is multiplied
by a platform-dependent value.
Karl Williamson [Sun, 27 Nov 2011 22:41:26 +0000 (15:41 -0700)]
regcomp.c: Make sure invlist_clone length set correctly
The cloned inversion list was getting initialized with sufficient space,
but because a Copy was used, it did not know how much of that space is
occupied. There are no tests, because this was found through valgrind,
and it otherwise depends on whatever was in the uninitialized data at
the time
Karl Williamson [Sun, 27 Nov 2011 22:34:52 +0000 (15:34 -0700)]
utf8.c: Prevent reading before buffer start
Make sure there is something before the character being read before
reading it.
Karl Williamson [Sun, 27 Nov 2011 01:24:46 +0000 (18:24 -0700)]
Utf8.c: Generate and use inversion lists for binary swashes
Prior to this patch, every time a code point was matched against a swash,
and the result was not previously known, a linear search through the
swash was performed. This patch changes that to generate an inversion
list whenever a swash for a binary property is created. A binary search
is then performed for missing values.
This change does not have much effect on the speed of Perl's regression
test suite, but the speed-up in worst-case scenarios is huge. The
program at the end of this commit is crafted to avoid the caching that
hides much of the current inefficiencies. At character classes of 100
isolated code points, the new method is about an order of magnitude
faster; two orders of magnitude at 1000 code points. The program at the
end of this commit message took 97s to execute on my box using blead,
and 1.5 seconds using this new scheme. I was surprised to see that even
with classes containing fewer than 10 code points, the binary search
trumped, by a little, the linear search
Even after this patch, under the current scheme, one can easily run out
of memory due to the permanent storing of results of swash lookups in
hashes. The new search mechanism might be fast enough to enable the
elimination of that memory usage. Instead, a simple cache in each
inversion list that stored its previous result could be created, and
that checked to see if it's still valid before starting the search,
under the assumption, which the current scheme also makes, that probes
will tend to be clustered together, as nearby code points are often in
the same script.
===============================================
# This program creates longer and longer character class lists while
# testing code points matches against them. By adding or subtracting
# 65 from the previous member, caching of results is eliminated (as of
# this writing), so this essentially tests for how long it takes to
# search through swashes to see if a code point matches or not.
use Benchmark ':hireswallclock';
my $string = "";
my $class_cp = 2**30; # Divide the code space in half, approx.
my $string_cp = $class_cp;
my $iterations = 10000;
for my $j (1..2048) {
# Append the next character to the [class]
my $hex_class_cp = sprintf("%X", $class_cp);
$string .= "\\x{$hex_class_cp}";
$class_cp -= 65;
next if $j % 100 != 0; # Only test certain ones
print "$j: lowest is [$hex_class_cp]: ";
timethis(1, "no warnings qw(portable non_unicode);my \$i = $string_cp; for (0 .. $iterations) { chr(\$i) =~ /[$string]/; \$i+= 65 }");
$string_cp += ($iterations + 1) * 65;
}
Karl Williamson [Fri, 25 Nov 2011 19:59:51 +0000 (12:59 -0700)]
regcomp.c: Add _invlist_populate_swatch()
This function will be used in future commits
Karl Williamson [Fri, 25 Nov 2011 19:49:02 +0000 (12:49 -0700)]
regcomp.c: Add invlist_search()
This function does a binary search on an inversion list. It will be
used in future commits
Karl Williamson [Thu, 24 Nov 2011 03:11:26 +0000 (20:11 -0700)]
utf8.c: Refactor code slightly in prep
Future commits will split up the necessary initialization into two
components. This patch prepares for that without adding anything new.
Karl Williamson [Tue, 22 Nov 2011 20:37:04 +0000 (13:37 -0700)]
regcomp.c: Change internal #define name
I have struggled to come up with a good name for this concept; and like
the new one better than the old
Karl Williamson [Tue, 22 Nov 2011 19:06:41 +0000 (12:06 -0700)]
utf8.c: New function to retrieve non-copy of swash
Currently, swash_init returns a copy of the swash it finds. The core
portions of the swash are read-only, and the non-read-only portions are
derived from them. When the value for a code point is looked up, the
results for it and adjacent code points are stored in a new element,
so that the lookup never has to be performed again. But since a copy is
returned, those results are stored only in the copy, and any other uses
of the same logical stash don't have access to them, so the lookups have
to be performed for each logical use.
Here's an example. If you have 2 occurrences of /\p{Upper}/ in your
program, there are 2 different swashes created, both initialized
identically. As you start matching against code points, say "A" =~
/\p{Upper}/, the swashes diverge, as the results for each match are
saved in the one applicable to that match. If you match "A" in each
swash, it has to be looked up in each swash, and an (identical) element
will be saved for it in each swash. This is wasteful of both time and
memory.
This patch renames the function and returns the original and not a copy,
thus eliminating the overhead for stashes accessed through the new
interface. The old function name is serviced by a new function which
merely wraps the new name result with a copy, thus preserving the
interface for existing calls.
Thus, in the example above, there is only one swash, and matching "A"
against it results in only one new element, and so the second use will
find that, and not have to go out looking again. In a program with lots
of regular expressions, the savings in time and memory can be quite
large.
The new name is restricted to use only in regcomp.c and utf8.c (unless
XS code cheats the preprocessor), where we will code so as to not
destroy the original's data. Otherwise, a change to that would change
the definition of a Unicode property everywhere in the program.
Note that there are no current callers of the new interface; these will
be added in future commits.
Karl Williamson [Tue, 22 Nov 2011 16:18:28 +0000 (09:18 -0700)]
utf8_heavy.pl: Add inversion status to cache key
Contrary to what the debug statement said, what is being returned is a
swash, and that swash is different from one that comes from the same
file but differs in inversion, and so changing the INVERT_IT element
messes things up for any existing one. Heretofore it hasn't mattered
because the swash returned is always a copy, and so it actually hasn't
created any problems. But future commits will stop the copying, so this
would create problems then.
The file will now have to be re-'do'ne to get an inverted list from it.
Karl Williamson [Mon, 21 Nov 2011 02:10:19 +0000 (19:10 -0700)]
uni/cache.t: Fix to handle regex compile time Uni props
Future commits are planned to move the resolution of Unicode properties
from regex execution time to compile time. By moving the code into a
BEGIN block, this .t can now handle both types. Before this patch, it
wouldn't show any activity at all if things are done at compile time.
Karl Williamson [Sun, 20 Nov 2011 16:03:26 +0000 (09:03 -0700)]
embed.fnc: swash_init() return value should not be ignored
Otherwise can have memory leaks
Karl Williamson [Sat, 19 Nov 2011 23:50:33 +0000 (16:50 -0700)]
utf8.c: Change name of static function
This function has always confused me, as it doesn't return a swash, but
a swatch.
Karl Williamson [Sat, 19 Nov 2011 21:49:20 +0000 (14:49 -0700)]
utf8_heavy: Allow to be called from regcomp.c
Future commits will cause regcomp.c to try to compile user-defined
properties. The caller stack is different for this, and there may be a
package name as well that differs from the existing scheme. This commit
allows for this.
Karl Williamson [Sat, 19 Nov 2011 21:47:33 +0000 (14:47 -0700)]
utf8_heavy: Add DEBUG statement
This helps keep track of the recursion used.
Karl Williamson [Sat, 19 Nov 2011 21:41:26 +0000 (14:41 -0700)]
utf8.c: Move test out of loops
We set the upper limit of the loops before entering them to the min of
the two possible limits, thus avoiding a test each time through
Karl Williamson [Sat, 19 Nov 2011 21:37:35 +0000 (14:37 -0700)]
mktables: Add a little stress to the tests
This simply reverses the sort order so that the generated tests
use the highest ranges instead of the lowest, making it less likely that
tests will pass by chance; and also increasing performance issues in
finding matches.
Karl Williamson [Sat, 19 Nov 2011 21:22:00 +0000 (14:22 -0700)]
utf8_heavy: Skip unnecessary operations
The mktables generated tables are well-formed, already sorted, and with
no extra items such as "+utf8::foo". Thus we don't have to do these
operations on them, but they are required on user-defined properties,
and should $list be passed in as a parameter.
This patch moves the code that does this to just the user-defined area
Karl Williamson [Sat, 19 Nov 2011 18:17:17 +0000 (11:17 -0700)]
uni/class.t: Add test
This new test makes sure that a regular expression that forward
references a user-defined property works.
Karl Williamson [Fri, 18 Nov 2011 17:22:56 +0000 (10:22 -0700)]
utf8_heavy: remove unused variable
Karl Williamson [Fri, 18 Nov 2011 15:36:43 +0000 (08:36 -0700)]
Comment additions, typos, white-space.
And the reordering for clarity of one test
Karl Williamson [Thu, 17 Nov 2011 03:03:35 +0000 (20:03 -0700)]
regexec.c: Add some comments to regclass_swash()
Karl Williamson [Wed, 16 Nov 2011 16:45:07 +0000 (09:45 -0700)]
regexec.c: Remove unnecessary intermediate values
Ricardo Signes [Fri, 13 Jan 2012 15:46:34 +0000 (10:46 -0500)]
un-break blead-breakage introduced by Porting/perl5160delta.pod
Ricardo Signes [Fri, 13 Jan 2012 15:18:18 +0000 (10:18 -0500)]
create perl5160delta-to-be
This way, we can begin summarizing now, rather than
summarize everything during the final release
period.
Also, note the new "Future Deprecations" section,
in which we can, where possible, announce that we
feel likely to formally deprecate things in the
next major release.
Nicholas Clark [Thu, 12 Jan 2012 22:07:20 +0000 (23:07 +0100)]
In Perl_sv_del_backref(), don't panic if svp is NULL during global destruction.
It's possible that the referencing SV is being freed partway through the
freeing of reference target. If this happens, the backreferences array of
the target has already been freed, and so svp will be NULL. If this is the
case, do nothing and return. Previously, this condition was not recognised
and the code would panic.
Nicholas Clark [Thu, 12 Jan 2012 21:51:55 +0000 (22:51 +0100)]
In Perl_sv_del_backref(), don't panic if the backref array is already freed.
During global destruction, it's possible for the array containing
backreferences to be freed before the SV that owns it. If this happens, don't
mistake it for a scalar backreference stored directly, and then get confused
and panic because things seem inconsistent.
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2011-12/msg00039.html
gives more information.
Nicholas Clark [Thu, 12 Jan 2012 21:23:16 +0000 (22:23 +0100)]
Better panic diagnostics in Perl_sv_del_backref()
If panicing with a croak(), include in the panic message the values which
caused the croak. This reveals something about the cause of the panic, and
more subtly, which of the two possible panic locations this is.
Father Chrysostomos [Thu, 12 Jan 2012 22:32:30 +0000 (14:32 -0800)]
Make -T HANDLE set the last stat type
This was brought up in bug #77388.
Father Chrysostomos [Thu, 12 Jan 2012 21:33:24 +0000 (13:33 -0800)]
Simplify logic in pp_sys.c:pp_fttty
name is only set when there is no gv. As of a couple of commits ago,
tmpsv is only used in that else block where it is set.
Father Chrysostomos [Thu, 12 Jan 2012 21:24:26 +0000 (13:24 -0800)]
[perl #77388] Make stacked -T and -B work
They just need to pick up the _ filehandle in stacked mode, like the
other ops (which actually rely on my_stat[_flags] to do it). This
apparently was forgotten when stacked filetests were added.
Father Chrysostomos [Thu, 12 Jan 2012 21:17:59 +0000 (13:17 -0800)]
Make -t, -T and -B with a handle pop it off the stack
This is something I broke carelessly with commit
094a3eec8.
No, this does not fix bug #77388.
Father Chrysostomos [Thu, 12 Jan 2012 20:55:21 +0000 (12:55 -0800)]
In pp_sys.c:pp_fttext, don’t call cGVOP_gv on an UNOP
Otherwise we might get a crash. cGVOP_gv is only valid when the
OPf_REF flag is set. In either case, gv already holds the GV we
want anyway.
This code has been buggy this way since this commit:
commit
5f05dabc4054964aa3b10f44f8468547f051cdf8
Author: Perl 5 Porters <perl5-porters@africa.nicoh.com>
Date: Thu Dec 19 16:44:00 1996 +1200
[inseparable changes from patch from perl5.003_11 to perl5.003_12]
but apparently has not actually caused a crash until just now.
I was trying to add a test for another bug (fixed in the next commit),
and it happened to trigger this one. My attempt to reduce this to
something small and reproducible failed.
Father Chrysostomos [Tue, 10 Jan 2012 21:23:34 +0000 (13:23 -0800)]
[perl #24237] @& should not stop $& from working
Mentioning $& in a program slows everything down, because it force
regular expressions to do a pre-match copy.
It used to happen for any symbol named &, e.g., @& and %&. This was
changed in commit
b4a9608f339, but that commit did not take into
account that the code path in question is only followed on creation of
the *& glob.
It should still be applying magic to $&, even if it is not setting
PL_sawampersand. The other place in gv_fetchpvn_flags that magical-
ises scalars (which currently handles %- %+ %! $] and @ISA), should
also turn on PL_sawampersand for $&.
All of the above applies to $' and $` as well.
Ricardo Signes [Tue, 10 Jan 2012 21:16:05 +0000 (16:16 -0500)]
update the schedule: rjbs will do 5.16
Father Chrysostomos [Tue, 10 Jan 2012 16:55:08 +0000 (08:55 -0800)]
[perl #35865, #43011] FETCH after autovivifying
After autovivification, perl should not assume that the value it has
assigned to a magical scalar is the one that would have been returned.
The result of this assumption is that any tie class that copies things
assigned to it will cause autovivification to assign to a temporary
aggregate without warning, in cases like this:
$tied{nonexistent}{foo} = 7;
The hash implicitly assigned to $tied{nonexistent} ends up being freed
after the =7 assignment.
This commit changes autovivification to do FETCH immediately after
doing STORE.
This required changing some recently-added tests in gmagic.t.
Without this change, you end up with horrific workarounds (using B.pm
to get the reference count), like the one in JE::Object (which I’m
pasting here, in case it has changed by the time you read this):
sub STORE {
my($self, $key, $val) = @_;
my $global = $self->global;
if(ref $val eq 'HASH' && !blessed $val
&& !%$val && svref_2object($val)->REFCNT == 2) {
$val = tie %$val, __PACKAGE__, __PACKAGE__->new(
$global);
} elsif (ref $val eq 'ARRAY' && !blessed $val && !@$val &&
svref_2object($val)->REFCNT == 2) {
require JE::Object::Array;
$val = tie @$val, 'JE::Object::Array',
JE::Object::Array->new($global);
}
$self->prop($key => $global->upgrade($val))
}
Father Chrysostomos [Tue, 10 Jan 2012 06:29:17 +0000 (22:29 -0800)]
Correct bad wording in perlsub
It seemed to imply that CORE:: syntax was introduced in 5.16. What it
was supposed to say was that CORE:: breaking through the feature.pm
barrier was introduced in 5.16. (Which sounds a little odd, as 5.16
is still in the future, but whatever.)
Father Chrysostomos [Tue, 10 Jan 2012 06:18:56 +0000 (22:18 -0800)]
Fix crash in hv_undef
Commit
60edcf09a was supposed to make things less buggy, but putting
the ENTER/LEAVE in h_freeentries was a mistake, as both hv_undef and
hv_clear still access the hv after calling h_freeentries. Why it
didn’t crash for me is anyone’s guess.
Joshua ben Jore [Tue, 10 Jan 2012 04:39:22 +0000 (20:39 -0800)]
[perl #40333] Another test
This test includes an explosive class, that overload::Overloaded
should be able to handle.
Father Chrysostomos [Tue, 10 Jan 2012 04:35:35 +0000 (20:35 -0800)]
[perl #40333] Stop overload::Overloaded from calling ->can
It’s possible, and too easy, for classes to define a can method to
deal with AUTOLOAD, without taking overloading into account. Since
AUTOLOAD is the main reason for overriding can, and since overloading
does not respect autoloading, can overrides should not be expected to
deal with it.
Since overload.pm already has a mycan function that fits this purpose,
this commit changes Overloaded to use that.
The test includes an example of a class structure that the previous
Overloaded implementation could not handle.
Father Chrysostomos [Tue, 10 Jan 2012 03:59:45 +0000 (19:59 -0800)]
Document that [ah]v_undef/clear may free the [ah]v
Father Chrysostomos [Tue, 10 Jan 2012 03:54:26 +0000 (19:54 -0800)]
Better fix for perl #107440
> > Actually, the simplest solution seem to be to put the av or hv on
> > the mortals stack in pp_aassign and pp_undef, rather than in
> > [ah]v_undef/clear.
>
> This makes me nervous. The tmps stack is typically cleared only on
> statement boundaries, so we run the risks of
>
> * user-visible delaying of freeing elements;
> * large tmps stack growth might be possible with
> certain types of loop that repeatedly assign to an array without
> freeing tmps (eg map? I think I fixed most map/grep tmps leakage
> a
> while back, but there may still be some edge cases).
>
> Surely an ENTER/SAVEFREESV/LEAVE inside pp_aassign is just as
> efficient,
> without any attendant risks?
>
> Also, although pp_aassign and pp_undef are now fixed, the
> [ah]v_undef/clear functions aren't, and they're part of the public API
> that can be called independently of pp_aassign etc. Ideally they
> should
> be fixed (so they don't crash in mid-loop), and their documentation
> updated to point out that on return, their AV/HV arg may have been
> freed.
This commit takes care of the first part; it changes pp_aassign to use
ENTER/SAVEFREESV/LEAVE and adds the same to h_freeentries (called both
by hv_undef and hv_clear), av_undef and av_clear.
It effectively reverts the C code part of
9f71cfe6ef2.
Father Chrysostomos [Mon, 9 Jan 2012 20:50:59 +0000 (12:50 -0800)]
Remove TODO functionality from assignwarn.t
Father Chrysostomos [Mon, 9 Jan 2012 20:46:01 +0000 (12:46 -0800)]
[perl #44895] += warning on uninit magic var
The only uses of USE_LEFT in core now occur when SvGETMAGIC has
already been called. So returning true for magical SVs is not neces-
sary. In fact, it was never correct.
Also, the code in do_vop (which handles bitwise operations on strings)
to avoid an uninitialized warning had the same buggy SvGMAGICAL check.
Now, the warning from $uninit += 1 is suppressed for all undefined
vars, not just amagical ones.
This causes 6 to-do tests in assignwarn.t to pass.
Father Chrysostomos [Mon, 9 Jan 2012 17:45:54 +0000 (09:45 -0800)]
Remove magical dPOPXnnrl_ul dPOPXiirl_ul macros
These are undocumented and unused on CPAN and in the core.
The core now uses _nomg variants.
Chris 'BinGOs' Williams [Mon, 9 Jan 2012 19:50:54 +0000 (19:50 +0000)]
Merge the POSIX notes in perldelta.
H.Merijn Brand [Mon, 9 Jan 2012 18:05:44 +0000 (19:05 +0100)]
missed in prev commit
H.Merijn Brand [Mon, 9 Jan 2012 17:20:10 +0000 (18:20 +0100)]
'A' is not blank
H.Merijn Brand [Mon, 9 Jan 2012 17:10:21 +0000 (18:10 +0100)]
Add probe for isblank() (requested by khw)
Father Chrysostomos [Mon, 9 Jan 2012 16:37:01 +0000 (08:37 -0800)]
Test for perl #43663
Father Chrysostomos [Mon, 9 Jan 2012 02:14:03 +0000 (18:14 -0800)]
[perl #92254, #92256] Fix SAVE_DEFSV to do refcounting
The current definition of SAVE_DEFSV doesn’t take reference count-
ing into account. Every instance of it in the perl core is buggy
as a result.
Most are also followed by DEFSV_set, which is likewise buggy.
This commit implements SAVE_DEFSV in terms of save_gp and
SAVEGENERICSV if PERL_CORE is defined. save_gp and SAVEGENERICSV are
what local(*_) = \$foo uses. Changing the definition for XS code is
probably too risky this close to 5.16. It should probably be changed
later, though.
DEFSV_set is now changed to do reference counting too.
Father Chrysostomos [Mon, 9 Jan 2012 01:18:23 +0000 (17:18 -0800)]
grep.t: require test.pl in BEGIN block
for parenthetical omissions.
Karl Williamson [Sun, 8 Jan 2012 21:59:13 +0000 (14:59 -0700)]
need backwards-compatile to_utf8_foo()
These 4 functions have been replaced by variants to_utf8_foo_flags(),
but for XS code that called the old ones in the Perl_to_utf8_foo()
forms, backwards compatibility versions need to be created.
For calls of just the to_utf8_foo() forms, macros have been used to
automatically call the new forms without the performance penalty of
going through the compatibility functions.
Karl Williamson [Sun, 8 Jan 2012 21:57:59 +0000 (14:57 -0700)]
embed.fnc: Revise comment
The 'd' flag doesn't mean that the documentation has to be even in the
same file as the source for that function; just somewhere in the source.
Karl Williamson [Sun, 8 Jan 2012 21:17:58 +0000 (14:17 -0700)]
embed.fnc: add comment
Chris 'BinGOs' Williams [Sun, 8 Jan 2012 22:36:01 +0000 (22:36 +0000)]
Update perlfaq to CPAN version 5.0150037
[DELTA]
5.0150037 Sun 8 Jan 2012 21:24:39 +0100
* Better XML parsing recommendations (apeiron)
* Remove various old questions & update a few (ranguard)
* Change auto generate of questions a bit (ranguard)
* Autogenerate question index in perlfaq.pod (doherty)
* Cleanups / typos, updating nested expressions (dami, reviewed by schwern)
Father Chrysostomos [Sun, 8 Jan 2012 20:30:45 +0000 (12:30 -0800)]
[perl #67490] Don’t call DELETE on scalar-tied elem
This little snippet:
sub TIESCALAR{bless[]}
sub STORE{}
tie $^H{foo}, '';
$^H{foo} = 1;
delete $^H{foo};
dies with ‘Can't locate object method "DELETE"...’.
This bug was introduced for %^H by commit
b3ca2e834c, but it is actu-
ally an older bug that already affected %ENV before that.
Clear-magic on a scalar is only called when it is an element of a mag-
ical aggregate.
For hashes, this clear-magic is called whenever the hash itself
is RMAGICAL.
Tied scalars and elements of tied aggregates use the same magic vta-
ble, under the assumption that mg_clear will never be called on a tied
scalar. That assumption is wrong.
Commit
b3ca2e834c is the one that made %^H magical, which is why it
caused this problem for %^H.
The obvious solution, giving tied scalars their own vtable, is not as
simple as it sounds, because then tied scalars are no longer RMAGICAL,
and at least some of the tie code assumes that they are.
So the easiest fix is to skip the DELETE call in Perl_magic_clearpack
if the type of magic is PERL_MAGIC_tiedscalar.
Father Chrysostomos [Sun, 8 Jan 2012 02:38:35 +0000 (18:38 -0800)]
Simplify magic logic in av.c:av_store
David Mitchell [Sun, 8 Jan 2012 15:45:39 +0000 (15:45 +0000)]
clarify how $SIG{__DIE__} can return
It can return via 'goto &sub', but not via 'goto LABEL'.
The docs originally just said 'via goto'
See [perl #44367].
Father Chrysostomos [Sat, 7 Jan 2012 16:29:49 +0000 (08:29 -0800)]
perlfunc: spaces after dots
Tom Hukins [Sat, 7 Jan 2012 15:37:38 +0000 (15:37 +0000)]
Make localtime()' s documentation more succinct
It's now twelve years since Y2K, so the documentation should not make
such a fuss about it.
Father Chrysostomos [Sat, 7 Jan 2012 07:36:38 +0000 (23:36 -0800)]
[perl #85670] Copy magic to ary elems properly
On Tue Mar 08 07:26:35 2011, thospel wrote:
> #!/usr/bin/perl -l
> use Data::Dumper;
> use Scalar::Util qw(weaken);
> our @ISA;
>
> for (1..2) {
> @ISA = qw(Foo);
> weaken($a = \@ISA);
> weaken($a = \$ISA[0]);
> print STDERR Dumper(\@ISA);
> }
>
> This prints:
> $VAR1 = [
> 'Foo'
> ];
> $VAR1 = [
> 'Foo',
> \$VAR1->[0]
> ];
>
> So the first time it's the expected @ISA, but the second time round it
> automagically added a reference to to the first ISA element
>
> (bug also exists in blead)
Shorter:
#!/usr/bin/perl -l
use Scalar::Util qw(weaken);
weaken($a = \@ISA);
@ISA = qw(Foo);
use Devel::Peek; Dump \@ISA;
weaken($a = \$ISA[0]);
print scalar @ISA; # prints 2
The dump shows the problem. backref magic is being copied to the ele-
ment. Put the magic in a different order, and everything is fine:
#!/usr/bin/perl -l
use Scalar::Util qw(weaken);
weaken($a = $b = []);
*ISA = $a;
@ISA = qw(Foo);
use Devel::Peek; Dump \@ISA;
weaken($a = \$ISA[0]);
print scalar @ISA; # prints 2
This code in av_store is so wrong:
if (SvSMAGICAL(av)) {
const MAGIC* const mg = SvMAGIC(av);
if (val != &PL_sv_undef) {
sv_magic(val, MUTABLE_SV(av), toLOWER(mg->mg_type), 0, key);
}
if (PL_delaymagic && mg->mg_type == PERL_MAGIC_isa)
PL_delaymagic |= DM_ARRAY_ISA;
else
mg_set(MUTABLE_SV(av));
}
It doesn’t follow the magic chain at all. So anything magic could get
attached to the @ISA array, and that will be copied to the element
instead of isa magic.
Notice that MUTABLE_SV(av) is the second argument to sv_magic, so
mg->mg_obj for the element always points back to the array.
Since backref magic’s mg->mg_obj points to the backrefs array, @ISA
ends up being used as this element’s backrefs array.
What if arylen_p gets copied instead? Let’s see:
$#ISA = -1;
@ISA = qw(Foo);
$ISA[0] = "Bar";
main->ber;
sub Bar::ber { warn "shave" }
__END__
Can't locate object method "ber" via package "main" at - line 7.
I’ve fixed this by making av_store walk the magic chain, copying any
magic for which toLOWER(mg->mg_type) != mg->mg_type.
Father Chrysostomos [Fri, 6 Jan 2012 21:50:35 +0000 (13:50 -0800)]
[perl #107440] Save av/hv on mortals stack when clearing
In pp_undef and pp_aassign, we should put the av or hv that is being
cleared on the mortals stack (with an increased refcount), so that
destructors fired during the clearing do not free the av or hv.
I was going to put this in av_undef, etc., but pp_aassign also needs
to access the aggregate after clearing it. We still get a crash with
that approach.
Putting the aggregate on the mortals stack in av_undef, av_clear and
h_freeentries would work, too, but might cause the aggregate to leak
too far. That may cause problems, e.g., if it is %^H, because it may
last until the end of the current compilation unit.
Directly inside a runloop (in a pp function), it should be OK to use
the mortals stack, as it *will* be cleared ‘soon’. This seems the
least intrusive approach.
Michael Witten [Fri, 6 Jan 2012 21:11:37 +0000 (13:11 -0800)]
[perl #90632] perlfunc: Rewrite `split'
I couldn't stand the way the documenation for `split' was written;
it felt like a kludge of broken English dumped into a messy pile by
several people, each of whom was unaware of the other's work.
This variation completes sentences, adds new ones, rearranges ideas,
expands on ideas, simplifies and unifies examples, and includes more
cross references.
While the original text seemed to be written in a way that touched upon
the arguments in reverse order (which did have a hint of elegance), this
version attempts to provide the reader with the most useful information
upfront.
Thanks to Brad Baxter and Thomas R. Sibley for their constructive
criticism.
[Modified by the committer to incorporate suggestions from Aristotle
Pagaltzis and Tom Christiansen.]
Ricardo Signes [Fri, 6 Jan 2012 21:07:45 +0000 (16:07 -0500)]
document the upgrade of Perldoc
Ricardo Signes [Fri, 6 Jan 2012 20:58:12 +0000 (15:58 -0500)]
Pod-Perldoc now includes test documents
do not test them as if they were Pod we ship
Ricardo Signes [Fri, 6 Jan 2012 12:58:33 +0000 (07:58 -0500)]
Upgrade Pod-Perldoc to CPAN version 3.15_15
Father Chrysostomos [Fri, 6 Jan 2012 19:20:18 +0000 (11:20 -0800)]
Uncomment evals in sort.t
These were commented out temporarily during development.
I forgot to uncomment them before committing.
Father Chrysostomos [Fri, 6 Jan 2012 07:55:32 +0000 (23:55 -0800)]
PerlIO::scalar: allow writing to SvIOK SVs
It used to crash if the PVX buffer happened to be null.
If the PVX buffer happened to be left over from before,
it would use that instead of the numeric value, even for
!SvPOK scalars.
Father Chrysostomos [Fri, 6 Jan 2012 07:48:16 +0000 (23:48 -0800)]
In PerlIO::Scalar’s write, stringify refs
Otherwise, it won’t work with an overloaded object.
Father Chrysostomos [Fri, 6 Jan 2012 07:13:59 +0000 (23:13 -0800)]
perlsyn: spaces after dots
Father Chrysostomos [Fri, 6 Jan 2012 07:10:15 +0000 (23:10 -0800)]
regen pod issues
Father Chrysostomos [Fri, 6 Jan 2012 07:10:04 +0000 (23:10 -0800)]
perlsyn: wrap long verbatim line
Father Chrysostomos [Fri, 6 Jan 2012 07:07:42 +0000 (23:07 -0800)]
Increase $PerlIO::scalar::VERSION to 0.13
Father Chrysostomos [Fri, 6 Jan 2012 07:05:16 +0000 (23:05 -0800)]
perlsyn: Correct ... example
Father Chrysostomos [Fri, 6 Jan 2012 06:55:45 +0000 (22:55 -0800)]
[perl #92706] In PerlIO::Scalar::seek, don’t assume SvPOKp
Otherwise we get assertion failures.
In fact, since seeking might be just for reading, we can’t coerce and
SvGROW either.
In fact, since the scalar might be modified between seek and write,
there is no *point* in SvGROW during seek, even for SvPOK scalars.
PerlIO::scalar assumes in too many places that the scalar it is using
is its own private scalar that nothing else can modify. Nothing could
be farther from the truth.
This commit moves the zero-fill that usually happens when seeking past
the end from seek to write. During a write, if the current position
is past the end of the string, the intervening bytes are zero-filled
at that point, since the seek hasn’t done it.
Father Chrysostomos [Fri, 6 Jan 2012 04:50:55 +0000 (20:50 -0800)]
perlsyn: add triple-dot index entries and alias
This adds the index entries to perlsyn that were removed in
the previous commit, and mentions in perlsyn that the ellipsis
is also called a triple-dot.
Father Chrysostomos [Fri, 6 Jan 2012 04:48:49 +0000 (20:48 -0800)]
perlop: remove triple-dot
This has been superseded by
c2f1e229, which adds it
to perlsyn.
Father Chrysostomos [Fri, 6 Jan 2012 04:41:08 +0000 (20:41 -0800)]
[perl #90064] warn once for dbmopen with undef 3rd arg
Father Chrysostomos [Fri, 6 Jan 2012 04:24:32 +0000 (20:24 -0800)]
Increase $overload::VERSION to 1.17