Karl Williamson [Sat, 18 Aug 2012 17:44:09 +0000 (11:44 -0600)]
lib/unicore/README.perl: Make usablea s shell script
This adds comment symbols and redirects error messages to /dev/null for
likely things that will fail
Karl Williamson [Sat, 18 Aug 2012 16:01:07 +0000 (10:01 -0600)]
Revert "Experimentally Use Unicode 6.2 beta"
This reverts commit
5435c3759c4567a1bb51384f6641c04822ec6391.
A new beta has been released, and so we should use that instead.
Karl Williamson [Sun, 26 Aug 2012 17:30:57 +0000 (11:30 -0600)]
perldelta for Unicode property performance gains
Steve Hay [Sun, 26 Aug 2012 13:34:22 +0000 (14:34 +0100)]
Upgrade Socket from 2.004 to 2.006
H.Merijn Brand [Sun, 26 Aug 2012 12:52:26 +0000 (14:52 +0200)]
Add Configure probe for ip_mreq_source
Needed to upgrade Socket from CPAN
Father Chrysostomos [Sun, 26 Aug 2012 05:27:33 +0000 (22:27 -0700)]
Correct typo in flag name
Father Chrysostomos [Sun, 26 Aug 2012 01:48:46 +0000 (18:48 -0700)]
Banish boolkeys
Since
6ea72b3a1, rv2hv and padhv have had the ability to return boo-
leans in scalar context, instead of bucket stats, if flagged the right
way. sub { %hash || ... } is optimised to take advantage of this. If
the || is in unknown context at compile time, the %hash is flagged as
being maybe a true boolean. When flagged that way, it returns a bool-
ean if block_gimme() returns G_VOID.
If rv2hv and padhv can already do this, then we don’t need the
boolkeys op any more. We can just flag the rv2hv to return a boolean.
In all the cases where boolkeys was used, we know at compile time that
it is true boolean context, so we add a new flag for that.
Karl Williamson [Tue, 21 Aug 2012 14:17:51 +0000 (08:17 -0600)]
regexec.c: White-space only
Indent inside newly formed block
Karl Williamson [Tue, 21 Aug 2012 04:03:22 +0000 (22:03 -0600)]
regex: Speed up \X processing
For most Unicode releases, GCB=prepend matches absolutely nothing. And
that appears to be the case going forward, as they added things to it,
and removed them later based on field experience.
An earlier commit has improved the performance of this significantly by
using a binary search of an empty array instead of a swash hash.
However, that search requires several layers of function calls to
discover that it is empty, which this commit avoids.
This patch will use whatever swash_init() returns unless it is empty,
preserving backwards compatibility with older Unicode releases. But if
it is empty, the routine sets things up so that future calls will always
fail without further testing.
Karl Williamson [Sat, 25 Aug 2012 20:54:10 +0000 (14:54 -0600)]
utf8.c: indent in new block: White space-only
Karl Williamson [Sat, 25 Aug 2012 20:49:47 +0000 (14:49 -0600)]
utf8.c: Prefer binsearch over swash hash for small swashes
A binary swash is a hash of bitmaps used to cache the results of looking
up if a code point matches a Unicode property or regex bracketed
character class. An inversion list is a data structure that also holds
information about what code points match a Unicode property or character
class. It is implemented as an SV* to a sorted C array, and hence can
be searched using a binary search.
This patch converts to using a binary search of an inversion list
instead of a hash look-up for inversion lists that are no more than 512
elements (9 iterations of the search loop). That number can be easily
adjusted, if necessary.
Theoretically, a hash is faster than a binary lookup over a very long
period. So this may negatively impact long-running servers. But in the
short run, where most programs reside, the binary search is
significantly faster.
A swash is filled as necessary as time goes on, caching each new
distinct code point it is called with. If it is called with many, many
such code points, its performance can degrade as collisions increase. A
binary search does not have that drawback. However, most real-world
scenarios do not have a program being called with huge numbers of
distinct code points. Mostly, the program will be called with code
points from just one or a few of the world's scripts, so will remain
sparse. The bitmaps in a swash are each 64 bits long (except for ASCII,
where it is 128). That means that when the swash is populated, a lookup
of a single code point that hasn't been checked before will have to
lookup the 63 adjoining code points as well, increasing its startup
overhead. Of course, if one of those 63 code points is later accessed,
no extra populate happens. This is a typical case where a languages
code points are all near each other.
The bottom line, though, is in the short term, this patch speeds up the
processing of \X regex matching about 35-40%, with modern Korean (which
has uniquely complicated \X processing) closer to 40%, and other scripts
closer to 35%.
The 512 boundary means that over 90% of the official Unicode properties
are handled using binary search. I settled on that number by
experimenting with several properties besides \X and with various
powers-of-2 limits. Until I got that high, performance kept improving
when the property went from being a swash to a binary search. \X
improved even up to 2048, which encompasses 100% of the official Unicode
properties.
The implementation changes so that an inversion list instead of a swash
is returned by swash_init() when the input flags allows it to do so, for
all inversion lists shorter than the compiled in constant of 512
(actually <= 512). The other functions that access swashes have added
intelligence to deal with an object of either type. Should someone in
CPAN be using the public swash_init() interface, they will not see any
difference, as the option to get an inversion list is not available to
them.
Karl Williamson [Sat, 25 Aug 2012 20:51:11 +0000 (14:51 -0600)]
utf8.c: Bypass a subroutine wrapper
We might as well call the core swash initialization, since we are the
core here, since the public one merely wraps it.
Karl Williamson [Sat, 25 Aug 2012 19:27:25 +0000 (13:27 -0600)]
utf8.c: Add comment about speed-up attempt
This might keep someone later from attempting the speedup which didn't
actually help, so I didn't commit it
Karl Williamson [Sat, 25 Aug 2012 17:42:55 +0000 (11:42 -0600)]
utf8.c: Shorten hash key for speed
Experiments have shown that longer hash keys impact performance. See
the thread at
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2012-08/msg00869.html
This patch shortens a key used very frequently. There are other keys in
this hash which are used frequently in some circumstances, but I expect
to change to use fewer in the future, so am not changing them now
Karl Williamson [Sat, 25 Aug 2012 14:58:42 +0000 (08:58 -0600)]
utf8.c: collapse a function parameter
Now that we have a flags parameter, we can get put this parameter as
just another flag, giving a cleaner interface to this internal-only
function. This also renames the flag parameter to <flag_p> to indicate
it needs to be dereferenced.
Karl Williamson [Sat, 25 Aug 2012 14:06:30 +0000 (08:06 -0600)]
regexec.c: Reword comment
This portion of the comment is unnecessary, and doesn't really reflect
the implementation
Karl Williamson [Fri, 24 Aug 2012 20:38:02 +0000 (14:38 -0600)]
regexec.c: Use get method instead of internals
A new get method has been written to access the internals of a swash
it's best to use it.
This also moves the error checking to the method
Karl Williamson [Fri, 24 Aug 2012 20:20:41 +0000 (14:20 -0600)]
embed.fnc: Turn null wrapper function into macro
This function only does something on EBCDIC platforms. On ASCII ones
make it a macro, like similar ones to avoid useless function nesting
Karl Williamson [Fri, 24 Aug 2012 20:00:22 +0000 (14:00 -0600)]
utf8.c: Revise internal API of swash_init()
This revises the API for the version of swash_init() that is usable
by core Perl. The external interface is unaffected. There is now a
flags parameter to allow for future growth. And the core internal-only
function that returns if a swash has a user-defined property in it or
not has been removed. This information is now returned via the new
flags parameter upon initialization, and is unavailable afterwards.
This is to prepare for the flexibility to change the swash that is
needed in future commits.
Karl Williamson [Fri, 24 Aug 2012 17:11:57 +0000 (11:11 -0600)]
embed.fnc: Mark internal function as "may change"
This function is not designed for a public API, and should have been so
listed.
Karl Williamson [Thu, 23 Aug 2012 19:47:37 +0000 (13:47 -0600)]
Add caching to inversion list searches
Benchmarking showed some speed-up when the result of the previous
search in an inversion list is cached, thus potentially avoiding a
search in the next call. This adds a field to each inversion list which
caches its previous search result.
Karl Williamson [Sat, 18 Aug 2012 18:20:42 +0000 (12:20 -0600)]
regexec.c: Use xor to save a branch
Probably this gets optimized this way anyway.
Karl Williamson [Tue, 21 Aug 2012 16:22:00 +0000 (10:22 -0600)]
Comment out unused function
In looking at \X handling, I noticed that this function which is
intended for use in it, actually isn't used. This function may someday
be useful, so I'm leaving the source in.
Karl Williamson [Tue, 21 Aug 2012 15:30:08 +0000 (09:30 -0600)]
utf8.c: Speed up \X processing of Korean
\X matches according to a complicated pattern that is hard-coded in
regexec.c. Part of that pattern involves checking if a code point is a
component of a Hangul Syllable or not. For Korean code points, this
involves checking against multiple tables. It turns out that two of
those tables are arranged so that the checks for them can be done via an
arithmetic expression; Unicode publishes algorithms for determining
various characteristics based on their very structured ordering.
This patch converts the routines that check these two tables to instead
use the arithmetic expression.
Karl Williamson [Thu, 23 Aug 2012 16:36:13 +0000 (10:36 -0600)]
regcomp.c: Move functions to inline_invlist.c
This populates inline_invlist.c with some static inline functions and
macro defines. These are the ones that are anticipated to be needed in
the near term outside regcomp.c
Karl Williamson [Thu, 23 Aug 2012 16:19:51 +0000 (10:19 -0600)]
regcomp.c: Rename 2 functions to indicate private nature
These two functions will be moved into a header in a future commit,
where they will be accessible outside regcomp.c Prefix their names with
an underscore to emphasize that they are private
Karl Williamson [Thu, 23 Aug 2012 14:37:58 +0000 (08:37 -0600)]
regcomp.c: Silence compiler warning.
The warning that this variable can be used uninitialized is spurious,
but silence it nonetheless.
Karl Williamson [Thu, 23 Aug 2012 00:30:59 +0000 (18:30 -0600)]
Add empty inline_invlist.c
This will be used for things need to handle inversion lists in the three
files that currently use them. I'm putting this in a separate hdr,
because inversion lists are very internal-only, so should not be grouped
in with things that there is an external API for. It is a dot-c file so
that the functions can continue to be declared with embed.fnc, and
porting/args_assert.t will continue to work, as it looks only in .c
files.
Karl Williamson [Tue, 21 Aug 2012 17:24:48 +0000 (11:24 -0600)]
regcomp.c: Add assertion, comments
Karl Williamson [Sat, 18 Aug 2012 20:23:12 +0000 (14:23 -0600)]
regcomp.c: Allow search to work on empty inversion lists
You cannot retrieve the array of an empty inversion list, so the code
has to be reordered to do that after the list is known to be non-empty.
I haven't been able to find a case where this currently fails, but
future commits open up the possibility.
Karl Williamson [Sat, 18 Aug 2012 18:19:00 +0000 (12:19 -0600)]
regcomp.c: Special case /[UV_MAX]/
The highest code point representable on the machine has to be special
cased. Earlier commits for 5.14 did this for ranges ending in this code
point, but it turns out there needs to be a special-special case when
the range contains just it.
Karl Williamson [Mon, 20 Aug 2012 19:28:31 +0000 (13:28 -0600)]
mktables: Fix bug when deleting final range
When a Range_List is emptied, there is a bug which causes a runtime
error when trying to refer to a non-existent element. This avoids that.
A future commit would have run afoul of this bug.
Father Chrysostomos [Sat, 25 Aug 2012 21:43:33 +0000 (14:43 -0700)]
Increase $B::Concise::VERSION to 0.93
Father Chrysostomos [Sat, 25 Aug 2012 20:22:46 +0000 (13:22 -0700)]
Optimise %hash in sub { %hash || ... }
In %hash || $foo, the %hash is in scalar context, so it has to iterate
through the buckets to produce statistics on bucket usage.
If the || is in void context, the value returned by hash is only ever
used as a boolean (as || doesn’t have to return it). We already opti-
mise it by adding a boolkeys op when it is known at compile time that
|| will be in void context.
In sub { %hash || $foo } it is not known at compile time that it will
be in void context, so it wasn’t optimised.
This commit optimises it by flagging the %hash at compile time as
being possibly in ‘true boolean’ context. When that flag is set,
the rv2hv and padhv ops call block_gimme() to see whether || is in
void context.
This speeds things up signficantly. Here is what I got after optimis-
ing rv2hv but before doing padhv:
$ time ./miniperl -e '%hash = 1..10000; sub { %hash || 1 }->() for 1..100000'
real 0m0.179s
user 0m0.101s
sys 0m0.005s
$ time ./miniperl -e 'my %hash = 1..10000; sub { %hash || 1 }->() for 1..100000'
real 0m5.446s
user 0m2.419s
sys 0m0.015s
(That example is slightly misleading because of the closure, but the
closure version takes 1 sec. when optimised.)
Yves Orton [Sat, 25 Aug 2012 16:35:25 +0000 (18:35 +0200)]
improve and fix the documentation of the PERL_HASH function
Yves Orton [Sat, 25 Aug 2012 10:28:38 +0000 (12:28 +0200)]
minor doc patches to api stuff
Father Chrysostomos [Sat, 25 Aug 2012 07:12:26 +0000 (00:12 -0700)]
Apply boolkeys optimisation to %hash?:
and consequently if(%hash) followed by else.
Father Chrysostomos [Sat, 25 Aug 2012 07:07:21 +0000 (00:07 -0700)]
Apply boolkeys optimisation to scalar(%hash)
Father Chrysostomos [Sat, 25 Aug 2012 06:52:36 +0000 (23:52 -0700)]
[perl #114576] Optimise if(%hash) in non-void context
The boolkeys optimisation (
867fa1e2da1) was only applying to an and
(or if) in void context. If an if occurs as the last thing in a sub-
routine, the void context is not know at compile time so the optimisa-
tion does not apply.
In the case of || (to which the boolkeys optimisation also applies),
we can’t optimise it in non-void context, because someone might be
writing $bucket_info = %hash || '0/0';
In the case of &&, we can optimise it, even in non-void context,
because a true value will always be discarded in %hash && foo.
The false value it returns for an empty hash is always the int-
eger 0. That would change if we simply applied boolkeys to
my $ret = %hash && foo; because boolkeys return &PL_sv_no (the dualvar
you get from !1). But since boolkeys’ return value is never directly
visible to perl code, we can safely change that.
Father Chrysostomos [Sat, 25 Aug 2012 06:03:44 +0000 (23:03 -0700)]
pp.c: pp_boolkeys does not need to pop
If it’s going to consume and return exactly one item, it doesn’t need
to decrement and increment the stack pointer.
Daniel Dragan [Fri, 24 Aug 2012 21:07:59 +0000 (17:07 -0400)]
[perl #114572] perl.c: fix locality/rmv redundant nulls in call_sv/eval_sv
Small tweaks to improve locality/more opportunity for C compiler to
optimize. Also remove redunant nulls, since the OP structs are
null filled a line or 2 before.
Father Chrysostomos [Fri, 24 Aug 2012 19:39:40 +0000 (12:39 -0700)]
parser.t: Move a test above ‘Add new tests here’
Father Chrysostomos [Fri, 24 Aug 2012 16:33:51 +0000 (09:33 -0700)]
pad.h: Rename PadnameSTATE; make it a proper boolean
I used PadnameIs* for OUR, because I was copying
PAD_COMPNAME_FLAGS_isOUR. STATE should be consistent with it. And it
was missing the double bang, making the docs wrong.
Tony Cook [Thu, 1 Mar 2012 10:16:50 +0000 (21:16 +1100)]
rt #111126 - don't empty a file with copy("foo/bar", "foo/");
Tony Cook [Wed, 29 Feb 2012 13:11:56 +0000 (00:11 +1100)]
rt #111126 - TODO test for copy foo/file to foo/
Tony Cook [Fri, 24 Aug 2012 07:46:14 +0000 (17:46 +1000)]
close the Peek.t temp file so the END block can unlink it
This was leaving detritus on Win32 builds
Tony Cook [Fri, 24 Aug 2012 07:42:48 +0000 (17:42 +1000)]
oops, left some debugging code
left from fixing perl #112272
Daniel Dragan [Wed, 22 Aug 2012 06:19:55 +0000 (02:19 -0400)]
don't use PerlHost's getenv after perl_destruct
On Win32, perl_free calls PerlHost's getenv which calls win32_getenv.
win32_getenv and its children use SVs and mortal stack. After
perl_destruct SVs and mortal stack don't exist but the old Itmps_stack
pointer remains unchanged/un-nulled. Depending on the memory allocator
randomness, previous mortaled SV would be written to allocator freed
but page allocated memory and it silently worked. Recently in 5.17 the
page started to be freed and now this bug segvs. This patch fixes
the problem by using PL_perl_destruct_level and calling getenv earlier.
Steve Hay [Thu, 23 Aug 2012 08:20:16 +0000 (09:20 +0100)]
Announcement template - Current development track is 5.17
Steve Hay [Thu, 23 Aug 2012 08:05:38 +0000 (09:05 +0100)]
RMG - CPAN /src and /src/README.html are the same
Steve Hay [Thu, 23 Aug 2012 07:52:00 +0000 (08:52 +0100)]
RMG - corelist.pl uses HTTP::Tiny, not wget or curl
It also fetches files remotely even when using a local CPAN mirror if
the files are missing.
Nicholas Clark [Thu, 23 Aug 2012 17:48:14 +0000 (19:48 +0200)]
Record the story behind the pack format specifiers H, h, B and b.
Father Chrysostomos [Thu, 23 Aug 2012 16:32:03 +0000 (09:32 -0700)]
Increase $Module::CoreList::VERSION to 2.73
Even though cmp_version.t doesn’t mind 2.72, we need a version bump,
as 2.72 is already on CPAN.
David Leadbeater [Wed, 22 Aug 2012 15:03:43 +0000 (17:03 +0200)]
Clean up data for ExtUtils::Miniperl in Module::CoreList
Some corelist data was constructed without ExtUtils::Miniperl being
present, presumably because perl wasn't fully built at the time.
David Leadbeater [Wed, 22 Aug 2012 14:50:13 +0000 (16:50 +0200)]
Clean up data for Pod::Perldoc::ToTk in Module:CoreList
It was alternating between 'undef' and undef.
David Leadbeater [Wed, 22 Aug 2012 14:41:16 +0000 (16:41 +0200)]
Clean up data for Carp::Heavy in Module::CoreList
It was lagging behind by about one release -- presumably due to it being
based on $Carp::VERSION.
David Leadbeater [Wed, 22 Aug 2012 14:27:29 +0000 (16:27 +0200)]
Fix the version of Scalar::Util in corelist for 5.7.3
Father Chrysostomos [Thu, 23 Aug 2012 07:19:55 +0000 (00:19 -0700)]
pad.h: PadnameSTATE
Father Chrysostomos [Thu, 23 Aug 2012 04:48:56 +0000 (21:48 -0700)]
Use FooBAR convention for new pad macros
After a while, I realised that it can be confusing for PAD_ARRAY and
PAD_MAX to take a pad argument, but for PAD_SV to take a number and
PAD_SET_CUR a padlist.
I was copying the HEK_KEY convention, which was probably a bad idea.
This is what we use elsewhere:
TypeMACRO
----=====
AvMAX
CopFILE
PmopSTASH
StashHANDLER
OpslabREFCNT_dec
Furthermore, heks are not part of the API, so what convention they use
is not so important.
So these:
PADNAMELIST_*
PADLIST_*
PADNAME_*
PAD_*
are now:
Padnamelist*
Padlist*
Padname*
Pad*
Father Chrysostomos [Thu, 23 Aug 2012 01:15:34 +0000 (18:15 -0700)]
Increase $B::Deparse::VERSION to 1.17
Father Chrysostomos [Thu, 23 Aug 2012 01:15:11 +0000 (18:15 -0700)]
B::Deparse: Suppress trailing ; in formats
While it doesn’t change the behaviour, nobody writes formats that way,
and this makes the output match 5.17.2 and earlier.
Father Chrysostomos [Thu, 23 Aug 2012 01:11:33 +0000 (18:11 -0700)]
pad.h: Let PADNAME_PV return null
Father Chrysostomos [Wed, 22 Aug 2012 23:48:45 +0000 (16:48 -0700)]
pad.h: typos in macro definitions
It would help to define these macros properly.
Father Chrysostomos [Wed, 22 Aug 2012 23:33:06 +0000 (16:33 -0700)]
pad.h: PADNAME_SV
If CPAN modules should not assume that pad names are SVs, we need
to provide a better way than newSVpvn(PADNAME_PV(pn),PADNAME_LEN(pn))
to get an SV out of it, as, knowing that pad names are just SVs, the
core can do it more efficiently by simply returning the pad name
itself.
Father Chrysostomos [Wed, 22 Aug 2012 23:24:37 +0000 (16:24 -0700)]
pad.[ch]: PADNAME_OUTER
I think this is the last bit of pad-as-sv stuff that was not
abstracted away in pad-specific macros.
Father Chrysostomos [Wed, 22 Aug 2012 22:59:23 +0000 (15:59 -0700)]
toke.c: Extreme paranoia
Karl Williamson [Wed, 22 Aug 2012 20:50:43 +0000 (14:50 -0600)]
PATCH: Devel::Peek doesn't compile under C++
Commit
c9795579db61c900bacee2790bdceb7bad3dd45d introduced
an error in C++: it's missing a cast.
Father Chrysostomos [Wed, 22 Aug 2012 21:07:44 +0000 (14:07 -0700)]
[perl #114040] Fix here-docs in multiline re-evals
Commit
5097bf9b8 only partially fixed this, or, rather, did the
groundwork for fixing it.
If we have a pattern like this:
/(?{<<foo . baz
bar
foo
})/
Then PL_linestr contains this while we are parsing the block:
"(?{<<foo . baz\nbar\nfoo\n})"
The code for parsing a here-doc in a multiline PL_linestr buffer
(which applies to here-docs in string evals or in quote-like operat-
ors) likes to modify PL_linestr to contain everything after the
<<heredoc marker except the here-doc body, which has been stolen (but
it oddly includes the last character of the marker, which does not
matter, as PL_bufptr is set to PL_linestart+1):
"o . baz\n})"
The regexp block parsing code expects to be able to extract the entire
block (as a string) from PL_linestr after parsing it. So it is not
helpful for S_scan_heredoc to go and modify it like that.
Before modifying PL_linestr, we can set aside a copy of the source
code (in PL_sublex_info.re_eval_str) from the beginning of the regexp
block to the end of PL_linestr, so that the regexp block code can
retrieve the original source from there.
We also adjust PL_sublex_info.re_eval_start so that at the end of the
regexp block PL_bufptr - PL_sublex_info.re_eval_start is the length of
the block.
Instead of clobbering PL_linestr, we can copy everything after the
here-doc to when the body begins. And this for two reasons: it
requires less allocation (I would have made that change in the end
anyway, for efficiency), and it makes it easier to calculate how much
to subtract from re_eval_start.
This fix does not apply to here-docs in quotes in multiline string
evals, which crashes and always has.
Father Chrysostomos [Wed, 22 Aug 2012 19:52:15 +0000 (12:52 -0700)]
Peek.t: Test that DeadCode doesn’t crash
I broke it, but Karl Williamson’s commit (the previous) with my tweaks
fixes it. This function was not at all exercised by the test suite.
Karl Williamson [Wed, 22 Aug 2012 17:16:55 +0000 (11:16 -0600)]
Devel::Peek: Fix so compiles under C++
Commit
86b9d29366aea0e71ad75b61d04f56f1fe5b0d4d created a new PADLIST
type. However, this broke the compilation of Devel::Peek with C++.
This commit gets it to compile again, and pass our regression test
suite.
[Modified by the committer to use the correct PADLIST_ macros; other-
wise it will crash.]
Father Chrysostomos [Wed, 22 Aug 2012 16:46:28 +0000 (09:46 -0700)]
toke.c: -DT should report forced tokens under -Dmad
I was wondering why the -DT output was missing things out.
This is why:
#ifdef PERL_MAD
/* FIXME - can these be merged? */
return next_type;
#else
return REPORT(next_type);
#endif
Father Chrysostomos [Wed, 22 Aug 2012 15:43:40 +0000 (08:43 -0700)]
heredoc.t: Add a CRLF test
I nearly broke this in recent bug fixes
Father Chrysostomos [Wed, 22 Aug 2012 01:02:39 +0000 (18:02 -0700)]
[Merge] New PADLIST type
To fix a bug (
db4cf31d1d) and to facilitate the lexical subs I’m work-
ing on, I needed to be able to add extra fields to a padlist. But
padlists are AVs, making that nontrivial.
There is no reason they need to be AVs, and they take less memory when
they are not, so I made a new padlist struct.
This is going to break CPAN modules that manipulate padlists.
To avoid having to patch those modules again later if we change pads
from AVs into their own types, I have added APIs for accessing the
contents of pads.
There is also a new PADNAMELIST type (currently equivalent to AV), in
case the pad holding the names needs to be a different type from a pad
some time in the future.
Father Chrysostomos [Wed, 22 Aug 2012 01:02:10 +0000 (18:02 -0700)]
pad.c: fix pod link
Father Chrysostomos [Tue, 21 Aug 2012 23:52:15 +0000 (16:52 -0700)]
Increase $XS:APItest::VERSION to 0.43
Father Chrysostomos [Tue, 21 Aug 2012 23:51:48 +0000 (16:51 -0700)]
Increase $B::VERSION to 1.38
Father Chrysostomos [Sat, 18 Aug 2012 19:12:36 +0000 (12:12 -0700)]
pad.c: CvPADLIST docs: one more thing
Father Chrysostomos [Sat, 18 Aug 2012 18:46:40 +0000 (11:46 -0700)]
pad.c: Use PAD_ARRAY rather than AvARRAY in curpad docs
Father Chrysostomos [Sat, 18 Aug 2012 18:38:50 +0000 (11:38 -0700)]
Use new types for comppad and comppad_name
I know that a few times I’ve looked at perl source files to find out
what type to use in ‘<type> foo = PL_whatever’. So I am changing
intrpvar.h as well as the api docs.
Father Chrysostomos [Sat, 18 Aug 2012 18:36:32 +0000 (11:36 -0700)]
pad.c: CvPADLIST doc update
Father Chrysostomos [Fri, 17 Aug 2012 21:21:37 +0000 (14:21 -0700)]
More PAD APIs
If we are making padlists their own type, and no longer AVs, it makes
sense to add APIs for pads, too, so that CPAN code that needs to
change now will only have to change once if we ever stop pads them-
selves from being AVs.
There is no reason pad names have to be SVs, so I am adding sep-
arate APIs for pad names, too. The AV containing pad names is
now officially a PADNAMELIST, which is accessed, not via
*PADLIST_ARRAY(padlist), but via PADLIST_NAMES(padlist).
Future optimisations may even merge the padlist with its name list so
I have also added macros to access the parts of the name list directly
from the padlist.
Father Chrysostomos [Fri, 17 Aug 2012 20:01:49 +0000 (13:01 -0700)]
Fix format closure bug with redefined outer sub
CVs close over their outer CVs. So, when you write:
my $x = 52;
sub foo {
sub bar {
sub baz {
$x
}
}
}
baz’s CvOUTSIDE pointer points to bar, bar’s CvOUTSIDE points to foo,
and foo’s to the main cv.
When the inner reference to $x is looked up, the CvOUTSIDE chain is
followed, and each sub’s pad is looked at to see if it has an $x.
(This happens at compile time.)
It can happen that bar is undefined and then redefined:
undef &bar;
eval 'sub bar { my $x = 34 }';
After this, baz will still refer to the main cv’s $x (52), but, if baz
had ‘eval '$x'’ instead of just $x, it would see the new bar’s $x.
(It’s not really a new bar, as its refaddr is the same, but it has a
new body.)
This particular case is harmless, and is obscure enough that we could
define it any way we want, and it could still be considered correct.
The real problem happens when CVs are cloned.
When a CV is cloned, its name pad already contains the offsets into
the parent pad where the values are to be found. If the outer CV
has been undefined and redefined, those pad offsets can be com-
pletely bogus.
Normally, a CV cannot be cloned except when its outer CV is running.
And the outer CV cannot have been undefined without also throwing
away the op that would have cloned the prototype.
But formats can be cloned when the outer CV is not running. So it
is possible for cloned formats to close over bogus entries in a new
parent pad.
In this example, \$x gives us an array ref. It shows ARRAY(0xbaff1ed)
instead of SCALAR(0xdeafbee):
sub foo {
my $x;
format =
@
($x,warn \$x)[0]
.
}
undef &foo;
eval 'sub foo { my @x; write }';
foo
__END__
And if the offset that the format’s pad closes over is beyond the end
of the parent’s new pad, we can even get a crash, as in this case:
eval
'sub foo {' .
'{my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l,$m,$n,$o,$p,$q,$r,$s,$t,$u)}'x999
. q|
my $x;
format =
@
($x,warn \$x)[0]
.
}
|;
undef &foo;
eval 'sub foo { my @x; my $x = 34; write }';
foo();
__END__
So now, instead of using CvROOT to identify clones of
CvOUTSIDE(format), we use the padlist ID instead. Padlists don’t
actually have an ID, so we give them one. Any time a sub is cloned,
the new padlist gets the same ID as the old. The format needs to
remember what its outer sub’s padlist ID was, so we put that in the
padlist struct, too.
Father Chrysostomos [Thu, 16 Aug 2012 23:47:38 +0000 (16:47 -0700)]
Increase $B::Xref::VERSION from 1.03 to 1.04
Father Chrysostomos [Thu, 16 Aug 2012 23:46:20 +0000 (16:46 -0700)]
Stop padlists from being AVs
In order to fix a bug, I need to add new fields to padlists. But I
cannot easily do that as long as they are AVs.
So I have created a new padlist struct.
This not only allows me to extend the padlist struct with new members
as necessary, but also saves memory, as we now have a three-pointer
struct where before we had a whole SV head (3-4 pointers) + XPVAV (5
pointers).
This will unfortunately break half of CPAN, but the pad API docs
clearly say this:
NOTE: this function is experimental and may change or be
removed without notice.
This would have broken B::Debug, but a patch sent upstream has already
been integrated into blead with commit
9d2d23d981.
Father Chrysostomos [Thu, 16 Aug 2012 05:27:54 +0000 (22:27 -0700)]
Use PADLIST in more places
Much code relies on the fact that PADLIST is typedeffed as AV.
PADLIST should be treated as a distinct type.
Father Chrysostomos [Thu, 16 Aug 2012 05:11:46 +0000 (22:11 -0700)]
Move PAD(LIST) typedefs to perl.h
otherwise they can only be used in some header files.
Father Chrysostomos [Tue, 21 Aug 2012 23:39:10 +0000 (16:39 -0700)]
[Merge] Enter inline.h
This is a home for static inline functions that cannot go in other
headers because they depend on proto.h or struct definitions.
This allows us to avoid repeating macros with GCC and non-GCC ver-
sions. It also makes it easier to avoid evaluating macro argu-
ments twice.
I’ve moved just enough things into it to offset the additional lines
added by the comments at the top. The ‘net code removal’ of this
branch is 4 lines.
Father Chrysostomos [Sat, 18 Aug 2012 20:16:31 +0000 (13:16 -0700)]
Move S_CvDEPTHp from cv.h to inline.h; shrink macros
This allows us to use assert() inside S_CvDEPTHp, so we no longer need
GCC and non-GCC variants of the macro that calls it.
Father Chrysostomos [Sat, 18 Aug 2012 19:58:38 +0000 (12:58 -0700)]
Static inline functions for SvPADTMP and SvPADSTALE
This allows non-GCC compilers to have assertions and avoids
repeating the macros.
Father Chrysostomos [Sat, 18 Aug 2012 19:39:40 +0000 (12:39 -0700)]
Use fast SvREFCNT_dec for non-GCC
Father Chrysostomos [Sat, 18 Aug 2012 19:34:33 +0000 (12:34 -0700)]
Use static inline functions for SvREFCNT_inc
This avoids the need to repeat the macros in GCC and non-GCC versions.
For non-GCC compilers capable of inlining, this should speed things up
slightly, too, as PL_Sv is no longer needed.
Father Chrysostomos [Fri, 17 Aug 2012 04:54:53 +0000 (21:54 -0700)]
[perl #113718] Add inline.h
We can put static inline functions here, and they can depend on
function prototypes and struct definitions from other header
files.
Chris 'BinGOs' Williams [Tue, 21 Aug 2012 22:55:41 +0000 (23:55 +0100)]
Sync Module-CoreList in Maintainers.pl for CPAN release
Chris 'BinGOs' Williams [Tue, 21 Aug 2012 22:46:13 +0000 (23:46 +0100)]
Update Changes fr Module-CoreList and bump to version 2.72
Father Chrysostomos [Tue, 21 Aug 2012 21:13:02 +0000 (14:13 -0700)]
[Merge] Here-doc parsing
I was waiting for 5.17.3 to be released, before merging my work on
padlists (which is blocking lexical subs), since I thought it would be
mean to inflict it on blead at the last minute before a release.
So, in the mean time, I decided to fix a small here-doc parsing bug,
that prevented them from occurring inside regexp code blocks.
As often happens, it turned out to be more involved than that....
I ended up writing a history of here-doc parsing, which you can find
in the commit message for
5097bf9b8d, which shows that the way they
have interacted with other quote-like operators (or other here-docs)
has changed over time in interesting ways.
While I was fixing those, I started to find other bugs. Since I was
modifying the code, I decided to try applying David Nicol’s patch that
allows a here-doc terminator with no newline after it, to avoid creat-
ing more conflicts through my changes. The patch didn’t work. And
while I was resolving what conflicts there were, I figured out a sim-
pler approach. So, instead of trying to investigate into why the
patch didn’t work, I just wrote my own version, which used less code.
Instead of working back on error to try to see whether we could have
accepted a terminator without a newline, we can just tack a newline on
the string buffer at EOF and let the rest of the code handle it the
usual way.
I continued to find more bugs as I went, till my ‘Yay, another bug!’
started to become ‘What? *Another* bug?’.
In the end:
• I fixed here-doc parsing, such that the body starts on the line fol-
lowing the <<foo marker, regardless of whether it is inside quotes,
string evals, or what have you (but see remaining bugs below). This
was contrary to the documentation, but the documentation was actu-
ally wrong half the time, so I corrected it.
• Here-doc terminators no longer require a final newline at EOF.
• You no longer get crashes with edge cases.
• Nulls in comments no longer confuse the here-doc parser.
And, finally, one bug that I fixed was not related to here-docs per
se, but got in the way. It deserves its own JAPH:
s/${s|||, \""}Just another Perl hacker,
/anything/;
print
There are still two bugs remaining:
• Here-docs whose markers occur in single-line s/// patterns where the
replacement part is multi-line or starts on a subsequent line are
still screwed.
• CR and CR LF line terminators are treated inconsistently inside and
outside of string evals.
I’ve decided to set those aside for later and merge what I’ve
done so far.
Father Chrysostomos [Tue, 21 Aug 2012 21:09:51 +0000 (14:09 -0700)]
perlop.pod: Update here-doc-in-quotes parsing rules
Father Chrysostomos [Tue, 21 Aug 2012 08:11:34 +0000 (01:11 -0700)]
smoke-me diag
nt,hun
Father Chrysostomos [Tue, 21 Aug 2012 08:45:15 +0000 (01:45 -0700)]
toke.c:scan_heredoc: Use PL_tokenbuf less
When scanning for a heredoc terminator in a string eval or quote-like
operator, the first character we are looking for is always a newline.
So instead of setting term to *PL_tokenbuf in those two code paths,
we can just hard-code '\n'.
Father Chrysostomos [Tue, 21 Aug 2012 06:58:59 +0000 (23:58 -0700)]
Fix substitution in substitution pattern
Guess what this prints:
s/${s|||, \""}Just another Perl hacker,
/anything/;
print
And look at this:
$ perl5.6.2 -e 's/${s|||;\""}/foo\n/; print;'
$ perl5.16.0 -e 's/${s|||;\""}/foo\n/; print;'
$ perl5.17.2 -e 's/${s|||;\""}/foo\n/; print;'
Bus error
$ ./miniperl -e 's/${s|||;\""}/foo\n/; print;'
Bus error
The first two gave no output, though they should have shown "foo".
And bleadperl now crashes.
When the lexer parses a quote-like operator, it begins by extracting
what is between the quotes. It puts it in an SV stored in the varia-
ble PL_lex_stuff. Then, if it is y/// or s///, it scans the replace-
ment part and puts it in an SV in PL_lex_repl. When it finishes with
it, it sets PL_lex_repl to NULL.
Now, if you put s/// in the pattern part of s/// (or y in s), the
inner s/// will clobber PL_lex_repl with its own replacement string.
So, when the outer s/// finish parsing its pattern and wants its
replacement string. If it is not there, it assumes it has already
parsed it (whether PL_lex_repl is set is how it remembers which half
of s/// it is parsing), and proceeds to feed bad code to the parser,
resulting in a bad op tree.
PL_lex_repl needs to be localised when a quote-like operator is
parsed. Since localisation for quote-like operators happens in a sep-
arate yylex call (yylex calls sublex_push, which does it) after the
string delimiters are found, at which point PL_lex_repl has already
been set (clobbering the previous value), we change the delim-
iter-scanning code (scan_{str,trans,subst}) to use the new
PL_sublex_info.repl, which sublex_push now copies into PL_lex_repl
after localising the latter.
Father Chrysostomos [Tue, 21 Aug 2012 02:08:57 +0000 (19:08 -0700)]
Fix here-docs in nested quote-like operators
When the lexer encounters a quote-like operator, it extracts the con-
tents of the quotes and starts an inner lexing scope.
To handle eval "s//<<FOO/e\n...", the here-doc parser peeks into the
outer lexing scope’s PL_linestr (current line buffer, which inside an
eval contains the entire string of code being parsed; for quote-like
operators, that is where the contents of the quote are stored). It
only does this inside a string eval. When parsing a file, the input
comes in one line at a time. So the here-doc parser steals lines from
the input stream for s//<<FOO/e outside an eval.
This approach fails in this case, as the peekee is the linestr for
s///, not for the eval:
eval ' s//"${\<<END}"/e; print
Just another Perl hacker,
END
'or die $@
__END__
Can't find string terminator "END" anywhere before EOF at (eval 1) line 1.
We also need to do this peeking stuff outside of a string eval, to
solve this:
s//"${\<<END}"
Just another Perl hacker,
END
/e; print
__END__
Can't find string terminator "END" anywhere before EOF at - line 1.
In the first example above, we need to look not in the parent lexing
scope’s linestr, but in that of the grandparent.
To solve the second example, we need to check whether the outer lexing
scope is a quote-like operator when we are not in an eval.
For parsing here-docs in quotes in eval, we currently store two
things, the former buffer pointer and the former linestr, in
PL_sublex_info.super_{bufp,lines}tr. The values for upper scopes are
stashed away on the savestack somewhere.
We need to be able to iterate through the outer lexer scopes till we
find one with multiple lines. Retrieving the information from the
savestack would be too complex and error-prone.
Since PL_linestr is an SV, we can abuse a couple of fields in it.
Upgrading it to PVNV gives it both IVX and NVX fields, which are big
enough to store pointers.
IVX is already used to hold an op number. So for the innermost quoted
scope we still need to use PL_sublex_info.super_bufptr. When entering
a new lexing scope (in sublex_push), we can localise the IVX field of
the outer PL_linestr SV and set it to what PL_sublex_info.super_bufptr
was in that scope. SvIVX(linestr) is only used for an op number when
that linestr’s lexing scope is the innermost one.
PL_sublex_info.super_linestr can be eliminated and replaced with
SvNVX(PL_linestr).