review.tizen.org Git - platform/upstream/perl.git/log

projects / platform / upstream / perl.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Father Chrysostomos [Wed, 29 Aug 2012 05:37:10 +0000 (22:37 -0700)]

toke.c: S_scan_heredoc: prune dead code

This incorrect code (using a pointer after finding it to be null)
is the result of the refactoring in 60f40a3895. It was trying to
account for a string eval with no line break in it. But that can’t
happen as of 11076590 (if it could it would crash).

So remove it and add an assertion, along with a comment explaining the
assertion.

commit | commitdiff | tree

Nicholas Clark [Thu, 30 Aug 2012 13:34:33 +0000 (15:34 +0200)]

Refactor t/op/die.t to re-use the same $SIG{__DIE__} handler where possible.

Restore testing that the $SIG{__DIE__} handler is called for the case of
C<die bless [ 7 ], "Error";> which was removed by the previous refactoring.
Re-using the same $SIG{__DIE__} handler results in 4 more tests of isa_ok()
for an 'ARRAY' - this isn't going to hurt anyone.

commit | commitdiff | tree

Colin Kuskie [Wed, 18 Jul 2012 04:59:30 +0000 (21:59 -0700)]

Refactor t/op/die.t to use test.pl instead of making TAP by hand.

[With a few whitespace tweaks]

commit | commitdiff | tree

Jerry D. Hedden [Wed, 29 Aug 2012 14:55:12 +0000 (10:55 -0400)]

Fix Cygwin build warnings

Fixes the following build warnings under Cygwin:

cygwin.c: In function 'do_spawn':
cygwin.c:132:5: warning: assignment from incompatible pointer type
cygwin.c: In function 'XS_Cygwin_posix_to_win_path':
cygwin.c:346:9: warning: 'err' may be used uninitialized in this function
cygwin.c: In function 'XS_Cygwin_win_to_posix_path':
cygwin.c:257:9: warning: 'err' may be used uninitialized in this function

commit | commitdiff | tree

Nicholas Clark [Wed, 29 Aug 2012 20:23:19 +0000 (22:23 +0200)]

Remove a no-longer needed lexical from t/op/lop.t

Jim Keenan spotted the commented out code referencing the variable $test.
Turns out that it is completely redundant, so its declaration can go too.

commit | commitdiff | tree

Colin Kuskie [Sat, 11 Aug 2012 03:24:09 +0000 (20:24 -0700)]

Document the last five tests of t/op/lop.t

commit | commitdiff | tree

Colin Kuskie [Sat, 28 Jul 2012 21:14:45 +0000 (14:14 -0700)]

Update t/op/lop.t to use test.pl instead of making TAP by hand.

commit | commitdiff | tree

Colin Kuskie [Thu, 19 Jul 2012 01:35:19 +0000 (18:35 -0700)]

Refactor t/uni/case.pl to use test.pl instead of making TAP by hand.

commit | commitdiff | tree

Colin Kuskie [Wed, 18 Jul 2012 05:21:21 +0000 (22:21 -0700)]

Refactor t/porting/checkcase.t to use test.pl instead of making TAP by hand.

commit | commitdiff | tree

Colin Kuskie [Wed, 18 Jul 2012 05:07:54 +0000 (22:07 -0700)]

Refactor t/re/no_utf8_pt.t to use test.pl instead of making TAP by hand.

commit | commitdiff | tree

Nicholas Clark [Tue, 28 Aug 2012 19:22:51 +0000 (21:22 +0200)]

Add /\.gif\z/ files to the non-Pod exceptions in t/porting/podcheck.t

commit | commitdiff | tree

Nicholas Clark [Tue, 28 Aug 2012 19:09:24 +0000 (21:09 +0200)]

t/porting/podcheck.t now passes no_chdir to File::Find::find().

File::Find::find() can call warn::warnif(), which in turn attempts to lazy
load Carp, which doesn't work for a test using relative paths in @INC with
the current directory changed.

commit | commitdiff | tree

Nicholas Clark [Tue, 28 Aug 2012 15:23:10 +0000 (17:23 +0200)]

t/porting/dual-life.t now passes no_chdir to File::Find::find().

File::Find::find() can call warn::warnif(), which in turn attempts to lazy
load Carp, which doesn't work for a test using relative paths in @INC with
the current directory changed.

commit | commitdiff | tree

Nicholas Clark [Tue, 28 Aug 2012 15:14:37 +0000 (17:14 +0200)]

t/porting/exec-bit.t isn't using File::{Basename,Find,Spec::Functions}.

No point loading modules that it uses nothing from.

commit | commitdiff | tree

Nicholas Clark [Tue, 28 Aug 2012 15:05:18 +0000 (17:05 +0200)]

t/porting/checkcase.t now passes no_chdir to File::Find::find().

This avoids the test occasionally aborting due to File::Find::find() calling
warn::warnif(), which in turn attempts to lazy load Carp, which doesn't work
for a test using relative paths in @INC with the current directory changed.

commit | commitdiff | tree

Nicholas Clark [Tue, 28 Aug 2012 20:01:19 +0000 (22:01 +0200)]

Refactor t/porting/filenames.t to shrink the code and the TAP generated.

Fold the function validate_file_name() into its only caller. Put the tested
pathname into each test description to avoid a call to note() - this halves
the size of the TAP generated. Fold the chained tests into a chained
if/elsif/else sequence. Eliminate the use of File::Spec, as all platforms
can cope internally with F<../MANIFEST>.

commit | commitdiff | tree

Karl Williamson [Tue, 28 Aug 2012 21:37:22 +0000 (15:37 -0600)]

regexec.c: White-space only

This outdents a block whose enclosing braces have been removed, and
reflows things to correspond.

commit | commitdiff | tree

Karl Williamson [Tue, 28 Aug 2012 21:29:42 +0000 (15:29 -0600)]

Avoid duplicate table look ups.

These two spots both are matching 'c+' where 'c' is some character
against a Unicode table. Prior to this patch, if it matched a single
'c', it would fall into a while loop, where it matches that same 'c'
again. Simply increment the pointer past the first match, and the while loop
will start looking for succeeding matches starting with the next
character in the input.

commit | commitdiff | tree

Karl Williamson [Tue, 28 Aug 2012 21:25:48 +0000 (15:25 -0600)]

Refactor \X regex handling to avoid a typical case table lookup

Prior to this commit 98.4% of Unicode code points that went through \X
had to be looked up to see if they begin a grapheme cluster; then looked
up again to find that they didn't require special handling. This commit
refactors things so only one look-up is required for those 98.4%. It
changes the table generated by mktables to accomplish this, and hence
the name of it, and references to it are changed to correspond.

commit | commitdiff | tree

Karl Williamson [Tue, 28 Aug 2012 03:50:03 +0000 (21:50 -0600)]

regexec.c: Remove no longer needed comments

These comments gave the derivation of the published Unicode algorithm
for determining what goes into \X to how it is actually implemented.

The new version of the Unicode text will be much more like what we've
implemented, so the derivation is no longer necessary; and is about to
be obsolete because of the Unicode document, and some changes to how we
process.

commit | commitdiff | tree

Steve Hay [Tue, 28 Aug 2012 17:32:14 +0000 (18:32 +0100)]

perldelta for 43ddfa5614 and 39b80fd98d.

commit | commitdiff | tree

Steve Hay [Tue, 28 Aug 2012 10:33:00 +0000 (11:33 +0100)]

Revert File::Copy::copy() to fail when copying a file onto itself

Copying a file onto itself was made a fatal error by 96a91e0163.
This was changed in 754f2cd0b9 from an undesirable croak() to return 1,
but the documentation was never changed from it being a fatal error.
It should probably have remained an error as per the documentation (but
updated not to say fatal) for consistency with cases of copying a file
onto itself via symbolic links or hard links.

commit | commitdiff | tree

Steve Hay [Mon, 27 Aug 2012 17:09:11 +0000 (18:09 +0100)]

Fix File::Copy test failure on Windows

Failure was introduced by 43ddfa5614 which looks for a warning message from
code that isn't run on Windows.

commit | commitdiff | tree

Father Chrysostomos [Tue, 28 Aug 2012 17:22:04 +0000 (10:22 -0700)]

note CPAN pod link target; regen pod issues

commit | commitdiff | tree

Father Chrysostomos [Tue, 28 Aug 2012 17:18:12 +0000 (10:18 -0700)]

perldtrace.pod: typo

commit | commitdiff | tree

Father Chrysostomos [Tue, 28 Aug 2012 17:15:08 +0000 (10:15 -0700)]

perldtrace.pod: Remove a stray =item

commit | commitdiff | tree

Father Chrysostomos [Tue, 28 Aug 2012 17:11:00 +0000 (10:11 -0700)]

Add another address for Shawn Moore to checkAUTHORS.pl

commit | commitdiff | tree

Father Chrysostomos [Tue, 28 Aug 2012 17:09:21 +0000 (10:09 -0700)]

Add t/run/dtrace.pl to MANIFEST

commit | commitdiff | tree

Shawn M Moore [Sun, 19 Aug 2012 15:12:27 +0000 (17:12 +0200)]

"loading-file" and "loaded-file" DTrace probes

commit | commitdiff | tree

Shawn M Moore [Fri, 24 Aug 2012 08:35:08 +0000 (10:35 +0200)]

"op-entry" DTrace probe

commit | commitdiff | tree

Father Chrysostomos [Tue, 28 Aug 2012 08:22:13 +0000 (01:22 -0700)]

op.c: Two more boolean %hash optimisations

In commit c8fe3bdf72 I used the wrong flag for ?:, causing it to slow
down unless the ?: was in void context.

OP_NOT has been sensitive to void context all along, which was never
necessary.

These two should be just as fast. The second should not be slower:

!%hash;
!!%hash;

commit | commitdiff | tree

Father Chrysostomos [Tue, 28 Aug 2012 08:11:30 +0000 (01:11 -0700)]

Use PL_parser->lex_shared instead of Sv[IN]VX(PL_linestr)

Unfortunately, PL_parser->linestr and PL_parser->bufptr are both
part of the API, so we can’t just move them to PL_parser->lex_shared.
Instead, we have to copy them in sublex_push, to make them visible to
inner lexing scopes.

This allows the SvIVX(PL_linestr) and SvNVX(PL_linestr) hack to
be removed.

It should also speed things up slightly. We are already allocating
PL_parser->lex_shared in sublex_push, so there should be no need to
upgrade PL_linestr to SvNVX as well.

I was pleasantly surprised to see how the here-doc code seemed to
shrink all by itself when modified to account.

PL_sublex_info.super_bufptr is also superseded by the addition of
->ls_bufptr to the LEXSHARED struct. Its old values when localised
were not visible, being stashed away on the savestack, so it was
harder to use.

commit | commitdiff | tree

Father Chrysostomos [Tue, 28 Aug 2012 07:51:27 +0000 (00:51 -0700)]

caller.t: Fix ‘Caller’ test

This string eval was always failing, leaving @c with its previous
value, which just happened to be what we were expecting.

commit | commitdiff | tree

Father Chrysostomos [Tue, 28 Aug 2012 06:03:36 +0000 (23:03 -0700)]

Stop here-docs from gutting (caller $n)[6]

(caller $n)[6] returns the text of the eval. Actually, it would
return, not the text of the eval, but the text with all the here-doc
bodies missing.

In this commit, I’m abusing the SvSCREAM flag to indicate that the
eval text stored in the context stack is refcounted.

commit | commitdiff | tree

Father Chrysostomos [Tue, 28 Aug 2012 05:38:57 +0000 (22:38 -0700)]

Stop (caller $n)[6] from including final "\n;"

String eval appends "\n;" to the string before evaluating it.
(caller $n)[6], which returns the text of the eval, was giving the
modified string, rather than the original.

In fact, it was returning the actual string buffer that the parser
uses. This commit changes it to create a new mortal SV from that
string buffer, but without the last two characters.

It unfortunately breaks this JAPH:

eval'BEGIN{${\(caller 2)[6]}=~y< !"$()+\-145=ACHMT^acfhinrsty{}>
<nlrhta"o Pe e,\nkrcrJ uthspeia">}say if+chr(1) -int"145"!=${^MATCH}'

commit | commitdiff | tree

Father Chrysostomos [Tue, 28 Aug 2012 01:26:11 +0000 (18:26 -0700)]

Fix eval 'q;;'

The parser expects a semicolon at the end of every statement, so the
lexer provides one. It was not doing so for evals ending with a
semicolon, even if that semicolon was not a statement-terminating
semicolon.

commit | commitdiff | tree

Father Chrysostomos [Tue, 28 Aug 2012 01:19:24 +0000 (18:19 -0700)]

Revert "smoke-me diag"

This reverts commit 372a31d8f53707bcfa9c233ce02a93f778b7bb4b.

I missed this when I was merging that branch. It should never have
made its way into blead. It was to find out why the Windows smokes
were temporarily failing, by dumping Config_heavy.pl in the logs.
This was what led to 0ee364945bd.

commit | commitdiff | tree

Father Chrysostomos [Tue, 28 Aug 2012 01:19:12 +0000 (18:19 -0700)]

parser.h: Document copline with more detail

It took me a while to figure this out, so here it is for
future readers.

commit | commitdiff | tree

Father Chrysostomos [Tue, 28 Aug 2012 01:16:34 +0000 (18:16 -0700)]

Fix line numbers inside here-docs

A previous commit put the number of lines in a here-doc in a separ-
ate parser field, which was added on to the line number at the next
CopLINE_inc (actually COPLINE_INC_WITH_HERELINES, now used through-
out toke.c).

Code interpolated inside the here-doc was picking up that value,
throwing line numbers off.

Before that, they were already off by one.

This commit fixes both.

I removed the CLINE from S_scan_heredoc and stopped using TERM (which
uses CLINE) for here-docs.  CLINE sets PL_copline, which is used to
pass a specific line number to newSTATEOP, which may or may not be the
same number as CopLINE(PL_curcop).  newSTATEOP grabs that number and
sets PL_copline to -1 (aka NOLINE).  I assume this was used to make
the statement containing the <<foo marker have the right line number.
But it didn’t fully work out, as subsequent statements on the same
line had the wrong number.  That I fixed a few commits ago when I
introduced herelines, making CopLINE(PL_curcop) have the right line
number for that line.  So the CLINE is not actually necessary anymore.
It was causing a problem also with the first statement inside the
heredoc (in ${...}), which would ‘steal’ the line number of the
<<foo marker.

This also means that <FH> and <.*> no longer do CLINE, but it is not
necessary, as they cannot span multiple lines.

commit | commitdiff | tree

Father Chrysostomos [Mon, 27 Aug 2012 21:56:52 +0000 (14:56 -0700)]

op.c: newSTATEOP: don’t check PL_parser after using it

If it is null, we would already have crashed when reaching this
statement.

commit | commitdiff | tree

Father Chrysostomos [Mon, 27 Aug 2012 16:18:29 +0000 (09:18 -0700)]

Add PL_parser->lex_shared struct; move herelines into it

PL_parser->herelines needs to be visible to inner lexing scopes, which
also need to have their own copy of it, so that the here-doc parser
can modify the right herelines variable corresponding to the
PL_linestr from which it is stealing its body. (A subsequent commit
will take take of that.)

commit | commitdiff | tree

Father Chrysostomos [Mon, 27 Aug 2012 06:46:09 +0000 (23:46 -0700)]

Stop invalid y/// ranges from leaking

commit | commitdiff | tree

Father Chrysostomos [Mon, 27 Aug 2012 06:27:45 +0000 (23:27 -0700)]

toke.c: Merge KEY_tr and KEY_y

commit | commitdiff | tree

Father Chrysostomos [Mon, 27 Aug 2012 06:10:28 +0000 (23:10 -0700)]

Stop unterminated here-docs from leaking memory

commit | commitdiff | tree

Father Chrysostomos [Mon, 27 Aug 2012 00:51:37 +0000 (17:51 -0700)]

[perl #114070] Fix lines nums after <<foo

The line numbers for operators after a here-doc marker on the same
line were off by the length of the here-doc.

This is because the here-doc parser would artificially increase the
line number as it went, because it was stealing lines out of the
input stream.

Instead, we can record the number of lines in the here-doc, and add it
to the line number the next time we need to increment it.

This also fixes the line numbers after s//<<END/e to the end of the
file, which were off because the line number adjusted by the <<END was
localised to the s///.

Since herelines is visible to inner lexing scopes, the outer lexing
scope can see changes made by the inner one.

The lack of localisation does cause problems with line numbers inside
quote-like operators (but they were off by one already), which will be
addressed in subsequent commits.

commit | commitdiff | tree

Karl Williamson [Mon, 27 Aug 2012 02:26:37 +0000 (20:26 -0600)]

Add utility and .h for character's UTF-8

This add regen/utf8_strings.pl takes Unicode characters and generates
utf8_strings.h to contains #defines for macros that translate from the
name to the UTF-8. This is needed in a few places, where previously
things were manually figured out and hard-coded in. Doing this instead
makes this easier, and removes EBCDIC dependencies/bugs, as the file
would simply be regen'd on an EBCDIC platform.

commit | commitdiff | tree

Karl Williamson [Thu, 16 Aug 2012 04:48:47 +0000 (22:48 -0600)]

regen/regcharclass.pl: Comment out obsolete code

Tricky folds have been removed from the code, so the removed #defines
are obsolete. I'm leaving this in, in so it can conveniently be
referred to in case we ever need it again.

commit | commitdiff | tree

Father Chrysostomos [Sun, 26 Aug 2012 19:34:43 +0000 (12:34 -0700)]

Opcode.pm: wrap long pod lines

commit | commitdiff | tree

Father Chrysostomos [Sun, 26 Aug 2012 19:29:07 +0000 (12:29 -0700)]

Increase $Opcode::VERSION to 1.24

commit | commitdiff | tree

Father Chrysostomos [Sun, 26 Aug 2012 19:28:45 +0000 (12:28 -0700)]

Remove boolkeys op

commit | commitdiff | tree

Father Chrysostomos [Sun, 26 Aug 2012 18:59:19 +0000 (11:59 -0700)]

pp_hot.c: pp_rv2av: Squash repetitive code

The LVRET that I removed (in the if(SvTYPE(sv) == type) block) actu-
ally never evaluates to true, because that block is only entered for
%hash->{elem} or @array->[elem], in which the parent op is helem or
aelem, not leavesublv or return. LVRET only returns true if the cur-
rent op is the last op in an lvalue sub. Likewise, the OPpMAYBE_LVSUB
flag is never set in that case, so checking it now is harmless (the
cases that used to enter the if(SvTyPE(sv)==type) block now fall
through to the OPpMAYBE_LVSUB check).

(Using LVRET in pp_rv2av is actually incorrect, and I corrected most instances in 40c94d11, but this one remained.)

commit | commitdiff | tree

Father Chrysostomos [Sun, 26 Aug 2012 18:10:18 +0000 (11:10 -0700)]

Croak for \local %{\%foo}

See the previous commit.

When I moved the check for local %$ref earlier, I didn’t move it
early enough.

commit | commitdiff | tree

Father Chrysostomos [Sun, 26 Aug 2012 18:06:39 +0000 (11:06 -0700)]

Restore ‘Can’t localize through ref’ to lv subs

In commit 40c94d11, I put an if statement inside an if statement,
skipping the else that followed if the outer if was true:

  if (...) {

  }
  else if {

became

  if (...) {
     ...
     if (...) {
        ...
     }
  }
  else if {

The result was that ‘Can’t localize through a reference’ no longer
happened if the operator (%{} or @{}) was the last thing in an lvalue
sub, if the lvalue sub was not called in lvalue context.

$ perl5.14.0 -e 'sub foo :lvalue { local %{\%foo} } foo(); print "ok\n"'
Can't localize through a reference at -e line 1.
$ perl5.16.0 -e 'sub foo :lvalue { local %{\%foo} } foo(); print "ok\n"'
ok

If the sub is called in lvalue context, the bug exists there, too, but
is much older (probably 82d039840b9):

$ perl5.6.2 -e 'sub f :lvalue { local %{\%foo} } (f()) =3; print "ok\n"'
Can't localize through a reference at -e line 1.
$ perl5.8.1 -e 'sub f :lvalue { local %{\%foo} } (f()) =3; print "ok\n"'
ok

The simplest solution is to change the order of the conditions.  If
the rv2hv or rv2av op is passed a reference, and has ‘local’ in front
of it (OPf_MOD && OPpLVAL_INTRO), that should die, regardless of
whether it is the last thing in an lvalue sub.

commit | commitdiff | tree

Karl Williamson [Sun, 26 Aug 2012 17:49:26 +0000 (11:49 -0600)]

Use new Unicode 6.2 beta

These supposedly are the final data files for 6.2. Earlier changes
originally proposed for 6.2 have been deferred until a later release.
Thus there is no change in the general category of ASCII characters in
these files from what they were in 6.1 and earlier, unlike what had been
proposed.

Unlike the previous experimental beta, code is now in place in Perl to
handle the revised definition of \X in 6.2. The current working draft
of that definition is at http://unicode.org/draft/reports/tr29/tr29.html

commit | commitdiff | tree

Karl Williamson [Sun, 26 Aug 2012 17:25:13 +0000 (11:25 -0600)]

Prepare for Unicode 6.2

This changes code to be able to handle Unicode 6.2, while continuing to
handle all prevrious releases.

The major change was a new definition of \X, which adds a property to
its calculation. Unfortunately \X is hard-coded into regexec.c, and so
has to revised whenever there is a change of this magnitude in Unicode,
which fortunately isn't all that often. I refactored the code in
mktables to make it easier next time there is a change like this one.

commit | commitdiff | tree

Karl Williamson [Sun, 26 Aug 2012 15:47:48 +0000 (09:47 -0600)]

mktables: Re-order some code, change comments

Unicode 6.2 is changing some of these things; this re-ordering will make
that more convenient.

commit | commitdiff | tree

Karl Williamson [Sun, 26 Aug 2012 15:29:13 +0000 (09:29 -0600)]

mktables: Correct generated table comment

commit | commitdiff | tree

Karl Williamson [Sat, 18 Aug 2012 17:44:09 +0000 (11:44 -0600)]

lib/unicore/README.perl: Make usablea s shell script

This adds comment symbols and redirects error messages to /dev/null for
likely things that will fail

commit | commitdiff | tree

Karl Williamson [Sat, 18 Aug 2012 16:01:07 +0000 (10:01 -0600)]

Revert "Experimentally Use Unicode 6.2 beta"

This reverts commit 5435c3759c4567a1bb51384f6641c04822ec6391.
A new beta has been released, and so we should use that instead.

commit | commitdiff | tree

Karl Williamson [Sun, 26 Aug 2012 17:30:57 +0000 (11:30 -0600)]

perldelta for Unicode property performance gains

commit | commitdiff | tree

Steve Hay [Sun, 26 Aug 2012 13:34:22 +0000 (14:34 +0100)]

Upgrade Socket from 2.004 to 2.006

commit | commitdiff | tree

H.Merijn Brand [Sun, 26 Aug 2012 12:52:26 +0000 (14:52 +0200)]

Add Configure probe for ip_mreq_source

Needed to upgrade Socket from CPAN

commit | commitdiff | tree

Father Chrysostomos [Sun, 26 Aug 2012 05:27:33 +0000 (22:27 -0700)]

Correct typo in flag name

commit | commitdiff | tree

Father Chrysostomos [Sun, 26 Aug 2012 01:48:46 +0000 (18:48 -0700)]

Banish boolkeys

Since 6ea72b3a1, rv2hv and padhv have had the ability to return boo-
leans in scalar context, instead of bucket stats, if flagged the right
way.  sub { %hash || ... } is optimised to take advantage of this.  If
the || is in unknown context at compile time, the %hash is flagged as
being maybe a true boolean.  When flagged that way, it returns a bool-
ean if block_gimme() returns G_VOID.

If rv2hv and padhv can already do this, then we don’t need the
boolkeys op any more.  We can just flag the rv2hv to return a boolean.
In all the cases where boolkeys was used, we know at compile time that
it is true boolean context, so we add a new flag for that.

commit | commitdiff | tree

Karl Williamson [Tue, 21 Aug 2012 14:17:51 +0000 (08:17 -0600)]

regexec.c: White-space only

Indent inside newly formed block

commit | commitdiff | tree

Karl Williamson [Tue, 21 Aug 2012 04:03:22 +0000 (22:03 -0600)]

regex: Speed up \X processing

For most Unicode releases, GCB=prepend matches absolutely nothing. And
that appears to be the case going forward, as they added things to it,
and removed them later based on field experience.

An earlier commit has improved the performance of this significantly by
using a binary search of an empty array instead of a swash hash.
However, that search requires several layers of function calls to
discover that it is empty, which this commit avoids.

This patch will use whatever swash_init() returns unless it is empty,
preserving backwards compatibility with older Unicode releases. But if
it is empty, the routine sets things up so that future calls will always
fail without further testing.

commit | commitdiff | tree

Karl Williamson [Sat, 25 Aug 2012 20:54:10 +0000 (14:54 -0600)]

utf8.c: indent in new block: White space-only

commit | commitdiff | tree

Karl Williamson [Sat, 25 Aug 2012 20:49:47 +0000 (14:49 -0600)]

utf8.c: Prefer binsearch over swash hash for small swashes

A binary swash is a hash of bitmaps used to cache the results of looking
up if a code point matches a Unicode property or regex bracketed
character class.  An inversion list is a data structure that also holds
information about what code points match a Unicode property or character
class.  It is implemented as an SV* to a sorted C array, and hence can
be searched using a binary search.

This patch converts to using a binary search of an  inversion list
instead of a hash look-up for inversion lists that are no more than 512
elements (9 iterations of the search loop).  That number can be easily
adjusted, if necessary.

Theoretically, a hash is faster than a binary lookup over a very long
period.  So this may negatively impact long-running servers.  But in the
short run, where most programs reside, the binary search is
significantly faster.

A swash is filled as necessary as time goes on, caching each new
distinct code point it is called with.  If it is called with  many, many
such code points, its performance can degrade as collisions increase.  A
binary search does not have that drawback.  However, most real-world
scenarios do not have a program being called with huge numbers of
distinct code points.  Mostly, the program will be called with code
points from just one or a few of the world's scripts, so will remain
sparse.  The bitmaps in a swash are each 64 bits long (except for ASCII,
where it is 128).  That means that when the swash is populated, a lookup
of a single code point that hasn't been checked before will have to
lookup the 63 adjoining code points as well, increasing its startup
overhead.  Of course, if one of those 63 code points is later accessed,
no extra populate happens.  This is a typical case where a languages
code points are all near each other.

The bottom line, though, is in the short term, this patch speeds up the
processing of \X regex matching about 35-40%, with modern Korean (which
has uniquely complicated \X processing) closer to 40%, and other scripts
closer to 35%.

The 512 boundary means that over 90% of the official Unicode properties
are handled using binary search.  I settled on that number by
experimenting with several properties besides \X and with various
powers-of-2 limits.  Until I got that high, performance kept improving
when the property went from being a swash to a binary search.  \X
improved even up to 2048, which encompasses 100% of the official Unicode
properties.

The implementation changes so that an inversion list instead of a swash
is returned by swash_init() when the input flags allows it to do so, for
all inversion lists shorter than the compiled in constant of 512
(actually <= 512).  The other functions that access swashes have added
intelligence to deal with an object of either type.  Should someone in
CPAN be using the public swash_init() interface, they will not see any
difference, as the option to get an inversion list is not available to
them.

commit | commitdiff | tree

Karl Williamson [Sat, 25 Aug 2012 20:51:11 +0000 (14:51 -0600)]

utf8.c: Bypass a subroutine wrapper

We might as well call the core swash initialization, since we are the
core here, since the public one merely wraps it.

commit | commitdiff | tree

Karl Williamson [Sat, 25 Aug 2012 19:27:25 +0000 (13:27 -0600)]

utf8.c: Add comment about speed-up attempt

This might keep someone later from attempting the speedup which didn't
actually help, so I didn't commit it

commit | commitdiff | tree

Karl Williamson [Sat, 25 Aug 2012 17:42:55 +0000 (11:42 -0600)]

utf8.c: Shorten hash key for speed

Experiments have shown that longer hash keys impact performance. See
the thread at
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2012-08/msg00869.html

This patch shortens a key used very frequently. There are other keys in
this hash which are used frequently in some circumstances, but I expect
to change to use fewer in the future, so am not changing them now

commit | commitdiff | tree

Karl Williamson [Sat, 25 Aug 2012 14:58:42 +0000 (08:58 -0600)]

utf8.c: collapse a function parameter

Now that we have a flags parameter, we can get put this parameter as
just another flag, giving a cleaner interface to this internal-only
function. This also renames the flag parameter to <flag_p> to indicate
it needs to be dereferenced.

commit | commitdiff | tree

Karl Williamson [Sat, 25 Aug 2012 14:06:30 +0000 (08:06 -0600)]

regexec.c: Reword comment

This portion of the comment is unnecessary, and doesn't really reflect
the implementation

commit | commitdiff | tree

Karl Williamson [Fri, 24 Aug 2012 20:38:02 +0000 (14:38 -0600)]

regexec.c: Use get method instead of internals

A new get method has been written to access the internals of a swash
it's best to use it.

This also moves the error checking to the method

commit | commitdiff | tree

Karl Williamson [Fri, 24 Aug 2012 20:20:41 +0000 (14:20 -0600)]

embed.fnc: Turn null wrapper function into macro

This function only does something on EBCDIC platforms. On ASCII ones
make it a macro, like similar ones to avoid useless function nesting

commit | commitdiff | tree

Karl Williamson [Fri, 24 Aug 2012 20:00:22 +0000 (14:00 -0600)]

utf8.c: Revise internal API of swash_init()

This revises the API for the version of swash_init() that is usable
by core Perl.  The external interface is unaffected.  There is now a
flags parameter to allow for future growth.  And the core internal-only
function that returns if a swash has a user-defined property in it or
not has been removed.  This information is now returned via the new
flags parameter upon initialization, and is unavailable afterwards.
This is to prepare for the flexibility to change the swash that is
needed in future commits.

commit | commitdiff | tree

Karl Williamson [Fri, 24 Aug 2012 17:11:57 +0000 (11:11 -0600)]

embed.fnc: Mark internal function as "may change"

This function is not designed for a public API, and should have been so
listed.

commit | commitdiff | tree

Karl Williamson [Thu, 23 Aug 2012 19:47:37 +0000 (13:47 -0600)]

Add caching to inversion list searches

Benchmarking showed some speed-up when the result of the previous
search in an inversion list is cached, thus potentially avoiding a
search in the next call. This adds a field to each inversion list which
caches its previous search result.

commit | commitdiff | tree

Karl Williamson [Sat, 18 Aug 2012 18:20:42 +0000 (12:20 -0600)]

regexec.c: Use xor to save a branch

Probably this gets optimized this way anyway.

commit | commitdiff | tree

Karl Williamson [Tue, 21 Aug 2012 16:22:00 +0000 (10:22 -0600)]

Comment out unused function

In looking at \X handling, I noticed that this function which is
intended for use in it, actually isn't used. This function may someday
be useful, so I'm leaving the source in.

commit | commitdiff | tree

Karl Williamson [Tue, 21 Aug 2012 15:30:08 +0000 (09:30 -0600)]

utf8.c: Speed up \X processing of Korean

\X matches according to a complicated pattern that is hard-coded in
regexec.c.  Part of that pattern involves checking if a code point is a
component of a Hangul Syllable or not.  For Korean code points, this
involves checking against multiple tables.  It turns out that two of
those tables are arranged so that the checks for them can be done via an
arithmetic expression; Unicode publishes algorithms for determining
various characteristics based on their very structured ordering.

This patch converts the routines that check these two tables to instead
use the arithmetic expression.

commit | commitdiff | tree

Karl Williamson [Thu, 23 Aug 2012 16:36:13 +0000 (10:36 -0600)]

regcomp.c: Move functions to inline_invlist.c

This populates inline_invlist.c with some static inline functions and
macro defines. These are the ones that are anticipated to be needed in
the near term outside regcomp.c

commit | commitdiff | tree

Karl Williamson [Thu, 23 Aug 2012 16:19:51 +0000 (10:19 -0600)]

regcomp.c: Rename 2 functions to indicate private nature

These two functions will be moved into a header in a future commit,
where they will be accessible outside regcomp.c Prefix their names with
an underscore to emphasize that they are private

commit | commitdiff | tree

Karl Williamson [Thu, 23 Aug 2012 14:37:58 +0000 (08:37 -0600)]

regcomp.c: Silence compiler warning.

The warning that this variable can be used uninitialized is spurious,
but silence it nonetheless.

commit | commitdiff | tree

Karl Williamson [Thu, 23 Aug 2012 00:30:59 +0000 (18:30 -0600)]

Add empty inline_invlist.c

This will be used for things need to handle inversion lists in the three
files that currently use them. I'm putting this in a separate hdr,
because inversion lists are very internal-only, so should not be grouped
in with things that there is an external API for. It is a dot-c file so
that the functions can continue to be declared with embed.fnc, and
porting/args_assert.t will continue to work, as it looks only in .c
files.

commit | commitdiff | tree

Karl Williamson [Tue, 21 Aug 2012 17:24:48 +0000 (11:24 -0600)]

regcomp.c: Add assertion, comments

commit | commitdiff | tree

Karl Williamson [Sat, 18 Aug 2012 20:23:12 +0000 (14:23 -0600)]

regcomp.c: Allow search to work on empty inversion lists

You cannot retrieve the array of an empty inversion list, so the code
has to be reordered to do that after the list is known to be non-empty.
I haven't been able to find a case where this currently fails, but
future commits open up the possibility.

commit | commitdiff | tree

Karl Williamson [Sat, 18 Aug 2012 18:19:00 +0000 (12:19 -0600)]

regcomp.c: Special case /[UV_MAX]/

The highest code point representable on the machine has to be special
cased. Earlier commits for 5.14 did this for ranges ending in this code
point, but it turns out there needs to be a special-special case when
the range contains just it.

commit | commitdiff | tree

Karl Williamson [Mon, 20 Aug 2012 19:28:31 +0000 (13:28 -0600)]

mktables: Fix bug when deleting final range

When a Range_List is emptied, there is a bug which causes a runtime
error when trying to refer to a non-existent element. This avoids that.
A future commit would have run afoul of this bug.

commit | commitdiff | tree

Father Chrysostomos [Sat, 25 Aug 2012 21:43:33 +0000 (14:43 -0700)]

Increase $B::Concise::VERSION to 0.93

commit | commitdiff | tree

Father Chrysostomos [Sat, 25 Aug 2012 20:22:46 +0000 (13:22 -0700)]

Optimise %hash in sub { %hash || ... }

In %hash || $foo, the %hash is in scalar context, so it has to iterate
through the buckets to produce statistics on bucket usage.

If the || is in void context, the value returned by hash is only ever
used as a boolean (as || doesn’t have to return it).  We already opti-
mise it by adding a boolkeys op when it is known at compile time that
|| will be in void context.

In sub { %hash || $foo } it is not known at compile time that it will
be in void context, so it wasn’t optimised.

This commit optimises it by flagging the %hash at compile time as
being possibly in ‘true boolean’ context.  When that flag is set,
the rv2hv and padhv ops call block_gimme() to see whether || is in
void context.

This speeds things up signficantly.  Here is what I got after optimis-
ing rv2hv but before doing padhv:

$ time ./miniperl -e '%hash = 1..10000; sub { %hash || 1 }->() for 1..100000'

real 0m0.179s
user 0m0.101s
sys 0m0.005s
$ time ./miniperl -e 'my %hash = 1..10000; sub { %hash || 1 }->() for 1..100000'

real 0m5.446s
user 0m2.419s
sys 0m0.015s

(That example is slightly misleading because of the closure, but the
closure version takes 1 sec. when optimised.)

commit | commitdiff | tree

Yves Orton [Sat, 25 Aug 2012 16:35:25 +0000 (18:35 +0200)]

improve and fix the documentation of the PERL_HASH function

commit | commitdiff | tree

Yves Orton [Sat, 25 Aug 2012 10:28:38 +0000 (12:28 +0200)]

minor doc patches to api stuff

commit | commitdiff | tree

Father Chrysostomos [Sat, 25 Aug 2012 07:12:26 +0000 (00:12 -0700)]

Apply boolkeys optimisation to %hash?:

and consequently if(%hash) followed by else.

commit | commitdiff | tree

Father Chrysostomos [Sat, 25 Aug 2012 07:07:21 +0000 (00:07 -0700)]

Apply boolkeys optimisation to scalar(%hash)

commit | commitdiff | tree

Father Chrysostomos [Sat, 25 Aug 2012 06:52:36 +0000 (23:52 -0700)]

[perl #114576] Optimise if(%hash) in non-void context

The boolkeys optimisation (867fa1e2da1) was only applying to an and
(or if) in void context.  If an if occurs as the last thing in a sub-
routine, the void context is not know at compile time so the optimisa-
tion does not apply.

In the case of || (to which the boolkeys optimisation also applies),
we can’t optimise it in non-void context, because someone might be
writing $bucket_info = %hash || '0/0';

In the case of &&, we can optimise it, even in non-void context,
because a true value will always be discarded in %hash && foo.
The false value it returns for an empty hash is always the int-
eger 0.  That would change if we simply applied boolkeys to
my $ret = %hash && foo; because boolkeys return &PL_sv_no (the dualvar
you get from !1).  But since boolkeys’ return value is never directly
visible to perl code, we can safely change that.

commit | commitdiff | tree

Father Chrysostomos [Sat, 25 Aug 2012 06:03:44 +0000 (23:03 -0700)]

pp.c: pp_boolkeys does not need to pop

If it’s going to consume and return exactly one item, it doesn’t need
to decrement and increment the stack pointer.

commit | commitdiff | tree

Daniel Dragan [Fri, 24 Aug 2012 21:07:59 +0000 (17:07 -0400)]

[perl #114572] perl.c: fix locality/rmv redundant nulls in call_sv/eval_sv

Small tweaks to improve locality/more opportunity for C compiler to
optimize. Also remove redunant nulls, since the OP structs are
null filled a line or 2 before.

commit | commitdiff | tree

Father Chrysostomos [Fri, 24 Aug 2012 19:39:40 +0000 (12:39 -0700)]

parser.t: Move a test above ‘Add new tests here’

commit | commitdiff | tree

Father Chrysostomos [Fri, 24 Aug 2012 16:33:51 +0000 (09:33 -0700)]

pad.h: Rename PadnameSTATE; make it a proper boolean

I used PadnameIs* for OUR, because I was copying
PAD_COMPNAME_FLAGS_isOUR. STATE should be consistent with it. And it
was missing the double bang, making the docs wrong.

Domain: System / Base;