Smylers [Thu, 5 Sep 2013 01:39:33 +0000 (03:39 +0200)]
When sending an email manually so it can have multiple patches, point out
that Mutt can construct the email for you.
Obviously this isn't as generally relevant as the rest of the Guide, since
patchers will use many different mail clients. But it's a significant boon
for those who do use Mutt, and a very short addition to the Guide.
Mutt is singled out simply because it has this functionality; I suspect
that most other widely used mail clients don't.
Committer: Removed trailing whitespace. Applied patch manually because other
lines in the file had been rebroken and patch no longer applied cleanly.
For: RT #119599
Smylers [Wed, 4 Sep 2013 11:37:36 +0000 (12:37 +0100)]
Multiple commits in Super Quick Patch Guide
How to use perlbug when a change is a series of commits, not a single
commit.
This is the advice RJBS gave me over IRC. Including it in the guide should
avoid him having to repeat the advice to others.
Committer: Added single quotes around one keyboard command.
For: RT #119599
Smylers [Wed, 4 Sep 2013 10:58:34 +0000 (11:58 +0100)]
Resetting a check out in Super Quick Patch Guide
Add advice so that somebody wishing to submit a second patch doesn't need
to throw away their perl check-out and start again.
Not knowing the 'git clean' step caught me out, and meant perl wouldn't
build for me. Nicholas helped me out. Adding this to the guide will
hopefully save Nicholas from having to repeat that for others (especially
since others may not be fortunate enough to have Nicholas handily seated
next to them at the point they encounter it).
(The non-building was because some things in the repository had been
re-arranged since my previous patch (several months earlier), and the
latest build was getting confused by some files left over from a
pre-re-arragned build.)
The 'git clean' step will also remove the first 0001-*.patch file, avoiding
the problem of there being two files matching that glob when attaching the
second patch.
Committer: Removed trailing whitespace.
For: RT #119599
Smylers [Wed, 4 Sep 2013 10:37:58 +0000 (11:37 +0100)]
Suggest reading blead's Super Quick Patch Guide
The Super Quick Patch Guide has been improved several times. Suggest that
a patcher looks at the latest version in the checkout of blead they've
just made, in case that's been improved since whichever released version
they were reading.
This has caught me out before: I've done something sub-optimal (for me or
those reviewing my patches) by diligently following out-of-date
instructions.
Remove trailing whitespace.
For: RT #119599
Smylers [Wed, 4 Sep 2013 09:54:15 +0000 (10:54 +0100)]
perlbug command wrapped to fit in 79 columns
To pass t/porting/podcheck.t --pedantic
The line-break is inside a $(...), so the lines can be copied-and-pasted
as they are, complete with line-break and extra spaces, and still give the
same output.
Remove trailing whitespace.
For: RT #119599
Smylers [Wed, 4 Sep 2013 09:49:41 +0000 (10:49 +0100)]
Have perlbug report version being patched
In the Super Quick Patch Guide, run the perlbug and perl from the working
copy that the patch is against, so the bug report contains relevant
version and configuration data, rather than that of whichever system perl
the reporter happens to have installed.
For: RT #119599
Smylers [Wed, 4 Sep 2013 10:32:24 +0000 (11:32 +0100)]
Consistent indent on shell commands
Most verbatim lines with shell commands had 2 spaces before the % prompt.
A few had 1 or 4 spaces. Make them all 2.
Remove trailing whitespace.
For: RT #119599
Smylers [Wed, 4 Sep 2013 09:31:25 +0000 (10:31 +0100)]
perlhack.pod tidied
In accordance with the comment at the top of the file, before I make other
changes to it.
Remove trailing whitespace.
For: RT #119599
Steve Hay [Wed, 4 Sep 2013 23:28:56 +0000 (00:28 +0100)]
Upgrade Socket from version 2.011 to 2.012
Nicholas Clark [Wed, 4 Sep 2013 11:20:37 +0000 (13:20 +0200)]
The bisect tool now takes test scripts as targets, and runs them with t/TEST
This makes it easier to bisect when test scripts started failing.
Reini Urban [Tue, 27 Aug 2013 16:52:28 +0000 (11:52 -0500)]
[perl #119481] Check SvVALID for !SvSCREAM, skip PAD
SVpad_NAME = SVp_SCREAM|SVpbm_VALID
Subsequently OUR, TYPED and STATE pads all have SVp_SCREAM set.
SVpad_NAME shares the same bit with SVpbm_VALID, so avoid checking
PADs for SVpbm_VALID.
Smylers [Tue, 3 Sep 2013 13:52:12 +0000 (14:52 +0100)]
Restore perlrepository.pod in stub form
Give Perl doc sites a sane ‘latest’ version to display, directing readers
to current information, rather than showing the Perl 5.12 version in
perpetuity.
And help anybody typing man perlrepository find where the docs have moved
to.
Suggested by Father Chrysostomos in:
http://www.nntp.perl.org/group/perl.perl5.porters/2013/09/msg207079.html
Zefram [Tue, 3 Sep 2013 19:00:22 +0000 (20:00 +0100)]
Carp-1.32 has been released to CPAN
Steve Hay [Tue, 3 Sep 2013 07:48:47 +0000 (08:48 +0100)]
version has been upgraded from version 0.9903 to 0.9904
John Peacock [Mon, 2 Sep 2013 22:49:50 +0000 (18:49 -0400)]
Sync core with CPAN version.pm release
Remove pointless diag lines, which were more trouble than they were
worth. Add code to ensure that SV's with magic are handled properly,
and include a test for it as well. A couple of whitespace changes and
one last set of I32 -> SSize_t upgrade for array indices.
Craig A. Berry [Mon, 2 Sep 2013 21:19:41 +0000 (16:19 -0500)]
Another reason for home-grown kill() on VMS.
For some time now Perl has provided its own kill() function on VMS
due to various problems with the system-supplied one, notably that
when called from within a signal handler, the second signal never
got delivered. This has at long last been corrected in the OS
as of the VMS84I_SYS V3.0 ECO.
But this exposes another problem with the CRTL's kill(), which is
that when called with a signal value of 0 it actually kills the
running program instead of restricting itself to error checking
as the standard requires. This turns out to be documented behavior
and the documented workaround is to define the _POSIX_EXIT macro.
However, universally changing the behavior of the exit() function
in order to prevent
kill(getpid(),0);
from bringing down the program that calls it doesn't seem like the
right trade-off. So just add one more condition to the list of
conditions under which we'll use our own kill().
Karl Williamson [Mon, 2 Sep 2013 17:17:13 +0000 (11:17 -0600)]
toke.c: Clarify comment
Brian Fraser [Mon, 2 Sep 2013 17:07:53 +0000 (14:07 -0300)]
t/op/for.t: Skip a test if the require for XS::APItest fails
Father Chrysostomos [Mon, 2 Sep 2013 15:29:50 +0000 (08:29 -0700)]
Don’t assume targs are contiguous for ‘my $x; my $y’
In commit
18c931a3, the padrange optimisation was prevented from mak-
ing this assumption for ‘my ($x,$y)’, but the assumption was still
there in the code that combines multiple statements into one.
This would lead to assertion failures (or, as of ce0d59f, crashes
under non-debugging builds) if a keyword plugin declined to handle
the second ‘my’, but only after creating a padop.
This fixes a regression from 5.16 affecting Devel::CallParser under
threaded builds.
Nicholas Clark [Mon, 2 Sep 2013 14:04:18 +0000 (16:04 +0200)]
Merge the changes to the internals of match variables.
Nicholas Clark [Thu, 29 Aug 2013 11:12:00 +0000 (13:12 +0200)]
Add a perldelta entry for the changes to the internals of match variables.
Nicholas Clark [Thu, 29 Aug 2013 10:56:27 +0000 (12:56 +0200)]
Simplify some code in Perl_magic_get() and Perl_magic_set().
Remove the checks that avoided confusing $^P, ${^PREMATCH} and ${^POSTMATCH}
now that the latter two do not take that code path. Remove a similar
check for $^S added by commit
4ffa73a366885f68 (Feb 2003). (This commit did
not add any other variable starting with a control-S.) This eliminates all
uses of the variable remaining.
Move the goto target do_numbuf_fetch inside the checks for PL_curpm, as
both its comefroms have already made the same check.
Nicholas Clark [Thu, 29 Aug 2013 10:33:46 +0000 (12:33 +0200)]
Remove now unused $` $' ${^MATCH} ${^PREMATCH} ${^POSTMATCH} code.
The previous commit's changes to Perl_gv_fetchpvn_flags() rendered this
code in Perl_magic_get() and Perl_magic_set() unreachable.
Nicholas Clark [Thu, 29 Aug 2013 10:16:11 +0000 (12:16 +0200)]
Store all other match vars in mg_len instead of mg_ptr/mg_len.
Perl_gv_fetchpvn_flags() now stores the appropriate RX_BUFF_IDX_* constant
in mg_len for $` $' ${^MATCH} ${^PREMATCH} and ${^POSTMATCH}
This makes some code in mg.c unreachable and hence unnecessary; the next
commit will remove it.
Nicholas Clark [Wed, 28 Aug 2013 20:00:54 +0000 (22:00 +0200)]
Store the match vars in mg_len instead of calling atoi() on mg_ptr.
The match variables $1, $2 etc, along with many other special scalars, have
magic type PERL_MAGIC_sv, with the variable's name stored in mg_ptr. The
look up in mg.c involved calling atoi() on the string in mg_ptr to get the
capture buffer as an integer, which is passed to the regex API.
To avoid this repeated use of atoi() at runtime, change the storage in the
MAGIC structure for $1, $2 etc and $&. Set mg_ptr to NULL, and store the
capture buffer in mg_len. Other code which manipulates magic ignores mg_len
if mg_ptr is NULL, so this representation does not require changes outside
of the routines which set up, read and write these variables.
(Perl_gv_fetchpvn_flags(), Perl_magic_get() and Perl_magic_set())
Steve Hay [Mon, 2 Sep 2013 07:41:49 +0000 (08:41 +0100)]
perldelta - Fill in some TODOs, wrap lines etc.
Father Chrysostomos [Sun, 1 Sep 2013 21:51:29 +0000 (14:51 -0700)]
toke.c:scan_const: Don’t use PL_bufptr
PL_bufptr is passed in as an argument, yet scan_const was some-
times looking at its argument (start) and sometimes using PL_bufptr
directly. This is just confusing.
Father Chrysostomos [Sun, 1 Sep 2013 21:47:38 +0000 (14:47 -0700)]
Teach mro code about null array elements
This is part of ticket #119433.
Commit ce0d49f changed AVs to use NULL for nonexistent elements. The
mro lookup code was not accounting for that, causing Class::Contract’s
tests to crash (and perhaps other modules, too).
Father Chrysostomos [Sun, 1 Sep 2013 21:26:29 +0000 (14:26 -0700)]
Refactor some parser.t line number tests
The check() function is designed to check the file set by #line, but
the last half dozen tests have no need to test for that. (I know
because I wrote them.) So make a new check_line function that just
checks the line number, and have them use that.
Father Chrysostomos [Sun, 1 Sep 2013 20:49:33 +0000 (13:49 -0700)]
Fix debugger lines with keyword <newline> =>
Commit 2179133 (in 5.19.2) modified the parser to look past newlines
when searching for => after a keyword. In doing so, it stopped the
parser from saving lines correctly for the debugger:
$ PERL5DB='sub DB::DB{}' perl5.18.1 -detime -e'=>;' -e 'print @{"_<-e"}'
sub DB::DB{};
time
=>;
print @{"_<-e"}
$ PERL5DB='sub DB::DB{}' perl5.19.3 -detime -e'=>;' -e 'print @{"_<-e"}'
sub DB::DB{};
=>;
print @{"_<-e"}
Notice how line 1 is missing in 5.19.3.
When peeking ahead past the end of the line, lex_read_space does need
to avoid incrementing the line number from the caller’s (yylex’s) per-
spective, but it still needs to increment it for lex_next_chunk to put
the lines for the debugger in the right slot. So this commit changes
lex_read_space to increment the line number but set it back again
after calling lex_next_chunk.
Another problem was that the buffer pointer was being restored for a
keyword followed by a line break, but only if there was *no* fat arrow
on the following line.
Father Chrysostomos [Sun, 1 Sep 2013 20:33:49 +0000 (13:33 -0700)]
line_debug.t: Add diagnostics
Father Chrysostomos [Sun, 1 Sep 2013 07:30:59 +0000 (00:30 -0700)]
Fix two line numbers bugs involving quote-like ops
I was going to try and fix #line directives in quote-like operators,
but I found myself fixing bug #3643 at the same time.
Before this commit, the #line directive would last until the end of
the quote-ilke operator, but not beyond:
qq{${
print __LINE__,"\n"; # 43
}};
print __LINE__,"\n"; # 5
The old method:
The lexer would scan to find the closing delimiter after seeing qq{,
incrementing the line number (CopLINE(PL_curcop)) as it went.
Then it would enter a scope for parsing the contents of the string,
with the line number localised and reset to the line of the qq{.
When it finished parsing the contents of qq{...}, it would then pop
the scope, restoring the previous value of the line number.
According to the new method:
When scanning to find the ending delimiter for qq{, the lexer still
increments CopLINE(PL_curcop), but then sets it back immediately to
the line of the first delimiter.
When parsing the contents of qq{...}, the line number is *not* local-
ised. Instead, that’s when we increment CopLINE(PL_curcop) for real.
Hence, scan_str no longer increments the line number (except before
the starting delimiter). It is up to callers to handle that *or* call
sublex_push.
There is some special handling for here-docs. Here-docs’ line numbers
have to increase while the body of the here-doc is being parsed, but
then rewound back to the here-doc marker (<<END) when the code after
it on the same line is parsed. Then when the next line break is
reached, the line number is incremented by the appropriate number for
it to hop over the here-doc body. We already have a mechanism for
that, storing the number of lines in lex_shared->herelines.
Parsing of here-docs still happens the old way, with line num-
bers localised to the right scope. But now we have to move
lex_shared->herelines into the inner scope’s lex_shared struct when
parsing a multiline quote other than a here-doc.
One thing this commit does not handle yet is #line inside a here-doc.
Bug #3643 was one symptom of a larger problem:
During the parsing of the contents of a quote-like operator, the
(localised) line number was incremented only in embedded code snip-
pets, not in constants parts of the string. So
"${ warn __LINE__,
__LINE__,
__LINE__ }"
would correctly give ‘123’. But this would produce the same
incorrectly:
"
foo
bar
baz
${ warn __LINE__,
__LINE__,
__LINE__ }"
Now the parsing of the contents of the string increments the line num-
ber in constant parts, too.
Father Chrysostomos [Sun, 1 Sep 2013 00:47:23 +0000 (17:47 -0700)]
[perl #115768] improve (caller)[2] line numbers
warn and die have special code (closest_cop) to find a nulled
nextstate op closest to the warn or die op, to get the line number
from it. This commit extends that capability to caller, so that
if (1) {
foo();
}
sub foo { warn +(caller)[2] }
shows the right line number.
Father Chrysostomos [Sat, 31 Aug 2013 13:44:12 +0000 (06:44 -0700)]
test.pl:runperl: more portability warnings
VMS treats initial < > | 2> and trailing & as special in command
line arguments, so we should avoid them in tests.
Father Chrysostomos [Fri, 30 Aug 2013 21:50:43 +0000 (14:50 -0700)]
toke.c: Reorder checks around deprecate_escaped_meta
ckWARN_d involves a function call, so put faster checks first.
Father Chrysostomos [Fri, 30 Aug 2013 03:34:04 +0000 (20:34 -0700)]
perl5200delta: Remove Function::Parameters
1.0202 works with bleadperl.
Father Chrysostomos [Thu, 29 Aug 2013 07:49:10 +0000 (00:49 -0700)]
Mention Tk in perl5200delta
See:
https://rt.cpan.org/Ticket/Display.html?id=88210
https://rt.perl.org/rt3//Public/Bug/Display.html?id=118189
Chris 'BinGOs' Williams [Sun, 1 Sep 2013 14:38:07 +0000 (15:38 +0100)]
Update parent to CPAN version 0.227
[DELTA]
0.227
20130991
. Fix RT #88320, restore tests passing for 5.17.5+
Thanks to Zefram for the report and contributing the fix
Steve Hay [Sun, 1 Sep 2013 14:01:54 +0000 (15:01 +0100)]
perldelta - CPAN::Meta::Requirements has been upgraded
Steve Hay [Sun, 1 Sep 2013 13:59:01 +0000 (14:59 +0100)]
Upgrade Unicode::Collate from version 0.98 to 0.99
Steve Hay [Sun, 1 Sep 2013 13:57:33 +0000 (14:57 +0100)]
Upgrade Scalar-List-Utils from version 1.31 to 1.32
Chris 'BinGOs' Williams [Sun, 1 Sep 2013 10:45:24 +0000 (11:45 +0100)]
Update Module-Load-Conditional to CPAN version 0.58
[DELTA]
Changes for 0.58 Sun Sep 1 11:21:59 BST 2013
=================================================
* RT#83728 make quoting work portably and remove
prototypes from one-liner in requires()
Karl Williamson [Sat, 31 Aug 2013 18:42:46 +0000 (12:42 -0600)]
lib/locale.t: Refactor some tests
The tests were to make sure that UTF-8 is returned when it should be.
This makes somewhat these somewhat cleaner
Karl Williamson [Sat, 24 Aug 2013 18:59:46 +0000 (12:59 -0600)]
Make printf, sprintf respect 'use locale' for radix
When called from outside the lexical scope of 'use locale', these now
always print a dot for the decimal point character.
This change is actually done in Perl_sv_vcatpvfn_flags, which is common
to many things, but the principal external effect that I could determine
is on printf and sprintf.
Without this change, unrelated code can change the locale, thus
affecting what an unsuspecting application prints.
Karl Williamson [Sat, 24 Aug 2013 18:42:28 +0000 (12:42 -0600)]
lib/locale.t: Add a bunch of tests
I looked through the online standard and added as many conformance tests
as I could think of.
Karl Williamson [Sat, 24 Aug 2013 18:30:34 +0000 (12:30 -0600)]
lib/locale.t: Display unassigned chars
This adds debug output to list the characters that aren't matched by any
posix class.
Karl Williamson [Sat, 24 Aug 2013 18:28:19 +0000 (12:28 -0600)]
lib/locale.t: Change debug output
This combines the output so that all characters are shown in code point
order, with only ASCII alphanumerics displayed literally. This is
clearer.
Karl Williamson [Sat, 24 Aug 2013 18:25:01 +0000 (12:25 -0600)]
lib/locale.t: Display :punct: characters under debug mode
This class of characters was previously omitted
Karl Williamson [Sat, 24 Aug 2013 18:14:43 +0000 (12:14 -0600)]
lib/locale.t: Use hash keys instead of many arrays
This implementation detail allows easier handling of things as a whole.
We could save code by adding evals, but I'm trying to not add the added
complexity of evals to the tests here.
Karl Williamson [Mon, 15 Jul 2013 02:38:17 +0000 (20:38 -0600)]
More changes to perllocale and POSIX.pod setlocale
These address some concerns from John Peacock.
Chris 'BinGOs' Williams [Sat, 31 Aug 2013 10:21:15 +0000 (11:21 +0100)]
Update CPAN-Meta-Requirements to CPAN version 2.123
[DELTA]
2.123 2013-08-30 12:17:14 America/New_York
[Fixed]
- On Perls prior to v5.12, CPAN::Meta::Requirements will be installed
into the 'core' library path to avoid an older version bundled with
ExtUtils::MakeMaker and installed there taking precedence.
Craig A. Berry [Fri, 30 Aug 2013 15:23:41 +0000 (10:23 -0500)]
Use explicit glob in concise.t.
This was sending a Perl program consisting entirely of '<.>' to
runperl, which on VMS does:
$ perl -e "<.>"
Can't open input file .> as stdin
%RMS-E-FNF, file not found
because the CLI strips the quotes and then the home-grown
redirection code sees the '<' as an invitation to redirect '.>'
to stdin. That's not readily fixable, so just dodge it here.
Steve Hay [Fri, 30 Aug 2013 07:19:58 +0000 (08:19 +0100)]
Upgrade Module::Load::Conditional from version 0.54 to 0.56
Steve Hay [Thu, 29 Aug 2013 21:12:31 +0000 (22:12 +0100)]
perldelta - Note upgrades to Encode and ExtUtils::ParseXS
Zefram [Thu, 29 Aug 2013 21:00:07 +0000 (22:00 +0100)]
preserve $! and $^E in Carp
Carp::longmess and Carp::shortmess now explicitly localise these status
variables, for the reason described in the new paragraph of documentation.
Chris 'BinGOs' Williams [Thu, 29 Aug 2013 18:30:07 +0000 (19:30 +0100)]
Update Encode to CPAN version 2.54
[DELTA]
$Revision: 2.54 $ $Date: 2013/08/29 16:47:39 $
! Encode.xs
+ t/cow.t
Addressed: COW breakage with _utf8_on()
https://rt.cpan.org/Ticket/Display.html?id=88230
! Encode.pm
Reverted the document accordingly to #11
https://github.com/dankogai/p5-encode/pull/10
+ t/decode.t
Unit test for decoding behavior change in #11
https://github.com/dankogai/p5-encode/pull/12
2.53 2013/08/29 15:20:31
! Encode.pm
Merged: Do not short-circuit decode_utf8 with utf8 flags
https://github.com/dankogai/p5-encode/pull/11
Merged: document decode_utf8 behaviour more precise
https://github.com/dankogai/p5-encode/pull/10
! Makefile.PL
Added repository cpan metadata
https://github.com/dankogai/p5-encode/pull/9
Chris 'BinGOs' Williams [Thu, 29 Aug 2013 18:27:59 +0000 (19:27 +0100)]
Update ExtUtils-ParseXS to CPAN version 3.22
[DELTA]
3.22 - Thu Aug 29 19:30:00 CET 2013
- Fix parallel testing crashes.
- Explicitly require new-enough Exporter.
Steve Hay [Thu, 29 Aug 2013 17:09:59 +0000 (18:09 +0100)]
Better check for the fork emulation in t/win32/signal.t
The d_pseudofork Configure variable hasn't been around all that long so
isn't suitable for use in dual-lived module tests, but is good for use in
core tests.
(t/win32/runenv.t doesn't do this since it is actually PERL_IMPLICIT_SYS
rather than the fork emulation which that test requires.)
Karl Williamson [Wed, 21 Aug 2013 03:51:23 +0000 (21:51 -0600)]
Allow trie use for /iaa matching
This adds code so that tries can be formed under /iaa, which formerly
weren't handled. A problem occurs when the string contains the LATIN
SMALL LETTER SHARP S when the regex pattern is not UTF-8 encoded. I
tried several ways to get this to work easily, but ended up deciding it
was too hard, to in this one situation, a new regnode is created to
prevent the trie code from even trying to turn it into a trie.
Karl Williamson [Wed, 21 Aug 2013 03:43:03 +0000 (21:43 -0600)]
Remove no longer necessary constants
These character constants were used only for a special edge case in trie
construction that has been removed -- except for one instance in
regexec.c which could just as well be some other character.
Karl Williamson [Wed, 21 Aug 2013 03:23:59 +0000 (21:23 -0600)]
Remove newly unnecessary regnode, code
The previous commit fixed things up so that this work-around regnode
doesn't have to exist; nor the work around for the EXACTFU_SS regnode
Karl Williamson [Wed, 21 Aug 2013 02:55:44 +0000 (20:55 -0600)]
regcomp.c: Create better estimate of trie match lengths
This commit improves the estimate of the length of a string that a trie
matches. Before this, the estimate gave more false positives, and
required some workarounds which are no longer necessary, and future
commits will remove.
The ultimate answer is to know precisely what will be matched. As noted
in the comments, this information is already largely available in a
global variable. But more work there needs to be done to complete it,
and make it conveniently accessible.
Karl Williamson [Tue, 20 Aug 2013 04:55:14 +0000 (22:55 -0600)]
regcomp.c: Split count variable into two: min, max
This is in preparation in later commits for the min and max to diverge.
This also renames the two variables to emphasize that bytes are what are
being counted, not characters.
Karl Williamson [Tue, 20 Aug 2013 04:38:43 +0000 (22:38 -0600)]
fold_grind.t: Modify trie test
The trie tests just add an alternation of a fixed string. This commit
makes that string the same number of bytes as the first alternative, in
an effort to not bias the test. Otherwise, something that might
otherwise appear to be too short might be long enough to match the fixed
string, defeating properly testing the length.
Karl Williamson [Mon, 19 Aug 2013 19:34:23 +0000 (13:34 -0600)]
regcomp.c: White-space, comment only
Fit into 79 columns, add comment
Karl Williamson [Mon, 19 Aug 2013 18:15:56 +0000 (12:15 -0600)]
regcomp.c: Remove unreachable code
This code no longer gets executed, as the single multi-char fold in the
Latin1 range is pre-folded before this code sees it. The surrounding
code didn't properly handle multi-char folds either.
Not having to deal with this allows us to not have to call the general
purpose function, but we do have to deal with one edge case
Karl Williamson [Mon, 19 Aug 2013 18:09:54 +0000 (12:09 -0600)]
regexec.c: Add comments, assertions
Karl Williamson [Mon, 19 Aug 2013 18:01:37 +0000 (12:01 -0600)]
regcomp.c: White-space only
Reflow comment to fit in 79 columns
Karl Williamson [Mon, 19 Aug 2013 17:57:52 +0000 (11:57 -0600)]
utf8.c: Add comment
Karl Williamson [Sun, 18 Aug 2013 15:00:11 +0000 (09:00 -0600)]
utf8.c: Add omitted fold cases
The LATIN SMALL LETTER SHARP S can't fold to 'ss' under /iaa because the
definition of /aa prohibits it, but it can fold to two consecutive
instances of LATIN SMALL LETTER LONG S. A capital sharp s can do the
same, and that was fixed in
1ca267a5, but this one was overlooked then.
It turns out that another possibility was also overlooked in
1ca267a5.
Both U+FB05 (LATIN SMALL LIGATURE LONG S T) and U+FB06 (LATIN SMALL
LIGATURE ST) fold to the string 'st', except under /iaa these folds are
prohibited. But U+FB05 and U+FB06 are equivalent to each other under
/iaa. This wasn't working until now. This commit changes things so
both fold to FB06.
This bug would only surface during /iaa matching, and I don't believe
there are any current code paths which lead to it, hence no tests are
added by this commit. However, a future commit will lead to this bug,
and existing tests find it then.
Karl Williamson [Sun, 18 Aug 2013 14:51:42 +0000 (08:51 -0600)]
utf8.h: White space only
Vertically align the definitions of a few #defines
Karl Williamson [Sun, 18 Aug 2013 14:50:34 +0000 (08:50 -0600)]
utf8.h, unicode_constants.h: Add some #defines.
These will be used in a future commit
Nicholas Clark [Wed, 28 Aug 2013 14:14:21 +0000 (16:14 +0200)]
In Perl_magic_setdbline, replace the use of atoi() with sv_2iv().
The value on which atoi() is called is actually always the buffer of an SV.
Hence we can use sv_2iv() instead.
Karl Williamson [Thu, 29 Aug 2013 15:57:59 +0000 (09:57 -0600)]
Merge branch 'ebcdic' into blead
Work on getting EBCDIC to work again in blead has slowed to a standstill
due to lack of summertime tuits. I've gotten concerned about bit rot,
and with the Pumpking's permission am merging in this commit the portion
most subject to bit rot. This includes almost all the changes in the
core C language files and a few modules. Omitted from this merge are
most test changes, a very few C language changes that for various
reasons aren't ready for merge, and most module changes, as well as any
totally new files. These will be merged sometime in the future.
With this merge and regenerating some tables, Perl mostly works with
EBCDIC on z/OS, even if many tests fail because they are testing for
ASCII-specific behavior.
This branch also isolates into just a few files the need to
differentiate between running on an ASCII versus an EBCDIC platform.
This will allow easier ripping out of EBCDIC code should we decide to do
so in the future, as well as making it easier to decide to leave it in,
as it now affects only a small amount of code.
One of the major reasons that ASCII and EBCDIC had to be distinguished
in code is that there were two sets of functions, one that worked on
native code points; the other on Unicode code points; the latter was
used when working with Unicode properties. To use the latter, one had
to convert to/from Unicode. This branch collapses those functions by
changing mktables to generate the Unicode property tables in terms of
the native character set. (This was a fairly simple change). Now, only
one set of functions is needed (the other is deprecated in this merge or
will be deprecated in later commits), and the conversions are almost
entirely avoided. Fortunately, most CPAN code did not bother with
distinguishing the two function sets, and so the deprecation affects
only a few modules.
Most of the "#ifdef EBCDIC" lines are removed, retained in only a few
files, most notably toke.c. These are required there for dealing with the
discontinuities in EBCDIC of the A-Z range, specifically in parsing
tr/// commands. (There are also some in utf8.[ch] for the differences
between UTF-8 and UTF-EBCDIC.) And a few smattered in other files,
mostly for performance.
The other major reason for ASCII/EBCDIC differences was due to UTF-8 vs
UTF-EBCDIC. New macros are created and used to hide more of those
differences from code than before.
Quite a few bugs that were only on EBCDIC platforms are now fixed.
These escaped earlier detection because we had no such platform to test
on.
Karl Williamson [Wed, 26 Jun 2013 21:30:59 +0000 (15:30 -0600)]
utf8.c: Move some code around for speed
This is a micro optimization. We now check for a common case and return
if found, before checking for a relatively uncommon case.
Karl Williamson [Wed, 26 Jun 2013 18:05:24 +0000 (12:05 -0600)]
utf8.h: Fix UTF8_IS_SUPER defn for EBCDIC
The parentheses were misplaced, so it wasn't looking at the second byte
of the input string properly.
Karl Williamson [Sat, 4 May 2013 19:29:15 +0000 (13:29 -0600)]
pp.c, regexec.c: Declare buffers large enough
These three buffers are not declared with the proper size. There is
a #define available to use, so use it. These matter only on EBCDIC
platforms, where the one in pp.c prior to this commit could cause a
buffer overrun there.
The others shouldn't because what is being used is known (smaller) size.
Karl Williamson [Sun, 28 Apr 2013 04:14:02 +0000 (22:14 -0600)]
utf8.c: Remove wrapper functions.
Now that the Unicode data is stored in native character set order, it is
rare to need to work with the Unicode order. Traditionally, the real
work was done in functions that worked with the Unicode order, and
wrapper functions (or macros) were used to translate to/from native.
There are two groups of functions: one that translates from code point
to UTF-8, and the other group goes the opposite direction.
This commit changes the base function that translates from UTF-8 to code
point to output native instead of Unicode. Those extremely rare
instances where Unicode output is needed instead will have to hand-wrap
calls to this function with a translation macro, as now described in the
API pod. Prior to this, it was the other way, the native was wrapped,
and the rare, strict Unicode wasn't. This eliminates a layer of
function call overhead for a common case.
The base function that translates from code point to UTF-8 retains its
Unicode input, as that is more natural to process. However, it is
de-emphasized in the pod, with the functionality description moved to
the pod for a native input wrapper function. And, those wrappers are
now macros in all cases; previously there was function call overhead
sometimes. (Equivalent exported functions are retained, however, for XS
code that uses the Perl_foo() form.)
I had hoped to rebase this commit, squashing it with an earlier commit
in this series, eliminating the use of a temporary function name change,
but the work involved turns out to be large, with no real payoff.
Karl Williamson [Tue, 30 Apr 2013 15:13:35 +0000 (09:13 -0600)]
perlapi vis utf8.c: Nits
Karl Williamson [Sat, 20 Apr 2013 23:04:08 +0000 (17:04 -0600)]
gv.c: Add comment
Karl Williamson [Tue, 30 Apr 2013 14:04:45 +0000 (08:04 -0600)]
utf8.c: Move 2 functions to earlier in file
This moves these two functions to be adjacent to the function they each
call, thus keeping like things together.
Karl Williamson [Fri, 19 Apr 2013 19:18:20 +0000 (13:18 -0600)]
regcomp.c: Add missing (parens) to expression
A pair of parentheses were missing leading to this 'if' not acting as
intended.
Karl Williamson [Sat, 13 Apr 2013 19:16:00 +0000 (13:16 -0600)]
toke.c: Fix EBCDIC bugs with single char variable names
Latin1 variable single character variable names should all be legal,
but the test was not for non-ASCII, it was for variant characters. On
EBCDIC platforms, this isn't the same as non-ASCII.
The legal control character variable names are not the same as the C0
and DEL controls, but are \001 .. \037, minus those that traditionally
match \s on ASCII platforms, plus \c?.
Karl Williamson [Sat, 13 Apr 2013 18:55:09 +0000 (12:55 -0600)]
toke.c: An EBCDIC fix
toCTRL(0..31) yields a printing character. This is different from
toCTRL(control) on EBCDIC machines.
Karl Williamson [Sat, 13 Apr 2013 15:18:41 +0000 (09:18 -0600)]
perlio.c: Generalize for EBCDIC
This code had the hex constants for CARRIAGE RETURN and LINE FEED
hard-coded in. It appears to me from the comments that '\r' and '\n'
are not suitable to use instead. This commit changes the constants to
use the native values instead.
Karl Williamson [Sat, 13 Apr 2013 15:51:34 +0000 (09:51 -0600)]
unicode_constants.h: Add #defines for CR, LF
Karl Williamson [Sat, 6 Apr 2013 18:56:52 +0000 (12:56 -0600)]
regcomp.c: In EBCDIC [i-j] exclude also ASCII
i and j are not adjacent in EBCDIC. This excluded any alphabetic
characters between them, but allowed other ascii ones.
Karl Williamson [Sat, 6 Apr 2013 18:54:42 +0000 (12:54 -0600)]
utf8.c: Don't use slower general-purpose function
There is a macro that accomplishes the same task for a two byte UTF-8
encoded character, and avoids the overhead of the general purpose
function call.
Karl Williamson [Sat, 6 Apr 2013 18:53:07 +0000 (12:53 -0600)]
utf8.c: Don't do ++ in macro parameter
The formal parameter gets evaluated multiple times on an EBCDIC
platform, thus incrementing more than the intended once.
Karl Williamson [Sat, 6 Apr 2013 18:50:48 +0000 (12:50 -0600)]
utf8.c: Use macro instead of duplicating code
There is a macro that accomplishes this task, and is easier to read.
Karl Williamson [Thu, 4 Apr 2013 03:59:16 +0000 (21:59 -0600)]
utf8.h: Clarify comments
Karl Williamson [Fri, 29 Mar 2013 20:56:16 +0000 (14:56 -0600)]
utf8.c: Avoid unnecessary UTF-8 conversions
This changes the code so that converting to UTF-8 is avoided unless
necessary. For such inputs, the conversion back from UTF-8 is also
avoided. The cost of doing this is that the first swatches are combined
into one that contains the values for all characters 0-255, instead of
having multiple swatches. That means when first calculating the swatch
it calculates all 256, instead of 128 (160 on EBCDIC).
This also fixes an EBCDIC bug in which characters in this range were
being translated twice.
Karl Williamson [Fri, 29 Mar 2013 19:34:59 +0000 (13:34 -0600)]
utf8.c: No need to check for UTF-8 malformations
This function assumes that the input is well-formed UTF-8, even though
until this commit, the prefatory comments didn't say so. The API does
not pass the buffer length, so there is no way it could check for
reading off the end of the buffer. One code path already calls
valid_utf8_to_uvchr(); this changes the remaining code path to correspond.
Karl Williamson [Sun, 24 Mar 2013 19:16:08 +0000 (13:16 -0600)]
utf8.c: Fix so UTF-16 to UTF-8 conversion works under EBCDIC
Karl Williamson [Sun, 24 Mar 2013 19:14:34 +0000 (13:14 -0600)]
utf8.h, utfebcdic.h: Add #define
Karl Williamson [Mon, 18 Mar 2013 17:45:06 +0000 (11:45 -0600)]
pp.c: White-space only
Make a ternary operation more clear
Karl Williamson [Mon, 18 Mar 2013 17:43:42 +0000 (11:43 -0600)]
Fix valid_utf8_to_uvchr() for EBCDIC
Karl Williamson [Mon, 18 Mar 2013 03:42:20 +0000 (21:42 -0600)]
t/test.pl: Add comment about EBCDIC
Karl Williamson [Sun, 17 Mar 2013 04:41:15 +0000 (22:41 -0600)]
Fix EBCDIC bugs in UTF8_ACUMULATE and utf8.c
Karl Williamson [Sat, 16 Mar 2013 22:52:45 +0000 (16:52 -0600)]
regcomp.c: Fix bug in EBCDIC
The POSIXA and NPOSIXA regnodes need to set the bits on only the ASCII
code points, but under EBCDIC those code points are not 0-127.