Document unicode_eval and evalbytes

author Father Chrysostomos <sprout@cpan.org>

Fri, 4 Nov 2011 23:37:41 +0000 (16:37 -0700)

committer Father Chrysostomos <sprout@cpan.org>

Sun, 6 Nov 2011 08:13:49 +0000 (01:13 -0700)
author Father Chrysostomos <sprout@cpan.org>
Fri, 4 Nov 2011 23:37:41 +0000 (16:37 -0700)
committer Father Chrysostomos <sprout@cpan.org>
Sun, 6 Nov 2011 08:13:49 +0000 (01:13 -0700)
diff --git a/lib/feature.pm b/lib/feature.pm

index dd44d7327d68b1894da6ca7fcc40487e5ab36f93..ac17b002cdbbc7d3cac4126014c2b48b0bd41959 100644 (file)
--- a/lib/feature.pm
+++ b/lib/feature.pm
@@ -129,6 +129,53 @@ C<use feature 'unicode_strings'> subpragma is B<strongly> recommended.
  This subpragma is available starting with Perl 5.11.3, but was not fully
  implemented until 5.13.8.
  
+=head2 the 'unicode_eval' and 'evalbytes' features
+
+Under the C<unicode_eval> feature, Perl's C<eval> function, when passed a
+string, will evaluate it as a string of characters, ignoring any
+C<use utf8> declarations.  C<use utf8> exists to declare the encoding of
+the script, which only makes sense for a stream of bytes, not a string of
+characters.  Source filters are forbidden, as they also really only make
+sense on strings of bytes.  Any attempt to activate a source filter will
+result in an error.
+
+The C<evalbytes> feature enables the C<evalbytes> keyword, which evaluates
+the argument passed to it as a string of bytes.  It dies if the string
+contains any characters outside the 8-bit range.  Source filters work
+within C<evalbytes>: they apply to the contents of the string being
+evaluated.
+
+Together, these two features are intended to replace the historical C<eval>
+function, which has (at least) two bugs in it, that cannot easily be fixed
+without breaking existing programs:
+
+=over
+
+=item *
+
+C<eval> behaves differently depending on the internal encoding of the
+string, sometimes treating its argument as a string of bytes, and sometimes
+as a string of characters.
+
+=item *
+
+Source filters activated within C<eval> leak out into whichever I<file>
+scope is currently being compiled.  To give an example with the CPAN module
+L<Semi::Semicolons>:
+
+    BEGIN { eval "use Semi::Semicolons;  # not filtered here " }
+    # filtered here!
+
+C<evalbytes> fixes that to work the way one would expect:
+
+    use feature "evalbytes";
+    BEGIN { evalbytes "use Semi::Semicolons;  # filtered " }
+    # not filtered
+
+=back
+
+These two features are available starting with Perl 5.16.
+
  =head1 FEATURE BUNDLES
  
  It's possible to load a whole slew of features in one go, using
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod

index 4ce4be3f665808684a10dd979cc5fd00c50550e8..86770fd84adee52aa706b8daeb5229d9d8d37142 100644 (file)
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -161,7 +161,8 @@ C<umask>, C<unlink>, C<utime>
  =item Keywords related to the control flow of your Perl program
  X<control flow>
  
-C<caller>, C<continue>, C<die>, C<do>, C<dump>, C<eval>, C<exit>,
+C<caller>, C<continue>, C<die>, C<do>,
+C<dump>, C<eval>, C<evalbytes> C<exit>,
  C<__FILE__>, C<goto>, C<last>, C<__LINE__>, C<next>, C<__PACKAGE__>,
  C<redo>, C<return>, C<sub>, C<wantarray>,
  
@@ -186,7 +187,8 @@ L<feature>.  Alternately, include a C<use v5.10> or later to the current scope.
  
  =item Miscellaneous functions
  
-C<defined>, C<dump>, C<eval>, C<formline>, C<local>, C<my>, C<our>,
+C<defined>, C<dump>, C<eval>, C<evalbytes>,
+C<formline>, C<local>, C<my>, C<our>,
  C<reset>, C<scalar>, C<state>, C<undef>, C<wantarray>
  
  =item Functions for processes and process groups
@@ -1634,6 +1636,17 @@ Note that the value is parsed every time the C<eval> executes.
  If EXPR is omitted, evaluates C<$_>.  This form is typically used to
  delay parsing and subsequent execution of the text of EXPR until run time.
  
+If the C<unicode_eval> feature is enabled (which is the default under a
+C<use 5.16> or higher declaration), EXPR or C<$_> is treated as a string of
+characters, so C<use utf8> declarations have no effect, and source filters
+are forbidden.  In the absence of the C<unicode_eval> feature, the string
+will sometimes be treated as characters and sometimes as bytes, depending
+on the internal encoding, and source filters activated within the C<eval>
+exhibit the erratic, but historical, behaviour of affecting some outer file
+scope that is still compiling.  See also the L</evalbytes> keyword, which
+always treats its input as a byte stream and works properly with source
+filters, and the L<feature> pragma.
+
  In the second form, the code within the BLOCK is parsed only once--at the
  same time the code surrounding the C<eval> itself was parsed--and executed
  within the context of the current Perl program.  This form is typically
@@ -1763,6 +1776,21 @@ surrounding lexical scope, but rather the scope of the first non-DB piece
  of code that called it.  You don't normally need to worry about this unless
  you are writing a Perl debugger.
  
+=item evalbytes EXPR
+X<evalbytes>
+
+=item evalbytes
+
+This function is like L</eval> with a string argument, except it always
+parses its argument, or C<$_> if EXPR is omitted, as a string of bytes.  A
+string containing characters whose ordinal value exceeds 255 results in an
+error.  Source filters activated within the evaluated code apply to the
+code itself.
+
+This function is only available under the C<evalbytes> feature, a
+C<use v5.16> (or higher) declaration, or with a C<CORE::> prefix.  See
+L<feature> for more information.
+
  =item exec LIST
  X<exec> X<execute>
  
diff --git a/t/porting/known_pod_issues.dat b/t/porting/known_pod_issues.dat

index 0ee1c4325cee7d800ae683c1702bbe33f5e8f240..d8c19334ba1399a4549df693c641d197119a48cd 100644 (file)
--- a/t/porting/known_pod_issues.dat
+++ b/t/porting/known_pod_issues.dat
@@ -1,4 +1,4 @@
-# This file is the data file for porting/podcheck.t.
+# This file is the data file for t/porting/podcheck.t.
  # There are three types of lines.
  # Comment lines are white-space only or begin with a '#', like this one.  Any
  #   changes you make to the comment lines will be lost when the file is
@@ -100,6 +100,7 @@ pwd_mkdb(8)
  recvmsg(3)
  s2p
  Scalar::Readonly
+Semi::Semicolons
  sendmail(1)
  sendmsg(3)
  sha1sum(1)
author	Father Chrysostomos <sprout@cpan.org>
	Fri, 4 Nov 2011 23:37:41 +0000 (16:37 -0700)
committer	Father Chrysostomos <sprout@cpan.org>
	Sun, 6 Nov 2011 08:13:49 +0000 (01:13 -0700)
lib/feature.pm		patch \| blob \| history
pod/perlfunc.pod		patch \| blob \| history
t/porting/known_pod_issues.dat		patch \| blob \| history