Deprecate calling isFOO_utf8() with malformed

author Karl Williamson <public@khwilliamson.com>

Sun, 23 Dec 2012 17:03:16 +0000 (10:03 -0700)

committer Karl Williamson <public@khwilliamson.com>

Sun, 23 Dec 2012 17:53:36 +0000 (10:53 -0700)
author Karl Williamson <public@khwilliamson.com>
Sun, 23 Dec 2012 17:03:16 +0000 (10:03 -0700)
committer Karl Williamson <public@khwilliamson.com>
Sun, 23 Dec 2012 17:53:36 +0000 (10:53 -0700)
diff --git a/pod/perldiag.pod b/pod/perldiag.pod

index f239e5270e596560c008c0b77e3bceea7300f920..64527f87f09d6aa1942f74da19015d4cfcd7a9b7 100644 (file)
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -2517,6 +2517,17 @@ with 'useperlio'.
  (F) Your machine doesn't implement the sockatmark() functionality,
  neither as a system call nor an ioctl call (SIOCATMARK).
  
+=item It is deprecated to pass malformed UTF-8 to character classification macros, for "%s"
+
+(D deprecated, utf8) This message indicates a bug either in the Perl
+core or in XS code.  Such code was trying to find out if a character,
+allegedly stored internally encoded as UTF-8, was of a given type, such
+as being punctuation or a digit.  But the character was not encoded in
+legal UTF-8.  The C<%s> is replaced by a string that can be used by
+knowledgeable people to determine what the type being checked against
+was.  If C<utf8> warnings are enabled, a further message is raised,
+giving details of the malformation.
+
  =item $* is no longer supported
  
  (D deprecated, syntax) The special variable C<$*>, deprecated in older
diff --git a/utf8.c b/utf8.c

index 930b14841924d424738f355293fd815b353a6777..ec4e62799b1a89a254552b4cd276796f8011ec25 100644 (file)
--- a/utf8.c
+++ b/utf8.c
@@ -2028,16 +2028,26 @@ S_is_utf8_common(pTHX_ const U8 *const p, SV **swash,
      PERL_ARGS_ASSERT_IS_UTF8_COMMON;
  
      /* The API should have included a length for the UTF-8 character in <p>,
-     * but it doesn't.  We therefor assume that p has been validated at least
+     * but it doesn't.  We therefore assume that p has been validated at least
       * as far as there being enough bytes available in it to accommodate the
       * character without reading beyond the end, and pass that number on to the
       * validating routine */
-    if (!is_utf8_char_buf(p, p + UTF8SKIP(p)))
-       return FALSE;
+    if (! is_utf8_char_buf(p, p + UTF8SKIP(p))) {
+        if (ckWARN_d(WARN_UTF8)) {
+            Perl_warner(aTHX_ packWARN2(WARN_DEPRECATED,WARN_UTF8),
+                   "It is deprecated to pass malformed UTF-8 to character classification macros, for \"%s\"", swashname);
+            if (ckWARN(WARN_UTF8)) {    /* This will output details as to the
+                                           what the malformation is */
+                utf8_to_uvchr_buf(p, p + UTF8SKIP(p), NULL);
+            }
+        }
+        return FALSE;
+    }
      if (!*swash) {
          U8 flags = _CORE_SWASH_INIT_ACCEPT_INVLIST;
          *swash = _core_swash_init("utf8", swashname, &PL_sv_undef, 1, 0, NULL, &flags);
      }
+
      return swash_fetch(*swash, p, TRUE) != 0;
  }
author	Karl Williamson <public@khwilliamson.com>
	Sun, 23 Dec 2012 17:03:16 +0000 (10:03 -0700)
committer	Karl Williamson <public@khwilliamson.com>
	Sun, 23 Dec 2012 17:53:36 +0000 (10:53 -0700)
pod/perldiag.pod		patch \| blob \| history
utf8.c		patch \| blob \| history