[Clang] Fix invalid utf-8 detection
authorCorentin Jabot <corentinjabot@gmail.com>
Wed, 6 Jul 2022 20:16:22 +0000 (22:16 +0200)
committerCorentin Jabot <corentinjabot@gmail.com>
Wed, 6 Jul 2022 20:20:04 +0000 (22:20 +0200)
The length of valid codepoints was incorrectly
calculated which was not caught before because the
absence of tests for the valid codepoints scenario.

Differential Revision: https://reviews.llvm.org/D129223

clang/test/Lexer/comment-invalid-utf8.c
llvm/lib/Support/ConvertUTF.cpp

index b8bf551..ed7405a 100644 (file)
 // abcd
 // \80abcd
 // expected-warning@-1 {{invalid UTF-8 in comment}}
+
+
+//§ § § 😀 你好 ©
+
+/*§ § § 😀 你好 ©*/
+
+/*
+§ § § 😀 你好 ©
+*/
+
+/* § § § 😀 你好 © */
index c494110..25875d4 100644 (file)
@@ -423,7 +423,7 @@ Boolean isLegalUTF8Sequence(const UTF8 *source, const UTF8 *sourceEnd) {
  */
 unsigned getUTF8SequenceSize(const UTF8 *source, const UTF8 *sourceEnd) {
   int length = trailingBytesForUTF8[*source] + 1;
-  return (length > sourceEnd - source && isLegalUTF8(source, length)) ? length
+  return (length < sourceEnd - source && isLegalUTF8(source, length)) ? length
                                                                       : 0;
 }