[Clang] Fix invalid utf-8 detection

author Corentin Jabot <corentinjabot@gmail.com>

Wed, 6 Jul 2022 20:16:22 +0000 (22:16 +0200)

committer Corentin Jabot <corentinjabot@gmail.com>

Wed, 6 Jul 2022 20:20:04 +0000 (22:20 +0200)
author Corentin Jabot <corentinjabot@gmail.com>
Wed, 6 Jul 2022 20:16:22 +0000 (22:16 +0200)
committer Corentin Jabot <corentinjabot@gmail.com>
Wed, 6 Jul 2022 20:20:04 +0000 (22:20 +0200)
diff --git a/clang/test/Lexer/comment-invalid-utf8.c b/clang/test/Lexer/comment-invalid-utf8.c

index b8bf551..ed7405a 100644 (file)
--- a/clang/test/Lexer/comment-invalid-utf8.c
+++ b/clang/test/Lexer/comment-invalid-utf8.c
@@ -25,3 +25,14 @@
  // abcd
  // \80abcd
  // expected-warning@-1 {{invalid UTF-8 in comment}}
+
+
+//§ § § 😀 你好 ©
+
+/*§ § § 😀 你好 ©*/
+
+/*
+§ § § 😀 你好 ©
+*/
+
+/* § § § 😀 你好 © */
diff --git a/llvm/lib/Support/ConvertUTF.cpp b/llvm/lib/Support/ConvertUTF.cpp

index c494110..25875d4 100644 (file)
--- a/llvm/lib/Support/ConvertUTF.cpp
+++ b/llvm/lib/Support/ConvertUTF.cpp
@@ -423,7 +423,7 @@ Boolean isLegalUTF8Sequence(const UTF8 *source, const UTF8 *sourceEnd) {
   */
  unsigned getUTF8SequenceSize(const UTF8 *source, const UTF8 *sourceEnd) {
    int length = trailingBytesForUTF8[*source] + 1;
-  return (length > sourceEnd - source && isLegalUTF8(source, length)) ? length
+  return (length < sourceEnd - source && isLegalUTF8(source, length)) ? length
                                                                        : 0;
  }
author	Corentin Jabot <corentinjabot@gmail.com>
	Wed, 6 Jul 2022 20:16:22 +0000 (22:16 +0200)
committer	Corentin Jabot <corentinjabot@gmail.com>
	Wed, 6 Jul 2022 20:20:04 +0000 (22:20 +0200)
clang/test/Lexer/comment-invalid-utf8.c		patch \| blob \| history
llvm/lib/Support/ConvertUTF.cpp		patch \| blob \| history