[Clang] Add a warning on invalid UTF-8 in comments.
authorCorentin Jabot <corentinjabot@gmail.com>
Fri, 17 Jun 2022 14:23:41 +0000 (16:23 +0200)
committerCorentin Jabot <corentinjabot@gmail.com>
Sat, 9 Jul 2022 09:26:45 +0000 (11:26 +0200)
commit355532a1499aa9b13a89fb5b5caaba2344d57cd7
tree7d6c1c30b30e73e854206b69ea9ac325e055f3d2
parentfb89c4126904e4d82f235e492042c16c87cc8e3d
[Clang] Add a warning on invalid UTF-8 in comments.

Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.

The warning is off by default as its likely to be somewhat disruptive
otherwise.

This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D128059
clang/docs/ReleaseNotes.rst
clang/include/clang/Basic/DiagnosticLexKinds.td
clang/lib/Lex/Lexer.cpp
clang/test/Lexer/comment-invalid-utf8.c [new file with mode: 0644]
clang/test/Lexer/comment-utf8.c [new file with mode: 0644]
clang/test/SemaCXX/static-assert.cpp
llvm/include/llvm/Support/ConvertUTF.h
llvm/lib/Support/ConvertUTF.cpp