[lldb] Print unprintable characters as unsigned
authorJonas Devlieghere <jonas@devlieghere.com>
Fri, 23 Jun 2023 17:55:17 +0000 (10:55 -0700)
committerJonas Devlieghere <jonas@devlieghere.com>
Fri, 23 Jun 2023 18:00:21 +0000 (11:00 -0700)
When specifying the C-string format for dumping memory, we treat
unprintable characters as signed. Whether a character is signed or not
is implementation defined, but all printable characters are signed.
Therefore it's fair to assume that unprintable characters are unsigned.

Before this patch, "\xcf\xfa\xed\xfe\f" would be printed as
"\xffffffcf\xfffffffa\xffffffed\xfffffffe\f". Now we correctly print the
original string.

rdar://111126134

Differential revision: https://reviews.llvm.org/D153644

lldb/source/Core/DumpDataExtractor.cpp
lldb/unittests/Core/DumpDataExtractorTest.cpp

index 7f6108f..728297a 100644 (file)
@@ -212,7 +212,8 @@ static void DumpCharacter(Stream &s, const char c) {
     s.PutChar(c);
     return;
   }
-  s.Printf("\\x%2.2x", c);
+  // Non-print characters can be assumed to be unsigned.
+  s.Printf("\\x%2.2x", static_cast<unsigned char>(c));
 }
 
 /// Dump a floating point type.
index 226f2b7..bbe5e9e 100644 (file)
@@ -131,8 +131,16 @@ TEST(DumpDataExtractorTest, Formats) {
   // set of bytes to match the 10 byte format but then if the test runs on a
   // machine where we don't use 10 it'll break.
 
+  // Test printable characters.
   TestDump(llvm::StringRef("aardvark"), lldb::Format::eFormatCString,
            "\"aardvark\"");
+  // Test unprintable characters.
+  TestDump(llvm::StringRef("\xcf\xfa\xed\xfe\f"), lldb::Format::eFormatCString,
+           "\"\\xcf\\xfa\\xed\\xfe\\f\"");
+  // Test a mix of printable and unprintable characters.
+  TestDump(llvm::StringRef("\xcf\xfa\ffoo"), lldb::Format::eFormatCString,
+           "\"\\xcf\\xfa\\ffoo\"");
+
   TestDump<uint16_t>(99, lldb::Format::eFormatDecimal, "99");
   // Just prints as a signed integer.
   TestDump(-1, lldb::Format::eFormatEnum, "-1");