[Serialization] Delta-encode consecutive SourceLocations in TypeLoc
authorSam McCall <sam.mccall@gmail.com>
Wed, 11 May 2022 13:42:31 +0000 (15:42 +0200)
committerSam McCall <sam.mccall@gmail.com>
Thu, 19 May 2022 07:40:44 +0000 (09:40 +0200)
commit4df795bff75289941508d07bbe9105b93b098105
treee3fa4b93fe774841d9aaad76345866121b1a37a6
parentbbc6834e2635a60230cd410d3f32b7c6d11ad208
[Serialization] Delta-encode consecutive SourceLocations in TypeLoc

Much of the size of PCH/PCM files comes from stored SourceLocations.
These are encoded using (almost) their raw value, VBR-encoded. Absolute
SourceLocations can be relatively large numbers, so this commonly takes
20-30 bits per location.

We can reduce this by exploiting redundancy: many "nearby" SourceLocations are
stored differing only slightly and can be delta-encoded.
Randam-access loading of AST nodes constrains how long these sequences
can be, but we can do it at least within a node that always gets
deserialized as an atomic unit.

TypeLoc is implemented in this patch as it's a relatively small change
that shows most of the API.
This saves ~3.5% of PCH size, I have local changes applying this technique
further that save another 3%, I think it's possible to get to 10% total.

Differential Revision: https://reviews.llvm.org/D125403
clang/include/clang/Serialization/ASTReader.h
clang/include/clang/Serialization/ASTRecordReader.h
clang/include/clang/Serialization/ASTRecordWriter.h
clang/include/clang/Serialization/ASTWriter.h
clang/include/clang/Serialization/SourceLocationEncoding.h [new file with mode: 0644]
clang/lib/Serialization/ASTReader.cpp
clang/lib/Serialization/ASTWriter.cpp
clang/unittests/Serialization/CMakeLists.txt
clang/unittests/Serialization/SourceLocationEncodingTest.cpp [new file with mode: 0644]