[clangd] Define a compact binary serialization fomat for symbol slab/index.
authorSam McCall <sam.mccall@gmail.com>
Tue, 4 Sep 2018 16:16:50 +0000 (16:16 +0000)
committerSam McCall <sam.mccall@gmail.com>
Tue, 4 Sep 2018 16:16:50 +0000 (16:16 +0000)
commit50f3631057f717448ba34b4175daaa81215fbd5e
tree918408ccfd12bfc7187889ec77d341d17ae386a7
parentcc8b507a60677b79fe180681834929e4764e6ece
[clangd] Define a compact binary serialization fomat for symbol slab/index.

Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
  llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.

The format is a RIFF container (chunks of (type, size, data)) with:
 - a compressed string table
 - simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.

There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.

Alternatives considered:
 - compressed YAML or JSON: bulky and slow to load
 - llvm bitstream: confusing model and libraries are hard to use. My attempt
   produced slightly larger files, and the code was longer and slower.
 - protobuf or similar: would be really nice (esp for back-compat) but the
   dependency is a big hassle
 - ad-hoc binary format without a container: it seems clear we're going
   to add posting lists and occurrences here, and that they will benefit
   from sharing a string table. The container makes it easy to debug
   these pieces in isolation, and make them optional.

Reviewers: ioeric

Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D51585

llvm-svn: 341375
14 files changed:
clang-tools-extra/clangd/CMakeLists.txt
clang-tools-extra/clangd/RIFF.cpp [new file with mode: 0644]
clang-tools-extra/clangd/RIFF.h [new file with mode: 0644]
clang-tools-extra/clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
clang-tools-extra/clangd/index/Index.cpp
clang-tools-extra/clangd/index/Index.h
clang-tools-extra/clangd/index/Serialization.cpp [new file with mode: 0644]
clang-tools-extra/clangd/index/Serialization.h [new file with mode: 0644]
clang-tools-extra/clangd/index/SymbolYAML.cpp
clang-tools-extra/clangd/tool/ClangdMain.cpp
clang-tools-extra/unittests/clangd/CMakeLists.txt
clang-tools-extra/unittests/clangd/RIFFTests.cpp [new file with mode: 0644]
clang-tools-extra/unittests/clangd/SerializationTests.cpp [new file with mode: 0644]
clang-tools-extra/unittests/clangd/SymbolCollectorTests.cpp