[clangd] New CC Ranking Model to fix bad inference due to overflow.
authorUtkarsh Saxena <usx@google.com>
Thu, 8 Oct 2020 11:13:40 +0000 (13:13 +0200)
committerUtkarsh Saxena <usx@google.com>
Thu, 8 Oct 2020 13:30:00 +0000 (15:30 +0200)
commita0a6fd435c6066d40d9c20835f1e52aad1e8cc65
tree4cdc65588ee7f551041d93105cffeb4da87edfb9
parent80bf29f00cc84374a0c2081a25b7c75e527eecb5
[clangd] New CC Ranking Model to fix bad inference due to overflow.

Unreachable file distances are represented as
`std::numeric_limits<unsigned>::max()`.
The previous dataset recorded the signals as `signed int` capturing this default
value as `-1`.

A new dataset was regenerated and a new model is trained that
interprets this unreachable as the intended value.

Distribution of `SymbolScopeDistance`:
Value         Normalised Frequency
0             46.6184
4294967295    29.5342
6             14.5666
4              6.4433
2              1.4534
8              0.5760
10             0.3581
....

Distribution of `FileProximityDistance`:
Value         Normalised Frequency
4294967295    39.9378
12             5.1997
14             4.9828
15             4.4221
16             4.3820
13             4.2765
17             3.8957
11             3.6387
19             3.4799
18             3.4076
....

Differential Revision: https://reviews.llvm.org/D89035
clang-tools-extra/clangd/quality/model/forest.json