Better de-duplicate classes, unions, enums in non-odr contexts
When reading an ELF/DWARF binary that doesn't support the One
Definition Rule[1] (aka ODR), type de-duplication is done on a
per-translation unit basis only. That means that a type definition
must be unique only in a given translation unit, in that case.
[1]: https://en.wikipedia.org/wiki/One_Definition_Rule
When handling big C binaries like the Linux Kernel, this implies a lot
of type duplication still, as the same type can be re-defined in a lot
of different translation units.
This patch tries to de-duplicate types on a per-corpus basis, even in
cases where the ODR doesn't apply. It does so by noting that if two
types of the same kind and name are seen in two different translation
units and yet have the same source location (are defined in the same
spot) then they are the same type.
The patch does this for class, union and enum types as these seem to
be the kinds of types which are duplicated the most and thus consume
the most memory and later take the most time to canonicalize. In a
subsequent patch, we might want to try to de-duplicate typedef types
too, and see if gain anything.
Comparing two linux kernels shows that with this patch, we come back
to a speed (and memory consumption) that is comparable to when we were
considering C-based binaries as being suited to our (too aggressive)
ODR-based type de-duplication algorithm.
Below are the output of the comparison measured with /usr/bin/time
with the aggressive ODR-based type de-duplication algorithm and with
this patch. We compare the vmlinux binary coming from the package
kernel-3.10.0-515 against the one from kernel-3.10.0-327.el7.x86_64.
We use the kabi whitelists provided by the relevant
kernel-abi-whitelists packages for these kernels to restrict the
comparison to a meaningful subset of interfaces.
52.66user 0.64system 0:53.34elapsed 99%CPU (0avgtext+0avgdata 944416maxresident)k 0inputs+48outputs (0major+250750minor)pagefaults 0swaps
vs:
51.04user 0.87system 0:51.91elapsed 100%CPU (0avgtext+0avgdata 972560maxresident)k
0inputs+232outputs (0major+279983minor)pagefaults 0swaps
The full invocation and results are available at:
http://people.redhat.com/~dseketel/kabidiff/vmlinuz-3.10.0-327.el7.x86_64--vmlinuz-3.10.0-515.el7.x86_64.diff.whitelisted.with-unions.txt
vs
http://people.redhat.com/~dseketel/kabidiff/vmlinuz-3.10.0-327.el7.x86_64--vmlinuz-3.10.0-515.el7.x86_64.diff.whitelisted.with-unions.with-non-odr-support.txt
Note that to be able to compare the two kernels, the current tree must
contain the necessary patches that make libabigail understand Linux
Kernel binaries.
* src/abg-dwarf-reader.cc (build_enum_type)
(add_or_update_class_type, add_or_update_union_type): When the ODR
is not relevant, use the location of the type to detect if two
enum, class or union types of the same name actually represent the
same type.
Signed-off-by: Dodji Seketeli <dodji@redhat.com>