dwarf-reader: Leverage ODR & DWZ
authorDodji Seketeli <dodji@redhat.com>
Wed, 23 Nov 2022 10:54:08 +0000 (11:54 +0100)
committerDodji Seketeli <dodji@redhat.com>
Wed, 30 Nov 2022 16:17:15 +0000 (17:17 +0100)
When DWARF debug info has been preprocessed with the DWZ tool, I think
we can assume that if two DIEs originating from the .gnu_debugaltlink
section have different offset, it means they are different, even if
they represent two types of the same nature and of the same name.
This is the whole point of DWZ.

When we process two DIEs originating from C++, it's possible "in
general" to assume that the One Definition Rule is in effect, meaning
that if two types of the same nature have the same name, they ought to
represent the same entity, meaning, they are the same type.

These two observations can lead to faster comparison of two aggregate
types of the same nature and of the same name.

This patch implements these two optimizations and use them by
default.

The first one is used by default on binaries that contains a
.gnu_debugaltlink section, which is the hint we use to detect that DWZ
was used to factorize DWARF DIEs.

The second of is used by default on DWARF DIEs originating from C++.

These optimizations can be de-activated on abidw and abidiff using the
--no-leverage-dwarf-factorization and --no-assume-odr-for-cplusplus.

* doc/manuals/abidiff.rst: Add documentation for
--no-leverage-dwarf-factorization and
--no-assume-odr-for-cplusplus
* doc/manuals/abidw.rst: Likewise.
* doc/manuals/abipkgdiff.rst: Likewise.
* include/abg-fe-iface.h (options::{leverage_dwarf_factorization,
assume_odr_for_cplusplus}): New data members.
* src/abg-dwarf-reader.cc (reader::leverage_dwarf_factorization_):
New data member.
(reader::leverage_dwarf_factorization): New accessor.
(compare_dies): If we are allowed to leverage the DWARF
factorization and if two type DIEs coming from the
.gnu_debugaltlink DWARF section have different offset, then they
are different.  Also, if we are allowed to assume ODR, use it to
speed up class/struct/unions comparisons.
* tools/abidiff.cc (options::{assume_odr_for_cplusplus,
leverage_dwarf_factorization}): Define new data members.
(options::options): Initialize them.
(display_usage): Add new help strings for
--no-leverage-dwarf-factorization and
--no-assume-odr-for-cplusplus.
(parse_command_line): Parse these new options.
(set_generic_options): New function.
(main): Use the new set_generic_options function.
* tools/abidw.cc (options::{assume_odr_for_cplusplus,
leverage_dwarf_factorization}): Define new data members.
(options::options): Initialize them.
(display_usage): Add new help strings for
--no-leverage-dwarf-factorization and
--no-assume-odr-for-cplusplus.
(parse_command_line): Parse these new options.
(set_generic_options): New function.
(load_corpus_and_write_abixml): Use the new set_generic_options
function.
* tools/abipkgdiff.cc (options::{assume_odr_for_cplusplus,
leverage_dwarf_factorization}): Define new data members.
(options::options): Initialize them.
(display_usage): Add new help strings for
--no-leverage-dwarf-factorization and
--no-assume-odr-for-cplusplus.
(parse_command_line): Parse these new options.
(set_generic_options): New function.
(compare): Use it.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
doc/manuals/abidiff.rst
doc/manuals/abidw.rst
doc/manuals/abipkgdiff.rst
include/abg-fe-iface.h
src/abg-dwarf-reader.cc
tests/test-diff-pkg.cc
tools/abidiff.cc
tools/abidw.cc
tools/abipkgdiff.cc

index a8878d2c2ded7d51c1addfabfc7eaebbc116cb18..dd357fba3395cbc6c89f5bc0ccd7713f4c8c6382 100644 (file)
@@ -580,6 +580,26 @@ Options
     changes.  Added or removed functions and variables do not have any
     diff nodes tree associated to them.
 
+  * ``--no-assume-odr-for-cplusplus``
+
+    When analysing a binary originating from C++ code using `DWARF`_
+    debug information, libabigail assumes the `One Definition Rule`_
+    to speed-up the analysis.  In that case, when several types have
+    the same name in the binary, they are assumed to all be equal.
+
+    This option disables that assumption and instructs libabigail to
+    actually actually compare the types to determine if they are
+    equal.
+
+  * ``--no-leverage-dwarf-factorization``
+
+    When analysing a binary which `DWARF`_ debug information was
+    processed with the `DWZ`_ tool, the type information is supposed
+    to be already factorized.  That context is used by libabigail to
+    perform some speed optimizations.
+
+    This option disables those optimizations.
+
   * ``--ctf``
 
     When comparing binaries, extract ABI information from `CTF`_ debug
@@ -810,3 +830,6 @@ Usage examples
 .. _ELF: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
 .. _DWARF: http://www.dwarfstd.org
 .. _CTF: https://raw.githubusercontent.com/wiki/oracle/binutils-gdb/files/ctf-spec.pdf
+.. _ODR: https://en.wikipedia.org/wiki/One_Definition_Rule
+.. _One Definition Rule: https://en.wikipedia.org/wiki/One_Definition_Rule
+.. _DWZ: https://sourceware.org/dwz
index 20948805efb6fc5e5e62dbb9394ab181e13066cc..87db18904666c21ef29971bfee2164c22d41959a 100644 (file)
@@ -327,6 +327,25 @@ Options
     This option is available only if the package was configured with
     the --enable-debug-type-canonicalization option.
 
+  * ``--no-assume-odr-for-cplusplus``
+
+    When analysing a binary originating from C++ code using `DWARF`_
+    debug information, libabigail assumes the `One Definition Rule`_
+    to speed-up the analysis.  In that case, when several types have
+    the same name in the binary, they are assumed to all be equal.
+
+    This option disables that assumption and instructs libabigail to
+    actually actually compare the types to determine if they are
+    equal.
+
+  * ``--no-leverage-dwarf-factorization``
+
+    When analysing a binary which `DWARF`_ debug information was
+    processed with the `DWZ`_ tool, the type information is supposed
+    to be already factorized.  That context is used by libabigail to
+    perform some speed optimizations.
+
+    This option disables those optimizations.
 
   * ``--ctf``
 
@@ -370,3 +389,6 @@ standard `here
 .. _GNU: http://www.gnu.org
 .. _Linux Kernel: https://kernel.org/
 .. _CTF: https://raw.githubusercontent.com/wiki/oracle/binutils-gdb/files/ctf-spec.pdf
+.. _ODR: https://en.wikipedia.org/wiki/One_Definition_Rule
+.. _One Definition Rule: https://en.wikipedia.org/wiki/One_Definition_Rule
+.. _DWZ: https://sourceware.org/dwz
index 771bb034ec4642a4e5b6c5e72a8ce5e190d64c77..9d7a3973b6f3693932545ddf2d19018864fc4864 100644 (file)
@@ -528,6 +528,26 @@ Options
       $ abipkgdiff --self-check --d1 mesa-libGLU-debuginfo-9.0.1-3.fc33.x86_64.rpm  mesa-libGLU-9.0.1-3.fc33.x86_64.rpm
        ==== SELF CHECK SUCCEEDED for 'libGLU.so.1.3.1' ====
       $
+  * ``--no-assume-odr-for-cplusplus``
+
+    When analysing a binary originating from C++ code using `DWARF`_
+    debug information, libabigail assumes the `One Definition Rule`_
+    to speed-up the analysis.  In that case, when several types have
+    the same name in the binary, they are assumed to all be equal.
+
+    This option disables that assumption and instructs libabigail to
+    actually actually compare the types to determine if they are
+    equal.
+
+  * ``--no-leverage-dwarf-factorization``
+
+    When analysing a binary which `DWARF`_ debug information was
+    processed with the `DWZ`_ tool, the type information is supposed
+    to be already factorized.  That context is used by libabigail to
+    perform some speed optimizations.
+
+    This option disables those optimizations.
+
 
   * ``--ctf``
 
@@ -554,3 +574,6 @@ In the later case, the value of the exit code is the same as for the
 .. _DWARF: http://www.dwarfstd.org
 .. _CTF: https://raw.githubusercontent.com/wiki/oracle/binutils-gdb/files/ctf-spec.pdf
 .. _Development Package: https://fedoraproject.org/wiki/Packaging:Guidelines?rd=Packaging/Guidelines#Devel_Packages
+.. _ODR: https://en.wikipedia.org/wiki/One_Definition_Rule
+.. _One Definition Rule: https://en.wikipedia.org/wiki/One_Definition_Rule
+.. _DWZ: https://sourceware.org/dwz
index 5e6cf26f548fcf41c717a4aa1d9fb11eb1c2a8ab..0b5a8e19486cdd6621f2a8ec17c4cad68cf0d9af 100644 (file)
@@ -63,7 +63,9 @@ protected:
     bool               load_all_types                  = false;
     bool               drop_undefined_syms             = false;
     bool               show_stats                      = false;
-    bool               do_log                          = false;;
+    bool               do_log                          = false;
+    bool               leverage_dwarf_factorization    = true;
+    bool               assume_odr_for_cplusplus        = true;
     options_type(environment&);
 
   };// font_end_iface::options_type
index 38f37ae2fdf06374406187097976b1409e56918b..bed8f4fdbdc725d65686ed004db635e3a201d5ec 100644 (file)
@@ -74,6 +74,7 @@ using std::stack;
 using std::deque;
 using std::list;
 using std::map;
+using abg_compat::optional;
 
 using namespace elf_helpers; // TODO: avoid using namespace
 
@@ -1857,6 +1858,7 @@ public:
   mutable size_t               compare_count_;
   mutable size_t               canonical_propagated_count_;
   mutable size_t               cancelled_propagation_count_;
+  mutable optional<bool>       leverage_dwarf_factorization_;
 
 protected:
 
@@ -5114,6 +5116,29 @@ public:
   load_in_linux_kernel_mode(bool f)
   {options().load_in_linux_kernel_mode = f;}
 
+  /// Test if it's allowed to assume that the DWARF debug info has
+  /// been factorized (for instance, with the DWZ tool) so that if two
+  /// type DIEs originating from the .gnu_debugaltlink section have
+  /// different offsets, they represent different types.
+  ///
+  /// @return true iff we can assume that the DWARF debug info has
+  /// been factorized.
+  bool
+  leverage_dwarf_factorization() const
+  {
+    if (!leverage_dwarf_factorization_.has_value())
+      {
+       if (options().leverage_dwarf_factorization
+           && elf_helpers::find_section_by_name(elf_handle(),
+                                                ".gnu_debugaltlink"))
+         leverage_dwarf_factorization_ = true;
+       else
+         leverage_dwarf_factorization_ = false;
+      }
+    ABG_ASSERT(leverage_dwarf_factorization_.has_value());
+
+    return *leverage_dwarf_factorization_;
+  }
   /// Getter of the "show_stats" flag.
   ///
   /// This flag tells if we should emit statistics about various
@@ -10556,6 +10581,12 @@ compare_dies(const reader& rdr,
   if (l_offset == r_offset)
     return COMPARISON_RESULT_EQUAL;
 
+  if (rdr.leverage_dwarf_factorization()
+      && (l_die_source == ALT_DEBUG_INFO_DIE_SOURCE
+         && r_die_source == ALT_DEBUG_INFO_DIE_SOURCE))
+    if (l_offset != r_offset)
+      return COMPARISON_RESULT_DIFFERENT;
+
   comparison_result result = COMPARISON_RESULT_EQUAL;
   if (maybe_get_cached_type_comparison_result(rdr, l_tag,
                                              dies_being_compared,
@@ -10706,6 +10737,12 @@ compare_dies(const reader& rdr,
 
        if (!compare_as_decl_and_type_dies(rdr, l, r))
          SET_RESULT_TO_FALSE(result, l, r);
+       else if (rdr.options().assume_odr_for_cplusplus
+                && rdr.odr_is_relevant(l)
+                && rdr.odr_is_relevant(r)
+                && !die_is_anonymous(l)
+                && !die_is_anonymous(r))
+         result = COMPARISON_RESULT_EQUAL;
        else
          {
            aggregates_being_compared.add(dies_being_compared);
index 8e510ca2b46775eebe0529a983cb10d3b42bf05e..56043a6f724ee63ed563d79a49ff61d2910da929 100644 (file)
@@ -346,7 +346,7 @@ static InOutSpec in_out_specs[] =
   {
     "data/test-diff-pkg/tbb-4.1-9.20130314.fc22.x86_64.rpm",
     "data/test-diff-pkg/tbb-4.3-3.20141204.fc23.x86_64.rpm",
-    "--no-default-suppression",
+    "--no-default-suppression --no-assume-odr-for-cplusplus",
     "",
     "data/test-diff-pkg/tbb-debuginfo-4.1-9.20130314.fc22.x86_64.rpm",
     "data/test-diff-pkg/tbb-debuginfo-4.3-3.20141204.fc23.x86_64.rpm",
index 5ffe47a3030159afa5cf2aae8dcf512bf18112b5..cd564a63d72d75c3176e39c8f86e5786c80e2dda 100644 (file)
@@ -108,6 +108,8 @@ struct options
   bool                 show_redundant_changes;
   bool                 show_symbols_not_referenced_by_debug_info;
   bool                 show_impacted_interfaces;
+  bool                 assume_odr_for_cplusplus;
+  bool                 leverage_dwarf_factorization;
   bool                 dump_diff_tree;
   bool                 show_stats;
   bool                 do_log;
@@ -159,6 +161,8 @@ struct options
       show_redundant_changes(),
       show_symbols_not_referenced_by_debug_info(true),
       show_impacted_interfaces(),
+      assume_odr_for_cplusplus(true),
+      leverage_dwarf_factorization(true),
       dump_diff_tree(),
       show_stats(),
       do_log()
@@ -259,6 +263,10 @@ display_usage(const string& prog_name, ostream& out)
     << " --no-redundant  do not display redundant changes "
     "(this is the default)\n"
     << " --impacted-interfaces  display interfaces impacted by leaf changes\n"
+    << " --no-leverage-dwarf-factorization  do not use DWZ optimisations to "
+    "speed-up the analysis of the binary\n"
+    << " --no-assume-odr-for-cplusplus  do not assume the ODR to speed-up the "
+    "analysis of the binary\n"
     << " --dump-diff-tree  emit a debug dump of the internal diff tree to "
     "the error output stream\n"
     <<  " --stats  show statistics about various internal stuff\n"
@@ -615,6 +623,10 @@ parse_command_line(int argc, char* argv[], options& opts)
        opts.show_redundant_changes = false;
       else if (!strcmp(argv[i], "--impacted-interfaces"))
        opts.show_impacted_interfaces = true;
+      else if (!strcmp(argv[i], "--no-leverage-dwarf-factorization"))
+       opts.leverage_dwarf_factorization = false;
+      else if (!strcmp(argv[i], "--no-assume-odr-for-cplusplus"))
+       opts.leverage_dwarf_factorization = false;
       else if (!strcmp(argv[i], "--dump-diff-tree"))
        opts.dump_diff_tree = true;
       else if (!strcmp(argv[i], "--stats"))
@@ -800,6 +812,23 @@ set_diff_context_from_opts(diff_context_sptr ctxt,
   ctxt->dump_diff_tree(opts.dump_diff_tree);
 }
 
+/// Set a bunch of tunable buttons on the ELF-based reader from the
+/// command-line options.
+///
+/// @param rdr the reader to tune.
+///
+/// @param opts the command line options.
+static void
+set_generic_options(abigail::elf_based_reader& rdr, options& opts)
+{
+  rdr.options().show_stats = opts.show_stats;
+  rdr.options().do_log = opts.do_log;
+  rdr.options().leverage_dwarf_factorization =
+    opts.leverage_dwarf_factorization;
+  rdr.options().assume_odr_for_cplusplus =
+    opts.assume_odr_for_cplusplus;
+}
+
 /// Set suppression specifications to the @p read_context used to load
 /// the ABI corpus from the ELF/DWARF file.
 ///
@@ -1208,9 +1237,7 @@ main(int argc, char* argv[])
                                           env, requested_fe_kind,
                                           opts.show_all_types);
             ABG_ASSERT(rdr);
-
-           rdr->options().show_stats = opts.show_stats;
-           rdr->options().do_log = opts.do_log;
+           set_generic_options(*rdr, opts);
            set_suppressions(*rdr, opts);
            c1 = rdr->read_corpus(c1_status);
 
@@ -1283,8 +1310,7 @@ main(int argc, char* argv[])
                                           opts.show_all_types);
             ABG_ASSERT(rdr);
 
-           rdr->options().show_stats = opts.show_stats;
-           rdr->options().do_log = opts.do_log;
+           set_generic_options(*rdr, opts);
            set_suppressions(*rdr, opts);
 
            c2 = rdr->read_corpus(c2_status);
index 3b1a1bd595d3aab67b147f892dba9f27ab14e490..01e88807e8a70b979bbbb9e6808931a66a0f8128 100644 (file)
@@ -117,6 +117,8 @@ struct options
   bool                 do_log;
   bool                 drop_private_types;
   bool                 drop_undefined_syms;
+  bool                 assume_odr_for_cplusplus;
+  bool                 leverage_dwarf_factorization;
   optional<bool>       exported_interfaces_only;
   type_id_style_kind   type_id_style;
 #ifdef WITH_DEBUG_SELF_COMPARISON
@@ -156,6 +158,8 @@ struct options
       do_log(),
       drop_private_types(false),
       drop_undefined_syms(false),
+      assume_odr_for_cplusplus(true),
+      leverage_dwarf_factorization(true),
       type_id_style(SEQUENCE_TYPE_ID_STYLE)
   {}
 
@@ -226,6 +230,10 @@ display_usage(const string& prog_name, ostream& out)
 #ifdef WITH_CTF
     << "  --ctf use CTF instead of DWARF in ELF files\n"
 #endif
+    << "  --no-leverage-dwarf-factorization  do not use DWZ optimisations to "
+    "speed-up the analysis of the binary\n"
+    << "  --no-assume-odr-for-cplusplus  do not assume the ODR to speed-up the "
+    "analysis of the binary\n"
     << "  --annotate  annotate the ABI artifacts emitted in the output\n"
     << "  --stats  show statistics about various internal stuff\n"
     << "  --verbose show verbose messages about internal stuff\n";
@@ -398,6 +406,10 @@ parse_command_line(int argc, char* argv[], options& opts)
               || !strcmp(argv[i], "debug-die-canonicalization"))
        opts.debug_die_canonicalization = true;
 #endif
+      else if (!strcmp (argv[i], "--no-assume-odr-for-cplusplus"))
+       opts.assume_odr_for_cplusplus = false;
+      else if (!strcmp (argv[i], "--no-leverage-dwarf-factorization"))
+       opts.leverage_dwarf_factorization = false;
       else if (!strcmp(argv[i], "--annotate"))
        opts.annotate = true;
       else if (!strcmp(argv[i], "--stats"))
@@ -520,6 +532,24 @@ set_suppressions(abigail::elf_based_reader& rdr, options& opts)
   rdr.add_suppressions(opts.kabi_whitelist_supprs);
 }
 
+/// Set a bunch of tunable buttons on the ELF-based reader from the
+/// command-line options.
+///
+/// @param rdr the reader to tune.
+///
+/// @param opts the command line options.
+static void
+set_generic_options(abigail::elf_based_reader& rdr, options& opts)
+{
+  rdr.options().drop_undefined_syms = opts.drop_undefined_syms;
+  rdr.options().show_stats = opts.show_stats;
+  rdr.options().do_log = opts.do_log;
+  rdr.options().leverage_dwarf_factorization =
+    opts.leverage_dwarf_factorization;
+  rdr.options().assume_odr_for_cplusplus =
+    opts.assume_odr_for_cplusplus;
+}
+
 /// Load an ABI @ref corpus (the internal representation of the ABI of
 /// a binary) and write it out as an abixml.
 ///
@@ -569,10 +599,9 @@ load_corpus_and_write_abixml(char* argv[],
                                 opts.linux_kernel_mode);
   ABG_ASSERT(reader);
 
-  // ... then tune a bunch of "buttons" on the newly created reader ...
-  reader->options().drop_undefined_syms = opts.drop_undefined_syms;
-  reader->options().show_stats = opts.show_stats;
-  reader->options().do_log = opts.do_log;
+  // ... then tune a bunch of "buttons" on the newly created reader
+  // ...
+  set_generic_options(*reader, opts);
   set_suppressions(*reader, opts);
 
   // If the user asked us to check if we found the "alternate debug
index 1feb3d9e78bf5e8f2fb84c481a8cf22ddf9e7b9f..3c7b6d025ddfd1ee5cd6a6dac1ec7bcd793629fb 100644 (file)
@@ -205,6 +205,8 @@ public:
   bool         show_added_binaries;
   bool         fail_if_no_debug_info;
   bool         show_identical_binaries;
+  bool         leverage_dwarf_factorization;
+  bool         assume_odr_for_cplusplus;
   bool         self_check;
   optional<bool> exported_interfaces_only;
 #ifdef WITH_CTF
@@ -248,6 +250,8 @@ public:
       show_added_binaries(true),
       fail_if_no_debug_info(),
       show_identical_binaries(),
+      leverage_dwarf_factorization(true),
+      assume_odr_for_cplusplus(true),
       self_check()
 #ifdef WITH_CTF
       ,
@@ -892,6 +896,10 @@ display_usage(const string& prog_name, ostream& out)
     << " --no-parallel                  do not execute in parallel\n"
     << " --fail-no-dbg                  fail if no debug info was found\n"
     << " --show-identical-binaries      show the names of identical binaries\n"
+    << " --no-leverage-dwarf-factorization  do not use DWZ optimisations to "
+    "speed-up the analysis of the binary\n"
+    << " --no-assume-odr-for-cplusplus  do not assume the ODR to speed-up the"
+    "analysis of the binary\n"
     << " --verbose                      emit verbose progress messages\n"
     << " --self-check                   perform a sanity check by comparing "
     "binaries inside the input package against their ABIXML representation\n"
@@ -1224,6 +1232,24 @@ set_diff_context_from_opts(diff_context_sptr ctxt,
   ctxt->add_suppressions(supprs);
 }
 
+/// Set a bunch of tunable buttons on the ELF-based reader from the
+/// command-line options.
+///
+/// @param rdr the reader to tune.
+///
+/// @param opts the command line options.
+static void
+set_generic_options(abigail::elf_based_reader& rdr, const options& opts)
+{
+  if (!opts.kabi_suppressions.empty())
+    rdr.add_suppressions(opts.kabi_suppressions);
+
+  rdr.options().leverage_dwarf_factorization =
+    opts.leverage_dwarf_factorization;
+  rdr.options().assume_odr_for_cplusplus =
+    opts.assume_odr_for_cplusplus;
+}
+
 /// Compare the ABI two elf files, using their associated debug info.
 ///
 /// The result of the comparison is emitted to standard output.
@@ -1336,8 +1362,7 @@ compare(const elf_file&           elf1,
     ABG_ASSERT(reader);
 
     reader->add_suppressions(priv_types_supprs1);
-    if (!opts.kabi_suppressions.empty())
-      reader->add_suppressions(opts.kabi_suppressions);
+    set_generic_options(*reader, opts);
 
     corpus1 = reader->read_corpus(c1_status);
 
@@ -1436,8 +1461,7 @@ compare(const elf_file&           elf1,
     ABG_ASSERT(reader);
 
     reader->add_suppressions(priv_types_supprs2);
-    if (!opts.kabi_suppressions.empty())
-      reader->add_suppressions(opts.kabi_suppressions);
+    set_generic_options(*reader, opts);
 
     corpus2 = reader->read_corpus(c2_status);
 
@@ -3350,6 +3374,10 @@ parse_command_line(int argc, char* argv[], options& opts)
        opts.show_added_binaries = false;
       else if (!strcmp(argv[i], "--fail-no-dbg"))
        opts.fail_if_no_debug_info = true;
+      else if (!strcmp(argv[i], "--no-leverage-dwarf-factorization"))
+       opts.leverage_dwarf_factorization = false;
+      else if (!strcmp(argv[i], "--no-assume-odr-for-cplusplus"))
+       opts.assume_odr_for_cplusplus = false;
       else if (!strcmp(argv[i], "--verbose"))
        opts.verbose = true;
       else if (!strcmp(argv[i], "--no-abignore"))