[DebugInfo][NFC] Add instr-ref documentation, migration guide

author J. Ryan Stinnett <jryans@gmail.com>

Mon, 9 May 2022 14:45:56 +0000 (15:45 +0100)

committer J. Ryan Stinnett <jryans@gmail.com>

Fri, 20 May 2022 13:13:46 +0000 (14:13 +0100)
author J. Ryan Stinnett <jryans@gmail.com>
Mon, 9 May 2022 14:45:56 +0000 (15:45 +0100)
committer J. Ryan Stinnett <jryans@gmail.com>
Fri, 20 May 2022 13:13:46 +0000 (14:13 +0100)
diff --git a/llvm/docs/InstrRefDebugInfo.md b/llvm/docs/InstrRefDebugInfo.md

new file mode 100644 (file)

index 0000000..b7cd0f8
--- /dev/null
+++ b/llvm/docs/InstrRefDebugInfo.md
@@ -0,0 +1,180 @@
+# Instruction referencing for debug info
+
+This document explains how LLVM uses value tracking, or instruction
+referencing, to determine variable locations for debug info in the code
+generation stage of compilation. This content is aimed at those working on code
+generation targets and optimisation passes. It may also be of interest to anyone
+curious about low-level debug info handling.
+
+# Problem statement
+
+At the end of compilation, LLVM must produce a DWARF location list (or similar)
+describing what register or stack location a variable can be found in, for each
+instruction in that variable's lexical scope. We could track the virtual
+register that the variable resides in through compilation, however this is
+vulnerable to register optimisations during regalloc, and instruction
+movements.
+
+# Solution: instruction referencing
+
+Rather than identify the virtual register that a variable value resides in,
+instead in instruction referencing mode, LLVM refers to the machine instruction
+and operand position that the value is defined in. Consider the LLVM IR way of
+referring to instruction values:
+
+  %2 = add i32 %0, %1
+  call void @llvm.dbg.value(metadata i32 %2,
+
+In LLVM IR, the IR Value is synonymous with the instruction that computes the
+value, to the extent that in memory a Value is a pointer to the computing
+instruction. Instruction referencing implements this relationship in the
+codegen backend of LLVM, after instruction selection. Consider the X86 assembly
+below and instruction referencing debug info, corresponding to the earlier
+LLVM IR:
+
+  %2:gr32 = ADD32rr %0, %1, implicit-def $eflags, debug-instr-number 1
+  DBG_INSTR_REF 1, 0, !123, !456, debug-location !789
+
+While the function remains in SSA form, virtual register %2 is sufficient to
+identify the value computed by the instruction -- however the function
+eventually leaves SSA form, and register optimisations will obscure which
+register the desired value is in. Instead, a more consistent way of identifying
+the instruction's value is to refer to the MachineOperand where the value is
+defined: independently of which register is defined by that MachineOperand. In
+the code above, the DBG_INSTR_REF instruction refers to instruction number one,
+operand zero, while the ADD32rr has a debug-instr-number attribute attached
+indicating that it is instruction number one.
+
+De-coupling variable locations from registers avoids difficulties involving
+register allocation and optimisation, but requires additional instrumentation
+when the instructions are optimised instead. Optimisations that replace
+instructions with optimised versions that compute the same value must either
+preserve the instruction number, or record a substitution from the old
+instruction / operand number pair to the new instruction / operand pair -- see
+MachineFunction::substituteDebugValuesForInst. If debug info maintenance is not
+performed, or an instruction is eliminated as dead code, the variable location
+is safely dropped and marked "optimised out". The exception is instructions
+that are mutated rather than replaced, which always need debug info
+maintenance.
+
+# Register allocator considerations
+
+When the register allocator runs, debugging instructions do not directly refer
+to any virtual registers, and thus there is no need for expensive location
+maintenance during regalloc (i.e., LiveDebugVariables). Debug instructions are
+unlinked from the function, then linked back in after register allocation
+completes.
+
+The exception is PHI instructions: these become implicit definitions at control
+flow merges once regalloc finishes, and any debug numbers attached to PHI
+instructions are lost. To circumvent this, debug numbers of PHIs are recorded
+at the start of register allocation (phi-node-elimination), then DBG_PHI
+instructions are inserted after regalloc finishes. This requires some
+maintenance of which register a variable is located in during regalloc, but at
+single positions (block entry points) rather than ranges of instructions.
+
+An example, before regalloc:
+
+  bb.2:
+    %2 = PHI %1, %bb.0, %2, %bb.1, debug-instr-number 1
+
+After:
+
+  bb.2:
+    DBG_PHI $rax, 1
+
+# LiveDebugValues
+
+After optimisations and code layout complete, information about variable
+values must be translated into variable locations, i.e. registers and stack
+slots. This is performed in the [LiveDebugValues pass][LiveDebugValues], where
+the debug instructions and machine code are separated out into two independent
+functions:
+ * One that assigns values to variable names,
+ * One that assigns values to machine registers and stack slots.
+
+LLVM's existing SSA tools are used to place PHIs for each function, between
+variable values and the values contained in machine locations, with value
+propagation eliminating any un-necessary PHIs. The two can then be joined up
+to map variables to values, then values to locations, for each instruction in
+the function.
+
+Key to this process is being able to identify the movement of values between
+registers and stack locations, so that the location of values can be preserved
+for the full time that they are resident in the machine.
+
+# Required target support and transition guide
+
+Instruction referencing will work on any target, but likely with poor coverage.
+Supporting instruction referencing well requires:
+ * Target hooks to be implemented to allow LiveDebugValues to follow values through the machine,
+ * Target-specific optimisations to be instrumented, to preserve instruction numbers.
+
+## Target hooks
+
+TargetInstrInfo::isCopyInstrImpl must be implemented to recognise any
+instructions that are copy-like -- LiveDebugValues uses this to identify when
+values move between registers.
+
+TargetInstrInfo::isLoadFromStackSlotPostFE and
+TargetInstrInfo::isStoreToStackSlotPostFE are needed to identify spill and
+restore instructions. Each should return the destination or source register
+respectively. LiveDebugValues will track the movement of a value from / to
+the stack slot. In addition, any instruction that writes to a stack spill
+should have a MachineMemoryOperand attached, so that LiveDebugValues can
+recognise that a slot has been clobbered.
+
+## Target-specific optimisation instrumentation
+
+Optimisations come in two flavours: those that mutate a MachineInstr to make
+it do something different, and those that create a new instruction to replace
+the operation of the old.
+
+The former _must_ be instrumented -- the relevant question is whether any
+register def in any operand will produce a different value, as a result of the
+mutation. If the answer is yes, then there is a risk that a DBG_INSTR_REF
+instruction referring to that operand will end up assigning the different
+value to a variable, presenting the debugging developer with an unexpected
+variable value. In such scenarios, call MachineInstr::dropDebugNumber() on the
+mutated instruction to erase its instruction number. Any DBG_INSTR_REF
+referring to it will produce an empty variable location instead, that appears
+as "optimised out" in the debugger.
+
+For the latter flavour of optimisation, to increase coverage you should record
+an instruction number substitution: a mapping from the old instruction number /
+operand pair to new instruction number / operand pair. Consider if we replace
+a three-address add instruction with a two-address add:
+
+  %2:gr32 = ADD32rr %0, %1, debug-instr-number 1
+
+becomes
+
+  %2:gr32 = ADD32rr %0(tied-def 0), %1, debug-instr-number 2
+
+With a substitution from "instruction number 1 operand 0" to "instruction number
+2 operand 0" recorded in the MachineFunction. In LiveDebugValues, DBG_INSTR_REFs
+will be mapped through the substitution table to find the most recent
+instruction number / operand number of the value it refers to.
+
+Use MachineFunction::substituteDebugValuesForInst to automatically produce
+substitutions between an old and new instruction. It assumes that any operand
+that is a def in the old instruction is a def in the new instruction at the
+same operand position. This works most of the time, for example in the example
+above.
+
+If operand numbers do not line up between the old and new instruction, use
+MachineInstr::getDebugInstrNum to acquire the instruction number for the new
+instruction, and MachineFunction::makeDebugValueSubstitution to record the
+mapping between register definitions in the old and new instructions. If some
+values computed by the old instruction are no longer computed by the new
+instruction, record no substitution -- LiveDebugValues will safely drop the
+now unavailable variable value.
+
+Should your target clone instructions, much the same as the TailDuplicator
+optimisation pass, do not attempt to preserve the instruction numbers or
+record any substitutions. MachineFunction::CloneMachineInstr should drop the
+instruction number of any cloned instruction, to avoid duplicate numbers
+appearing to LiveDebugValues. Dealing with duplicated instructions is a
+natural extension to instruction referencing that's currently unimplemented.
+
+[LiveDebugValues]: SourceLevelDebugging.html#livedebugvalues-expansion-of-variable-locations
diff --git a/llvm/docs/MIRLangRef.rst b/llvm/docs/MIRLangRef.rst

index e977b3b..e7b368d 100644 (file)
--- a/llvm/docs/MIRLangRef.rst
+++ b/llvm/docs/MIRLangRef.rst
@@ -911,5 +911,6 @@ The additional operands to ``DBG_INSTR_REF`` are identical to ``DBG_VALUE``,
  and the ``DBG_INSTR_REF`` s position records where the variable takes on the
  designated value in the same way.
  
-More information about how these constructs are used will appear on the source
-level debugging page in due course, see also :doc:`SourceLevelDebugging` and :doc:`HowToUpdateDebugInfo`.
+More information about how these constructs are used is available in
+:doc:`InstrRefDebugInfo`. The related documents :doc:`SourceLevelDebugging` and
+:doc:`HowToUpdateDebugInfo` may be useful as well.
diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst

index c344c6e..efee7d3 100644 (file)
--- a/llvm/docs/UserGuides.rst
+++ b/llvm/docs/UserGuides.rst
@@ -38,6 +38,7 @@ intermediate LLVM representation.
     HowToCrossCompileBuiltinsOnArm\r
     HowToCrossCompileLLVM\r
     HowToUpdateDebugInfo\r
+   InstrRefDebugInfo\r
     LinkTimeOptimization\r
     LoopTerminology\r
     MarkdownQuickstartTemplate\r
@@ -160,6 +161,15 @@ Optimizations
     This document describes the design and philosophy behind the LLVM\r
     source-level debugger.\r
  \r
+:doc:`How to Update Debug Info <HowToUpdateDebugInfo>`\r
+   This document specifies how to correctly update debug info in various kinds\r
+   of code transformations.\r
+\r
+:doc:`InstrRefDebugInfo`\r
+   This document explains how LLVM uses value tracking, or instruction\r
+   referencing, to determine variable locations for debug info in the final\r
+   stages of compilation.\r
+\r
  Code Generation\r
  ---------------\r
  \r
author	J. Ryan Stinnett <jryans@gmail.com>
	Mon, 9 May 2022 14:45:56 +0000 (15:45 +0100)
committer	J. Ryan Stinnett <jryans@gmail.com>
	Fri, 20 May 2022 13:13:46 +0000 (14:13 +0100)
llvm/docs/InstrRefDebugInfo.md	[new file with mode: 0644]	patch \| blob
llvm/docs/MIRLangRef.rst		patch \| blob \| history
llvm/docs/UserGuides.rst		patch \| blob \| history