// HLO instructions are the IR used by the high-level compiler.
class HloInstruction {
public:
+ // A fusion node computes the same value a call to its fusion computation
+ // would compute. However, the choice of fusion kind dictates codegen
+ // strategy for the backend.
+ //
+ // To generate code for a kFusion HloInstruction, most backends do something
+ // like the following:
+ //
+ // 1) Identify the "primary" HloInstruction of the fused computation.
+ // 2) Emit code that does the work of the primary node, creating its inputs
+ // and transforming its outputs as specified by the fused computation.
+ //
+ // In step (2), the code emitted is usually similar to the code that would be
+ // emitted for an *unfused* version of the primary node, except that
+ //
+ // - when the primary node reads an element of one of its operands, instead
+ // of loading the value from memory, it *computes* the value based on the
+ // contents of the fused computation.
+ // - when the primary node outputs a value, instead of storing it to memory,
+ // it forwards the value to its users, which then perform additional
+ // computations before the value is finally stored to memory at the root of
+ // the fusion node.
+ //
+ // An HloInstruction's FusionKind helps us find the kFusion instruction's
+ // primary node, and can also affect how we generate code in step (2).
+ //
+ // - kInput: The primary node is the root of the fused instruction.
+ //
+ // - kOutput: The primary node is not the root of the fused instruction.
+ // This fusion kind requires that one operand buffer of the fusion
+ // instruction be able to alias the output buffer. This constraint is
+ // usually enough to let backends find the primary node unambiguously.
+ //
+ // - kLoop: The primary node is the root of the fused computation, but,
+ // unlike in input fusion, we prescribe a specific implementation for
+ // codegen. Rather than generating code that looks like the code we'd emit
+ // for an unfused version of the primary/root node, we emit code that
+ // generates one element of the root at a time.
+ //
+ // - kCustom: Custom category for backend-specific fusions that don't fit
+ // into the above patterns.
+ //
+ // Not all backends support all fusion kinds, and given a particular fused
+ // computation, it's not in general safe to change its fusion kind. Creation
+ // of fusion nodes is always backend-specific.
+ //
+ // For elementwise ops (e.g. kAdd), most backends would emit a
+ // one-element-at-a-time implementation for the unfused version, so loop
+ // fusion and input fusion are probably equivalent if the root node is
+ // elementwise. They're not necessarily equivalent e.g. for kReduce, where an
+ // implementation might emit something more sophisticated for an unfused or
+ // input-fusion reduce, but will emit the naive code that reduces one element
+ // at a time for loop fusion with a reduce as the root.
+ //
+ // Another way to think of loop fusion is that it's equivalent to input
+ // fusion, but where the root node is an implicit identity node, whose
+ // unfused implementation is "read one element, write one element".
+ //
+ // TODO(b/79869434): This categorization scheme is not great. For one thing,
+ // input and loop fusion are basically the same thing: There is no reason for
+ // the HLO to encode backend-specific decisions about how e.g. a reduce that's
+ // the root of a fusion should be lowered. In addition, this scheme as
+ // written doesn't work for multi-output fusion, where the primary node is
+ // never actually the root (which is a kTuple instruction that gathers the
+ // multiple outputs of the fusion).
enum class FusionKind {
- kLoop, // Fused into a loop.
- kInput, // Op's input is fused into the op itself.
- kOutput, // Op's output is fused into the op itself.
- // REQUIRES: At least one operand buffer must be able
- // to alias the output buffer.
- kCustom, // Custom category for backend-specific fusions that
- // do not match any of the more specific ones.
+ kLoop,
+ kInput,
+ kOutput,
+ kCustom,
};
~HloInstruction();