From 705b185f90a67b15922f7537f13944e74321f344 Mon Sep 17 00:00:00 2001 From: Chandler Carruth Date: Sat, 31 Jan 2015 03:43:40 +0000 Subject: [PATCH] [PM] Change the core design of the TTI analysis to use a polymorphic type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is *much* more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to *just* do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669 --- llvm/include/llvm/Analysis/TargetTransformInfo.h | 443 +++++++++++---- .../llvm/Analysis/TargetTransformInfoImpl.h | 431 ++++++++++++++ llvm/include/llvm/CodeGen/BasicTTIImpl.h | 626 ++++++++++++++++++++ llvm/include/llvm/InitializePasses.h | 4 +- llvm/include/llvm/Target/TargetMachine.h | 2 +- llvm/lib/Analysis/Analysis.cpp | 2 +- llvm/lib/Analysis/CostModel.cpp | 3 +- llvm/lib/Analysis/FunctionTargetTransformInfo.cpp | 6 +- llvm/lib/Analysis/IPA/InlineCost.cpp | 6 +- llvm/lib/Analysis/TargetTransformInfo.cpp | 627 ++++----------------- llvm/lib/CodeGen/BasicTargetTransformInfo.cpp | 625 +------------------- llvm/lib/CodeGen/CodeGen.cpp | 1 - llvm/lib/CodeGen/CodeGenPrepare.cpp | 4 +- llvm/lib/Target/AArch64/AArch64TargetMachine.cpp | 4 - .../Target/AArch64/AArch64TargetTransformInfo.cpp | 224 ++++---- llvm/lib/Target/ARM/ARMTargetMachine.cpp | 4 - llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | 165 +++--- llvm/lib/Target/Mips/MipsTargetMachine.cpp | 2 +- llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp | 4 - llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp | 88 ++- llvm/lib/Target/PowerPC/PPCTargetMachine.cpp | 4 - llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp | 231 ++++---- llvm/lib/Target/R600/AMDGPUTargetMachine.cpp | 4 - llvm/lib/Target/R600/AMDGPUTargetTransformInfo.cpp | 99 ++-- llvm/lib/Target/Target.cpp | 1 + llvm/lib/Target/TargetMachine.cpp | 6 + llvm/lib/Target/X86/X86TargetMachine.cpp | 4 - llvm/lib/Target/X86/X86TargetTransformInfo.cpp | 289 +++++----- llvm/lib/Target/XCore/XCoreTargetMachine.cpp | 4 - llvm/lib/Target/XCore/XCoreTargetTransformInfo.cpp | 56 +- llvm/lib/Transforms/Scalar/ConstantHoisting.cpp | 6 +- llvm/lib/Transforms/Scalar/EarlyCSE.cpp | 6 +- llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | 3 +- llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp | 7 +- llvm/lib/Transforms/Scalar/LoopRotation.cpp | 6 +- llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp | 10 +- llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp | 7 +- llvm/lib/Transforms/Scalar/LoopUnswitch.cpp | 9 +- .../Transforms/Scalar/PartiallyInlineLibCalls.cpp | 5 +- .../Scalar/SeparateConstOffsetFromGEP.cpp | 7 +- llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp | 7 +- .../Transforms/Scalar/TailRecursionElimination.cpp | 6 +- llvm/lib/Transforms/Vectorize/BBVectorize.cpp | 12 +- llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 6 +- llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | 6 +- llvm/test/Analysis/CostModel/no_info.ll | 7 +- llvm/tools/opt/opt.cpp | 5 + 47 files changed, 2141 insertions(+), 1943 deletions(-) create mode 100644 llvm/include/llvm/Analysis/TargetTransformInfoImpl.h create mode 100644 llvm/include/llvm/CodeGen/BasicTTIImpl.h diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index f949447..b0811ab 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h @@ -53,34 +53,21 @@ struct MemIntrinsicInfo { /// \brief This pass provides access to the codegen interfaces that are needed /// for IR-level transformations. class TargetTransformInfo { -protected: - /// \brief The TTI instance one level down the stack. - /// - /// This is used to implement the default behavior all of the methods which - /// is to delegate up through the stack of TTIs until one can answer the - /// query. - TargetTransformInfo *PrevTTI; - - /// \brief The top of the stack of TTI analyses available. +public: + /// \brief Construct a TTI object using a type implementing the \c Concept + /// API below. /// - /// This is a convenience routine maintained as TTI analyses become available - /// that complements the PrevTTI delegation chain. When one part of an - /// analysis pass wants to query another part of the analysis pass it can use - /// this to start back at the top of the stack. - TargetTransformInfo *TopTTI; + /// This is used by targets to construct a TTI wrapping their target-specific + /// implementaion that encodes appropriate costs for their target. + template TargetTransformInfo(T Impl); - /// All pass subclasses must in their initializePass routine call - /// pushTTIStack with themselves to update the pointers tracking the previous - /// TTI instance in the analysis group's stack, and the top of the analysis - /// group's stack. - void pushTTIStack(Pass *P); + // Provide move semantics. + TargetTransformInfo(TargetTransformInfo &&Arg); + TargetTransformInfo &operator=(TargetTransformInfo &&RHS); - /// All pass subclasses must call TargetTransformInfo::getAnalysisUsage. - virtual void getAnalysisUsage(AnalysisUsage &AU) const; - -public: - /// This class is intended to be subclassed by real implementations. - virtual ~TargetTransformInfo() = 0; + // We need to define the destructor out-of-line to define our sub-classes + // out-of-line. + ~TargetTransformInfo(); /// \name Generic Target Information /// @{ @@ -120,16 +107,15 @@ public: /// /// The returned cost is defined in terms of \c TargetCostConstants, see its /// comments for a detailed explanation of the cost values. - virtual unsigned getOperationCost(unsigned Opcode, Type *Ty, - Type *OpTy = nullptr) const; + unsigned getOperationCost(unsigned Opcode, Type *Ty, + Type *OpTy = nullptr) const; /// \brief Estimate the cost of a GEP operation when lowered. /// /// The contract for this function is the same as \c getOperationCost except /// that it supports an interface that provides extra information specific to /// the GEP operation. - virtual unsigned getGEPCost(const Value *Ptr, - ArrayRef Operands) const; + unsigned getGEPCost(const Value *Ptr, ArrayRef Operands) const; /// \brief Estimate the cost of a function call when lowered. /// @@ -140,31 +126,31 @@ public: /// This is the most basic query for estimating call cost: it only knows the /// function type and (potentially) the number of arguments at the call site. /// The latter is only interesting for varargs function types. - virtual unsigned getCallCost(FunctionType *FTy, int NumArgs = -1) const; + unsigned getCallCost(FunctionType *FTy, int NumArgs = -1) const; /// \brief Estimate the cost of calling a specific function when lowered. /// /// This overload adds the ability to reason about the particular function /// being called in the event it is a library call with special lowering. - virtual unsigned getCallCost(const Function *F, int NumArgs = -1) const; + unsigned getCallCost(const Function *F, int NumArgs = -1) const; /// \brief Estimate the cost of calling a specific function when lowered. /// /// This overload allows specifying a set of candidate argument values. - virtual unsigned getCallCost(const Function *F, - ArrayRef Arguments) const; + unsigned getCallCost(const Function *F, + ArrayRef Arguments) const; /// \brief Estimate the cost of an intrinsic when lowered. /// /// Mirrors the \c getCallCost method but uses an intrinsic identifier. - virtual unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy, - ArrayRef ParamTys) const; + unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy, + ArrayRef ParamTys) const; /// \brief Estimate the cost of an intrinsic when lowered. /// /// Mirrors the \c getCallCost method but uses an intrinsic identifier. - virtual unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy, - ArrayRef Arguments) const; + unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy, + ArrayRef Arguments) const; /// \brief Estimate the cost of a given IR user when lowered. /// @@ -181,13 +167,13 @@ public: /// /// The returned cost is defined in terms of \c TargetCostConstants, see its /// comments for a detailed explanation of the cost values. - virtual unsigned getUserCost(const User *U) const; + unsigned getUserCost(const User *U) const; /// \brief hasBranchDivergence - Return true if branch divergence exists. /// Branch divergence has a significantly negative impact on GPU performance /// when threads in the same wavefront take different paths due to conditional /// branches. - virtual bool hasBranchDivergence() const; + bool hasBranchDivergence() const; /// \brief Test whether calls to a function lower to actual program function /// calls. @@ -201,7 +187,7 @@ public: /// and execution-speed costs. This would allow modelling the core of this /// query more accurately as a call is a single small instruction, but /// incurs significant execution cost. - virtual bool isLoweredToCall(const Function *F) const; + bool isLoweredToCall(const Function *F) const; /// Parameters that control the generic loop unrolling transformation. struct UnrollingPreferences { @@ -243,8 +229,8 @@ public: /// \brief Get target-customized preferences for the generic loop unrolling /// transformation. The caller will initialize UP with the current /// target-independent defaults. - virtual void getUnrollingPreferences(const Function *F, Loop *L, - UnrollingPreferences &UP) const; + void getUnrollingPreferences(const Function *F, Loop *L, + UnrollingPreferences &UP) const; /// @} @@ -264,29 +250,28 @@ public: /// \brief Return true if the specified immediate is legal add immediate, that /// is the target has add instructions which can add a register with the /// immediate without having to materialize the immediate into a register. - virtual bool isLegalAddImmediate(int64_t Imm) const; + bool isLegalAddImmediate(int64_t Imm) const; /// \brief Return true if the specified immediate is legal icmp immediate, /// that is the target has icmp instructions which can compare a register /// against the immediate without having to materialize the immediate into a /// register. - virtual bool isLegalICmpImmediate(int64_t Imm) const; + bool isLegalICmpImmediate(int64_t Imm) const; /// \brief Return true if the addressing mode represented by AM is legal for /// this target, for a load/store of the specified type. /// The type may be VoidTy, in which case only return true if the addressing /// mode is legal for a load/store of any legal type. /// TODO: Handle pre/postinc as well. - virtual bool isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV, - int64_t BaseOffset, bool HasBaseReg, - int64_t Scale) const; + bool isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV, int64_t BaseOffset, + bool HasBaseReg, int64_t Scale) const; /// \brief Return true if the target works with masked instruction /// AVX2 allows masks for consecutive load and store for i32 and i64 elements. /// AVX-512 architecture will also allow masks for non-consecutive memory /// accesses. - virtual bool isLegalMaskedStore(Type *DataType, int Consecutive) const; - virtual bool isLegalMaskedLoad(Type *DataType, int Consecutive) const; + bool isLegalMaskedStore(Type *DataType, int Consecutive) const; + bool isLegalMaskedLoad(Type *DataType, int Consecutive) const; /// \brief Return the cost of the scaling factor used in the addressing /// mode represented by AM for this target, for a load/store @@ -294,45 +279,44 @@ public: /// If the AM is supported, the return value must be >= 0. /// If the AM is not supported, it returns a negative value. /// TODO: Handle pre/postinc as well. - virtual int getScalingFactorCost(Type *Ty, GlobalValue *BaseGV, - int64_t BaseOffset, bool HasBaseReg, - int64_t Scale) const; + int getScalingFactorCost(Type *Ty, GlobalValue *BaseGV, int64_t BaseOffset, + bool HasBaseReg, int64_t Scale) const; /// \brief Return true if it's free to truncate a value of type Ty1 to type /// Ty2. e.g. On x86 it's free to truncate a i32 value in register EAX to i16 /// by referencing its sub-register AX. - virtual bool isTruncateFree(Type *Ty1, Type *Ty2) const; + bool isTruncateFree(Type *Ty1, Type *Ty2) const; /// \brief Return true if this type is legal. - virtual bool isTypeLegal(Type *Ty) const; + bool isTypeLegal(Type *Ty) const; /// \brief Returns the target's jmp_buf alignment in bytes. - virtual unsigned getJumpBufAlignment() const; + unsigned getJumpBufAlignment() const; /// \brief Returns the target's jmp_buf size in bytes. - virtual unsigned getJumpBufSize() const; + unsigned getJumpBufSize() const; /// \brief Return true if switches should be turned into lookup tables for the /// target. - virtual bool shouldBuildLookupTables() const; + bool shouldBuildLookupTables() const; /// \brief Return hardware support for population count. - virtual PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) const; + PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) const; /// \brief Return true if the hardware has a fast square-root instruction. - virtual bool haveFastSqrt(Type *Ty) const; + bool haveFastSqrt(Type *Ty) const; /// \brief Return the expected cost of materializing for the given integer /// immediate of the specified type. - virtual unsigned getIntImmCost(const APInt &Imm, Type *Ty) const; + unsigned getIntImmCost(const APInt &Imm, Type *Ty) const; /// \brief Return the expected cost of materialization for the given integer /// immediate of the specified type for a given instruction. The cost can be /// zero if the immediate can be folded into the specified instruction. - virtual unsigned getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm, - Type *Ty) const; - virtual unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, - const APInt &Imm, Type *Ty) const; + unsigned getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm, + Type *Ty) const; + unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm, + Type *Ty) const; /// @} /// \name Vector Target Information @@ -361,18 +345,18 @@ public: /// \return The number of scalar or vector registers that the target has. /// If 'Vectors' is true, it returns the number of vector registers. If it is /// set to false, it returns the number of scalar registers. - virtual unsigned getNumberOfRegisters(bool Vector) const; + unsigned getNumberOfRegisters(bool Vector) const; /// \return The width of the largest scalar or vector register type. - virtual unsigned getRegisterBitWidth(bool Vector) const; + unsigned getRegisterBitWidth(bool Vector) const; /// \return The maximum interleave factor that any transform should try to /// perform for this target. This number depends on the level of parallelism /// and the number of execution units in the CPU. - virtual unsigned getMaxInterleaveFactor() const; + unsigned getMaxInterleaveFactor() const; /// \return The expected cost of arithmetic ops, such as mul, xor, fsub, etc. - virtual unsigned + unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind Opd1Info = OK_AnyValue, OperandValueKind Opd2Info = OK_AnyValue, @@ -382,36 +366,33 @@ public: /// \return The cost of a shuffle instruction of kind Kind and of type Tp. /// The index and subtype parameters are used by the subvector insertion and /// extraction shuffle kinds. - virtual unsigned getShuffleCost(ShuffleKind Kind, Type *Tp, int Index = 0, - Type *SubTp = nullptr) const; + unsigned getShuffleCost(ShuffleKind Kind, Type *Tp, int Index = 0, + Type *SubTp = nullptr) const; /// \return The expected cost of cast instructions, such as bitcast, trunc, /// zext, etc. - virtual unsigned getCastInstrCost(unsigned Opcode, Type *Dst, - Type *Src) const; + unsigned getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) const; /// \return The expected cost of control-flow related instructions such as /// Phi, Ret, Br. - virtual unsigned getCFInstrCost(unsigned Opcode) const; + unsigned getCFInstrCost(unsigned Opcode) const; /// \returns The expected cost of compare and select instructions. - virtual unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, - Type *CondTy = nullptr) const; + unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, + Type *CondTy = nullptr) const; /// \return The expected cost of vector Insert and Extract. /// Use -1 to indicate that there is no information on the index value. - virtual unsigned getVectorInstrCost(unsigned Opcode, Type *Val, - unsigned Index = -1) const; + unsigned getVectorInstrCost(unsigned Opcode, Type *Val, + unsigned Index = -1) const; /// \return The cost of Load and Store instructions. - virtual unsigned getMemoryOpCost(unsigned Opcode, Type *Src, - unsigned Alignment, - unsigned AddressSpace) const; + unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, + unsigned AddressSpace) const; /// \return The cost of masked Load and Store instructions. - virtual unsigned getMaskedMemoryOpCost(unsigned Opcode, Type *Src, - unsigned Alignment, - unsigned AddressSpace) const; + unsigned getMaskedMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, + unsigned AddressSpace) const; /// \brief Calculate the cost of performing a vector reduction. /// @@ -426,16 +407,16 @@ public: /// Split: /// (v0, v1, v2, v3) /// ((v0+v2), (v1+v3), undef, undef) - virtual unsigned getReductionCost(unsigned Opcode, Type *Ty, - bool IsPairwiseForm) const; + unsigned getReductionCost(unsigned Opcode, Type *Ty, + bool IsPairwiseForm) const; /// \returns The cost of Intrinsic instructions. - virtual unsigned getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy, - ArrayRef Tys) const; + unsigned getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy, + ArrayRef Tys) const; /// \returns The number of pieces into which the provided type must be /// split during legalization. Zero is returned when the answer is unknown. - virtual unsigned getNumberOfParts(Type *Tp) const; + unsigned getNumberOfParts(Type *Tp) const; /// \returns The cost of the address computation. For most targets this can be /// merged into the instruction indexing mode. Some targets might want to @@ -444,34 +425,302 @@ public: /// The 'IsComplex' parameter is a hint that the address computation is likely /// to involve multiple instructions and as such unlikely to be merged into /// the address indexing mode. - virtual unsigned getAddressComputationCost(Type *Ty, - bool IsComplex = false) const; + unsigned getAddressComputationCost(Type *Ty, bool IsComplex = false) const; /// \returns The cost, if any, of keeping values of the given types alive /// over a callsite. /// /// Some types may require the use of register classes that do not have /// any callee-saved registers, so would require a spill and fill. - virtual unsigned getCostOfKeepingLiveOverCall(ArrayRef Tys) const; + unsigned getCostOfKeepingLiveOverCall(ArrayRef Tys) const; /// \returns True if the intrinsic is a supported memory intrinsic. Info /// will contain additional information - whether the intrinsic may write /// or read to memory, volatility and the pointer. Info is undefined /// if false is returned. - virtual bool getTgtMemIntrinsic(IntrinsicInst *Inst, - MemIntrinsicInfo &Info) const; + bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info) const; /// \returns A value which is the result of the given memory intrinsic. New /// instructions may be created to extract the result from the given intrinsic /// memory operation. Returns nullptr if the target cannot create a result /// from the given intrinsic. - virtual Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, - Type *ExpectedType) const; + Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, + Type *ExpectedType) const; /// @} - /// Analysis group identification. +private: + /// \brief The abstract base class used to type erase specific TTI + /// implementations. + class Concept; + + /// \brief The template model for the base class which wraps a concrete + /// implementation in a type erased interface. + template class Model; + + std::unique_ptr TTIImpl; +}; + +class TargetTransformInfo::Concept { +public: + virtual ~Concept() = 0; + + virtual unsigned getOperationCost(unsigned Opcode, Type *Ty, Type *OpTy) = 0; + virtual unsigned getGEPCost(const Value *Ptr, + ArrayRef Operands) = 0; + virtual unsigned getCallCost(FunctionType *FTy, int NumArgs) = 0; + virtual unsigned getCallCost(const Function *F, int NumArgs) = 0; + virtual unsigned getCallCost(const Function *F, + ArrayRef Arguments) = 0; + virtual unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy, + ArrayRef ParamTys) = 0; + virtual unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy, + ArrayRef Arguments) = 0; + virtual unsigned getUserCost(const User *U) = 0; + virtual bool hasBranchDivergence() = 0; + virtual bool isLoweredToCall(const Function *F) = 0; + virtual void getUnrollingPreferences(const Function *F, Loop *L, + UnrollingPreferences &UP) = 0; + virtual bool isLegalAddImmediate(int64_t Imm) = 0; + virtual bool isLegalICmpImmediate(int64_t Imm) = 0; + virtual bool isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV, + int64_t BaseOffset, bool HasBaseReg, + int64_t Scale) = 0; + virtual bool isLegalMaskedStore(Type *DataType, int Consecutive) = 0; + virtual bool isLegalMaskedLoad(Type *DataType, int Consecutive) = 0; + virtual int getScalingFactorCost(Type *Ty, GlobalValue *BaseGV, + int64_t BaseOffset, bool HasBaseReg, + int64_t Scale) = 0; + virtual bool isTruncateFree(Type *Ty1, Type *Ty2) = 0; + virtual bool isTypeLegal(Type *Ty) = 0; + virtual unsigned getJumpBufAlignment() = 0; + virtual unsigned getJumpBufSize() = 0; + virtual bool shouldBuildLookupTables() = 0; + virtual PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) = 0; + virtual bool haveFastSqrt(Type *Ty) = 0; + virtual unsigned getIntImmCost(const APInt &Imm, Type *Ty) = 0; + virtual unsigned getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm, + Type *Ty) = 0; + virtual unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, + const APInt &Imm, Type *Ty) = 0; + virtual unsigned getNumberOfRegisters(bool Vector) = 0; + virtual unsigned getRegisterBitWidth(bool Vector) = 0; + virtual unsigned getMaxInterleaveFactor() = 0; + virtual unsigned + getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind Opd1Info, + OperandValueKind Opd2Info, + OperandValueProperties Opd1PropInfo, + OperandValueProperties Opd2PropInfo) = 0; + virtual unsigned getShuffleCost(ShuffleKind Kind, Type *Tp, int Index, + Type *SubTp) = 0; + virtual unsigned getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) = 0; + virtual unsigned getCFInstrCost(unsigned Opcode) = 0; + virtual unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, + Type *CondTy) = 0; + virtual unsigned getVectorInstrCost(unsigned Opcode, Type *Val, + unsigned Index) = 0; + virtual unsigned getMemoryOpCost(unsigned Opcode, Type *Src, + unsigned Alignment, + unsigned AddressSpace) = 0; + virtual unsigned getMaskedMemoryOpCost(unsigned Opcode, Type *Src, + unsigned Alignment, + unsigned AddressSpace) = 0; + virtual unsigned getReductionCost(unsigned Opcode, Type *Ty, + bool IsPairwiseForm) = 0; + virtual unsigned getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy, + ArrayRef Tys) = 0; + virtual unsigned getNumberOfParts(Type *Tp) = 0; + virtual unsigned getAddressComputationCost(Type *Ty, bool IsComplex) = 0; + virtual unsigned getCostOfKeepingLiveOverCall(ArrayRef Tys) = 0; + virtual bool getTgtMemIntrinsic(IntrinsicInst *Inst, + MemIntrinsicInfo &Info) = 0; + virtual Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, + Type *ExpectedType) = 0; +}; + +template +class TargetTransformInfo::Model final : public TargetTransformInfo::Concept { + T Impl; + +public: + Model(T Impl) : Impl(std::move(Impl)) {} + ~Model() override {} + + unsigned getOperationCost(unsigned Opcode, Type *Ty, Type *OpTy) override { + return Impl.getOperationCost(Opcode, Ty, OpTy); + } + unsigned getGEPCost(const Value *Ptr, + ArrayRef Operands) override { + return Impl.getGEPCost(Ptr, Operands); + } + unsigned getCallCost(FunctionType *FTy, int NumArgs) override { + return Impl.getCallCost(FTy, NumArgs); + } + unsigned getCallCost(const Function *F, int NumArgs) override { + return Impl.getCallCost(F, NumArgs); + } + unsigned getCallCost(const Function *F, + ArrayRef Arguments) override { + return Impl.getCallCost(F, Arguments); + } + unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy, + ArrayRef ParamTys) override { + return Impl.getIntrinsicCost(IID, RetTy, ParamTys); + } + unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy, + ArrayRef Arguments) override { + return Impl.getIntrinsicCost(IID, RetTy, Arguments); + } + unsigned getUserCost(const User *U) override { return Impl.getUserCost(U); } + bool hasBranchDivergence() override { return Impl.hasBranchDivergence(); } + bool isLoweredToCall(const Function *F) override { + return Impl.isLoweredToCall(F); + } + void getUnrollingPreferences(const Function *F, Loop *L, + UnrollingPreferences &UP) override { + return Impl.getUnrollingPreferences(F, L, UP); + } + bool isLegalAddImmediate(int64_t Imm) override { + return Impl.isLegalAddImmediate(Imm); + } + bool isLegalICmpImmediate(int64_t Imm) override { + return Impl.isLegalICmpImmediate(Imm); + } + bool isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV, int64_t BaseOffset, + bool HasBaseReg, int64_t Scale) override { + return Impl.isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg, + Scale); + } + bool isLegalMaskedStore(Type *DataType, int Consecutive) override { + return Impl.isLegalMaskedStore(DataType, Consecutive); + } + bool isLegalMaskedLoad(Type *DataType, int Consecutive) override { + return Impl.isLegalMaskedLoad(DataType, Consecutive); + } + int getScalingFactorCost(Type *Ty, GlobalValue *BaseGV, int64_t BaseOffset, + bool HasBaseReg, int64_t Scale) override { + return Impl.getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg, Scale); + } + bool isTruncateFree(Type *Ty1, Type *Ty2) override { + return Impl.isTruncateFree(Ty1, Ty2); + } + bool isTypeLegal(Type *Ty) override { return Impl.isTypeLegal(Ty); } + unsigned getJumpBufAlignment() override { return Impl.getJumpBufAlignment(); } + unsigned getJumpBufSize() override { return Impl.getJumpBufSize(); } + bool shouldBuildLookupTables() override { + return Impl.shouldBuildLookupTables(); + } + PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) override { + return Impl.getPopcntSupport(IntTyWidthInBit); + } + bool haveFastSqrt(Type *Ty) override { return Impl.haveFastSqrt(Ty); } + unsigned getIntImmCost(const APInt &Imm, Type *Ty) override { + return Impl.getIntImmCost(Imm, Ty); + } + unsigned getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm, + Type *Ty) override { + return Impl.getIntImmCost(Opc, Idx, Imm, Ty); + } + unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm, + Type *Ty) override { + return Impl.getIntImmCost(IID, Idx, Imm, Ty); + } + unsigned getNumberOfRegisters(bool Vector) override { + return Impl.getNumberOfRegisters(Vector); + } + unsigned getRegisterBitWidth(bool Vector) override { + return Impl.getRegisterBitWidth(Vector); + } + unsigned getMaxInterleaveFactor() override { + return Impl.getMaxInterleaveFactor(); + } + unsigned + getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind Opd1Info, + OperandValueKind Opd2Info, + OperandValueProperties Opd1PropInfo, + OperandValueProperties Opd2PropInfo) override { + return Impl.getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info, + Opd1PropInfo, Opd2PropInfo); + } + unsigned getShuffleCost(ShuffleKind Kind, Type *Tp, int Index, + Type *SubTp) override { + return Impl.getShuffleCost(Kind, Tp, Index, SubTp); + } + unsigned getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) override { + return Impl.getCastInstrCost(Opcode, Dst, Src); + } + unsigned getCFInstrCost(unsigned Opcode) override { + return Impl.getCFInstrCost(Opcode); + } + unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, + Type *CondTy) override { + return Impl.getCmpSelInstrCost(Opcode, ValTy, CondTy); + } + unsigned getVectorInstrCost(unsigned Opcode, Type *Val, + unsigned Index) override { + return Impl.getVectorInstrCost(Opcode, Val, Index); + } + unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, + unsigned AddressSpace) override { + return Impl.getMemoryOpCost(Opcode, Src, Alignment, AddressSpace); + } + unsigned getMaskedMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, + unsigned AddressSpace) override { + return Impl.getMaskedMemoryOpCost(Opcode, Src, Alignment, AddressSpace); + } + unsigned getReductionCost(unsigned Opcode, Type *Ty, + bool IsPairwiseForm) override { + return Impl.getReductionCost(Opcode, Ty, IsPairwiseForm); + } + unsigned getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy, + ArrayRef Tys) override { + return Impl.getIntrinsicInstrCost(ID, RetTy, Tys); + } + unsigned getNumberOfParts(Type *Tp) override { + return Impl.getNumberOfParts(Tp); + } + unsigned getAddressComputationCost(Type *Ty, bool IsComplex) override { + return Impl.getAddressComputationCost(Ty, IsComplex); + } + unsigned getCostOfKeepingLiveOverCall(ArrayRef Tys) override { + return Impl.getCostOfKeepingLiveOverCall(Tys); + } + bool getTgtMemIntrinsic(IntrinsicInst *Inst, + MemIntrinsicInfo &Info) override { + return Impl.getTgtMemIntrinsic(Inst, Info); + } + Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, + Type *ExpectedType) override { + return Impl.getOrCreateResultFromMemIntrinsic(Inst, ExpectedType); + } +}; + +template +TargetTransformInfo::TargetTransformInfo(T Impl) + : TTIImpl(new Model(Impl)) {} + +/// \brief Wrapper pass for TargetTransformInfo. +/// +/// This pass can be constructed from a TTI object which it stores internally +/// and is queried by passes. +class TargetTransformInfoWrapperPass : public ImmutablePass { + TargetTransformInfo TTI; + + virtual void anchor(); + +public: static char ID; + + /// \brief We must provide a default constructor for the pass but it should + /// never be used. + /// + /// Use the constructor below or call one of the creation routines. + TargetTransformInfoWrapperPass(); + + explicit TargetTransformInfoWrapperPass(TargetTransformInfo TTI); + + TargetTransformInfo &getTTI() { return TTI; } + const TargetTransformInfo &getTTI() const { return TTI; } }; /// \brief Create the base case instance of a pass in the TTI analysis group. @@ -479,7 +728,7 @@ public: /// This class provides the base case for the stack of TTI analyzes. It doesn't /// delegate to anything and uses the STTI and VTTI objects passed in to /// satisfy the queries. -ImmutablePass *createNoTargetTransformInfoPass(); +ImmutablePass *createNoTargetTransformInfoPass(const DataLayout *DL); } // End llvm namespace diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h new file mode 100644 index 0000000..01fb8b6 --- /dev/null +++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h @@ -0,0 +1,431 @@ +//===- TargetTransformInfoImpl.h --------------------------------*- C++ -*-===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===----------------------------------------------------------------------===// +/// \file +/// This file provides helpers for the implementation of +/// a TargetTransformInfo-conforming class. +/// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_ANALYSIS_TARGETTRANSFORMINFOIMPL_H +#define LLVM_ANALYSIS_TARGETTRANSFORMINFOIMPL_H + +#include "llvm/Analysis/TargetTransformInfo.h" +#include "llvm/IR/CallSite.h" +#include "llvm/IR/DataLayout.h" +#include "llvm/IR/Operator.h" +#include "llvm/IR/Function.h" +#include "llvm/IR/Type.h" + +namespace llvm { + +/// \brief Base class for use as a mix-in that aids implementing +/// a TargetTransformInfo-compatible class. +class TargetTransformInfoImplBase { +protected: + typedef TargetTransformInfo TTI; + + const DataLayout *DL; + + explicit TargetTransformInfoImplBase(const DataLayout *DL) + : DL(DL) {} + +public: + // Provide value semantics. MSVC requires that we spell all of these out. + TargetTransformInfoImplBase(const TargetTransformInfoImplBase &Arg) + : DL(Arg.DL) {} + TargetTransformInfoImplBase(TargetTransformInfoImplBase &&Arg) + : DL(std::move(Arg.DL)) {} + TargetTransformInfoImplBase & + operator=(const TargetTransformInfoImplBase &RHS) { + DL = RHS.DL; + return *this; + } + TargetTransformInfoImplBase &operator=(TargetTransformInfoImplBase &&RHS) { + DL = std::move(RHS.DL); + return *this; + } + + unsigned getOperationCost(unsigned Opcode, Type *Ty, Type *OpTy) { + switch (Opcode) { + default: + // By default, just classify everything as 'basic'. + return TTI::TCC_Basic; + + case Instruction::GetElementPtr: + llvm_unreachable("Use getGEPCost for GEP operations!"); + + case Instruction::BitCast: + assert(OpTy && "Cast instructions must provide the operand type"); + if (Ty == OpTy || (Ty->isPointerTy() && OpTy->isPointerTy())) + // Identity and pointer-to-pointer casts are free. + return TTI::TCC_Free; + + // Otherwise, the default basic cost is used. + return TTI::TCC_Basic; + + case Instruction::IntToPtr: { + if (!DL) + return TTI::TCC_Basic; + + // An inttoptr cast is free so long as the input is a legal integer type + // which doesn't contain values outside the range of a pointer. + unsigned OpSize = OpTy->getScalarSizeInBits(); + if (DL->isLegalInteger(OpSize) && + OpSize <= DL->getPointerTypeSizeInBits(Ty)) + return TTI::TCC_Free; + + // Otherwise it's not a no-op. + return TTI::TCC_Basic; + } + case Instruction::PtrToInt: { + if (!DL) + return TTI::TCC_Basic; + + // A ptrtoint cast is free so long as the result is large enough to store + // the pointer, and a legal integer type. + unsigned DestSize = Ty->getScalarSizeInBits(); + if (DL->isLegalInteger(DestSize) && + DestSize >= DL->getPointerTypeSizeInBits(OpTy)) + return TTI::TCC_Free; + + // Otherwise it's not a no-op. + return TTI::TCC_Basic; + } + case Instruction::Trunc: + // trunc to a native type is free (assuming the target has compare and + // shift-right of the same width). + if (DL && DL->isLegalInteger(DL->getTypeSizeInBits(Ty))) + return TTI::TCC_Free; + + return TTI::TCC_Basic; + } + } + + unsigned getGEPCost(const Value *Ptr, ArrayRef Operands) { + // In the basic model, we just assume that all-constant GEPs will be folded + // into their uses via addressing modes. + for (unsigned Idx = 0, Size = Operands.size(); Idx != Size; ++Idx) + if (!isa(Operands[Idx])) + return TTI::TCC_Basic; + + return TTI::TCC_Free; + } + + unsigned getCallCost(FunctionType *FTy, int NumArgs) { + assert(FTy && "FunctionType must be provided to this routine."); + + // The target-independent implementation just measures the size of the + // function by approximating that each argument will take on average one + // instruction to prepare. + + if (NumArgs < 0) + // Set the argument number to the number of explicit arguments in the + // function. + NumArgs = FTy->getNumParams(); + + return TTI::TCC_Basic * (NumArgs + 1); + } + + unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy, + ArrayRef ParamTys) { + switch (IID) { + default: + // Intrinsics rarely (if ever) have normal argument setup constraints. + // Model them as having a basic instruction cost. + // FIXME: This is wrong for libc intrinsics. + return TTI::TCC_Basic; + + case Intrinsic::annotation: + case Intrinsic::assume: + case Intrinsic::dbg_declare: + case Intrinsic::dbg_value: + case Intrinsic::invariant_start: + case Intrinsic::invariant_end: + case Intrinsic::lifetime_start: + case Intrinsic::lifetime_end: + case Intrinsic::objectsize: + case Intrinsic::ptr_annotation: + case Intrinsic::var_annotation: + case Intrinsic::experimental_gc_result_int: + case Intrinsic::experimental_gc_result_float: + case Intrinsic::experimental_gc_result_ptr: + case Intrinsic::experimental_gc_result: + case Intrinsic::experimental_gc_relocate: + // These intrinsics don't actually represent code after lowering. + return TTI::TCC_Free; + } + } + + bool hasBranchDivergence() { return false; } + + bool isLoweredToCall(const Function *F) { + // FIXME: These should almost certainly not be handled here, and instead + // handled with the help of TLI or the target itself. This was largely + // ported from existing analysis heuristics here so that such refactorings + // can take place in the future. + + if (F->isIntrinsic()) + return false; + + if (F->hasLocalLinkage() || !F->hasName()) + return true; + + StringRef Name = F->getName(); + + // These will all likely lower to a single selection DAG node. + if (Name == "copysign" || Name == "copysignf" || Name == "copysignl" || + Name == "fabs" || Name == "fabsf" || Name == "fabsl" || Name == "sin" || + Name == "fmin" || Name == "fminf" || Name == "fminl" || + Name == "fmax" || Name == "fmaxf" || Name == "fmaxl" || + Name == "sinf" || Name == "sinl" || Name == "cos" || Name == "cosf" || + Name == "cosl" || Name == "sqrt" || Name == "sqrtf" || Name == "sqrtl") + return false; + + // These are all likely to be optimized into something smaller. + if (Name == "pow" || Name == "powf" || Name == "powl" || Name == "exp2" || + Name == "exp2l" || Name == "exp2f" || Name == "floor" || + Name == "floorf" || Name == "ceil" || Name == "round" || + Name == "ffs" || Name == "ffsl" || Name == "abs" || Name == "labs" || + Name == "llabs") + return false; + + return true; + } + + void getUnrollingPreferences(const Function *, Loop *, + TTI::UnrollingPreferences &) {} + + bool isLegalAddImmediate(int64_t Imm) { return false; } + + bool isLegalICmpImmediate(int64_t Imm) { return false; } + + bool isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV, int64_t BaseOffset, + bool HasBaseReg, int64_t Scale) { + // Guess that reg+reg addressing is allowed. This heuristic is taken from + // the implementation of LSR. + return !BaseGV && BaseOffset == 0 && Scale <= 1; + } + + bool isLegalMaskedStore(Type *DataType, int Consecutive) { return false; } + + bool isLegalMaskedLoad(Type *DataType, int Consecutive) { return false; } + + int getScalingFactorCost(Type *Ty, GlobalValue *BaseGV, int64_t BaseOffset, + bool HasBaseReg, int64_t Scale) { + // Guess that all legal addressing mode are free. + if (isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg, Scale)) + return 0; + return -1; + } + + bool isTruncateFree(Type *Ty1, Type *Ty2) { return false; } + + bool isTypeLegal(Type *Ty) { return false; } + + unsigned getJumpBufAlignment() { return 0; } + + unsigned getJumpBufSize() { return 0; } + + bool shouldBuildLookupTables() { return true; } + + TTI::PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) { + return TTI::PSK_Software; + } + + bool haveFastSqrt(Type *Ty) { return false; } + + unsigned getIntImmCost(const APInt &Imm, Type *Ty) { return TTI::TCC_Basic; } + + unsigned getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, + Type *Ty) { + return TTI::TCC_Free; + } + + unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm, + Type *Ty) { + return TTI::TCC_Free; + } + + unsigned getNumberOfRegisters(bool Vector) { return 8; } + + unsigned getRegisterBitWidth(bool Vector) { return 32; } + + unsigned getMaxInterleaveFactor() { return 1; } + + unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty, + TTI::OperandValueKind Opd1Info, + TTI::OperandValueKind Opd2Info, + TTI::OperandValueProperties Opd1PropInfo, + TTI::OperandValueProperties Opd2PropInfo) { + return 1; + } + + unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Ty, int Index, + Type *SubTp) { + return 1; + } + + unsigned getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) { return 1; } + + unsigned getCFInstrCost(unsigned Opcode) { return 1; } + + unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy) { + return 1; + } + + unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) { + return 1; + } + + unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, + unsigned AddressSpace) { + return 1; + } + + unsigned getMaskedMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, + unsigned AddressSpace) { + return 1; + } + + unsigned getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy, + ArrayRef Tys) { + return 1; + } + + unsigned getNumberOfParts(Type *Tp) { return 0; } + + unsigned getAddressComputationCost(Type *Tp, bool) { return 0; } + + unsigned getReductionCost(unsigned, Type *, bool) { return 1; } + + unsigned getCostOfKeepingLiveOverCall(ArrayRef Tys) { return 0; } + + bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info) { + return false; + } + + Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, + Type *ExpectedType) { + return nullptr; + } +}; + +/// \brief CRTP base class for use as a mix-in that aids implementing +/// a TargetTransformInfo-compatible class. +template +class TargetTransformInfoImplCRTPBase : public TargetTransformInfoImplBase { +private: + typedef TargetTransformInfoImplBase BaseT; + +protected: + explicit TargetTransformInfoImplCRTPBase(const DataLayout *DL) + : BaseT(DL) {} + +public: + // Provide value semantics. MSVC requires that we spell all of these out. + TargetTransformInfoImplCRTPBase(const TargetTransformInfoImplCRTPBase &Arg) + : BaseT(static_cast(Arg)) {} + TargetTransformInfoImplCRTPBase(TargetTransformInfoImplCRTPBase &&Arg) + : BaseT(std::move(static_cast(Arg))) {} + TargetTransformInfoImplCRTPBase & + operator=(const TargetTransformInfoImplCRTPBase &RHS) { + BaseT::operator=(static_cast(RHS)); + return *this; + } + TargetTransformInfoImplCRTPBase & + operator=(TargetTransformInfoImplCRTPBase &&RHS) { + BaseT::operator=(std::move(static_cast(RHS))); + return *this; + } + + using BaseT::getCallCost; + + unsigned getCallCost(const Function *F, int NumArgs) { + assert(F && "A concrete function must be provided to this routine."); + + if (NumArgs < 0) + // Set the argument number to the number of explicit arguments in the + // function. + NumArgs = F->arg_size(); + + if (Intrinsic::ID IID = (Intrinsic::ID)F->getIntrinsicID()) { + FunctionType *FTy = F->getFunctionType(); + SmallVector ParamTys(FTy->param_begin(), FTy->param_end()); + return static_cast(this) + ->getIntrinsicCost(IID, FTy->getReturnType(), ParamTys); + } + + if (!static_cast(this)->isLoweredToCall(F)) + return TTI::TCC_Basic; // Give a basic cost if it will be lowered + // directly. + + return static_cast(this)->getCallCost(F->getFunctionType(), NumArgs); + } + + unsigned getCallCost(const Function *F, ArrayRef Arguments) { + // Simply delegate to generic handling of the call. + // FIXME: We should use instsimplify or something else to catch calls which + // will constant fold with these arguments. + return static_cast(this)->getCallCost(F, Arguments.size()); + } + + using BaseT::getIntrinsicCost; + + unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy, + ArrayRef Arguments) { + // Delegate to the generic intrinsic handling code. This mostly provides an + // opportunity for targets to (for example) special case the cost of + // certain intrinsics based on constants used as arguments. + SmallVector ParamTys; + ParamTys.reserve(Arguments.size()); + for (unsigned Idx = 0, Size = Arguments.size(); Idx != Size; ++Idx) + ParamTys.push_back(Arguments[Idx]->getType()); + return static_cast(this)->getIntrinsicCost(IID, RetTy, ParamTys); + } + + unsigned getUserCost(const User *U) { + if (isa(U)) + return TTI::TCC_Free; // Model all PHI nodes as free. + + if (const GEPOperator *GEP = dyn_cast(U)) { + SmallVector Indices(GEP->idx_begin(), GEP->idx_end()); + return static_cast(this) + ->getGEPCost(GEP->getPointerOperand(), Indices); + } + + if (ImmutableCallSite CS = U) { + const Function *F = CS.getCalledFunction(); + if (!F) { + // Just use the called value type. + Type *FTy = CS.getCalledValue()->getType()->getPointerElementType(); + return static_cast(this) + ->getCallCost(cast(FTy), CS.arg_size()); + } + + SmallVector Arguments(CS.arg_begin(), CS.arg_end()); + return static_cast(this)->getCallCost(F, Arguments); + } + + if (const CastInst *CI = dyn_cast(U)) { + // Result of a cmp instruction is often extended (to be used by other + // cmp instructions, logical or return instructions). These are usually + // nop on most sane targets. + if (isa(CI->getOperand(0))) + return TTI::TCC_Free; + } + + // Otherwise delegate to the fully generic implementations. + return getOperationCost( + Operator::getOpcode(U), U->getType(), + U->getNumOperands() == 1 ? U->getOperand(0)->getType() : nullptr); + } +}; +} + +#endif diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h new file mode 100644 index 0000000..7d0aeb4 --- /dev/null +++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h @@ -0,0 +1,626 @@ +//===- BasicTTIImpl.h -------------------------------------------*- C++ -*-===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===----------------------------------------------------------------------===// +/// \file +/// This file provides a helper that implements much of the TTI interface in +/// terms of the target-independent code generator and TargetLowering +/// interfaces. +/// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_CODEGEN_BASICTTIIMPL_H +#define LLVM_CODEGEN_BASICTTIIMPL_H + +#include "llvm/Analysis/LoopInfo.h" +#include "llvm/Analysis/TargetTransformInfoImpl.h" +#include "llvm/Support/CommandLine.h" +#include "llvm/Target/TargetLowering.h" +#include "llvm/Target/TargetSubtargetInfo.h" + +namespace llvm { + +extern cl::opt PartialUnrollingThreshold; + +template +class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase { +private: + typedef TargetTransformInfoImplCRTPBase BaseT; + typedef TargetTransformInfo TTI; + + /// Estimate the overhead of scalarizing an instruction. Insert and Extract + /// are set if the result needs to be inserted and/or extracted from vectors. + unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) { + assert(Ty->isVectorTy() && "Can only scalarize vectors"); + unsigned Cost = 0; + + for (int i = 0, e = Ty->getVectorNumElements(); i < e; ++i) { + if (Insert) + Cost += static_cast(this) + ->getVectorInstrCost(Instruction::InsertElement, Ty, i); + if (Extract) + Cost += static_cast(this) + ->getVectorInstrCost(Instruction::ExtractElement, Ty, i); + } + + return Cost; + } + + /// Estimate the cost overhead of SK_Alternate shuffle. + unsigned getAltShuffleOverhead(Type *Ty) { + assert(Ty->isVectorTy() && "Can only shuffle vectors"); + unsigned Cost = 0; + // Shuffle cost is equal to the cost of extracting element from its argument + // plus the cost of inserting them onto the result vector. + + // e.g. <4 x float> has a mask of <0,5,2,7> i.e we need to extract from + // index 0 of first vector, index 1 of second vector,index 2 of first + // vector and finally index 3 of second vector and insert them at index + // <0,1,2,3> of result vector. + for (int i = 0, e = Ty->getVectorNumElements(); i < e; ++i) { + Cost += static_cast(this) + ->getVectorInstrCost(Instruction::InsertElement, Ty, i); + Cost += static_cast(this) + ->getVectorInstrCost(Instruction::ExtractElement, Ty, i); + } + return Cost; + } + + const TargetLoweringBase *getTLI() const { + return TM->getSubtargetImpl()->getTargetLowering(); + } + +protected: + const TargetMachine *TM; + + explicit BasicTTIImplBase(const TargetMachine *TM = nullptr) + : BaseT(TM ? TM->getDataLayout() : nullptr), TM(TM) {} + +public: + // Provide value semantics. MSVC requires that we spell all of these out. + BasicTTIImplBase(const BasicTTIImplBase &Arg) + : BaseT(static_cast(Arg)), TM(Arg.TM) {} + BasicTTIImplBase(BasicTTIImplBase &&Arg) + : BaseT(std::move(static_cast(Arg))), TM(std::move(Arg.TM)) {} + BasicTTIImplBase &operator=(const BasicTTIImplBase &RHS) { + BaseT::operator=(static_cast(RHS)); + TM = RHS.TM; + return *this; + } + BasicTTIImplBase &operator=(BasicTTIImplBase &&RHS) { + BaseT::operator=(std::move(static_cast(RHS))); + TM = std::move(RHS.TM); + return *this; + } + + /// \name Scalar TTI Implementations + /// @{ + + bool hasBranchDivergence() { return false; } + + bool isLegalAddImmediate(int64_t imm) { + return getTLI()->isLegalAddImmediate(imm); + } + + bool isLegalICmpImmediate(int64_t imm) { + return getTLI()->isLegalICmpImmediate(imm); + } + + bool isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV, int64_t BaseOffset, + bool HasBaseReg, int64_t Scale) { + TargetLoweringBase::AddrMode AM; + AM.BaseGV = BaseGV; + AM.BaseOffs = BaseOffset; + AM.HasBaseReg = HasBaseReg; + AM.Scale = Scale; + return getTLI()->isLegalAddressingMode(AM, Ty); + } + + int getScalingFactorCost(Type *Ty, GlobalValue *BaseGV, int64_t BaseOffset, + bool HasBaseReg, int64_t Scale) { + TargetLoweringBase::AddrMode AM; + AM.BaseGV = BaseGV; + AM.BaseOffs = BaseOffset; + AM.HasBaseReg = HasBaseReg; + AM.Scale = Scale; + return getTLI()->getScalingFactorCost(AM, Ty); + } + + bool isTruncateFree(Type *Ty1, Type *Ty2) { + return getTLI()->isTruncateFree(Ty1, Ty2); + } + + bool isTypeLegal(Type *Ty) { + EVT VT = getTLI()->getValueType(Ty); + return getTLI()->isTypeLegal(VT); + } + + unsigned getJumpBufAlignment() { return getTLI()->getJumpBufAlignment(); } + + unsigned getJumpBufSize() { return getTLI()->getJumpBufSize(); } + + bool shouldBuildLookupTables() { + const TargetLoweringBase *TLI = getTLI(); + return TLI->isOperationLegalOrCustom(ISD::BR_JT, MVT::Other) || + TLI->isOperationLegalOrCustom(ISD::BRIND, MVT::Other); + } + + bool haveFastSqrt(Type *Ty) { + const TargetLoweringBase *TLI = getTLI(); + EVT VT = TLI->getValueType(Ty); + return TLI->isTypeLegal(VT) && + TLI->isOperationLegalOrCustom(ISD::FSQRT, VT); + } + + void getUnrollingPreferences(const Function *F, Loop *L, + TTI::UnrollingPreferences &UP) { + // This unrolling functionality is target independent, but to provide some + // motivation for its intended use, for x86: + + // According to the Intel 64 and IA-32 Architectures Optimization Reference + // Manual, Intel Core models and later have a loop stream detector (and + // associated uop queue) that can benefit from partial unrolling. + // The relevant requirements are: + // - The loop must have no more than 4 (8 for Nehalem and later) branches + // taken, and none of them may be calls. + // - The loop can have no more than 18 (28 for Nehalem and later) uops. + + // According to the Software Optimization Guide for AMD Family 15h + // Processors, models 30h-4fh (Steamroller and later) have a loop predictor + // and loop buffer which can benefit from partial unrolling. + // The relevant requirements are: + // - The loop must have fewer than 16 branches + // - The loop must have less than 40 uops in all executed loop branches + + // The number of taken branches in a loop is hard to estimate here, and + // benchmarking has revealed that it is better not to be conservative when + // estimating the branch count. As a result, we'll ignore the branch limits + // until someone finds a case where it matters in practice. + + unsigned MaxOps; + const TargetSubtargetInfo *ST = TM->getSubtargetImpl(*F); + if (PartialUnrollingThreshold.getNumOccurrences() > 0) + MaxOps = PartialUnrollingThreshold; + else if (ST->getSchedModel().LoopMicroOpBufferSize > 0) + MaxOps = ST->getSchedModel().LoopMicroOpBufferSize; + else + return; + + // Scan the loop: don't unroll loops with calls. + for (Loop::block_iterator I = L->block_begin(), E = L->block_end(); I != E; + ++I) { + BasicBlock *BB = *I; + + for (BasicBlock::iterator J = BB->begin(), JE = BB->end(); J != JE; ++J) + if (isa(J) || isa(J)) { + ImmutableCallSite CS(J); + if (const Function *F = CS.getCalledFunction()) { + if (!static_cast(this)->isLoweredToCall(F)) + continue; + } + + return; + } + } + + // Enable runtime and partial unrolling up to the specified size. + UP.Partial = UP.Runtime = true; + UP.PartialThreshold = UP.PartialOptSizeThreshold = MaxOps; + } + + /// @} + + /// \name Vector TTI Implementations + /// @{ + + unsigned getNumberOfRegisters(bool Vector) { return 1; } + + unsigned getRegisterBitWidth(bool Vector) { return 32; } + + unsigned getMaxInterleaveFactor() { return 1; } + + unsigned getArithmeticInstrCost( + unsigned Opcode, Type *Ty, + TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue, + TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue, + TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None, + TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None) { + // Check if any of the operands are vector operands. + const TargetLoweringBase *TLI = getTLI(); + int ISD = TLI->InstructionOpcodeToISD(Opcode); + assert(ISD && "Invalid opcode"); + + std::pair LT = TLI->getTypeLegalizationCost(Ty); + + bool IsFloat = Ty->getScalarType()->isFloatingPointTy(); + // Assume that floating point arithmetic operations cost twice as much as + // integer operations. + unsigned OpCost = (IsFloat ? 2 : 1); + + if (TLI->isOperationLegalOrPromote(ISD, LT.second)) { + // The operation is legal. Assume it costs 1. + // If the type is split to multiple registers, assume that there is some + // overhead to this. + // TODO: Once we have extract/insert subvector cost we need to use them. + if (LT.first > 1) + return LT.first * 2 * OpCost; + return LT.first * 1 * OpCost; + } + + if (!TLI->isOperationExpand(ISD, LT.second)) { + // If the operation is custom lowered then assume + // thare the code is twice as expensive. + return LT.first * 2 * OpCost; + } + + // Else, assume that we need to scalarize this op. + if (Ty->isVectorTy()) { + unsigned Num = Ty->getVectorNumElements(); + unsigned Cost = static_cast(this) + ->getArithmeticInstrCost(Opcode, Ty->getScalarType()); + // return the cost of multiple scalar invocation plus the cost of + // inserting + // and extracting the values. + return getScalarizationOverhead(Ty, true, true) + Num * Cost; + } + + // We don't know anything about this scalar instruction. + return OpCost; + } + + unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index, + Type *SubTp) { + if (Kind == TTI::SK_Alternate) { + return getAltShuffleOverhead(Tp); + } + return 1; + } + + unsigned getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) { + const TargetLoweringBase *TLI = getTLI(); + int ISD = TLI->InstructionOpcodeToISD(Opcode); + assert(ISD && "Invalid opcode"); + + std::pair SrcLT = TLI->getTypeLegalizationCost(Src); + std::pair DstLT = TLI->getTypeLegalizationCost(Dst); + + // Check for NOOP conversions. + if (SrcLT.first == DstLT.first && + SrcLT.second.getSizeInBits() == DstLT.second.getSizeInBits()) { + + // Bitcast between types that are legalized to the same type are free. + if (Opcode == Instruction::BitCast || Opcode == Instruction::Trunc) + return 0; + } + + if (Opcode == Instruction::Trunc && + TLI->isTruncateFree(SrcLT.second, DstLT.second)) + return 0; + + if (Opcode == Instruction::ZExt && + TLI->isZExtFree(SrcLT.second, DstLT.second)) + return 0; + + // If the cast is marked as legal (or promote) then assume low cost. + if (SrcLT.first == DstLT.first && + TLI->isOperationLegalOrPromote(ISD, DstLT.second)) + return 1; + + // Handle scalar conversions. + if (!Src->isVectorTy() && !Dst->isVectorTy()) { + + // Scalar bitcasts are usually free. + if (Opcode == Instruction::BitCast) + return 0; + + // Just check the op cost. If the operation is legal then assume it costs + // 1. + if (!TLI->isOperationExpand(ISD, DstLT.second)) + return 1; + + // Assume that illegal scalar instruction are expensive. + return 4; + } + + // Check vector-to-vector casts. + if (Dst->isVectorTy() && Src->isVectorTy()) { + + // If the cast is between same-sized registers, then the check is simple. + if (SrcLT.first == DstLT.first && + SrcLT.second.getSizeInBits() == DstLT.second.getSizeInBits()) { + + // Assume that Zext is done using AND. + if (Opcode == Instruction::ZExt) + return 1; + + // Assume that sext is done using SHL and SRA. + if (Opcode == Instruction::SExt) + return 2; + + // Just check the op cost. If the operation is legal then assume it + // costs + // 1 and multiply by the type-legalization overhead. + if (!TLI->isOperationExpand(ISD, DstLT.second)) + return SrcLT.first * 1; + } + + // If we are converting vectors and the operation is illegal, or + // if the vectors are legalized to different types, estimate the + // scalarization costs. + unsigned Num = Dst->getVectorNumElements(); + unsigned Cost = static_cast(this)->getCastInstrCost( + Opcode, Dst->getScalarType(), Src->getScalarType()); + + // Return the cost of multiple scalar invocation plus the cost of + // inserting and extracting the values. + return getScalarizationOverhead(Dst, true, true) + Num * Cost; + } + + // We already handled vector-to-vector and scalar-to-scalar conversions. + // This + // is where we handle bitcast between vectors and scalars. We need to assume + // that the conversion is scalarized in one way or another. + if (Opcode == Instruction::BitCast) + // Illegal bitcasts are done by storing and loading from a stack slot. + return (Src->isVectorTy() ? getScalarizationOverhead(Src, false, true) + : 0) + + (Dst->isVectorTy() ? getScalarizationOverhead(Dst, true, false) + : 0); + + llvm_unreachable("Unhandled cast"); + } + + unsigned getCFInstrCost(unsigned Opcode) { + // Branches are assumed to be predicted. + return 0; + } + + unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy) { + const TargetLoweringBase *TLI = getTLI(); + int ISD = TLI->InstructionOpcodeToISD(Opcode); + assert(ISD && "Invalid opcode"); + + // Selects on vectors are actually vector selects. + if (ISD == ISD::SELECT) { + assert(CondTy && "CondTy must exist"); + if (CondTy->isVectorTy()) + ISD = ISD::VSELECT; + } + + std::pair LT = TLI->getTypeLegalizationCost(ValTy); + + if (!(ValTy->isVectorTy() && !LT.second.isVector()) && + !TLI->isOperationExpand(ISD, LT.second)) { + // The operation is legal. Assume it costs 1. Multiply + // by the type-legalization overhead. + return LT.first * 1; + } + + // Otherwise, assume that the cast is scalarized. + if (ValTy->isVectorTy()) { + unsigned Num = ValTy->getVectorNumElements(); + if (CondTy) + CondTy = CondTy->getScalarType(); + unsigned Cost = static_cast(this)->getCmpSelInstrCost( + Opcode, ValTy->getScalarType(), CondTy); + + // Return the cost of multiple scalar invocation plus the cost of + // inserting + // and extracting the values. + return getScalarizationOverhead(ValTy, true, false) + Num * Cost; + } + + // Unknown scalar opcode. + return 1; + } + + unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) { + std::pair LT = + getTLI()->getTypeLegalizationCost(Val->getScalarType()); + + return LT.first; + } + + unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, + unsigned AddressSpace) { + assert(!Src->isVoidTy() && "Invalid type"); + std::pair LT = getTLI()->getTypeLegalizationCost(Src); + + // Assuming that all loads of legal types cost 1. + unsigned Cost = LT.first; + + if (Src->isVectorTy() && + Src->getPrimitiveSizeInBits() < LT.second.getSizeInBits()) { + // This is a vector load that legalizes to a larger type than the vector + // itself. Unless the corresponding extending load or truncating store is + // legal, then this will scalarize. + TargetLowering::LegalizeAction LA = TargetLowering::Expand; + EVT MemVT = getTLI()->getValueType(Src, true); + if (MemVT.isSimple() && MemVT != MVT::Other) { + if (Opcode == Instruction::Store) + LA = getTLI()->getTruncStoreAction(LT.second, MemVT.getSimpleVT()); + else + LA = getTLI()->getLoadExtAction(ISD::EXTLOAD, LT.second, MemVT); + } + + if (LA != TargetLowering::Legal && LA != TargetLowering::Custom) { + // This is a vector load/store for some illegal type that is scalarized. + // We must account for the cost of building or decomposing the vector. + Cost += getScalarizationOverhead(Src, Opcode != Instruction::Store, + Opcode == Instruction::Store); + } + } + + return Cost; + } + + unsigned getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy, + ArrayRef Tys) { + unsigned ISD = 0; + switch (IID) { + default: { + // Assume that we need to scalarize this intrinsic. + unsigned ScalarizationCost = 0; + unsigned ScalarCalls = 1; + if (RetTy->isVectorTy()) { + ScalarizationCost = getScalarizationOverhead(RetTy, true, false); + ScalarCalls = std::max(ScalarCalls, RetTy->getVectorNumElements()); + } + for (unsigned i = 0, ie = Tys.size(); i != ie; ++i) { + if (Tys[i]->isVectorTy()) { + ScalarizationCost += getScalarizationOverhead(Tys[i], false, true); + ScalarCalls = std::max(ScalarCalls, RetTy->getVectorNumElements()); + } + } + + return ScalarCalls + ScalarizationCost; + } + // Look for intrinsics that can be lowered directly or turned into a scalar + // intrinsic call. + case Intrinsic::sqrt: + ISD = ISD::FSQRT; + break; + case Intrinsic::sin: + ISD = ISD::FSIN; + break; + case Intrinsic::cos: + ISD = ISD::FCOS; + break; + case Intrinsic::exp: + ISD = ISD::FEXP; + break; + case Intrinsic::exp2: + ISD = ISD::FEXP2; + break; + case Intrinsic::log: + ISD = ISD::FLOG; + break; + case Intrinsic::log10: + ISD = ISD::FLOG10; + break; + case Intrinsic::log2: + ISD = ISD::FLOG2; + break; + case Intrinsic::fabs: + ISD = ISD::FABS; + break; + case Intrinsic::minnum: + ISD = ISD::FMINNUM; + break; + case Intrinsic::maxnum: + ISD = ISD::FMAXNUM; + break; + case Intrinsic::copysign: + ISD = ISD::FCOPYSIGN; + break; + case Intrinsic::floor: + ISD = ISD::FFLOOR; + break; + case Intrinsic::ceil: + ISD = ISD::FCEIL; + break; + case Intrinsic::trunc: + ISD = ISD::FTRUNC; + break; + case Intrinsic::nearbyint: + ISD = ISD::FNEARBYINT; + break; + case Intrinsic::rint: + ISD = ISD::FRINT; + break; + case Intrinsic::round: + ISD = ISD::FROUND; + break; + case Intrinsic::pow: + ISD = ISD::FPOW; + break; + case Intrinsic::fma: + ISD = ISD::FMA; + break; + case Intrinsic::fmuladd: + ISD = ISD::FMA; + break; + // FIXME: We should return 0 whenever getIntrinsicCost == TCC_Free. + case Intrinsic::lifetime_start: + case Intrinsic::lifetime_end: + return 0; + case Intrinsic::masked_store: + return static_cast(this) + ->getMaskedMemoryOpCost(Instruction::Store, Tys[0], 0, 0); + case Intrinsic::masked_load: + return static_cast(this) + ->getMaskedMemoryOpCost(Instruction::Load, RetTy, 0, 0); + } + + const TargetLoweringBase *TLI = getTLI(); + std::pair LT = TLI->getTypeLegalizationCost(RetTy); + + if (TLI->isOperationLegalOrPromote(ISD, LT.second)) { + // The operation is legal. Assume it costs 1. + // If the type is split to multiple registers, assume that there is some + // overhead to this. + // TODO: Once we have extract/insert subvector cost we need to use them. + if (LT.first > 1) + return LT.first * 2; + return LT.first * 1; + } + + if (!TLI->isOperationExpand(ISD, LT.second)) { + // If the operation is custom lowered then assume + // thare the code is twice as expensive. + return LT.first * 2; + } + + // If we can't lower fmuladd into an FMA estimate the cost as a floating + // point mul followed by an add. + if (IID == Intrinsic::fmuladd) + return static_cast(this) + ->getArithmeticInstrCost(BinaryOperator::FMul, RetTy) + + static_cast(this) + ->getArithmeticInstrCost(BinaryOperator::FAdd, RetTy); + + // Else, assume that we need to scalarize this intrinsic. For math builtins + // this will emit a costly libcall, adding call overhead and spills. Make it + // very expensive. + if (RetTy->isVectorTy()) { + unsigned Num = RetTy->getVectorNumElements(); + unsigned Cost = static_cast(this)->getIntrinsicInstrCost( + IID, RetTy->getScalarType(), Tys); + return 10 * Cost * Num; + } + + // This is going to be turned into a library call, make it expensive. + return 10; + } + + unsigned getNumberOfParts(Type *Tp) { + std::pair LT = getTLI()->getTypeLegalizationCost(Tp); + return LT.first; + } + + unsigned getAddressComputationCost(Type *Ty, bool IsComplex) { return 0; } + + unsigned getReductionCost(unsigned Opcode, Type *Ty, bool IsPairwise) { + assert(Ty->isVectorTy() && "Expect a vector type"); + unsigned NumVecElts = Ty->getVectorNumElements(); + unsigned NumReduxLevels = Log2_32(NumVecElts); + unsigned ArithCost = + NumReduxLevels * + static_cast(this)->getArithmeticInstrCost(Opcode, Ty); + // Assume the pairwise shuffles add a cost. + unsigned ShuffleCost = + NumReduxLevels * (IsPairwise + 1) * + static_cast(this) + ->getShuffleCost(TTI::SK_ExtractSubvector, Ty, NumVecElts / 2, Ty); + return ShuffleCost + ArithCost + getScalarizationOverhead(Ty, false, true); + } + + /// @} +}; +} + +#endif diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index 6a6d48c..ff6ecb1 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -77,7 +77,6 @@ void initializeAlignmentFromAssumptionsPass(PassRegistry&); void initializeBarrierNoopPass(PassRegistry&); void initializeBasicAliasAnalysisPass(PassRegistry&); void initializeCallGraphWrapperPassPass(PassRegistry &); -void initializeBasicTTIPass(PassRegistry&); void initializeBlockExtractorPassPass(PassRegistry&); void initializeBlockFrequencyInfoPass(PassRegistry&); void initializeBoundsCheckingPass(PassRegistry&); @@ -264,9 +263,8 @@ void initializeTailCallElimPass(PassRegistry&); void initializeTailDuplicatePassPass(PassRegistry&); void initializeTargetPassConfigPass(PassRegistry&); void initializeDataLayoutPassPass(PassRegistry &); -void initializeTargetTransformInfoAnalysisGroup(PassRegistry&); +void initializeTargetTransformInfoWrapperPassPass(PassRegistry &); void initializeFunctionTargetTransformInfoPass(PassRegistry &); -void initializeNoTTIPass(PassRegistry&); void initializeTargetLibraryInfoWrapperPassPass(PassRegistry &); void initializeAssumptionCacheTrackerPass(PassRegistry &); void initializeTwoAddressInstructionPassPass(PassRegistry&); diff --git a/llvm/include/llvm/Target/TargetMachine.h b/llvm/include/llvm/Target/TargetMachine.h index 2f004e2..0e3997c 100644 --- a/llvm/include/llvm/Target/TargetMachine.h +++ b/llvm/include/llvm/Target/TargetMachine.h @@ -187,7 +187,7 @@ public: void setFunctionSections(bool); /// \brief Register analysis passes for this target with a pass manager. - virtual void addAnalysisPasses(PassManagerBase &) {} + virtual void addAnalysisPasses(PassManagerBase &); /// CodeGenFileType - These enums are meant to be passed into /// addPassesToEmitFile to indicate what type of file to emit, and returned by diff --git a/llvm/lib/Analysis/Analysis.cpp b/llvm/lib/Analysis/Analysis.cpp index e56ff61..92858e7 100644 --- a/llvm/lib/Analysis/Analysis.cpp +++ b/llvm/lib/Analysis/Analysis.cpp @@ -65,7 +65,7 @@ void llvm::initializeAnalysis(PassRegistry &Registry) { initializeRegionOnlyPrinterPass(Registry); initializeScalarEvolutionPass(Registry); initializeScalarEvolutionAliasAnalysisPass(Registry); - initializeTargetTransformInfoAnalysisGroup(Registry); + initializeTargetTransformInfoWrapperPassPass(Registry); initializeTypeBasedAliasAnalysisPass(Registry); initializeScopedNoAliasAAPass(Registry); } diff --git a/llvm/lib/Analysis/CostModel.cpp b/llvm/lib/Analysis/CostModel.cpp index 1b74f8c1..7b5dfa2 100644 --- a/llvm/lib/Analysis/CostModel.cpp +++ b/llvm/lib/Analysis/CostModel.cpp @@ -83,7 +83,8 @@ CostModelAnalysis::getAnalysisUsage(AnalysisUsage &AU) const { bool CostModelAnalysis::runOnFunction(Function &F) { this->F = &F; - TTI = getAnalysisIfAvailable(); + auto *TTIWP = getAnalysisIfAvailable(); + TTI = TTIWP ? &TTIWP->getTTI() : nullptr; return false; } diff --git a/llvm/lib/Analysis/FunctionTargetTransformInfo.cpp b/llvm/lib/Analysis/FunctionTargetTransformInfo.cpp index a686bec..36f1820 100644 --- a/llvm/lib/Analysis/FunctionTargetTransformInfo.cpp +++ b/llvm/lib/Analysis/FunctionTargetTransformInfo.cpp @@ -21,7 +21,7 @@ using namespace llvm; #define DEBUG_TYPE "function-tti" static const char ftti_name[] = "Function TargetTransformInfo"; INITIALIZE_PASS_BEGIN(FunctionTargetTransformInfo, "function_tti", ftti_name, false, true) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_PASS_END(FunctionTargetTransformInfo, "function_tti", ftti_name, false, true) char FunctionTargetTransformInfo::ID = 0; @@ -38,13 +38,13 @@ FunctionTargetTransformInfo::FunctionTargetTransformInfo() void FunctionTargetTransformInfo::getAnalysisUsage(AnalysisUsage &AU) const { AU.setPreservesAll(); - AU.addRequired(); + AU.addRequired(); } void FunctionTargetTransformInfo::releaseMemory() {} bool FunctionTargetTransformInfo::runOnFunction(Function &F) { Fn = &F; - TTI = &getAnalysis(); + TTI = &getAnalysis().getTTI(); return false; } diff --git a/llvm/lib/Analysis/IPA/InlineCost.cpp b/llvm/lib/Analysis/IPA/InlineCost.cpp index 58ac5d3..bbae253 100644 --- a/llvm/lib/Analysis/IPA/InlineCost.cpp +++ b/llvm/lib/Analysis/IPA/InlineCost.cpp @@ -1232,7 +1232,7 @@ void CallAnalyzer::dump() { INITIALIZE_PASS_BEGIN(InlineCostAnalysis, "inline-cost", "Inline Cost Analysis", true, true) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker) INITIALIZE_PASS_END(InlineCostAnalysis, "inline-cost", "Inline Cost Analysis", true, true) @@ -1246,12 +1246,12 @@ InlineCostAnalysis::~InlineCostAnalysis() {} void InlineCostAnalysis::getAnalysisUsage(AnalysisUsage &AU) const { AU.setPreservesAll(); AU.addRequired(); - AU.addRequired(); + AU.addRequired(); CallGraphSCCPass::getAnalysisUsage(AU); } bool InlineCostAnalysis::runOnSCC(CallGraphSCC &SCC) { - TTI = &getAnalysis(); + TTI = &getAnalysis().getTTI(); ACT = &getAnalysis(); return false; } diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp index ebc56bb..cdbce0d 100644 --- a/llvm/lib/Analysis/TargetTransformInfo.cpp +++ b/llvm/lib/Analysis/TargetTransformInfo.cpp @@ -8,6 +8,7 @@ //===----------------------------------------------------------------------===// #include "llvm/Analysis/TargetTransformInfo.h" +#include "llvm/Analysis/TargetTransformInfoImpl.h" #include "llvm/IR/CallSite.h" #include "llvm/IR/DataLayout.h" #include "llvm/IR/Instruction.h" @@ -20,669 +21,257 @@ using namespace llvm; #define DEBUG_TYPE "tti" -// Setup the analysis group to manage the TargetTransformInfo passes. -INITIALIZE_ANALYSIS_GROUP(TargetTransformInfo, "Target Information", NoTTI) -char TargetTransformInfo::ID = 0; +TargetTransformInfo::~TargetTransformInfo() {} -TargetTransformInfo::~TargetTransformInfo() { -} - -void TargetTransformInfo::pushTTIStack(Pass *P) { - TopTTI = this; - PrevTTI = &P->getAnalysis(); - - // Walk up the chain and update the top TTI pointer. - for (TargetTransformInfo *PTTI = PrevTTI; PTTI; PTTI = PTTI->PrevTTI) - PTTI->TopTTI = this; -} +TargetTransformInfo::TargetTransformInfo(TargetTransformInfo &&Arg) + : TTIImpl(std::move(Arg.TTIImpl)) {} -void TargetTransformInfo::getAnalysisUsage(AnalysisUsage &AU) const { - AU.addRequired(); +TargetTransformInfo &TargetTransformInfo::operator=(TargetTransformInfo &&RHS) { + TTIImpl = std::move(RHS.TTIImpl); + return *this; } unsigned TargetTransformInfo::getOperationCost(unsigned Opcode, Type *Ty, Type *OpTy) const { - return PrevTTI->getOperationCost(Opcode, Ty, OpTy); -} - -unsigned TargetTransformInfo::getGEPCost( - const Value *Ptr, ArrayRef Operands) const { - return PrevTTI->getGEPCost(Ptr, Operands); + return TTIImpl->getOperationCost(Opcode, Ty, OpTy); } unsigned TargetTransformInfo::getCallCost(FunctionType *FTy, int NumArgs) const { - return PrevTTI->getCallCost(FTy, NumArgs); + return TTIImpl->getCallCost(FTy, NumArgs); } -unsigned TargetTransformInfo::getCallCost(const Function *F, - int NumArgs) const { - return PrevTTI->getCallCost(F, NumArgs); -} - -unsigned TargetTransformInfo::getCallCost( - const Function *F, ArrayRef Arguments) const { - return PrevTTI->getCallCost(F, Arguments); -} - -unsigned TargetTransformInfo::getIntrinsicCost( - Intrinsic::ID IID, Type *RetTy, ArrayRef ParamTys) const { - return PrevTTI->getIntrinsicCost(IID, RetTy, ParamTys); +unsigned +TargetTransformInfo::getCallCost(const Function *F, + ArrayRef Arguments) const { + return TTIImpl->getCallCost(F, Arguments); } -unsigned TargetTransformInfo::getIntrinsicCost( - Intrinsic::ID IID, Type *RetTy, ArrayRef Arguments) const { - return PrevTTI->getIntrinsicCost(IID, RetTy, Arguments); +unsigned +TargetTransformInfo::getIntrinsicCost(Intrinsic::ID IID, Type *RetTy, + ArrayRef Arguments) const { + return TTIImpl->getIntrinsicCost(IID, RetTy, Arguments); } unsigned TargetTransformInfo::getUserCost(const User *U) const { - return PrevTTI->getUserCost(U); + return TTIImpl->getUserCost(U); } bool TargetTransformInfo::hasBranchDivergence() const { - return PrevTTI->hasBranchDivergence(); + return TTIImpl->hasBranchDivergence(); } bool TargetTransformInfo::isLoweredToCall(const Function *F) const { - return PrevTTI->isLoweredToCall(F); + return TTIImpl->isLoweredToCall(F); } -void -TargetTransformInfo::getUnrollingPreferences(const Function *F, Loop *L, - UnrollingPreferences &UP) const { - PrevTTI->getUnrollingPreferences(F, L, UP); +void TargetTransformInfo::getUnrollingPreferences( + const Function *F, Loop *L, UnrollingPreferences &UP) const { + return TTIImpl->getUnrollingPreferences(F, L, UP); } bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const { - return PrevTTI->isLegalAddImmediate(Imm); + return TTIImpl->isLegalAddImmediate(Imm); } bool TargetTransformInfo::isLegalICmpImmediate(int64_t Imm) const { - return PrevTTI->isLegalICmpImmediate(Imm); + return TTIImpl->isLegalICmpImmediate(Imm); } -bool TargetTransformInfo::isLegalMaskedLoad(Type *DataType, - int Consecutive) const { - return false; +bool TargetTransformInfo::isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV, + int64_t BaseOffset, + bool HasBaseReg, + int64_t Scale) const { + return TTIImpl->isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg, + Scale); } bool TargetTransformInfo::isLegalMaskedStore(Type *DataType, int Consecutive) const { - return false; + return TTIImpl->isLegalMaskedStore(DataType, Consecutive); } - -bool TargetTransformInfo::isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV, - int64_t BaseOffset, - bool HasBaseReg, - int64_t Scale) const { - return PrevTTI->isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg, - Scale); +bool TargetTransformInfo::isLegalMaskedLoad(Type *DataType, + int Consecutive) const { + return TTIImpl->isLegalMaskedLoad(DataType, Consecutive); } int TargetTransformInfo::getScalingFactorCost(Type *Ty, GlobalValue *BaseGV, int64_t BaseOffset, bool HasBaseReg, int64_t Scale) const { - return PrevTTI->getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg, + return TTIImpl->getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg, Scale); } bool TargetTransformInfo::isTruncateFree(Type *Ty1, Type *Ty2) const { - return PrevTTI->isTruncateFree(Ty1, Ty2); + return TTIImpl->isTruncateFree(Ty1, Ty2); } bool TargetTransformInfo::isTypeLegal(Type *Ty) const { - return PrevTTI->isTypeLegal(Ty); + return TTIImpl->isTypeLegal(Ty); } unsigned TargetTransformInfo::getJumpBufAlignment() const { - return PrevTTI->getJumpBufAlignment(); + return TTIImpl->getJumpBufAlignment(); } unsigned TargetTransformInfo::getJumpBufSize() const { - return PrevTTI->getJumpBufSize(); + return TTIImpl->getJumpBufSize(); } bool TargetTransformInfo::shouldBuildLookupTables() const { - return PrevTTI->shouldBuildLookupTables(); + return TTIImpl->shouldBuildLookupTables(); } TargetTransformInfo::PopcntSupportKind TargetTransformInfo::getPopcntSupport(unsigned IntTyWidthInBit) const { - return PrevTTI->getPopcntSupport(IntTyWidthInBit); + return TTIImpl->getPopcntSupport(IntTyWidthInBit); } bool TargetTransformInfo::haveFastSqrt(Type *Ty) const { - return PrevTTI->haveFastSqrt(Ty); + return TTIImpl->haveFastSqrt(Ty); } unsigned TargetTransformInfo::getIntImmCost(const APInt &Imm, Type *Ty) const { - return PrevTTI->getIntImmCost(Imm, Ty); + return TTIImpl->getIntImmCost(Imm, Ty); } -unsigned TargetTransformInfo::getIntImmCost(unsigned Opc, unsigned Idx, +unsigned TargetTransformInfo::getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty) const { - return PrevTTI->getIntImmCost(Opc, Idx, Imm, Ty); + return TTIImpl->getIntImmCost(Opcode, Idx, Imm, Ty); } unsigned TargetTransformInfo::getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm, Type *Ty) const { - return PrevTTI->getIntImmCost(IID, Idx, Imm, Ty); + return TTIImpl->getIntImmCost(IID, Idx, Imm, Ty); } unsigned TargetTransformInfo::getNumberOfRegisters(bool Vector) const { - return PrevTTI->getNumberOfRegisters(Vector); + return TTIImpl->getNumberOfRegisters(Vector); } unsigned TargetTransformInfo::getRegisterBitWidth(bool Vector) const { - return PrevTTI->getRegisterBitWidth(Vector); + return TTIImpl->getRegisterBitWidth(Vector); } unsigned TargetTransformInfo::getMaxInterleaveFactor() const { - return PrevTTI->getMaxInterleaveFactor(); + return TTIImpl->getMaxInterleaveFactor(); } unsigned TargetTransformInfo::getArithmeticInstrCost( - unsigned Opcode, Type *Ty, OperandValueKind Op1Info, - OperandValueKind Op2Info, OperandValueProperties Opd1PropInfo, + unsigned Opcode, Type *Ty, OperandValueKind Opd1Info, + OperandValueKind Opd2Info, OperandValueProperties Opd1PropInfo, OperandValueProperties Opd2PropInfo) const { - return PrevTTI->getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info, + return TTIImpl->getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info, Opd1PropInfo, Opd2PropInfo); } -unsigned TargetTransformInfo::getShuffleCost(ShuffleKind Kind, Type *Tp, +unsigned TargetTransformInfo::getShuffleCost(ShuffleKind Kind, Type *Ty, int Index, Type *SubTp) const { - return PrevTTI->getShuffleCost(Kind, Tp, Index, SubTp); + return TTIImpl->getShuffleCost(Kind, Ty, Index, SubTp); } unsigned TargetTransformInfo::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) const { - return PrevTTI->getCastInstrCost(Opcode, Dst, Src); + return TTIImpl->getCastInstrCost(Opcode, Dst, Src); } unsigned TargetTransformInfo::getCFInstrCost(unsigned Opcode) const { - return PrevTTI->getCFInstrCost(Opcode); + return TTIImpl->getCFInstrCost(Opcode); } unsigned TargetTransformInfo::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy) const { - return PrevTTI->getCmpSelInstrCost(Opcode, ValTy, CondTy); + return TTIImpl->getCmpSelInstrCost(Opcode, ValTy, CondTy); } unsigned TargetTransformInfo::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) const { - return PrevTTI->getVectorInstrCost(Opcode, Val, Index); + return TTIImpl->getVectorInstrCost(Opcode, Val, Index); } unsigned TargetTransformInfo::getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, unsigned AddressSpace) const { - return PrevTTI->getMemoryOpCost(Opcode, Src, Alignment, AddressSpace); + return TTIImpl->getMemoryOpCost(Opcode, Src, Alignment, AddressSpace); } -unsigned +unsigned TargetTransformInfo::getMaskedMemoryOpCost(unsigned Opcode, Type *Src, - unsigned Alignment, - unsigned AddressSpace) const { - return PrevTTI->getMaskedMemoryOpCost(Opcode, Src, Alignment, AddressSpace); + unsigned Alignment, + unsigned AddressSpace) const { + return TTIImpl->getMaskedMemoryOpCost(Opcode, Src, Alignment, AddressSpace); } unsigned -TargetTransformInfo::getIntrinsicInstrCost(Intrinsic::ID ID, - Type *RetTy, +TargetTransformInfo::getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy, ArrayRef Tys) const { - return PrevTTI->getIntrinsicInstrCost(ID, RetTy, Tys); + return TTIImpl->getIntrinsicInstrCost(ID, RetTy, Tys); } unsigned TargetTransformInfo::getNumberOfParts(Type *Tp) const { - return PrevTTI->getNumberOfParts(Tp); + return TTIImpl->getNumberOfParts(Tp); } unsigned TargetTransformInfo::getAddressComputationCost(Type *Tp, bool IsComplex) const { - return PrevTTI->getAddressComputationCost(Tp, IsComplex); + return TTIImpl->getAddressComputationCost(Tp, IsComplex); } unsigned TargetTransformInfo::getReductionCost(unsigned Opcode, Type *Ty, - bool IsPairwise) const { - return PrevTTI->getReductionCost(Opcode, Ty, IsPairwise); + bool IsPairwiseForm) const { + return TTIImpl->getReductionCost(Opcode, Ty, IsPairwiseForm); +} + +unsigned +TargetTransformInfo::getCostOfKeepingLiveOverCall(ArrayRef Tys) const { + return TTIImpl->getCostOfKeepingLiveOverCall(Tys); } -unsigned TargetTransformInfo::getCostOfKeepingLiveOverCall(ArrayRef Tys) - const { - return PrevTTI->getCostOfKeepingLiveOverCall(Tys); +bool TargetTransformInfo::getTgtMemIntrinsic(IntrinsicInst *Inst, + MemIntrinsicInfo &Info) const { + return TTIImpl->getTgtMemIntrinsic(Inst, Info); } Value *TargetTransformInfo::getOrCreateResultFromMemIntrinsic( IntrinsicInst *Inst, Type *ExpectedType) const { - return PrevTTI->getOrCreateResultFromMemIntrinsic(Inst, ExpectedType); + return TTIImpl->getOrCreateResultFromMemIntrinsic(Inst, ExpectedType); } -bool TargetTransformInfo::getTgtMemIntrinsic(IntrinsicInst *Inst, - MemIntrinsicInfo &Info) const { - return PrevTTI->getTgtMemIntrinsic(Inst, Info); -} +TargetTransformInfo::Concept::~Concept() {} namespace { - -struct NoTTI final : ImmutablePass, TargetTransformInfo { - const DataLayout *DL; - - NoTTI() : ImmutablePass(ID), DL(nullptr) { - initializeNoTTIPass(*PassRegistry::getPassRegistry()); - } - - void initializePass() override { - // Note that this subclass is special, and must *not* call initializeTTI as - // it does not chain. - TopTTI = this; - PrevTTI = nullptr; - DataLayoutPass *DLP = getAnalysisIfAvailable(); - DL = DLP ? &DLP->getDataLayout() : nullptr; - } - - void getAnalysisUsage(AnalysisUsage &AU) const override { - // Note that this subclass is special, and must *not* call - // TTI::getAnalysisUsage as it breaks the recursion. - } - - /// Pass identification. - static char ID; - - /// Provide necessary pointer adjustments for the two base classes. - void *getAdjustedAnalysisPointer(const void *ID) override { - if (ID == &TargetTransformInfo::ID) - return (TargetTransformInfo*)this; - return this; - } - - unsigned getOperationCost(unsigned Opcode, Type *Ty, - Type *OpTy) const override { - switch (Opcode) { - default: - // By default, just classify everything as 'basic'. - return TCC_Basic; - - case Instruction::GetElementPtr: - llvm_unreachable("Use getGEPCost for GEP operations!"); - - case Instruction::BitCast: - assert(OpTy && "Cast instructions must provide the operand type"); - if (Ty == OpTy || (Ty->isPointerTy() && OpTy->isPointerTy())) - // Identity and pointer-to-pointer casts are free. - return TCC_Free; - - // Otherwise, the default basic cost is used. - return TCC_Basic; - - case Instruction::IntToPtr: { - if (!DL) - return TCC_Basic; - - // An inttoptr cast is free so long as the input is a legal integer type - // which doesn't contain values outside the range of a pointer. - unsigned OpSize = OpTy->getScalarSizeInBits(); - if (DL->isLegalInteger(OpSize) && - OpSize <= DL->getPointerTypeSizeInBits(Ty)) - return TCC_Free; - - // Otherwise it's not a no-op. - return TCC_Basic; - } - case Instruction::PtrToInt: { - if (!DL) - return TCC_Basic; - - // A ptrtoint cast is free so long as the result is large enough to store - // the pointer, and a legal integer type. - unsigned DestSize = Ty->getScalarSizeInBits(); - if (DL->isLegalInteger(DestSize) && - DestSize >= DL->getPointerTypeSizeInBits(OpTy)) - return TCC_Free; - - // Otherwise it's not a no-op. - return TCC_Basic; - } - case Instruction::Trunc: - // trunc to a native type is free (assuming the target has compare and - // shift-right of the same width). - if (DL && DL->isLegalInteger(DL->getTypeSizeInBits(Ty))) - return TCC_Free; - - return TCC_Basic; - } - } - - unsigned getGEPCost(const Value *Ptr, - ArrayRef Operands) const override { - // In the basic model, we just assume that all-constant GEPs will be folded - // into their uses via addressing modes. - for (unsigned Idx = 0, Size = Operands.size(); Idx != Size; ++Idx) - if (!isa(Operands[Idx])) - return TCC_Basic; - - return TCC_Free; - } - - unsigned getCallCost(FunctionType *FTy, int NumArgs = -1) const override - { - assert(FTy && "FunctionType must be provided to this routine."); - - // The target-independent implementation just measures the size of the - // function by approximating that each argument will take on average one - // instruction to prepare. - - if (NumArgs < 0) - // Set the argument number to the number of explicit arguments in the - // function. - NumArgs = FTy->getNumParams(); - - return TCC_Basic * (NumArgs + 1); - } - - unsigned getCallCost(const Function *F, int NumArgs = -1) const override - { - assert(F && "A concrete function must be provided to this routine."); - - if (NumArgs < 0) - // Set the argument number to the number of explicit arguments in the - // function. - NumArgs = F->arg_size(); - - if (Intrinsic::ID IID = (Intrinsic::ID)F->getIntrinsicID()) { - FunctionType *FTy = F->getFunctionType(); - SmallVector ParamTys(FTy->param_begin(), FTy->param_end()); - return TopTTI->getIntrinsicCost(IID, FTy->getReturnType(), ParamTys); - } - - if (!TopTTI->isLoweredToCall(F)) - return TCC_Basic; // Give a basic cost if it will be lowered directly. - - return TopTTI->getCallCost(F->getFunctionType(), NumArgs); - } - - unsigned getCallCost(const Function *F, - ArrayRef Arguments) const override { - // Simply delegate to generic handling of the call. - // FIXME: We should use instsimplify or something else to catch calls which - // will constant fold with these arguments. - return TopTTI->getCallCost(F, Arguments.size()); - } - - unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy, - ArrayRef ParamTys) const override { - switch (IID) { - default: - // Intrinsics rarely (if ever) have normal argument setup constraints. - // Model them as having a basic instruction cost. - // FIXME: This is wrong for libc intrinsics. - return TCC_Basic; - - case Intrinsic::annotation: - case Intrinsic::assume: - case Intrinsic::dbg_declare: - case Intrinsic::dbg_value: - case Intrinsic::invariant_start: - case Intrinsic::invariant_end: - case Intrinsic::lifetime_start: - case Intrinsic::lifetime_end: - case Intrinsic::objectsize: - case Intrinsic::ptr_annotation: - case Intrinsic::var_annotation: - case Intrinsic::experimental_gc_result_int: - case Intrinsic::experimental_gc_result_float: - case Intrinsic::experimental_gc_result_ptr: - case Intrinsic::experimental_gc_result: - case Intrinsic::experimental_gc_relocate: - // These intrinsics don't actually represent code after lowering. - return TCC_Free; - } - } - - unsigned - getIntrinsicCost(Intrinsic::ID IID, Type *RetTy, - ArrayRef Arguments) const override { - // Delegate to the generic intrinsic handling code. This mostly provides an - // opportunity for targets to (for example) special case the cost of - // certain intrinsics based on constants used as arguments. - SmallVector ParamTys; - ParamTys.reserve(Arguments.size()); - for (unsigned Idx = 0, Size = Arguments.size(); Idx != Size; ++Idx) - ParamTys.push_back(Arguments[Idx]->getType()); - return TopTTI->getIntrinsicCost(IID, RetTy, ParamTys); - } - - unsigned getUserCost(const User *U) const override { - if (isa(U)) - return TCC_Free; // Model all PHI nodes as free. - - if (const GEPOperator *GEP = dyn_cast(U)) { - SmallVector Indices(GEP->idx_begin(), GEP->idx_end()); - return TopTTI->getGEPCost(GEP->getPointerOperand(), Indices); - } - - if (ImmutableCallSite CS = U) { - const Function *F = CS.getCalledFunction(); - if (!F) { - // Just use the called value type. - Type *FTy = CS.getCalledValue()->getType()->getPointerElementType(); - return TopTTI->getCallCost(cast(FTy), CS.arg_size()); - } - - SmallVector Arguments(CS.arg_begin(), CS.arg_end()); - return TopTTI->getCallCost(F, Arguments); - } - - if (const CastInst *CI = dyn_cast(U)) { - // Result of a cmp instruction is often extended (to be used by other - // cmp instructions, logical or return instructions). These are usually - // nop on most sane targets. - if (isa(CI->getOperand(0))) - return TCC_Free; - } - - // Otherwise delegate to the fully generic implementations. - return getOperationCost(Operator::getOpcode(U), U->getType(), - U->getNumOperands() == 1 ? - U->getOperand(0)->getType() : nullptr); - } - - bool hasBranchDivergence() const override { return false; } - - bool isLoweredToCall(const Function *F) const override { - // FIXME: These should almost certainly not be handled here, and instead - // handled with the help of TLI or the target itself. This was largely - // ported from existing analysis heuristics here so that such refactorings - // can take place in the future. - - if (F->isIntrinsic()) - return false; - - if (F->hasLocalLinkage() || !F->hasName()) - return true; - - StringRef Name = F->getName(); - - // These will all likely lower to a single selection DAG node. - if (Name == "copysign" || Name == "copysignf" || Name == "copysignl" || - Name == "fabs" || Name == "fabsf" || Name == "fabsl" || Name == "sin" || - Name == "fmin" || Name == "fminf" || Name == "fminl" || - Name == "fmax" || Name == "fmaxf" || Name == "fmaxl" || - Name == "sinf" || Name == "sinl" || Name == "cos" || Name == "cosf" || - Name == "cosl" || Name == "sqrt" || Name == "sqrtf" || Name == "sqrtl") - return false; - - // These are all likely to be optimized into something smaller. - if (Name == "pow" || Name == "powf" || Name == "powl" || Name == "exp2" || - Name == "exp2l" || Name == "exp2f" || Name == "floor" || Name == - "floorf" || Name == "ceil" || Name == "round" || Name == "ffs" || - Name == "ffsl" || Name == "abs" || Name == "labs" || Name == "llabs") - return false; - - return true; - } - - void getUnrollingPreferences(const Function *, Loop *, - UnrollingPreferences &) const override {} - - bool isLegalAddImmediate(int64_t Imm) const override { - return false; - } - - bool isLegalICmpImmediate(int64_t Imm) const override { - return false; - } - - bool isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV, int64_t BaseOffset, - bool HasBaseReg, int64_t Scale) const override - { - // Guess that reg+reg addressing is allowed. This heuristic is taken from - // the implementation of LSR. - return !BaseGV && BaseOffset == 0 && Scale <= 1; - } - - int getScalingFactorCost(Type *Ty, GlobalValue *BaseGV, int64_t BaseOffset, - bool HasBaseReg, int64_t Scale) const override { - // Guess that all legal addressing mode are free. - if(isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg, Scale)) - return 0; - return -1; - } - - bool isTruncateFree(Type *Ty1, Type *Ty2) const override { - return false; - } - - bool isTypeLegal(Type *Ty) const override { - return false; - } - - unsigned getJumpBufAlignment() const override { - return 0; - } - - unsigned getJumpBufSize() const override { - return 0; - } - - bool shouldBuildLookupTables() const override { - return true; - } - - PopcntSupportKind - getPopcntSupport(unsigned IntTyWidthInBit) const override { - return PSK_Software; - } - - bool haveFastSqrt(Type *Ty) const override { - return false; - } - - unsigned getIntImmCost(const APInt &Imm, Type *Ty) const override { - return TCC_Basic; - } - - unsigned getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, - Type *Ty) const override { - return TCC_Free; - } - - unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm, - Type *Ty) const override { - return TCC_Free; - } - - unsigned getNumberOfRegisters(bool Vector) const override { - return 8; - } - - unsigned getRegisterBitWidth(bool Vector) const override { - return 32; - } - - unsigned getMaxInterleaveFactor() const override { - return 1; - } - - unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind, - OperandValueKind, OperandValueProperties, - OperandValueProperties) const override { - return 1; - } - - unsigned getShuffleCost(ShuffleKind Kind, Type *Ty, - int Index = 0, Type *SubTp = nullptr) const override { - return 1; - } - - unsigned getCastInstrCost(unsigned Opcode, Type *Dst, - Type *Src) const override { - return 1; - } - - unsigned getCFInstrCost(unsigned Opcode) const override { - return 1; - } - - unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, - Type *CondTy = nullptr) const override { - return 1; - } - - unsigned getVectorInstrCost(unsigned Opcode, Type *Val, - unsigned Index = -1) const override { - return 1; - } - - unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, - unsigned AddressSpace) const override { - return 1; - } - - unsigned getMaskedMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, - unsigned AddressSpace) const override { - return 1; - } - - unsigned getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy, - ArrayRef Tys) const override { - return 1; - } - - unsigned getNumberOfParts(Type *Tp) const override { - return 0; - } - - unsigned getAddressComputationCost(Type *Tp, bool) const override { - return 0; - } - - unsigned getReductionCost(unsigned, Type *, bool) const override { - return 1; - } - - unsigned getCostOfKeepingLiveOverCall(ArrayRef Tys) const override { - return 0; - } - - bool getTgtMemIntrinsic(IntrinsicInst *Inst, - MemIntrinsicInfo &Info) const override { - return false; - } - - Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, - Type *ExpectedType) const override { - return nullptr; - } +/// \brief No-op implementation of the TTI interface using the utility base +/// classes. +/// +/// This is used when no target specific information is available. +struct NoTTIImpl : TargetTransformInfoImplCRTPBase { + explicit NoTTIImpl(const DataLayout *DL) + : TargetTransformInfoImplCRTPBase(DL) {} }; +} + +// Register the basic pass. +INITIALIZE_PASS(TargetTransformInfoWrapperPass, "tti", + "Target Transform Information", false, true) +char TargetTransformInfoWrapperPass::ID = 0; -} // end anonymous namespace +void TargetTransformInfoWrapperPass::anchor() {} -INITIALIZE_AG_PASS(NoTTI, TargetTransformInfo, "notti", - "No target information", true, true, true) -char NoTTI::ID = 0; +TargetTransformInfoWrapperPass::TargetTransformInfoWrapperPass() + : ImmutablePass(ID), TTI(NoTTIImpl(/*DataLayout*/ nullptr)) { + initializeTargetTransformInfoWrapperPassPass( + *PassRegistry::getPassRegistry()); +} + +TargetTransformInfoWrapperPass::TargetTransformInfoWrapperPass( + TargetTransformInfo TTI) + : ImmutablePass(ID), TTI(std::move(TTI)) { + initializeTargetTransformInfoWrapperPassPass( + *PassRegistry::getPassRegistry()); +} -ImmutablePass *llvm::createNoTargetTransformInfoPass() { - return new NoTTI(); +ImmutablePass *llvm::createNoTargetTransformInfoPass(const DataLayout *DL) { + return new TargetTransformInfoWrapperPass(NoTTIImpl(DL)); } diff --git a/llvm/lib/CodeGen/BasicTargetTransformInfo.cpp b/llvm/lib/CodeGen/BasicTargetTransformInfo.cpp index 20b994e..5cd0ed3 100644 --- a/llvm/lib/CodeGen/BasicTargetTransformInfo.cpp +++ b/llvm/lib/CodeGen/BasicTargetTransformInfo.cpp @@ -15,245 +15,50 @@ /// //===----------------------------------------------------------------------===// +#include "llvm/CodeGen/BasicTTIImpl.h" #include "llvm/CodeGen/Passes.h" #include "llvm/Analysis/LoopInfo.h" #include "llvm/Analysis/TargetTransformInfo.h" +#include "llvm/Analysis/TargetTransformInfoImpl.h" #include "llvm/Support/CommandLine.h" -#include "llvm/Target/TargetLowering.h" -#include "llvm/Target/TargetSubtargetInfo.h" #include using namespace llvm; -static cl::opt -PartialUnrollingThreshold("partial-unrolling-threshold", cl::init(0), - cl::desc("Threshold for partial unrolling"), cl::Hidden); +cl::opt + llvm::PartialUnrollingThreshold("partial-unrolling-threshold", cl::init(0), + cl::desc("Threshold for partial unrolling"), + cl::Hidden); #define DEBUG_TYPE "basictti" namespace { - -class BasicTTI final : public ImmutablePass, public TargetTransformInfo { - const TargetMachine *TM; - - /// Estimate the overhead of scalarizing an instruction. Insert and Extract - /// are set if the result needs to be inserted and/or extracted from vectors. - unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const; - - /// Estimate the cost overhead of SK_Alternate shuffle. - unsigned getAltShuffleOverhead(Type *Ty) const; - - const TargetLoweringBase *getTLI() const { - return TM->getSubtargetImpl()->getTargetLowering(); - } +class BasicTTIImpl : public BasicTTIImplBase { + typedef BasicTTIImplBase BaseT; public: - BasicTTI() : ImmutablePass(ID), TM(nullptr) { - llvm_unreachable("This pass cannot be directly constructed"); - } + explicit BasicTTIImpl(const TargetMachine *TM = nullptr) : BaseT(TM) {} - BasicTTI(const TargetMachine *TM) : ImmutablePass(ID), TM(TM) { - initializeBasicTTIPass(*PassRegistry::getPassRegistry()); + // Provide value semantics. MSVC requires that we spell all of these out. + BasicTTIImpl(const BasicTTIImpl &Arg) + : BaseT(static_cast(Arg)) {} + BasicTTIImpl(BasicTTIImpl &&Arg) + : BaseT(std::move(static_cast(Arg))) {} + BasicTTIImpl &operator=(const BasicTTIImpl &RHS) { + BaseT::operator=(static_cast(RHS)); + return *this; } - - void initializePass() override { - pushTTIStack(this); + BasicTTIImpl &operator=(BasicTTIImpl &&RHS) { + BaseT::operator=(std::move(static_cast(RHS))); + return *this; } - - void getAnalysisUsage(AnalysisUsage &AU) const override { - TargetTransformInfo::getAnalysisUsage(AU); - } - - /// Pass identification. - static char ID; - - /// Provide necessary pointer adjustments for the two base classes. - void *getAdjustedAnalysisPointer(const void *ID) override { - if (ID == &TargetTransformInfo::ID) - return (TargetTransformInfo*)this; - return this; - } - - bool hasBranchDivergence() const override; - - /// \name Scalar TTI Implementations - /// @{ - - bool isLegalAddImmediate(int64_t imm) const override; - bool isLegalICmpImmediate(int64_t imm) const override; - bool isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV, - int64_t BaseOffset, bool HasBaseReg, - int64_t Scale) const override; - int getScalingFactorCost(Type *Ty, GlobalValue *BaseGV, - int64_t BaseOffset, bool HasBaseReg, - int64_t Scale) const override; - bool isTruncateFree(Type *Ty1, Type *Ty2) const override; - bool isTypeLegal(Type *Ty) const override; - unsigned getJumpBufAlignment() const override; - unsigned getJumpBufSize() const override; - bool shouldBuildLookupTables() const override; - bool haveFastSqrt(Type *Ty) const override; - void getUnrollingPreferences(const Function *F, Loop *L, - UnrollingPreferences &UP) const override; - - /// @} - - /// \name Vector TTI Implementations - /// @{ - - unsigned getNumberOfRegisters(bool Vector) const override; - unsigned getMaxInterleaveFactor() const override; - unsigned getRegisterBitWidth(bool Vector) const override; - unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind, - OperandValueKind, OperandValueProperties, - OperandValueProperties) const override; - unsigned getShuffleCost(ShuffleKind Kind, Type *Tp, - int Index, Type *SubTp) const override; - unsigned getCastInstrCost(unsigned Opcode, Type *Dst, - Type *Src) const override; - unsigned getCFInstrCost(unsigned Opcode) const override; - unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, - Type *CondTy) const override; - unsigned getVectorInstrCost(unsigned Opcode, Type *Val, - unsigned Index) const override; - unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, - unsigned AddressSpace) const override; - unsigned getIntrinsicInstrCost(Intrinsic::ID, Type *RetTy, - ArrayRef Tys) const override; - unsigned getNumberOfParts(Type *Tp) const override; - unsigned getAddressComputationCost( Type *Ty, bool IsComplex) const override; - unsigned getReductionCost(unsigned Opcode, Type *Ty, - bool IsPairwise) const override; - - /// @} }; - } -INITIALIZE_AG_PASS(BasicTTI, TargetTransformInfo, "basictti", - "Target independent code generator's TTI", true, true, false) -char BasicTTI::ID = 0; - ImmutablePass * llvm::createBasicTargetTransformInfoPass(const TargetMachine *TM) { - return new BasicTTI(TM); -} - -bool BasicTTI::hasBranchDivergence() const { return false; } - -bool BasicTTI::isLegalAddImmediate(int64_t imm) const { - return getTLI()->isLegalAddImmediate(imm); -} - -bool BasicTTI::isLegalICmpImmediate(int64_t imm) const { - return getTLI()->isLegalICmpImmediate(imm); -} - -bool BasicTTI::isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV, - int64_t BaseOffset, bool HasBaseReg, - int64_t Scale) const { - TargetLoweringBase::AddrMode AM; - AM.BaseGV = BaseGV; - AM.BaseOffs = BaseOffset; - AM.HasBaseReg = HasBaseReg; - AM.Scale = Scale; - return getTLI()->isLegalAddressingMode(AM, Ty); + return new TargetTransformInfoWrapperPass(BasicTTIImpl(TM)); } -int BasicTTI::getScalingFactorCost(Type *Ty, GlobalValue *BaseGV, - int64_t BaseOffset, bool HasBaseReg, - int64_t Scale) const { - TargetLoweringBase::AddrMode AM; - AM.BaseGV = BaseGV; - AM.BaseOffs = BaseOffset; - AM.HasBaseReg = HasBaseReg; - AM.Scale = Scale; - return getTLI()->getScalingFactorCost(AM, Ty); -} - -bool BasicTTI::isTruncateFree(Type *Ty1, Type *Ty2) const { - return getTLI()->isTruncateFree(Ty1, Ty2); -} - -bool BasicTTI::isTypeLegal(Type *Ty) const { - EVT T = getTLI()->getValueType(Ty); - return getTLI()->isTypeLegal(T); -} - -unsigned BasicTTI::getJumpBufAlignment() const { - return getTLI()->getJumpBufAlignment(); -} - -unsigned BasicTTI::getJumpBufSize() const { - return getTLI()->getJumpBufSize(); -} - -bool BasicTTI::shouldBuildLookupTables() const { - const TargetLoweringBase *TLI = getTLI(); - return TLI->isOperationLegalOrCustom(ISD::BR_JT, MVT::Other) || - TLI->isOperationLegalOrCustom(ISD::BRIND, MVT::Other); -} - -bool BasicTTI::haveFastSqrt(Type *Ty) const { - const TargetLoweringBase *TLI = getTLI(); - EVT VT = TLI->getValueType(Ty); - return TLI->isTypeLegal(VT) && TLI->isOperationLegalOrCustom(ISD::FSQRT, VT); -} - -void BasicTTI::getUnrollingPreferences(const Function *F, Loop *L, - UnrollingPreferences &UP) const { - // This unrolling functionality is target independent, but to provide some - // motivation for its intended use, for x86: - - // According to the Intel 64 and IA-32 Architectures Optimization Reference - // Manual, Intel Core models and later have a loop stream detector - // (and associated uop queue) that can benefit from partial unrolling. - // The relevant requirements are: - // - The loop must have no more than 4 (8 for Nehalem and later) branches - // taken, and none of them may be calls. - // - The loop can have no more than 18 (28 for Nehalem and later) uops. - - // According to the Software Optimization Guide for AMD Family 15h Processors, - // models 30h-4fh (Steamroller and later) have a loop predictor and loop - // buffer which can benefit from partial unrolling. - // The relevant requirements are: - // - The loop must have fewer than 16 branches - // - The loop must have less than 40 uops in all executed loop branches - - // The number of taken branches in a loop is hard to estimate here, and - // benchmarking has revealed that it is better not to be conservative when - // estimating the branch count. As a result, we'll ignore the branch limits - // until someone finds a case where it matters in practice. - - unsigned MaxOps; - const TargetSubtargetInfo *ST = TM->getSubtargetImpl(*F); - if (PartialUnrollingThreshold.getNumOccurrences() > 0) - MaxOps = PartialUnrollingThreshold; - else if (ST->getSchedModel().LoopMicroOpBufferSize > 0) - MaxOps = ST->getSchedModel().LoopMicroOpBufferSize; - else - return; - - // Scan the loop: don't unroll loops with calls. - for (Loop::block_iterator I = L->block_begin(), E = L->block_end(); - I != E; ++I) { - BasicBlock *BB = *I; - - for (BasicBlock::iterator J = BB->begin(), JE = BB->end(); J != JE; ++J) - if (isa(J) || isa(J)) { - ImmutableCallSite CS(J); - if (const Function *F = CS.getCalledFunction()) { - if (!TopTTI->isLoweredToCall(F)) - continue; - } - - return; - } - } - - // Enable runtime and partial unrolling up to the specified size. - UP.Partial = UP.Runtime = true; - UP.PartialThreshold = UP.PartialOptSizeThreshold = MaxOps; -} //===----------------------------------------------------------------------===// // @@ -261,391 +66,3 @@ void BasicTTI::getUnrollingPreferences(const Function *F, Loop *L, // //===----------------------------------------------------------------------===// -unsigned BasicTTI::getScalarizationOverhead(Type *Ty, bool Insert, - bool Extract) const { - assert (Ty->isVectorTy() && "Can only scalarize vectors"); - unsigned Cost = 0; - - for (int i = 0, e = Ty->getVectorNumElements(); i < e; ++i) { - if (Insert) - Cost += TopTTI->getVectorInstrCost(Instruction::InsertElement, Ty, i); - if (Extract) - Cost += TopTTI->getVectorInstrCost(Instruction::ExtractElement, Ty, i); - } - - return Cost; -} - -unsigned BasicTTI::getNumberOfRegisters(bool Vector) const { - return 1; -} - -unsigned BasicTTI::getRegisterBitWidth(bool Vector) const { - return 32; -} - -unsigned BasicTTI::getMaxInterleaveFactor() const { - return 1; -} - -unsigned BasicTTI::getArithmeticInstrCost(unsigned Opcode, Type *Ty, - OperandValueKind, OperandValueKind, - OperandValueProperties, - OperandValueProperties) const { - // Check if any of the operands are vector operands. - const TargetLoweringBase *TLI = getTLI(); - int ISD = TLI->InstructionOpcodeToISD(Opcode); - assert(ISD && "Invalid opcode"); - - std::pair LT = TLI->getTypeLegalizationCost(Ty); - - bool IsFloat = Ty->getScalarType()->isFloatingPointTy(); - // Assume that floating point arithmetic operations cost twice as much as - // integer operations. - unsigned OpCost = (IsFloat ? 2 : 1); - - if (TLI->isOperationLegalOrPromote(ISD, LT.second)) { - // The operation is legal. Assume it costs 1. - // If the type is split to multiple registers, assume that there is some - // overhead to this. - // TODO: Once we have extract/insert subvector cost we need to use them. - if (LT.first > 1) - return LT.first * 2 * OpCost; - return LT.first * 1 * OpCost; - } - - if (!TLI->isOperationExpand(ISD, LT.second)) { - // If the operation is custom lowered then assume - // thare the code is twice as expensive. - return LT.first * 2 * OpCost; - } - - // Else, assume that we need to scalarize this op. - if (Ty->isVectorTy()) { - unsigned Num = Ty->getVectorNumElements(); - unsigned Cost = TopTTI->getArithmeticInstrCost(Opcode, Ty->getScalarType()); - // return the cost of multiple scalar invocation plus the cost of inserting - // and extracting the values. - return getScalarizationOverhead(Ty, true, true) + Num * Cost; - } - - // We don't know anything about this scalar instruction. - return OpCost; -} - -unsigned BasicTTI::getAltShuffleOverhead(Type *Ty) const { - assert(Ty->isVectorTy() && "Can only shuffle vectors"); - unsigned Cost = 0; - // Shuffle cost is equal to the cost of extracting element from its argument - // plus the cost of inserting them onto the result vector. - - // e.g. <4 x float> has a mask of <0,5,2,7> i.e we need to extract from index - // 0 of first vector, index 1 of second vector,index 2 of first vector and - // finally index 3 of second vector and insert them at index <0,1,2,3> of - // result vector. - for (int i = 0, e = Ty->getVectorNumElements(); i < e; ++i) { - Cost += TopTTI->getVectorInstrCost(Instruction::InsertElement, Ty, i); - Cost += TopTTI->getVectorInstrCost(Instruction::ExtractElement, Ty, i); - } - return Cost; -} - -unsigned BasicTTI::getShuffleCost(ShuffleKind Kind, Type *Tp, int Index, - Type *SubTp) const { - if (Kind == SK_Alternate) { - return getAltShuffleOverhead(Tp); - } - return 1; -} - -unsigned BasicTTI::getCastInstrCost(unsigned Opcode, Type *Dst, - Type *Src) const { - const TargetLoweringBase *TLI = getTLI(); - int ISD = TLI->InstructionOpcodeToISD(Opcode); - assert(ISD && "Invalid opcode"); - - std::pair SrcLT = TLI->getTypeLegalizationCost(Src); - std::pair DstLT = TLI->getTypeLegalizationCost(Dst); - - // Check for NOOP conversions. - if (SrcLT.first == DstLT.first && - SrcLT.second.getSizeInBits() == DstLT.second.getSizeInBits()) { - - // Bitcast between types that are legalized to the same type are free. - if (Opcode == Instruction::BitCast || Opcode == Instruction::Trunc) - return 0; - } - - if (Opcode == Instruction::Trunc && - TLI->isTruncateFree(SrcLT.second, DstLT.second)) - return 0; - - if (Opcode == Instruction::ZExt && - TLI->isZExtFree(SrcLT.second, DstLT.second)) - return 0; - - // If the cast is marked as legal (or promote) then assume low cost. - if (SrcLT.first == DstLT.first && - TLI->isOperationLegalOrPromote(ISD, DstLT.second)) - return 1; - - // Handle scalar conversions. - if (!Src->isVectorTy() && !Dst->isVectorTy()) { - - // Scalar bitcasts are usually free. - if (Opcode == Instruction::BitCast) - return 0; - - // Just check the op cost. If the operation is legal then assume it costs 1. - if (!TLI->isOperationExpand(ISD, DstLT.second)) - return 1; - - // Assume that illegal scalar instruction are expensive. - return 4; - } - - // Check vector-to-vector casts. - if (Dst->isVectorTy() && Src->isVectorTy()) { - - // If the cast is between same-sized registers, then the check is simple. - if (SrcLT.first == DstLT.first && - SrcLT.second.getSizeInBits() == DstLT.second.getSizeInBits()) { - - // Assume that Zext is done using AND. - if (Opcode == Instruction::ZExt) - return 1; - - // Assume that sext is done using SHL and SRA. - if (Opcode == Instruction::SExt) - return 2; - - // Just check the op cost. If the operation is legal then assume it costs - // 1 and multiply by the type-legalization overhead. - if (!TLI->isOperationExpand(ISD, DstLT.second)) - return SrcLT.first * 1; - } - - // If we are converting vectors and the operation is illegal, or - // if the vectors are legalized to different types, estimate the - // scalarization costs. - unsigned Num = Dst->getVectorNumElements(); - unsigned Cost = TopTTI->getCastInstrCost(Opcode, Dst->getScalarType(), - Src->getScalarType()); - - // Return the cost of multiple scalar invocation plus the cost of - // inserting and extracting the values. - return getScalarizationOverhead(Dst, true, true) + Num * Cost; - } - - // We already handled vector-to-vector and scalar-to-scalar conversions. This - // is where we handle bitcast between vectors and scalars. We need to assume - // that the conversion is scalarized in one way or another. - if (Opcode == Instruction::BitCast) - // Illegal bitcasts are done by storing and loading from a stack slot. - return (Src->isVectorTy()? getScalarizationOverhead(Src, false, true):0) + - (Dst->isVectorTy()? getScalarizationOverhead(Dst, true, false):0); - - llvm_unreachable("Unhandled cast"); - } - -unsigned BasicTTI::getCFInstrCost(unsigned Opcode) const { - // Branches are assumed to be predicted. - return 0; -} - -unsigned BasicTTI::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, - Type *CondTy) const { - const TargetLoweringBase *TLI = getTLI(); - int ISD = TLI->InstructionOpcodeToISD(Opcode); - assert(ISD && "Invalid opcode"); - - // Selects on vectors are actually vector selects. - if (ISD == ISD::SELECT) { - assert(CondTy && "CondTy must exist"); - if (CondTy->isVectorTy()) - ISD = ISD::VSELECT; - } - - std::pair LT = TLI->getTypeLegalizationCost(ValTy); - - if (!(ValTy->isVectorTy() && !LT.second.isVector()) && - !TLI->isOperationExpand(ISD, LT.second)) { - // The operation is legal. Assume it costs 1. Multiply - // by the type-legalization overhead. - return LT.first * 1; - } - - // Otherwise, assume that the cast is scalarized. - if (ValTy->isVectorTy()) { - unsigned Num = ValTy->getVectorNumElements(); - if (CondTy) - CondTy = CondTy->getScalarType(); - unsigned Cost = TopTTI->getCmpSelInstrCost(Opcode, ValTy->getScalarType(), - CondTy); - - // Return the cost of multiple scalar invocation plus the cost of inserting - // and extracting the values. - return getScalarizationOverhead(ValTy, true, false) + Num * Cost; - } - - // Unknown scalar opcode. - return 1; -} - -unsigned BasicTTI::getVectorInstrCost(unsigned Opcode, Type *Val, - unsigned Index) const { - std::pair LT = getTLI()->getTypeLegalizationCost(Val->getScalarType()); - - return LT.first; -} - -unsigned BasicTTI::getMemoryOpCost(unsigned Opcode, Type *Src, - unsigned Alignment, - unsigned AddressSpace) const { - assert(!Src->isVoidTy() && "Invalid type"); - std::pair LT = getTLI()->getTypeLegalizationCost(Src); - - // Assuming that all loads of legal types cost 1. - unsigned Cost = LT.first; - - if (Src->isVectorTy() && - Src->getPrimitiveSizeInBits() < LT.second.getSizeInBits()) { - // This is a vector load that legalizes to a larger type than the vector - // itself. Unless the corresponding extending load or truncating store is - // legal, then this will scalarize. - TargetLowering::LegalizeAction LA = TargetLowering::Expand; - EVT MemVT = getTLI()->getValueType(Src, true); - if (MemVT.isSimple() && MemVT != MVT::Other) { - if (Opcode == Instruction::Store) - LA = getTLI()->getTruncStoreAction(LT.second, MemVT.getSimpleVT()); - else - LA = getTLI()->getLoadExtAction(ISD::EXTLOAD, LT.second, MemVT); - } - - if (LA != TargetLowering::Legal && LA != TargetLowering::Custom) { - // This is a vector load/store for some illegal type that is scalarized. - // We must account for the cost of building or decomposing the vector. - Cost += getScalarizationOverhead(Src, Opcode != Instruction::Store, - Opcode == Instruction::Store); - } - } - - return Cost; -} - -unsigned BasicTTI::getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy, - ArrayRef Tys) const { - unsigned ISD = 0; - switch (IID) { - default: { - // Assume that we need to scalarize this intrinsic. - unsigned ScalarizationCost = 0; - unsigned ScalarCalls = 1; - if (RetTy->isVectorTy()) { - ScalarizationCost = getScalarizationOverhead(RetTy, true, false); - ScalarCalls = std::max(ScalarCalls, RetTy->getVectorNumElements()); - } - for (unsigned i = 0, ie = Tys.size(); i != ie; ++i) { - if (Tys[i]->isVectorTy()) { - ScalarizationCost += getScalarizationOverhead(Tys[i], false, true); - ScalarCalls = std::max(ScalarCalls, RetTy->getVectorNumElements()); - } - } - - return ScalarCalls + ScalarizationCost; - } - // Look for intrinsics that can be lowered directly or turned into a scalar - // intrinsic call. - case Intrinsic::sqrt: ISD = ISD::FSQRT; break; - case Intrinsic::sin: ISD = ISD::FSIN; break; - case Intrinsic::cos: ISD = ISD::FCOS; break; - case Intrinsic::exp: ISD = ISD::FEXP; break; - case Intrinsic::exp2: ISD = ISD::FEXP2; break; - case Intrinsic::log: ISD = ISD::FLOG; break; - case Intrinsic::log10: ISD = ISD::FLOG10; break; - case Intrinsic::log2: ISD = ISD::FLOG2; break; - case Intrinsic::fabs: ISD = ISD::FABS; break; - case Intrinsic::minnum: ISD = ISD::FMINNUM; break; - case Intrinsic::maxnum: ISD = ISD::FMAXNUM; break; - case Intrinsic::copysign: ISD = ISD::FCOPYSIGN; break; - case Intrinsic::floor: ISD = ISD::FFLOOR; break; - case Intrinsic::ceil: ISD = ISD::FCEIL; break; - case Intrinsic::trunc: ISD = ISD::FTRUNC; break; - case Intrinsic::nearbyint: - ISD = ISD::FNEARBYINT; break; - case Intrinsic::rint: ISD = ISD::FRINT; break; - case Intrinsic::round: ISD = ISD::FROUND; break; - case Intrinsic::pow: ISD = ISD::FPOW; break; - case Intrinsic::fma: ISD = ISD::FMA; break; - case Intrinsic::fmuladd: ISD = ISD::FMA; break; - // FIXME: We should return 0 whenever getIntrinsicCost == TCC_Free. - case Intrinsic::lifetime_start: - case Intrinsic::lifetime_end: - return 0; - case Intrinsic::masked_store: - return TopTTI->getMaskedMemoryOpCost(Instruction::Store, Tys[0], 0, 0); - case Intrinsic::masked_load: - return TopTTI->getMaskedMemoryOpCost(Instruction::Load, RetTy, 0, 0); - } - - const TargetLoweringBase *TLI = getTLI(); - std::pair LT = TLI->getTypeLegalizationCost(RetTy); - - if (TLI->isOperationLegalOrPromote(ISD, LT.second)) { - // The operation is legal. Assume it costs 1. - // If the type is split to multiple registers, assume that there is some - // overhead to this. - // TODO: Once we have extract/insert subvector cost we need to use them. - if (LT.first > 1) - return LT.first * 2; - return LT.first * 1; - } - - if (!TLI->isOperationExpand(ISD, LT.second)) { - // If the operation is custom lowered then assume - // thare the code is twice as expensive. - return LT.first * 2; - } - - // If we can't lower fmuladd into an FMA estimate the cost as a floating - // point mul followed by an add. - if (IID == Intrinsic::fmuladd) - return TopTTI->getArithmeticInstrCost(BinaryOperator::FMul, RetTy) + - TopTTI->getArithmeticInstrCost(BinaryOperator::FAdd, RetTy); - - // Else, assume that we need to scalarize this intrinsic. For math builtins - // this will emit a costly libcall, adding call overhead and spills. Make it - // very expensive. - if (RetTy->isVectorTy()) { - unsigned Num = RetTy->getVectorNumElements(); - unsigned Cost = TopTTI->getIntrinsicInstrCost(IID, RetTy->getScalarType(), - Tys); - return 10 * Cost * Num; - } - - // This is going to be turned into a library call, make it expensive. - return 10; -} - -unsigned BasicTTI::getNumberOfParts(Type *Tp) const { - std::pair LT = getTLI()->getTypeLegalizationCost(Tp); - return LT.first; -} - -unsigned BasicTTI::getAddressComputationCost(Type *Ty, bool IsComplex) const { - return 0; -} - -unsigned BasicTTI::getReductionCost(unsigned Opcode, Type *Ty, - bool IsPairwise) const { - assert(Ty->isVectorTy() && "Expect a vector type"); - unsigned NumVecElts = Ty->getVectorNumElements(); - unsigned NumReduxLevels = Log2_32(NumVecElts); - unsigned ArithCost = NumReduxLevels * - TopTTI->getArithmeticInstrCost(Opcode, Ty); - // Assume the pairwise shuffles add a cost. - unsigned ShuffleCost = - NumReduxLevels * (IsPairwise + 1) * - TopTTI->getShuffleCost(SK_ExtractSubvector, Ty, NumVecElts / 2, Ty); - return ShuffleCost + ArithCost + getScalarizationOverhead(Ty, false, true); -} diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp index 307dec5..7c0068e 100644 --- a/llvm/lib/CodeGen/CodeGen.cpp +++ b/llvm/lib/CodeGen/CodeGen.cpp @@ -21,7 +21,6 @@ using namespace llvm; /// initializeCodeGen - Initialize all passes linked into the CodeGen library. void llvm::initializeCodeGen(PassRegistry &Registry) { initializeAtomicExpandPass(Registry); - initializeBasicTTIPass(Registry); initializeBranchFolderPassPass(Registry); initializeCodeGenPreparePass(Registry); initializeDeadMachineInstructionElimPass(Registry); diff --git a/llvm/lib/CodeGen/CodeGenPrepare.cpp b/llvm/lib/CodeGen/CodeGenPrepare.cpp index 7b673ba..184a6a4 100644 --- a/llvm/lib/CodeGen/CodeGenPrepare.cpp +++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp @@ -162,7 +162,7 @@ class TypePromotionTransaction; void getAnalysisUsage(AnalysisUsage &AU) const override { AU.addPreserved(); AU.addRequired(); - AU.addRequired(); + AU.addRequired(); } private: @@ -213,7 +213,7 @@ bool CodeGenPrepare::runOnFunction(Function &F) { if (TM) TLI = TM->getSubtargetImpl(F)->getTargetLowering(); TLInfo = &getAnalysis().getTLI(); - TTI = &getAnalysis(); + TTI = &getAnalysis().getTTI(); DominatorTreeWrapperPass *DTWP = getAnalysisIfAvailable(); DT = DTWP ? &DTWP->getDomTree() : nullptr; diff --git a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp index 6a800a3..a0d1b84 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp +++ b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp @@ -196,10 +196,6 @@ public: } // namespace void AArch64TargetMachine::addAnalysisPasses(PassManagerBase &PM) { - // Add first the target-independent BasicTTI pass, then our AArch64 pass. This - // allows the AArch64 pass to delegate to the target independent layer when - // appropriate. - PM.add(createBasicTargetTransformInfoPass(this)); PM.add(createAArch64TargetTransformInfoPass(this)); } diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp index 653ba83..f1e9c6a 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp @@ -18,6 +18,7 @@ #include "AArch64TargetMachine.h" #include "MCTargetDesc/AArch64AddressingModes.h" #include "llvm/Analysis/TargetTransformInfo.h" +#include "llvm/CodeGen/BasicTTIImpl.h" #include "llvm/Support/Debug.h" #include "llvm/Target/CostTable.h" #include "llvm/Target/TargetLowering.h" @@ -26,23 +27,18 @@ using namespace llvm; #define DEBUG_TYPE "aarch64tti" -// Declare the pass initialization routine locally as target-specific passes -// don't have a target-wide initialization entry point, and so we rely on the -// pass constructor initialization. -namespace llvm { -void initializeAArch64TTIPass(PassRegistry &); -} - namespace { -class AArch64TTI final : public ImmutablePass, public TargetTransformInfo { - const AArch64TargetMachine *TM; +class AArch64TTIImpl : public BasicTTIImplBase { + typedef BasicTTIImplBase BaseT; + typedef TargetTransformInfo TTI; + const AArch64Subtarget *ST; const AArch64TargetLowering *TLI; /// Estimate the overhead of scalarizing an instruction. Insert and Extract /// are set if the result needs to be inserted and/or extracted from vectors. - unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const; + unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract); enum MemIntrinsicType { VECTOR_LDST_TWO_ELEMENTS, @@ -51,48 +47,47 @@ class AArch64TTI final : public ImmutablePass, public TargetTransformInfo { }; public: - AArch64TTI() : ImmutablePass(ID), TM(nullptr), ST(nullptr), TLI(nullptr) { - llvm_unreachable("This pass cannot be directly constructed"); - } - - AArch64TTI(const AArch64TargetMachine *TM) - : ImmutablePass(ID), TM(TM), ST(TM->getSubtargetImpl()), - TLI(TM->getSubtargetImpl()->getTargetLowering()) { - initializeAArch64TTIPass(*PassRegistry::getPassRegistry()); - } - - void initializePass() override { pushTTIStack(this); } - - void getAnalysisUsage(AnalysisUsage &AU) const override { - TargetTransformInfo::getAnalysisUsage(AU); + explicit AArch64TTIImpl(const AArch64TargetMachine *TM = nullptr) + : BaseT(TM), ST(TM ? TM->getSubtargetImpl() : nullptr), + TLI(ST ? ST->getTargetLowering() : nullptr) {} + + // Provide value semantics. MSVC requires that we spell all of these out. + AArch64TTIImpl(const AArch64TTIImpl &Arg) + : BaseT(static_cast(Arg)), ST(Arg.ST), TLI(Arg.TLI) {} + AArch64TTIImpl(AArch64TTIImpl &&Arg) + : BaseT(std::move(static_cast(Arg))), ST(std::move(Arg.ST)), + TLI(std::move(Arg.TLI)) {} + AArch64TTIImpl &operator=(const AArch64TTIImpl &RHS) { + BaseT::operator=(static_cast(RHS)); + ST = RHS.ST; + TLI = RHS.TLI; + return *this; } - - /// Pass identification. - static char ID; - - /// Provide necessary pointer adjustments for the two base classes. - void *getAdjustedAnalysisPointer(const void *ID) override { - if (ID == &TargetTransformInfo::ID) - return (TargetTransformInfo *)this; - return this; + AArch64TTIImpl &operator=(AArch64TTIImpl &&RHS) { + BaseT::operator=(std::move(static_cast(RHS))); + ST = std::move(RHS.ST); + TLI = std::move(RHS.TLI); + return *this; } /// \name Scalar TTI Implementations /// @{ - unsigned getIntImmCost(int64_t Val) const; - unsigned getIntImmCost(const APInt &Imm, Type *Ty) const override; + + using BaseT::getIntImmCost; + unsigned getIntImmCost(int64_t Val); + unsigned getIntImmCost(const APInt &Imm, Type *Ty); unsigned getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, - Type *Ty) const override; + Type *Ty); unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm, - Type *Ty) const override; - PopcntSupportKind getPopcntSupport(unsigned TyWidth) const override; + Type *Ty); + TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth); /// @} /// \name Vector TTI Implementations /// @{ - unsigned getNumberOfRegisters(bool Vector) const override { + unsigned getNumberOfRegisters(bool Vector) { if (Vector) { if (ST->hasNEON()) return 32; @@ -101,7 +96,7 @@ public: return 31; } - unsigned getRegisterBitWidth(bool Vector) const override { + unsigned getRegisterBitWidth(bool Vector) { if (Vector) { if (ST->hasNEON()) return 128; @@ -110,57 +105,50 @@ public: return 64; } - unsigned getMaxInterleaveFactor() const override; + unsigned getMaxInterleaveFactor(); - unsigned getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) const - override; + unsigned getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src); - unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) const - override; + unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index); unsigned getArithmeticInstrCost( - unsigned Opcode, Type *Ty, OperandValueKind Opd1Info = OK_AnyValue, - OperandValueKind Opd2Info = OK_AnyValue, - OperandValueProperties Opd1PropInfo = OP_None, - OperandValueProperties Opd2PropInfo = OP_None) const override; + unsigned Opcode, Type *Ty, + TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue, + TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue, + TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None, + TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None); - unsigned getAddressComputationCost(Type *Ty, bool IsComplex) const override; + unsigned getAddressComputationCost(Type *Ty, bool IsComplex); - unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy) const - override; + unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy); unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, - unsigned AddressSpace) const override; + unsigned AddressSpace); - unsigned getCostOfKeepingLiveOverCall(ArrayRef Tys) const override; + unsigned getCostOfKeepingLiveOverCall(ArrayRef Tys); void getUnrollingPreferences(const Function *F, Loop *L, - UnrollingPreferences &UP) const override; + TTI::UnrollingPreferences &UP); Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, - Type *ExpectedType) const override; + Type *ExpectedType); - bool getTgtMemIntrinsic(IntrinsicInst *Inst, - MemIntrinsicInfo &Info) const override; + bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info); /// @} }; } // end anonymous namespace -INITIALIZE_AG_PASS(AArch64TTI, TargetTransformInfo, "aarch64tti", - "AArch64 Target Transform Info", true, true, false) -char AArch64TTI::ID = 0; - ImmutablePass * llvm::createAArch64TargetTransformInfoPass(const AArch64TargetMachine *TM) { - return new AArch64TTI(TM); + return new TargetTransformInfoWrapperPass(AArch64TTIImpl(TM)); } /// \brief Calculate the cost of materializing a 64-bit value. This helper /// method might only calculate a fraction of a larger immediate. Therefore it /// is valid to return a cost of ZERO. -unsigned AArch64TTI::getIntImmCost(int64_t Val) const { +unsigned AArch64TTIImpl::getIntImmCost(int64_t Val) { // Check if the immediate can be encoded within an instruction. if (Val == 0 || AArch64_AM::isLogicalImmediate(Val, 64)) return 0; @@ -174,7 +162,7 @@ unsigned AArch64TTI::getIntImmCost(int64_t Val) const { } /// \brief Calculate the cost of materializing the given constant. -unsigned AArch64TTI::getIntImmCost(const APInt &Imm, Type *Ty) const { +unsigned AArch64TTIImpl::getIntImmCost(const APInt &Imm, Type *Ty) { assert(Ty->isIntegerTy()); unsigned BitSize = Ty->getPrimitiveSizeInBits(); @@ -198,25 +186,25 @@ unsigned AArch64TTI::getIntImmCost(const APInt &Imm, Type *Ty) const { return std::max(1U, Cost); } -unsigned AArch64TTI::getIntImmCost(unsigned Opcode, unsigned Idx, - const APInt &Imm, Type *Ty) const { +unsigned AArch64TTIImpl::getIntImmCost(unsigned Opcode, unsigned Idx, + const APInt &Imm, Type *Ty) { assert(Ty->isIntegerTy()); unsigned BitSize = Ty->getPrimitiveSizeInBits(); // There is no cost model for constants with a bit size of 0. Return TCC_Free // here, so that constant hoisting will ignore this constant. if (BitSize == 0) - return TCC_Free; + return TTI::TCC_Free; unsigned ImmIdx = ~0U; switch (Opcode) { default: - return TCC_Free; + return TTI::TCC_Free; case Instruction::GetElementPtr: // Always hoist the base address of a GetElementPtr. if (Idx == 0) - return 2 * TCC_Basic; - return TCC_Free; + return 2 * TTI::TCC_Basic; + return TTI::TCC_Free; case Instruction::Store: ImmIdx = 0; break; @@ -238,7 +226,7 @@ unsigned AArch64TTI::getIntImmCost(unsigned Opcode, unsigned Idx, case Instruction::LShr: case Instruction::AShr: if (Idx == 1) - return TCC_Free; + return TTI::TCC_Free; break; case Instruction::Trunc: case Instruction::ZExt: @@ -256,26 +244,27 @@ unsigned AArch64TTI::getIntImmCost(unsigned Opcode, unsigned Idx, if (Idx == ImmIdx) { unsigned NumConstants = (BitSize + 63) / 64; - unsigned Cost = AArch64TTI::getIntImmCost(Imm, Ty); - return (Cost <= NumConstants * TCC_Basic) - ? static_cast(TCC_Free) : Cost; + unsigned Cost = AArch64TTIImpl::getIntImmCost(Imm, Ty); + return (Cost <= NumConstants * TTI::TCC_Basic) + ? static_cast(TTI::TCC_Free) + : Cost; } - return AArch64TTI::getIntImmCost(Imm, Ty); + return AArch64TTIImpl::getIntImmCost(Imm, Ty); } -unsigned AArch64TTI::getIntImmCost(Intrinsic::ID IID, unsigned Idx, - const APInt &Imm, Type *Ty) const { +unsigned AArch64TTIImpl::getIntImmCost(Intrinsic::ID IID, unsigned Idx, + const APInt &Imm, Type *Ty) { assert(Ty->isIntegerTy()); unsigned BitSize = Ty->getPrimitiveSizeInBits(); // There is no cost model for constants with a bit size of 0. Return TCC_Free // here, so that constant hoisting will ignore this constant. if (BitSize == 0) - return TCC_Free; + return TTI::TCC_Free; switch (IID) { default: - return TCC_Free; + return TTI::TCC_Free; case Intrinsic::sadd_with_overflow: case Intrinsic::uadd_with_overflow: case Intrinsic::ssub_with_overflow: @@ -284,35 +273,36 @@ unsigned AArch64TTI::getIntImmCost(Intrinsic::ID IID, unsigned Idx, case Intrinsic::umul_with_overflow: if (Idx == 1) { unsigned NumConstants = (BitSize + 63) / 64; - unsigned Cost = AArch64TTI::getIntImmCost(Imm, Ty); - return (Cost <= NumConstants * TCC_Basic) - ? static_cast(TCC_Free) : Cost; + unsigned Cost = AArch64TTIImpl::getIntImmCost(Imm, Ty); + return (Cost <= NumConstants * TTI::TCC_Basic) + ? static_cast(TTI::TCC_Free) + : Cost; } break; case Intrinsic::experimental_stackmap: if ((Idx < 2) || (Imm.getBitWidth() <= 64 && isInt<64>(Imm.getSExtValue()))) - return TCC_Free; + return TTI::TCC_Free; break; case Intrinsic::experimental_patchpoint_void: case Intrinsic::experimental_patchpoint_i64: if ((Idx < 4) || (Imm.getBitWidth() <= 64 && isInt<64>(Imm.getSExtValue()))) - return TCC_Free; + return TTI::TCC_Free; break; } - return AArch64TTI::getIntImmCost(Imm, Ty); + return AArch64TTIImpl::getIntImmCost(Imm, Ty); } -AArch64TTI::PopcntSupportKind -AArch64TTI::getPopcntSupport(unsigned TyWidth) const { +TargetTransformInfo::PopcntSupportKind +AArch64TTIImpl::getPopcntSupport(unsigned TyWidth) { assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2"); if (TyWidth == 32 || TyWidth == 64) - return PSK_FastHardware; + return TTI::PSK_FastHardware; // TODO: AArch64TargetLowering::LowerCTPOP() supports 128bit popcount. - return PSK_Software; + return TTI::PSK_Software; } -unsigned AArch64TTI::getCastInstrCost(unsigned Opcode, Type *Dst, - Type *Src) const { +unsigned AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, + Type *Src) { int ISD = TLI->InstructionOpcodeToISD(Opcode); assert(ISD && "Invalid opcode"); @@ -320,7 +310,7 @@ unsigned AArch64TTI::getCastInstrCost(unsigned Opcode, Type *Dst, EVT DstTy = TLI->getValueType(Dst); if (!SrcTy.isSimple() || !DstTy.isSimple()) - return TargetTransformInfo::getCastInstrCost(Opcode, Dst, Src); + return BaseT::getCastInstrCost(Opcode, Dst, Src); static const TypeConversionCostTblEntry ConversionTbl[] = { // LowerVectorINT_TO_FP: @@ -391,11 +381,11 @@ unsigned AArch64TTI::getCastInstrCost(unsigned Opcode, Type *Dst, if (Idx != -1) return ConversionTbl[Idx].Cost; - return TargetTransformInfo::getCastInstrCost(Opcode, Dst, Src); + return BaseT::getCastInstrCost(Opcode, Dst, Src); } -unsigned AArch64TTI::getVectorInstrCost(unsigned Opcode, Type *Val, - unsigned Index) const { +unsigned AArch64TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, + unsigned Index) { assert(Val->isVectorTy() && "This must be a vector type"); if (Index != -1U) { @@ -419,10 +409,10 @@ unsigned AArch64TTI::getVectorInstrCost(unsigned Opcode, Type *Val, return 2; } -unsigned AArch64TTI::getArithmeticInstrCost( - unsigned Opcode, Type *Ty, OperandValueKind Opd1Info, - OperandValueKind Opd2Info, OperandValueProperties Opd1PropInfo, - OperandValueProperties Opd2PropInfo) const { +unsigned AArch64TTIImpl::getArithmeticInstrCost( + unsigned Opcode, Type *Ty, TTI::OperandValueKind Opd1Info, + TTI::OperandValueKind Opd2Info, TTI::OperandValueProperties Opd1PropInfo, + TTI::OperandValueProperties Opd2PropInfo) { // Legalize the type. std::pair LT = TLI->getTypeLegalizationCost(Ty); @@ -453,8 +443,8 @@ unsigned AArch64TTI::getArithmeticInstrCost( switch (ISD) { default: - return TargetTransformInfo::getArithmeticInstrCost( - Opcode, Ty, Opd1Info, Opd2Info, Opd1PropInfo, Opd2PropInfo); + return BaseT::getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info, + Opd1PropInfo, Opd2PropInfo); case ISD::ADD: case ISD::MUL: case ISD::XOR: @@ -466,7 +456,7 @@ unsigned AArch64TTI::getArithmeticInstrCost( } } -unsigned AArch64TTI::getAddressComputationCost(Type *Ty, bool IsComplex) const { +unsigned AArch64TTIImpl::getAddressComputationCost(Type *Ty, bool IsComplex) { // Address computations in vectorized code with non-consecutive addresses will // likely result in more instructions compared to scalar code where the // computation can more often be merged into the index mode. The resulting @@ -481,8 +471,8 @@ unsigned AArch64TTI::getAddressComputationCost(Type *Ty, bool IsComplex) const { return 1; } -unsigned AArch64TTI::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, - Type *CondTy) const { +unsigned AArch64TTIImpl::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, + Type *CondTy) { int ISD = TLI->InstructionOpcodeToISD(Opcode); // We don't lower vector selects well that are wider than the register width. @@ -509,12 +499,12 @@ unsigned AArch64TTI::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, return VectorSelectTbl[Idx].Cost; } } - return TargetTransformInfo::getCmpSelInstrCost(Opcode, ValTy, CondTy); + return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy); } -unsigned AArch64TTI::getMemoryOpCost(unsigned Opcode, Type *Src, - unsigned Alignment, - unsigned AddressSpace) const { +unsigned AArch64TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src, + unsigned Alignment, + unsigned AddressSpace) { std::pair LT = TLI->getTypeLegalizationCost(Src); if (Opcode == Instruction::Store && Src->isVectorTy() && Alignment != 16 && @@ -542,7 +532,7 @@ unsigned AArch64TTI::getMemoryOpCost(unsigned Opcode, Type *Src, return LT.first; } -unsigned AArch64TTI::getCostOfKeepingLiveOverCall(ArrayRef Tys) const { +unsigned AArch64TTIImpl::getCostOfKeepingLiveOverCall(ArrayRef Tys) { unsigned Cost = 0; for (auto *I : Tys) { if (!I->isVectorTy()) @@ -554,20 +544,20 @@ unsigned AArch64TTI::getCostOfKeepingLiveOverCall(ArrayRef Tys) const { return Cost; } -unsigned AArch64TTI::getMaxInterleaveFactor() const { +unsigned AArch64TTIImpl::getMaxInterleaveFactor() { if (ST->isCortexA57()) return 4; return 2; } -void AArch64TTI::getUnrollingPreferences(const Function *F, Loop *L, - UnrollingPreferences &UP) const { +void AArch64TTIImpl::getUnrollingPreferences(const Function *F, Loop *L, + TTI::UnrollingPreferences &UP) { // Disable partial & runtime unrolling on -Os. UP.PartialOptSizeThreshold = 0; } -Value *AArch64TTI::getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, - Type *ExpectedType) const { +Value *AArch64TTIImpl::getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, + Type *ExpectedType) { switch (Inst->getIntrinsicID()) { default: return nullptr; @@ -602,8 +592,8 @@ Value *AArch64TTI::getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst, } } -bool AArch64TTI::getTgtMemIntrinsic(IntrinsicInst *Inst, - MemIntrinsicInfo &Info) const { +bool AArch64TTIImpl::getTgtMemIntrinsic(IntrinsicInst *Inst, + MemIntrinsicInfo &Info) { switch (Inst->getIntrinsicID()) { default: break; diff --git a/llvm/lib/Target/ARM/ARMTargetMachine.cpp b/llvm/lib/Target/ARM/ARMTargetMachine.cpp index 2041e68..3f91914 100644 --- a/llvm/lib/Target/ARM/ARMTargetMachine.cpp +++ b/llvm/lib/Target/ARM/ARMTargetMachine.cpp @@ -216,10 +216,6 @@ ARMBaseTargetMachine::getSubtargetImpl(const Function &F) const { } void ARMBaseTargetMachine::addAnalysisPasses(PassManagerBase &PM) { - // Add first the target-independent BasicTTI pass, then our ARM pass. This - // allows the ARM pass to delegate to the target independent layer when - // appropriate. - PM.add(createBasicTargetTransformInfoPass(this)); PM.add(createARMTargetTransformInfoPass(this)); } diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp index ec834e8..b03fa3a 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -17,6 +17,7 @@ #include "ARM.h" #include "ARMTargetMachine.h" #include "llvm/Analysis/TargetTransformInfo.h" +#include "llvm/CodeGen/BasicTTIImpl.h" #include "llvm/Support/Debug.h" #include "llvm/Target/CostTable.h" #include "llvm/Target/TargetLowering.h" @@ -24,57 +25,48 @@ using namespace llvm; #define DEBUG_TYPE "armtti" -// Declare the pass initialization routine locally as target-specific passes -// don't have a target-wide initialization entry point, and so we rely on the -// pass constructor initialization. -namespace llvm { -void initializeARMTTIPass(PassRegistry &); -} - namespace { -class ARMTTI final : public ImmutablePass, public TargetTransformInfo { - const ARMBaseTargetMachine *TM; +class ARMTTIImpl : public BasicTTIImplBase { + typedef BasicTTIImplBase BaseT; + typedef TargetTransformInfo TTI; + const ARMSubtarget *ST; const ARMTargetLowering *TLI; /// Estimate the overhead of scalarizing an instruction. Insert and Extract /// are set if the result needs to be inserted and/or extracted from vectors. - unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const; + unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract); public: - ARMTTI() : ImmutablePass(ID), TM(nullptr), ST(nullptr), TLI(nullptr) { - llvm_unreachable("This pass cannot be directly constructed"); - } - - ARMTTI(const ARMBaseTargetMachine *TM) - : ImmutablePass(ID), TM(TM), ST(TM->getSubtargetImpl()), - TLI(TM->getSubtargetImpl()->getTargetLowering()) { - initializeARMTTIPass(*PassRegistry::getPassRegistry()); - } - - void initializePass() override { - pushTTIStack(this); + explicit ARMTTIImpl(const ARMBaseTargetMachine *TM = nullptr) + : BaseT(TM), ST(TM ? TM->getSubtargetImpl() : nullptr), + TLI(ST ? ST->getTargetLowering() : nullptr) {} + + // Provide value semantics. MSVC requires that we spell all of these out. + ARMTTIImpl(const ARMTTIImpl &Arg) + : BaseT(static_cast(Arg)), ST(Arg.ST), TLI(Arg.TLI) {} + ARMTTIImpl(ARMTTIImpl &&Arg) + : BaseT(std::move(static_cast(Arg))), ST(std::move(Arg.ST)), + TLI(std::move(Arg.TLI)) {} + ARMTTIImpl &operator=(const ARMTTIImpl &RHS) { + BaseT::operator=(static_cast(RHS)); + ST = RHS.ST; + TLI = RHS.TLI; + return *this; } - - void getAnalysisUsage(AnalysisUsage &AU) const override { - TargetTransformInfo::getAnalysisUsage(AU); - } - - /// Pass identification. - static char ID; - - /// Provide necessary pointer adjustments for the two base classes. - void *getAdjustedAnalysisPointer(const void *ID) override { - if (ID == &TargetTransformInfo::ID) - return (TargetTransformInfo*)this; - return this; + ARMTTIImpl &operator=(ARMTTIImpl &&RHS) { + BaseT::operator=(std::move(static_cast(RHS))); + ST = std::move(RHS.ST); + TLI = std::move(RHS.TLI); + return *this; } /// \name Scalar TTI Implementations /// @{ - using TargetTransformInfo::getIntImmCost; - unsigned getIntImmCost(const APInt &Imm, Type *Ty) const override; + + using BaseT::getIntImmCost; + unsigned getIntImmCost(const APInt &Imm, Type *Ty); /// @} @@ -82,7 +74,7 @@ public: /// \name Vector TTI Implementations /// @{ - unsigned getNumberOfRegisters(bool Vector) const override { + unsigned getNumberOfRegisters(bool Vector) { if (Vector) { if (ST->hasNEON()) return 16; @@ -94,7 +86,7 @@ public: return 13; } - unsigned getRegisterBitWidth(bool Vector) const override { + unsigned getRegisterBitWidth(bool Vector) { if (Vector) { if (ST->hasNEON()) return 128; @@ -104,52 +96,45 @@ public: return 32; } - unsigned getMaxInterleaveFactor() const override { + unsigned getMaxInterleaveFactor() { // These are out of order CPUs: if (ST->isCortexA15() || ST->isSwift()) return 2; return 1; } - unsigned getShuffleCost(ShuffleKind Kind, Type *Tp, - int Index, Type *SubTp) const override; + unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index, + Type *SubTp); - unsigned getCastInstrCost(unsigned Opcode, Type *Dst, - Type *Src) const override; + unsigned getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src); - unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, - Type *CondTy) const override; + unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy); - unsigned getVectorInstrCost(unsigned Opcode, Type *Val, - unsigned Index) const override; + unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index); - unsigned getAddressComputationCost(Type *Val, - bool IsComplex) const override; + unsigned getAddressComputationCost(Type *Val, bool IsComplex); unsigned getArithmeticInstrCost( - unsigned Opcode, Type *Ty, OperandValueKind Op1Info = OK_AnyValue, - OperandValueKind Op2Info = OK_AnyValue, - OperandValueProperties Opd1PropInfo = OP_None, - OperandValueProperties Opd2PropInfo = OP_None) const override; + unsigned Opcode, Type *Ty, + TTI::OperandValueKind Op1Info = TTI::OK_AnyValue, + TTI::OperandValueKind Op2Info = TTI::OK_AnyValue, + TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None, + TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None); unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, - unsigned AddressSpace) const override; + unsigned AddressSpace); + /// @} }; } // end anonymous namespace -INITIALIZE_AG_PASS(ARMTTI, TargetTransformInfo, "armtti", - "ARM Target Transform Info", true, true, false) -char ARMTTI::ID = 0; - ImmutablePass * llvm::createARMTargetTransformInfoPass(const ARMBaseTargetMachine *TM) { - return new ARMTTI(TM); + return new TargetTransformInfoWrapperPass(ARMTTIImpl(TM)); } - -unsigned ARMTTI::getIntImmCost(const APInt &Imm, Type *Ty) const { +unsigned ARMTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty) { assert(Ty->isIntegerTy()); unsigned Bits = Ty->getPrimitiveSizeInBits(); @@ -181,8 +166,7 @@ unsigned ARMTTI::getIntImmCost(const APInt &Imm, Type *Ty) const { return 3; } -unsigned ARMTTI::getCastInstrCost(unsigned Opcode, Type *Dst, - Type *Src) const { +unsigned ARMTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) { int ISD = TLI->InstructionOpcodeToISD(Opcode); assert(ISD && "Invalid opcode"); @@ -206,7 +190,7 @@ unsigned ARMTTI::getCastInstrCost(unsigned Opcode, Type *Dst, EVT DstTy = TLI->getValueType(Dst); if (!SrcTy.isSimple() || !DstTy.isSimple()) - return TargetTransformInfo::getCastInstrCost(Opcode, Dst, Src); + return BaseT::getCastInstrCost(Opcode, Dst, Src); // Some arithmetic, load and store operations have specific instructions // to cast up/down their types automatically at no extra cost. @@ -377,11 +361,11 @@ unsigned ARMTTI::getCastInstrCost(unsigned Opcode, Type *Dst, return ARMIntegerConversionTbl[Idx].Cost; } - return TargetTransformInfo::getCastInstrCost(Opcode, Dst, Src); + return BaseT::getCastInstrCost(Opcode, Dst, Src); } -unsigned ARMTTI::getVectorInstrCost(unsigned Opcode, Type *ValTy, - unsigned Index) const { +unsigned ARMTTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy, + unsigned Index) { // Penalize inserting into an D-subregister. We end up with a three times // lower estimated throughput on swift. if (ST->isSwift() && @@ -397,11 +381,11 @@ unsigned ARMTTI::getVectorInstrCost(unsigned Opcode, Type *ValTy, ValTy->getVectorElementType()->isIntegerTy()) return 3; - return TargetTransformInfo::getVectorInstrCost(Opcode, ValTy, Index); + return BaseT::getVectorInstrCost(Opcode, ValTy, Index); } -unsigned ARMTTI::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, - Type *CondTy) const { +unsigned ARMTTIImpl::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, + Type *CondTy) { int ISD = TLI->InstructionOpcodeToISD(Opcode); // On NEON a a vector select gets lowered to vbsl. @@ -431,10 +415,10 @@ unsigned ARMTTI::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, return LT.first; } - return TargetTransformInfo::getCmpSelInstrCost(Opcode, ValTy, CondTy); + return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy); } -unsigned ARMTTI::getAddressComputationCost(Type *Ty, bool IsComplex) const { +unsigned ARMTTIImpl::getAddressComputationCost(Type *Ty, bool IsComplex) { // Address computations in vectorized code with non-consecutive addresses will // likely result in more instructions compared to scalar code where the // computation can more often be merged into the index mode. The resulting @@ -449,13 +433,13 @@ unsigned ARMTTI::getAddressComputationCost(Type *Ty, bool IsComplex) const { return 1; } -unsigned ARMTTI::getShuffleCost(ShuffleKind Kind, Type *Tp, int Index, - Type *SubTp) const { +unsigned ARMTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index, + Type *SubTp) { // We only handle costs of reverse and alternate shuffles for now. - if (Kind != SK_Reverse && Kind != SK_Alternate) - return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp); + if (Kind != TTI::SK_Reverse && Kind != TTI::SK_Alternate) + return BaseT::getShuffleCost(Kind, Tp, Index, SubTp); - if (Kind == SK_Reverse) { + if (Kind == TTI::SK_Reverse) { static const CostTblEntry NEONShuffleTbl[] = { // Reverse shuffle cost one instruction if we are shuffling within a // double word (vrev) or two if we shuffle a quad word (vrev, vext). @@ -473,11 +457,11 @@ unsigned ARMTTI::getShuffleCost(ShuffleKind Kind, Type *Tp, int Index, int Idx = CostTableLookup(NEONShuffleTbl, ISD::VECTOR_SHUFFLE, LT.second); if (Idx == -1) - return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp); + return BaseT::getShuffleCost(Kind, Tp, Index, SubTp); return LT.first * NEONShuffleTbl[Idx].Cost; } - if (Kind == SK_Alternate) { + if (Kind == TTI::SK_Alternate) { static const CostTblEntry NEONAltShuffleTbl[] = { // Alt shuffle cost table for ARM. Cost is the number of instructions // required to create the shuffled vector. @@ -499,16 +483,16 @@ unsigned ARMTTI::getShuffleCost(ShuffleKind Kind, Type *Tp, int Index, int Idx = CostTableLookup(NEONAltShuffleTbl, ISD::VECTOR_SHUFFLE, LT.second); if (Idx == -1) - return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp); + return BaseT::getShuffleCost(Kind, Tp, Index, SubTp); return LT.first * NEONAltShuffleTbl[Idx].Cost; } - return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp); + return BaseT::getShuffleCost(Kind, Tp, Index, SubTp); } -unsigned ARMTTI::getArithmeticInstrCost( - unsigned Opcode, Type *Ty, OperandValueKind Op1Info, - OperandValueKind Op2Info, OperandValueProperties Opd1PropInfo, - OperandValueProperties Opd2PropInfo) const { +unsigned ARMTTIImpl::getArithmeticInstrCost( + unsigned Opcode, Type *Ty, TTI::OperandValueKind Op1Info, + TTI::OperandValueKind Op2Info, TTI::OperandValueProperties Opd1PropInfo, + TTI::OperandValueProperties Opd2PropInfo) { int ISDOpcode = TLI->InstructionOpcodeToISD(Opcode); std::pair LT = TLI->getTypeLegalizationCost(Ty); @@ -564,8 +548,8 @@ unsigned ARMTTI::getArithmeticInstrCost( if (Idx != -1) return LT.first * CostTbl[Idx].Cost; - unsigned Cost = TargetTransformInfo::getArithmeticInstrCost( - Opcode, Ty, Op1Info, Op2Info, Opd1PropInfo, Opd2PropInfo); + unsigned Cost = BaseT::getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info, + Opd1PropInfo, Opd2PropInfo); // This is somewhat of a hack. The problem that we are facing is that SROA // creates a sequence of shift, and, or instructions to construct values. @@ -581,8 +565,9 @@ unsigned ARMTTI::getArithmeticInstrCost( return Cost; } -unsigned ARMTTI::getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, - unsigned AddressSpace) const { +unsigned ARMTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src, + unsigned Alignment, + unsigned AddressSpace) { std::pair LT = TLI->getTypeLegalizationCost(Src); if (Src->isVectorTy() && Alignment != 16 && diff --git a/llvm/lib/Target/Mips/MipsTargetMachine.cpp b/llvm/lib/Target/Mips/MipsTargetMachine.cpp index 1fc64b5..1a5d21e 100644 --- a/llvm/lib/Target/Mips/MipsTargetMachine.cpp +++ b/llvm/lib/Target/Mips/MipsTargetMachine.cpp @@ -245,7 +245,7 @@ void MipsTargetMachine::addAnalysisPasses(PassManagerBase &PM) { // pass needs to become a function pass instead of // being an immutable pass and then this method as it exists now // would be unnecessary. - PM.add(createNoTargetTransformInfoPass()); + PM.add(createNoTargetTransformInfoPass(getDataLayout())); } else LLVMTargetMachine::addAnalysisPasses(PM); DEBUG(errs() << "Target Transform Info Pass Added\n"); diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp index 9083b41..0b74372 100644 --- a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp @@ -137,10 +137,6 @@ TargetPassConfig *NVPTXTargetMachine::createPassConfig(PassManagerBase &PM) { } void NVPTXTargetMachine::addAnalysisPasses(PassManagerBase &PM) { - // Add first the target-independent BasicTTI pass, then our NVPTX pass. This - // allows the NVPTX pass to delegate to the target independent layer when - // appropriate. - PM.add(createBasicTargetTransformInfoPass(this)); PM.add(createNVPTXTargetTransformInfoPass(this)); } diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp index b09d0d4..c7a03c7 100644 --- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp @@ -19,6 +19,7 @@ #include "llvm/Analysis/LoopInfo.h" #include "llvm/Analysis/TargetTransformInfo.h" #include "llvm/Analysis/ValueTracking.h" +#include "llvm/CodeGen/BasicTTIImpl.h" #include "llvm/Support/Debug.h" #include "llvm/Target/CostTable.h" #include "llvm/Target/TargetLowering.h" @@ -26,69 +27,56 @@ using namespace llvm; #define DEBUG_TYPE "NVPTXtti" -// Declare the pass initialization routine locally as target-specific passes -// don't have a target-wide initialization entry point, and so we rely on the -// pass constructor initialization. -namespace llvm { -void initializeNVPTXTTIPass(PassRegistry &); -} - namespace { -class NVPTXTTI final : public ImmutablePass, public TargetTransformInfo { - const NVPTXTargetLowering *TLI; -public: - NVPTXTTI() : ImmutablePass(ID), TLI(nullptr) { - llvm_unreachable("This pass cannot be directly constructed"); - } - - NVPTXTTI(const NVPTXTargetMachine *TM) - : ImmutablePass(ID), TLI(TM->getSubtargetImpl()->getTargetLowering()) { - initializeNVPTXTTIPass(*PassRegistry::getPassRegistry()); - } +class NVPTXTTIImpl : public BasicTTIImplBase { + typedef BasicTTIImplBase BaseT; + typedef TargetTransformInfo TTI; - void initializePass() override { pushTTIStack(this); } + const NVPTXTargetLowering *TLI; - void getAnalysisUsage(AnalysisUsage &AU) const override { - TargetTransformInfo::getAnalysisUsage(AU); +public: + explicit NVPTXTTIImpl(const NVPTXTargetMachine *TM = nullptr) + : BaseT(TM), + TLI(TM ? TM->getSubtargetImpl()->getTargetLowering() : nullptr) {} + + // Provide value semantics. MSVC requires that we spell all of these out. + NVPTXTTIImpl(const NVPTXTTIImpl &Arg) + : BaseT(static_cast(Arg)), TLI(Arg.TLI) {} + NVPTXTTIImpl(NVPTXTTIImpl &&Arg) + : BaseT(std::move(static_cast(Arg))), TLI(std::move(Arg.TLI)) {} + NVPTXTTIImpl &operator=(const NVPTXTTIImpl &RHS) { + BaseT::operator=(static_cast(RHS)); + TLI = RHS.TLI; + return *this; } - - /// Pass identification. - static char ID; - - /// Provide necessary pointer adjustments for the two base classes. - void *getAdjustedAnalysisPointer(const void *ID) override { - if (ID == &TargetTransformInfo::ID) - return (TargetTransformInfo *)this; - return this; + NVPTXTTIImpl &operator=(NVPTXTTIImpl &&RHS) { + BaseT::operator=(std::move(static_cast(RHS))); + TLI = std::move(RHS.TLI); + return *this; } - bool hasBranchDivergence() const override; + bool hasBranchDivergence() { return true; } unsigned getArithmeticInstrCost( - unsigned Opcode, Type *Ty, OperandValueKind Opd1Info = OK_AnyValue, - OperandValueKind Opd2Info = OK_AnyValue, - OperandValueProperties Opd1PropInfo = OP_None, - OperandValueProperties Opd2PropInfo = OP_None) const override; + unsigned Opcode, Type *Ty, + TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue, + TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue, + TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None, + TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None); }; } // end anonymous namespace -INITIALIZE_AG_PASS(NVPTXTTI, TargetTransformInfo, "NVPTXtti", - "NVPTX Target Transform Info", true, true, false) -char NVPTXTTI::ID = 0; - ImmutablePass * llvm::createNVPTXTargetTransformInfoPass(const NVPTXTargetMachine *TM) { - return new NVPTXTTI(TM); + return new TargetTransformInfoWrapperPass(NVPTXTTIImpl(TM)); } -bool NVPTXTTI::hasBranchDivergence() const { return true; } - -unsigned NVPTXTTI::getArithmeticInstrCost( - unsigned Opcode, Type *Ty, OperandValueKind Opd1Info, - OperandValueKind Opd2Info, OperandValueProperties Opd1PropInfo, - OperandValueProperties Opd2PropInfo) const { +unsigned NVPTXTTIImpl::getArithmeticInstrCost( + unsigned Opcode, Type *Ty, TTI::OperandValueKind Opd1Info, + TTI::OperandValueKind Opd2Info, TTI::OperandValueProperties Opd1PropInfo, + TTI::OperandValueProperties Opd2PropInfo) { // Legalize the type. std::pair LT = TLI->getTypeLegalizationCost(Ty); @@ -96,8 +84,8 @@ unsigned NVPTXTTI::getArithmeticInstrCost( switch (ISD) { default: - return TargetTransformInfo::getArithmeticInstrCost( - Opcode, Ty, Opd1Info, Opd2Info, Opd1PropInfo, Opd2PropInfo); + return BaseT::getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info, + Opd1PropInfo, Opd2PropInfo); case ISD::ADD: case ISD::MUL: case ISD::XOR: @@ -109,7 +97,7 @@ unsigned NVPTXTTI::getArithmeticInstrCost( if (LT.second.SimpleTy == MVT::i64) return 2 * LT.first; // Delegate other cases to the basic TTI. - return TargetTransformInfo::getArithmeticInstrCost( - Opcode, Ty, Opd1Info, Opd2Info, Opd1PropInfo, Opd2PropInfo); + return BaseT::getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info, + Opd1PropInfo, Opd2PropInfo); } } diff --git a/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp b/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp index cbcaaef..8121d7f 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp +++ b/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp @@ -275,9 +275,5 @@ void PPCPassConfig::addPreEmitPass() { } void PPCTargetMachine::addAnalysisPasses(PassManagerBase &PM) { - // Add first the target-independent BasicTTI pass, then our PPC pass. This - // allows the PPC pass to delegate to the target independent layer when - // appropriate. - PM.add(createBasicTargetTransformInfoPass(this)); PM.add(createPPCTargetTransformInfoPass(this)); } diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp index a7bb252..6bdee25 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp +++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp @@ -17,6 +17,7 @@ #include "PPC.h" #include "PPCTargetMachine.h" #include "llvm/Analysis/TargetTransformInfo.h" +#include "llvm/CodeGen/BasicTTIImpl.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/Debug.h" #include "llvm/Target/CostTable.h" @@ -28,96 +29,83 @@ using namespace llvm; static cl::opt DisablePPCConstHoist("disable-ppc-constant-hoisting", cl::desc("disable constant hoisting on PPC"), cl::init(false), cl::Hidden); -// Declare the pass initialization routine locally as target-specific passes -// don't have a target-wide initialization entry point, and so we rely on the -// pass constructor initialization. -namespace llvm { -void initializePPCTTIPass(PassRegistry &); -} - namespace { -class PPCTTI final : public ImmutablePass, public TargetTransformInfo { - const TargetMachine *TM; +class PPCTTIImpl : public BasicTTIImplBase { + typedef BasicTTIImplBase BaseT; + typedef TargetTransformInfo TTI; + const PPCSubtarget *ST; const PPCTargetLowering *TLI; public: - PPCTTI() : ImmutablePass(ID), ST(nullptr), TLI(nullptr) { - llvm_unreachable("This pass cannot be directly constructed"); - } - - PPCTTI(const PPCTargetMachine *TM) - : ImmutablePass(ID), TM(TM), ST(TM->getSubtargetImpl()), - TLI(TM->getSubtargetImpl()->getTargetLowering()) { - initializePPCTTIPass(*PassRegistry::getPassRegistry()); - } - - void initializePass() override { - pushTTIStack(this); + explicit PPCTTIImpl(const PPCTargetMachine *TM = nullptr) + : BaseT(TM), ST(TM->getSubtargetImpl()), TLI(ST->getTargetLowering()) {} + + // Provide value semantics. MSVC requires that we spell all of these out. + PPCTTIImpl(const PPCTTIImpl &Arg) + : BaseT(static_cast(Arg)), ST(Arg.ST), TLI(Arg.TLI) {} + PPCTTIImpl(PPCTTIImpl &&Arg) + : BaseT(std::move(static_cast(Arg))), ST(std::move(Arg.ST)), + TLI(std::move(Arg.TLI)) {} + PPCTTIImpl &operator=(const PPCTTIImpl &RHS) { + BaseT::operator=(static_cast(RHS)); + ST = RHS.ST; + TLI = RHS.TLI; + return *this; } - - void getAnalysisUsage(AnalysisUsage &AU) const override { - TargetTransformInfo::getAnalysisUsage(AU); - } - - /// Pass identification. - static char ID; - - /// Provide necessary pointer adjustments for the two base classes. - void *getAdjustedAnalysisPointer(const void *ID) override { - if (ID == &TargetTransformInfo::ID) - return (TargetTransformInfo*)this; - return this; + PPCTTIImpl &operator=(PPCTTIImpl &&RHS) { + BaseT::operator=(std::move(static_cast(RHS))); + ST = std::move(RHS.ST); + TLI = std::move(RHS.TLI); + return *this; } /// \name Scalar TTI Implementations /// @{ - unsigned getIntImmCost(const APInt &Imm, Type *Ty) const override; + + using BaseT::getIntImmCost; + unsigned getIntImmCost(const APInt &Imm, Type *Ty); unsigned getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, - Type *Ty) const override; + Type *Ty); unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm, - Type *Ty) const override; + Type *Ty); - PopcntSupportKind getPopcntSupport(unsigned TyWidth) const override; + TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth); void getUnrollingPreferences(const Function *F, Loop *L, - UnrollingPreferences &UP) const override; + TTI::UnrollingPreferences &UP); /// @} /// \name Vector TTI Implementations /// @{ - unsigned getNumberOfRegisters(bool Vector) const override; - unsigned getRegisterBitWidth(bool Vector) const override; - unsigned getMaxInterleaveFactor() const override; - unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind, - OperandValueKind, OperandValueProperties, - OperandValueProperties) const override; - unsigned getShuffleCost(ShuffleKind Kind, Type *Tp, - int Index, Type *SubTp) const override; - unsigned getCastInstrCost(unsigned Opcode, Type *Dst, - Type *Src) const override; - unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, - Type *CondTy) const override; - unsigned getVectorInstrCost(unsigned Opcode, Type *Val, - unsigned Index) const override; + unsigned getNumberOfRegisters(bool Vector); + unsigned getRegisterBitWidth(bool Vector); + unsigned getMaxInterleaveFactor(); + unsigned getArithmeticInstrCost( + unsigned Opcode, Type *Ty, + TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue, + TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue, + TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None, + TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None); + unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index, + Type *SubTp); + unsigned getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src); + unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy); + unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index); unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, - unsigned AddressSpace) const override; + unsigned AddressSpace); /// @} }; } // end anonymous namespace -INITIALIZE_AG_PASS(PPCTTI, TargetTransformInfo, "ppctti", - "PPC Target Transform Info", true, true, false) -char PPCTTI::ID = 0; - ImmutablePass * llvm::createPPCTargetTransformInfoPass(const PPCTargetMachine *TM) { - return new PPCTTI(TM); + return new TargetTransformInfoWrapperPass(PPCTTIImpl(TM)); } @@ -127,16 +115,17 @@ llvm::createPPCTargetTransformInfoPass(const PPCTargetMachine *TM) { // //===----------------------------------------------------------------------===// -PPCTTI::PopcntSupportKind PPCTTI::getPopcntSupport(unsigned TyWidth) const { +TargetTransformInfo::PopcntSupportKind +PPCTTIImpl::getPopcntSupport(unsigned TyWidth) { assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2"); if (ST->hasPOPCNTD() && TyWidth <= 64) - return PSK_FastHardware; - return PSK_Software; + return TTI::PSK_FastHardware; + return TTI::PSK_Software; } -unsigned PPCTTI::getIntImmCost(const APInt &Imm, Type *Ty) const { +unsigned PPCTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty) { if (DisablePPCConstHoist) - return TargetTransformInfo::getIntImmCost(Imm, Ty); + return BaseT::getIntImmCost(Imm, Ty); assert(Ty->isIntegerTy()); @@ -145,28 +134,28 @@ unsigned PPCTTI::getIntImmCost(const APInt &Imm, Type *Ty) const { return ~0U; if (Imm == 0) - return TCC_Free; + return TTI::TCC_Free; if (Imm.getBitWidth() <= 64) { if (isInt<16>(Imm.getSExtValue())) - return TCC_Basic; + return TTI::TCC_Basic; if (isInt<32>(Imm.getSExtValue())) { // A constant that can be materialized using lis. if ((Imm.getZExtValue() & 0xFFFF) == 0) - return TCC_Basic; + return TTI::TCC_Basic; - return 2 * TCC_Basic; + return 2 * TTI::TCC_Basic; } } - return 4 * TCC_Basic; + return 4 * TTI::TCC_Basic; } -unsigned PPCTTI::getIntImmCost(Intrinsic::ID IID, unsigned Idx, - const APInt &Imm, Type *Ty) const { +unsigned PPCTTIImpl::getIntImmCost(Intrinsic::ID IID, unsigned Idx, + const APInt &Imm, Type *Ty) { if (DisablePPCConstHoist) - return TargetTransformInfo::getIntImmCost(IID, Idx, Imm, Ty); + return BaseT::getIntImmCost(IID, Idx, Imm, Ty); assert(Ty->isIntegerTy()); @@ -175,31 +164,32 @@ unsigned PPCTTI::getIntImmCost(Intrinsic::ID IID, unsigned Idx, return ~0U; switch (IID) { - default: return TCC_Free; + default: + return TTI::TCC_Free; case Intrinsic::sadd_with_overflow: case Intrinsic::uadd_with_overflow: case Intrinsic::ssub_with_overflow: case Intrinsic::usub_with_overflow: if ((Idx == 1) && Imm.getBitWidth() <= 64 && isInt<16>(Imm.getSExtValue())) - return TCC_Free; + return TTI::TCC_Free; break; case Intrinsic::experimental_stackmap: if ((Idx < 2) || (Imm.getBitWidth() <= 64 && isInt<64>(Imm.getSExtValue()))) - return TCC_Free; + return TTI::TCC_Free; break; case Intrinsic::experimental_patchpoint_void: case Intrinsic::experimental_patchpoint_i64: if ((Idx < 4) || (Imm.getBitWidth() <= 64 && isInt<64>(Imm.getSExtValue()))) - return TCC_Free; + return TTI::TCC_Free; break; } - return PPCTTI::getIntImmCost(Imm, Ty); + return PPCTTIImpl::getIntImmCost(Imm, Ty); } -unsigned PPCTTI::getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, - Type *Ty) const { +unsigned PPCTTIImpl::getIntImmCost(unsigned Opcode, unsigned Idx, + const APInt &Imm, Type *Ty) { if (DisablePPCConstHoist) - return TargetTransformInfo::getIntImmCost(Opcode, Idx, Imm, Ty); + return BaseT::getIntImmCost(Opcode, Idx, Imm, Ty); assert(Ty->isIntegerTy()); @@ -211,14 +201,15 @@ unsigned PPCTTI::getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, bool ShiftedFree = false, RunFree = false, UnsignedFree = false, ZeroFree = false; switch (Opcode) { - default: return TCC_Free; + default: + return TTI::TCC_Free; case Instruction::GetElementPtr: // Always hoist the base address of a GetElementPtr. This prevents the // creation of new constants for every base constant that gets constant // folded with the offset. if (Idx == 0) - return 2 * TCC_Basic; - return TCC_Free; + return 2 * TTI::TCC_Basic; + return TTI::TCC_Free; case Instruction::And: RunFree = true; // (for the rotate-and-mask instructions) // Fallthrough... @@ -250,53 +241,52 @@ unsigned PPCTTI::getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, } if (ZeroFree && Imm == 0) - return TCC_Free; + return TTI::TCC_Free; if (Idx == ImmIdx && Imm.getBitWidth() <= 64) { if (isInt<16>(Imm.getSExtValue())) - return TCC_Free; + return TTI::TCC_Free; if (RunFree) { if (Imm.getBitWidth() <= 32 && (isShiftedMask_32(Imm.getZExtValue()) || isShiftedMask_32(~Imm.getZExtValue()))) - return TCC_Free; - + return TTI::TCC_Free; if (ST->isPPC64() && (isShiftedMask_64(Imm.getZExtValue()) || isShiftedMask_64(~Imm.getZExtValue()))) - return TCC_Free; + return TTI::TCC_Free; } if (UnsignedFree && isUInt<16>(Imm.getZExtValue())) - return TCC_Free; + return TTI::TCC_Free; if (ShiftedFree && (Imm.getZExtValue() & 0xFFFF) == 0) - return TCC_Free; + return TTI::TCC_Free; } - return PPCTTI::getIntImmCost(Imm, Ty); + return PPCTTIImpl::getIntImmCost(Imm, Ty); } -void PPCTTI::getUnrollingPreferences(const Function *F, Loop *L, - UnrollingPreferences &UP) const { +void PPCTTIImpl::getUnrollingPreferences(const Function *F, Loop *L, + TTI::UnrollingPreferences &UP) { if (TM->getSubtarget(F).getDarwinDirective() == PPC::DIR_A2) { // The A2 is in-order with a deep pipeline, and concatenation unrolling // helps expose latency-hiding opportunities to the instruction scheduler. UP.Partial = UP.Runtime = true; } - TargetTransformInfo::getUnrollingPreferences(F, L, UP); + BaseT::getUnrollingPreferences(F, L, UP); } -unsigned PPCTTI::getNumberOfRegisters(bool Vector) const { +unsigned PPCTTIImpl::getNumberOfRegisters(bool Vector) { if (Vector && !ST->hasAltivec()) return 0; return ST->hasVSX() ? 64 : 32; } -unsigned PPCTTI::getRegisterBitWidth(bool Vector) const { +unsigned PPCTTIImpl::getRegisterBitWidth(bool Vector) { if (Vector) { if (ST->hasAltivec()) return 128; return 0; @@ -308,7 +298,7 @@ unsigned PPCTTI::getRegisterBitWidth(bool Vector) const { } -unsigned PPCTTI::getMaxInterleaveFactor() const { +unsigned PPCTTIImpl::getMaxInterleaveFactor() { unsigned Directive = ST->getDarwinDirective(); // The 440 has no SIMD support, but floating-point instructions // have a 5-cycle latency, so unroll by 5x for latency hiding. @@ -329,35 +319,35 @@ unsigned PPCTTI::getMaxInterleaveFactor() const { return 2; } -unsigned PPCTTI::getArithmeticInstrCost( - unsigned Opcode, Type *Ty, OperandValueKind Op1Info, - OperandValueKind Op2Info, OperandValueProperties Opd1PropInfo, - OperandValueProperties Opd2PropInfo) const { +unsigned PPCTTIImpl::getArithmeticInstrCost( + unsigned Opcode, Type *Ty, TTI::OperandValueKind Op1Info, + TTI::OperandValueKind Op2Info, TTI::OperandValueProperties Opd1PropInfo, + TTI::OperandValueProperties Opd2PropInfo) { assert(TLI->InstructionOpcodeToISD(Opcode) && "Invalid opcode"); // Fallback to the default implementation. - return TargetTransformInfo::getArithmeticInstrCost( - Opcode, Ty, Op1Info, Op2Info, Opd1PropInfo, Opd2PropInfo); + return BaseT::getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info, + Opd1PropInfo, Opd2PropInfo); } -unsigned PPCTTI::getShuffleCost(ShuffleKind Kind, Type *Tp, int Index, - Type *SubTp) const { - return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp); +unsigned PPCTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index, + Type *SubTp) { + return BaseT::getShuffleCost(Kind, Tp, Index, SubTp); } -unsigned PPCTTI::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) const { +unsigned PPCTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) { assert(TLI->InstructionOpcodeToISD(Opcode) && "Invalid opcode"); - return TargetTransformInfo::getCastInstrCost(Opcode, Dst, Src); + return BaseT::getCastInstrCost(Opcode, Dst, Src); } -unsigned PPCTTI::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, - Type *CondTy) const { - return TargetTransformInfo::getCmpSelInstrCost(Opcode, ValTy, CondTy); +unsigned PPCTTIImpl::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, + Type *CondTy) { + return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy); } -unsigned PPCTTI::getVectorInstrCost(unsigned Opcode, Type *Val, - unsigned Index) const { +unsigned PPCTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, + unsigned Index) { assert(Val->isVectorTy() && "This must be a vector type"); int ISD = TLI->InstructionOpcodeToISD(Opcode); @@ -368,7 +358,7 @@ unsigned PPCTTI::getVectorInstrCost(unsigned Opcode, Type *Val, if (Index == 0) return 0; - return TargetTransformInfo::getVectorInstrCost(Opcode, Val, Index); + return BaseT::getVectorInstrCost(Opcode, Val, Index); } // Estimated cost of a load-hit-store delay. This was obtained @@ -385,21 +375,20 @@ unsigned PPCTTI::getVectorInstrCost(unsigned Opcode, Type *Val, // these need to be estimated as very costly. if (ISD == ISD::EXTRACT_VECTOR_ELT || ISD == ISD::INSERT_VECTOR_ELT) - return LHSPenalty + - TargetTransformInfo::getVectorInstrCost(Opcode, Val, Index); + return LHSPenalty + BaseT::getVectorInstrCost(Opcode, Val, Index); - return TargetTransformInfo::getVectorInstrCost(Opcode, Val, Index); + return BaseT::getVectorInstrCost(Opcode, Val, Index); } -unsigned PPCTTI::getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, - unsigned AddressSpace) const { +unsigned PPCTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src, + unsigned Alignment, + unsigned AddressSpace) { // Legalize the type. std::pair LT = TLI->getTypeLegalizationCost(Src); assert((Opcode == Instruction::Load || Opcode == Instruction::Store) && "Invalid Opcode"); - unsigned Cost = - TargetTransformInfo::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace); + unsigned Cost = BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace); // VSX loads/stores support unaligned access. if (ST->hasVSX()) { diff --git a/llvm/lib/Target/R600/AMDGPUTargetMachine.cpp b/llvm/lib/Target/R600/AMDGPUTargetMachine.cpp index a37748e..5c4ef1c 100644 --- a/llvm/lib/Target/R600/AMDGPUTargetMachine.cpp +++ b/llvm/lib/Target/R600/AMDGPUTargetMachine.cpp @@ -120,10 +120,6 @@ TargetPassConfig *AMDGPUTargetMachine::createPassConfig(PassManagerBase &PM) { //===----------------------------------------------------------------------===// void AMDGPUTargetMachine::addAnalysisPasses(PassManagerBase &PM) { - // Add first the target-independent BasicTTI pass, then our AMDGPU pass. This - // allows the AMDGPU pass to delegate to the target independent layer when - // appropriate. - PM.add(createBasicTargetTransformInfoPass(this)); PM.add(createAMDGPUTargetTransformInfoPass(this)); } diff --git a/llvm/lib/Target/R600/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/R600/AMDGPUTargetTransformInfo.cpp index e7bc006..132765a 100644 --- a/llvm/lib/Target/R600/AMDGPUTargetTransformInfo.cpp +++ b/llvm/lib/Target/R600/AMDGPUTargetTransformInfo.cpp @@ -20,6 +20,7 @@ #include "llvm/Analysis/LoopInfo.h" #include "llvm/Analysis/TargetTransformInfo.h" #include "llvm/Analysis/ValueTracking.h" +#include "llvm/CodeGen/BasicTTIImpl.h" #include "llvm/Support/Debug.h" #include "llvm/Target/CostTable.h" #include "llvm/Target/TargetLowering.h" @@ -27,78 +28,58 @@ using namespace llvm; #define DEBUG_TYPE "AMDGPUtti" -// Declare the pass initialization routine locally as target-specific passes -// don't have a target-wide initialization entry point, and so we rely on the -// pass constructor initialization. -namespace llvm { -void initializeAMDGPUTTIPass(PassRegistry &); -} - namespace { -class AMDGPUTTI final : public ImmutablePass, public TargetTransformInfo { - const AMDGPUTargetMachine *TM; - const AMDGPUSubtarget *ST; - const AMDGPUTargetLowering *TLI; +class AMDGPUTTIImpl : public BasicTTIImplBase { + typedef BasicTTIImplBase BaseT; + typedef TargetTransformInfo TTI; - /// Estimate the overhead of scalarizing an instruction. Insert and Extract - /// are set if the result needs to be inserted and/or extracted from vectors. - unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const; + const AMDGPUSubtarget *ST; public: - AMDGPUTTI() : ImmutablePass(ID), TM(nullptr), ST(nullptr), TLI(nullptr) { - llvm_unreachable("This pass cannot be directly constructed"); + explicit AMDGPUTTIImpl(const AMDGPUTargetMachine *TM = nullptr) + : BaseT(TM), ST(TM->getSubtargetImpl()) {} + + // Provide value semantics. MSVC requires that we spell all of these out. + AMDGPUTTIImpl(const AMDGPUTTIImpl &Arg) + : BaseT(static_cast(Arg)), ST(Arg.ST) {} + AMDGPUTTIImpl(AMDGPUTTIImpl &&Arg) + : BaseT(std::move(static_cast(Arg))), ST(std::move(Arg.ST)) {} + AMDGPUTTIImpl &operator=(const AMDGPUTTIImpl &RHS) { + BaseT::operator=(static_cast(RHS)); + ST = RHS.ST; + return *this; } - - AMDGPUTTI(const AMDGPUTargetMachine *TM) - : ImmutablePass(ID), TM(TM), ST(TM->getSubtargetImpl()), - TLI(TM->getSubtargetImpl()->getTargetLowering()) { - initializeAMDGPUTTIPass(*PassRegistry::getPassRegistry()); + AMDGPUTTIImpl &operator=(AMDGPUTTIImpl &&RHS) { + BaseT::operator=(std::move(static_cast(RHS))); + ST = std::move(RHS.ST); + return *this; } - void initializePass() override { pushTTIStack(this); } - - void getAnalysisUsage(AnalysisUsage &AU) const override { - TargetTransformInfo::getAnalysisUsage(AU); - } - - /// Pass identification. - static char ID; - - /// Provide necessary pointer adjustments for the two base classes. - void *getAdjustedAnalysisPointer(const void *ID) override { - if (ID == &TargetTransformInfo::ID) - return (TargetTransformInfo *)this; - return this; - } - - bool hasBranchDivergence() const override; + bool hasBranchDivergence() { return true; } void getUnrollingPreferences(const Function *F, Loop *L, - UnrollingPreferences &UP) const override; + TTI::UnrollingPreferences &UP); - PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) const override; + TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth) { + assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2"); + return ST->hasBCNT(TyWidth) ? TTI::PSK_FastHardware : TTI::PSK_Software; + } - unsigned getNumberOfRegisters(bool Vector) const override; - unsigned getRegisterBitWidth(bool Vector) const override; - unsigned getMaxInterleaveFactor() const override; + unsigned getNumberOfRegisters(bool Vector); + unsigned getRegisterBitWidth(bool Vector); + unsigned getMaxInterleaveFactor(); }; } // end anonymous namespace -INITIALIZE_AG_PASS(AMDGPUTTI, TargetTransformInfo, "AMDGPUtti", - "AMDGPU Target Transform Info", true, true, false) -char AMDGPUTTI::ID = 0; - ImmutablePass * llvm::createAMDGPUTargetTransformInfoPass(const AMDGPUTargetMachine *TM) { - return new AMDGPUTTI(TM); + return new TargetTransformInfoWrapperPass(AMDGPUTTIImpl(TM)); } -bool AMDGPUTTI::hasBranchDivergence() const { return true; } - -void AMDGPUTTI::getUnrollingPreferences(const Function *, Loop *L, - UnrollingPreferences &UP) const { +void AMDGPUTTIImpl::getUnrollingPreferences(const Function *, Loop *L, + TTI::UnrollingPreferences &UP) { UP.Threshold = 300; // Twice the default. UP.Count = UINT_MAX; UP.Partial = true; @@ -130,13 +111,7 @@ void AMDGPUTTI::getUnrollingPreferences(const Function *, Loop *L, } } -AMDGPUTTI::PopcntSupportKind -AMDGPUTTI::getPopcntSupport(unsigned TyWidth) const { - assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2"); - return ST->hasBCNT(TyWidth) ? PSK_FastHardware : PSK_Software; -} - -unsigned AMDGPUTTI::getNumberOfRegisters(bool Vec) const { +unsigned AMDGPUTTIImpl::getNumberOfRegisters(bool Vec) { if (Vec) return 0; @@ -147,11 +122,9 @@ unsigned AMDGPUTTI::getNumberOfRegisters(bool Vec) const { return 4 * 128; // XXX - 4 channels. Should these count as vector instead? } -unsigned AMDGPUTTI::getRegisterBitWidth(bool) const { - return 32; -} +unsigned AMDGPUTTIImpl::getRegisterBitWidth(bool) { return 32; } -unsigned AMDGPUTTI::getMaxInterleaveFactor() const { +unsigned AMDGPUTTIImpl::getMaxInterleaveFactor() { // Semi-arbitrary large amount. return 64; } diff --git a/llvm/lib/Target/Target.cpp b/llvm/lib/Target/Target.cpp index 352cdee..76de63c 100644 --- a/llvm/lib/Target/Target.cpp +++ b/llvm/lib/Target/Target.cpp @@ -36,6 +36,7 @@ inline LLVMTargetLibraryInfoRef wrap(const TargetLibraryInfoImpl *P) { void llvm::initializeTarget(PassRegistry &Registry) { initializeDataLayoutPassPass(Registry); initializeTargetLibraryInfoWrapperPassPass(Registry); + initializeTargetTransformInfoWrapperPassPass(Registry); } void LLVMInitializeTarget(LLVMPassRegistryRef R) { diff --git a/llvm/lib/Target/TargetMachine.cpp b/llvm/lib/Target/TargetMachine.cpp index b3ff001..2b683ad 100644 --- a/llvm/lib/Target/TargetMachine.cpp +++ b/llvm/lib/Target/TargetMachine.cpp @@ -12,6 +12,7 @@ //===----------------------------------------------------------------------===// #include "llvm/Target/TargetMachine.h" +#include "llvm/Analysis/TargetTransformInfo.h" #include "llvm/CodeGen/MachineFunction.h" #include "llvm/IR/Function.h" #include "llvm/IR/GlobalAlias.h" @@ -24,6 +25,7 @@ #include "llvm/MC/MCSectionMachO.h" #include "llvm/MC/MCTargetOptions.h" #include "llvm/MC/SectionKind.h" +#include "llvm/PassManager.h" #include "llvm/Support/CommandLine.h" #include "llvm/Target/TargetLowering.h" #include "llvm/Target/TargetLoweringObjectFile.h" @@ -170,6 +172,10 @@ void TargetMachine::setDataSections(bool V) { Options.DataSections = V; } +void TargetMachine::addAnalysisPasses(PassManagerBase &PM) { + PM.add(createNoTargetTransformInfoPass(getDataLayout())); +} + static bool canUsePrivateLabel(const MCAsmInfo &AsmInfo, const MCSection &Section) { if (!AsmInfo.isSectionAtomizableBySymbols(Section)) diff --git a/llvm/lib/Target/X86/X86TargetMachine.cpp b/llvm/lib/Target/X86/X86TargetMachine.cpp index 11f6fd1..5988b9a 100644 --- a/llvm/lib/Target/X86/X86TargetMachine.cpp +++ b/llvm/lib/Target/X86/X86TargetMachine.cpp @@ -165,10 +165,6 @@ UseVZeroUpper("x86-use-vzeroupper", cl::Hidden, //===----------------------------------------------------------------------===// void X86TargetMachine::addAnalysisPasses(PassManagerBase &PM) { - // Add first the target-independent BasicTTI pass, then our X86 pass. This - // allows the X86 pass to delegate to the target independent layer when - // appropriate. - PM.add(createBasicTargetTransformInfoPass(this)); PM.add(createX86TargetTransformInfoPass(this)); } diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp index 9d7f123..d792f93 100644 --- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp +++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp @@ -17,6 +17,7 @@ #include "X86.h" #include "X86TargetMachine.h" #include "llvm/Analysis/TargetTransformInfo.h" +#include "llvm/CodeGen/BasicTTIImpl.h" #include "llvm/IR/IntrinsicInst.h" #include "llvm/Support/Debug.h" #include "llvm/Target/CostTable.h" @@ -25,110 +26,92 @@ using namespace llvm; #define DEBUG_TYPE "x86tti" -// Declare the pass initialization routine locally as target-specific passes -// don't have a target-wide initialization entry point, and so we rely on the -// pass constructor initialization. -namespace llvm { -void initializeX86TTIPass(PassRegistry &); -} - namespace { -class X86TTI final : public ImmutablePass, public TargetTransformInfo { +class X86TTIImpl : public BasicTTIImplBase { + typedef BasicTTIImplBase BaseT; + typedef TargetTransformInfo TTI; + const X86Subtarget *ST; const X86TargetLowering *TLI; - /// Estimate the overhead of scalarizing an instruction. Insert and Extract - /// are set if the result needs to be inserted and/or extracted from vectors. - unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const; + unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract); public: - X86TTI() : ImmutablePass(ID), ST(nullptr), TLI(nullptr) { - llvm_unreachable("This pass cannot be directly constructed"); - } - - X86TTI(const X86TargetMachine *TM) - : ImmutablePass(ID), ST(TM->getSubtargetImpl()), - TLI(TM->getSubtargetImpl()->getTargetLowering()) { - initializeX86TTIPass(*PassRegistry::getPassRegistry()); - } - - void initializePass() override { - pushTTIStack(this); + explicit X86TTIImpl(const X86TargetMachine *TM = nullptr) + : BaseT(TM), ST(TM ? TM->getSubtargetImpl() : nullptr), + TLI(ST ? ST->getTargetLowering() : nullptr) {} + + // Provide value semantics. MSVC requires that we spell all of these out. + X86TTIImpl(const X86TTIImpl &Arg) + : BaseT(static_cast(Arg)), ST(Arg.ST), TLI(Arg.TLI) {} + X86TTIImpl(X86TTIImpl &&Arg) + : BaseT(std::move(static_cast(Arg))), ST(std::move(Arg.ST)), + TLI(std::move(Arg.TLI)) {} + X86TTIImpl &operator=(const X86TTIImpl &RHS) { + BaseT::operator=(static_cast(RHS)); + ST = RHS.ST; + TLI = RHS.TLI; + return *this; } - - void getAnalysisUsage(AnalysisUsage &AU) const override { - TargetTransformInfo::getAnalysisUsage(AU); - } - - /// Pass identification. - static char ID; - - /// Provide necessary pointer adjustments for the two base classes. - void *getAdjustedAnalysisPointer(const void *ID) override { - if (ID == &TargetTransformInfo::ID) - return (TargetTransformInfo*)this; - return this; + X86TTIImpl &operator=(X86TTIImpl &&RHS) { + BaseT::operator=(std::move(static_cast(RHS))); + ST = std::move(RHS.ST); + TLI = std::move(RHS.TLI); + return *this; } /// \name Scalar TTI Implementations /// @{ - PopcntSupportKind getPopcntSupport(unsigned TyWidth) const override; + TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth); /// @} /// \name Vector TTI Implementations /// @{ - unsigned getNumberOfRegisters(bool Vector) const override; - unsigned getRegisterBitWidth(bool Vector) const override; - unsigned getMaxInterleaveFactor() const override; - unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind, - OperandValueKind, OperandValueProperties, - OperandValueProperties) const override; - unsigned getShuffleCost(ShuffleKind Kind, Type *Tp, - int Index, Type *SubTp) const override; - unsigned getCastInstrCost(unsigned Opcode, Type *Dst, - Type *Src) const override; - unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, - Type *CondTy) const override; - unsigned getVectorInstrCost(unsigned Opcode, Type *Val, - unsigned Index) const override; + unsigned getNumberOfRegisters(bool Vector); + unsigned getRegisterBitWidth(bool Vector); + unsigned getMaxInterleaveFactor(); + unsigned getArithmeticInstrCost( + unsigned Opcode, Type *Ty, + TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue, + TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue, + TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None, + TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None); + unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index, + Type *SubTp); + unsigned getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src); + unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy); + unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index); unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, - unsigned AddressSpace) const override; - unsigned getMaskedMemoryOpCost(unsigned Opcode, Type *Src, - unsigned Alignment, - unsigned AddressSpace) const override; + unsigned AddressSpace); + unsigned getMaskedMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, + unsigned AddressSpace); - unsigned getAddressComputationCost(Type *PtrTy, - bool IsComplex) const override; + unsigned getAddressComputationCost(Type *PtrTy, bool IsComplex); - unsigned getReductionCost(unsigned Opcode, Type *Ty, - bool IsPairwiseForm) const override; + unsigned getReductionCost(unsigned Opcode, Type *Ty, bool IsPairwiseForm); - unsigned getIntImmCost(int64_t) const; + unsigned getIntImmCost(int64_t); - unsigned getIntImmCost(const APInt &Imm, Type *Ty) const override; + unsigned getIntImmCost(const APInt &Imm, Type *Ty); unsigned getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, - Type *Ty) const override; + Type *Ty); unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm, - Type *Ty) const override; - bool isLegalMaskedLoad (Type *DataType, int Consecutive) const override; - bool isLegalMaskedStore(Type *DataType, int Consecutive) const override; + Type *Ty); + bool isLegalMaskedLoad(Type *DataType, int Consecutive); + bool isLegalMaskedStore(Type *DataType, int Consecutive); /// @} }; } // end anonymous namespace -INITIALIZE_AG_PASS(X86TTI, TargetTransformInfo, "x86tti", - "X86 Target Transform Info", true, true, false) -char X86TTI::ID = 0; - ImmutablePass * llvm::createX86TargetTransformInfoPass(const X86TargetMachine *TM) { - return new X86TTI(TM); + return new TargetTransformInfoWrapperPass(X86TTIImpl(TM)); } @@ -138,15 +121,16 @@ llvm::createX86TargetTransformInfoPass(const X86TargetMachine *TM) { // //===----------------------------------------------------------------------===// -X86TTI::PopcntSupportKind X86TTI::getPopcntSupport(unsigned TyWidth) const { +TargetTransformInfo::PopcntSupportKind +X86TTIImpl::getPopcntSupport(unsigned TyWidth) { assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2"); // TODO: Currently the __builtin_popcount() implementation using SSE3 // instructions is inefficient. Once the problem is fixed, we should // call ST->hasSSE3() instead of ST->hasPOPCNT(). - return ST->hasPOPCNT() ? PSK_FastHardware : PSK_Software; + return ST->hasPOPCNT() ? TTI::PSK_FastHardware : TTI::PSK_Software; } -unsigned X86TTI::getNumberOfRegisters(bool Vector) const { +unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) { if (Vector && !ST->hasSSE1()) return 0; @@ -158,7 +142,7 @@ unsigned X86TTI::getNumberOfRegisters(bool Vector) const { return 8; } -unsigned X86TTI::getRegisterBitWidth(bool Vector) const { +unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) { if (Vector) { if (ST->hasAVX512()) return 512; if (ST->hasAVX()) return 256; @@ -172,7 +156,7 @@ unsigned X86TTI::getRegisterBitWidth(bool Vector) const { } -unsigned X86TTI::getMaxInterleaveFactor() const { +unsigned X86TTIImpl::getMaxInterleaveFactor() { if (ST->isAtom()) return 1; @@ -184,10 +168,10 @@ unsigned X86TTI::getMaxInterleaveFactor() const { return 2; } -unsigned X86TTI::getArithmeticInstrCost( - unsigned Opcode, Type *Ty, OperandValueKind Op1Info, - OperandValueKind Op2Info, OperandValueProperties Opd1PropInfo, - OperandValueProperties Opd2PropInfo) const { +unsigned X86TTIImpl::getArithmeticInstrCost( + unsigned Opcode, Type *Ty, TTI::OperandValueKind Op1Info, + TTI::OperandValueKind Op2Info, TTI::OperandValueProperties Opd1PropInfo, + TTI::OperandValueProperties Opd2PropInfo) { // Legalize the type. std::pair LT = TLI->getTypeLegalizationCost(Ty); @@ -442,17 +426,16 @@ unsigned X86TTI::getArithmeticInstrCost( return LT.first * 6; // Fallback to the default implementation. - return TargetTransformInfo::getArithmeticInstrCost(Opcode, Ty, Op1Info, - Op2Info); + return BaseT::getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info); } -unsigned X86TTI::getShuffleCost(ShuffleKind Kind, Type *Tp, int Index, - Type *SubTp) const { +unsigned X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index, + Type *SubTp) { // We only estimate the cost of reverse and alternate shuffles. - if (Kind != SK_Reverse && Kind != SK_Alternate) - return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp); + if (Kind != TTI::SK_Reverse && Kind != TTI::SK_Alternate) + return BaseT::getShuffleCost(Kind, Tp, Index, SubTp); - if (Kind == SK_Reverse) { + if (Kind == TTI::SK_Reverse) { std::pair LT = TLI->getTypeLegalizationCost(Tp); unsigned Cost = 1; if (LT.second.getSizeInBits() > 128) @@ -462,7 +445,7 @@ unsigned X86TTI::getShuffleCost(ShuffleKind Kind, Type *Tp, int Index, return Cost * LT.first; } - if (Kind == SK_Alternate) { + if (Kind == TTI::SK_Alternate) { // 64-bit packed float vectors (v2f32) are widened to type v4f32. // 64-bit packed integer vectors (v2i32) are promoted to type v2i64. std::pair LT = TLI->getTypeLegalizationCost(Tp); @@ -555,13 +538,13 @@ unsigned X86TTI::getShuffleCost(ShuffleKind Kind, Type *Tp, int Index, int Idx = CostTableLookup(SSEAltShuffleTbl, ISD::VECTOR_SHUFFLE, LT.second); if (Idx != -1) return LT.first * SSEAltShuffleTbl[Idx].Cost; - return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp); + return BaseT::getShuffleCost(Kind, Tp, Index, SubTp); } - return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp); + return BaseT::getShuffleCost(Kind, Tp, Index, SubTp); } -unsigned X86TTI::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) const { +unsigned X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) { int ISD = TLI->InstructionOpcodeToISD(Opcode); assert(ISD && "Invalid opcode"); @@ -643,7 +626,7 @@ unsigned X86TTI::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) const { // The function getSimpleVT only handles simple value types. if (!SrcTy.isSimple() || !DstTy.isSimple()) - return TargetTransformInfo::getCastInstrCost(Opcode, Dst, Src); + return BaseT::getCastInstrCost(Opcode, Dst, Src); static const TypeConversionCostTblEntry AVX2ConversionTbl[] = { @@ -762,11 +745,11 @@ unsigned X86TTI::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) const { return AVXConversionTbl[Idx].Cost; } - return TargetTransformInfo::getCastInstrCost(Opcode, Dst, Src); + return BaseT::getCastInstrCost(Opcode, Dst, Src); } -unsigned X86TTI::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, - Type *CondTy) const { +unsigned X86TTIImpl::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, + Type *CondTy) { // Legalize the type. std::pair LT = TLI->getTypeLegalizationCost(ValTy); @@ -832,11 +815,11 @@ unsigned X86TTI::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, return LT.first * SSE42CostTbl[Idx].Cost; } - return TargetTransformInfo::getCmpSelInstrCost(Opcode, ValTy, CondTy); + return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy); } -unsigned X86TTI::getVectorInstrCost(unsigned Opcode, Type *Val, - unsigned Index) const { +unsigned X86TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, + unsigned Index) { assert(Val->isVectorTy() && "This must be a vector type"); if (Index != -1U) { @@ -856,26 +839,27 @@ unsigned X86TTI::getVectorInstrCost(unsigned Opcode, Type *Val, return 0; } - return TargetTransformInfo::getVectorInstrCost(Opcode, Val, Index); + return BaseT::getVectorInstrCost(Opcode, Val, Index); } -unsigned X86TTI::getScalarizationOverhead(Type *Ty, bool Insert, - bool Extract) const { +unsigned X86TTIImpl::getScalarizationOverhead(Type *Ty, bool Insert, + bool Extract) { assert (Ty->isVectorTy() && "Can only scalarize vectors"); unsigned Cost = 0; for (int i = 0, e = Ty->getVectorNumElements(); i < e; ++i) { if (Insert) - Cost += TopTTI->getVectorInstrCost(Instruction::InsertElement, Ty, i); + Cost += getVectorInstrCost(Instruction::InsertElement, Ty, i); if (Extract) - Cost += TopTTI->getVectorInstrCost(Instruction::ExtractElement, Ty, i); + Cost += getVectorInstrCost(Instruction::ExtractElement, Ty, i); } return Cost; } -unsigned X86TTI::getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, - unsigned AddressSpace) const { +unsigned X86TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src, + unsigned Alignment, + unsigned AddressSpace) { // Handle non-power-of-two vectors such as <3 x float> if (VectorType *VTy = dyn_cast(Src)) { unsigned NumElem = VTy->getVectorNumElements(); @@ -893,10 +877,8 @@ unsigned X86TTI::getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, // Assume that all other non-power-of-two numbers are scalarized. if (!isPowerOf2_32(NumElem)) { - unsigned Cost = TargetTransformInfo::getMemoryOpCost(Opcode, - VTy->getScalarType(), - Alignment, - AddressSpace); + unsigned Cost = BaseT::getMemoryOpCost(Opcode, VTy->getScalarType(), + Alignment, AddressSpace); unsigned SplitCost = getScalarizationOverhead(Src, Opcode == Instruction::Load, Opcode==Instruction::Store); @@ -920,9 +902,9 @@ unsigned X86TTI::getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment, return Cost; } -unsigned X86TTI::getMaskedMemoryOpCost(unsigned Opcode, Type *SrcTy, - unsigned Alignment, - unsigned AddressSpace) const { +unsigned X86TTIImpl::getMaskedMemoryOpCost(unsigned Opcode, Type *SrcTy, + unsigned Alignment, + unsigned AddressSpace) { VectorType *SrcVTy = dyn_cast(SrcTy); if (!SrcVTy) // To calculate scalar take the regular cost, without mask @@ -945,9 +927,9 @@ unsigned X86TTI::getMaskedMemoryOpCost(unsigned Opcode, Type *SrcTy, unsigned ValueSplitCost = getScalarizationOverhead(SrcVTy, Opcode == Instruction::Load, Opcode == Instruction::Store); - unsigned MemopCost = NumElem * - TargetTransformInfo::getMemoryOpCost(Opcode, SrcVTy->getScalarType(), - Alignment, AddressSpace); + unsigned MemopCost = + NumElem * BaseT::getMemoryOpCost(Opcode, SrcVTy->getScalarType(), + Alignment, AddressSpace); return MemopCost + ValueSplitCost + MaskSplitCost + MaskCmpCost; } @@ -957,15 +939,14 @@ unsigned X86TTI::getMaskedMemoryOpCost(unsigned Opcode, Type *SrcTy, if (LT.second != TLI->getValueType(SrcVTy).getSimpleVT() && LT.second.getVectorNumElements() == NumElem) // Promotion requires expand/truncate for data and a shuffle for mask. - Cost += getShuffleCost(TargetTransformInfo::SK_Alternate, SrcVTy, 0, 0) + - getShuffleCost(TargetTransformInfo::SK_Alternate, MaskTy, 0, 0); - + Cost += getShuffleCost(TTI::SK_Alternate, SrcVTy, 0, 0) + + getShuffleCost(TTI::SK_Alternate, MaskTy, 0, 0); + else if (LT.second.getVectorNumElements() > NumElem) { VectorType *NewMaskTy = VectorType::get(MaskTy->getVectorElementType(), LT.second.getVectorNumElements()); // Expanding requires fill mask with zeroes - Cost += getShuffleCost(TargetTransformInfo::SK_InsertSubvector, - NewMaskTy, 0, MaskTy); + Cost += getShuffleCost(TTI::SK_InsertSubvector, NewMaskTy, 0, MaskTy); } if (!ST->hasAVX512()) return Cost + LT.first*4; // Each maskmov costs 4 @@ -974,7 +955,7 @@ unsigned X86TTI::getMaskedMemoryOpCost(unsigned Opcode, Type *SrcTy, return Cost+LT.first; } -unsigned X86TTI::getAddressComputationCost(Type *Ty, bool IsComplex) const { +unsigned X86TTIImpl::getAddressComputationCost(Type *Ty, bool IsComplex) { // Address computations in vectorized code with non-consecutive addresses will // likely result in more instructions compared to scalar code where the // computation can more often be merged into the index mode. The resulting @@ -984,11 +965,11 @@ unsigned X86TTI::getAddressComputationCost(Type *Ty, bool IsComplex) const { if (Ty->isVectorTy() && IsComplex) return NumVectorInstToHideOverhead; - return TargetTransformInfo::getAddressComputationCost(Ty, IsComplex); + return BaseT::getAddressComputationCost(Ty, IsComplex); } -unsigned X86TTI::getReductionCost(unsigned Opcode, Type *ValTy, - bool IsPairwise) const { +unsigned X86TTIImpl::getReductionCost(unsigned Opcode, Type *ValTy, + bool IsPairwise) { std::pair LT = TLI->getTypeLegalizationCost(ValTy); @@ -1064,23 +1045,23 @@ unsigned X86TTI::getReductionCost(unsigned Opcode, Type *ValTy, } } - return TargetTransformInfo::getReductionCost(Opcode, ValTy, IsPairwise); + return BaseT::getReductionCost(Opcode, ValTy, IsPairwise); } /// \brief Calculate the cost of materializing a 64-bit value. This helper /// method might only calculate a fraction of a larger immediate. Therefore it /// is valid to return a cost of ZERO. -unsigned X86TTI::getIntImmCost(int64_t Val) const { +unsigned X86TTIImpl::getIntImmCost(int64_t Val) { if (Val == 0) - return TCC_Free; + return TTI::TCC_Free; if (isInt<32>(Val)) - return TCC_Basic; + return TTI::TCC_Basic; - return 2 * TCC_Basic; + return 2 * TTI::TCC_Basic; } -unsigned X86TTI::getIntImmCost(const APInt &Imm, Type *Ty) const { +unsigned X86TTIImpl::getIntImmCost(const APInt &Imm, Type *Ty) { assert(Ty->isIntegerTy()); unsigned BitSize = Ty->getPrimitiveSizeInBits(); @@ -1092,10 +1073,10 @@ unsigned X86TTI::getIntImmCost(const APInt &Imm, Type *Ty) const { // Fixme: Create a cost model for types larger than i128 once the codegen // issues have been fixed. if (BitSize > 128) - return TCC_Free; + return TTI::TCC_Free; if (Imm == 0) - return TCC_Free; + return TTI::TCC_Free; // Sign-extend all constants to a multiple of 64-bit. APInt ImmVal = Imm; @@ -1114,26 +1095,27 @@ unsigned X86TTI::getIntImmCost(const APInt &Imm, Type *Ty) const { return std::max(1U, Cost); } -unsigned X86TTI::getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, - Type *Ty) const { +unsigned X86TTIImpl::getIntImmCost(unsigned Opcode, unsigned Idx, + const APInt &Imm, Type *Ty) { assert(Ty->isIntegerTy()); unsigned BitSize = Ty->getPrimitiveSizeInBits(); // There is no cost model for constants with a bit size of 0. Return TCC_Free // here, so that constant hoisting will ignore this constant. if (BitSize == 0) - return TCC_Free; + return TTI::TCC_Free; unsigned ImmIdx = ~0U; switch (Opcode) { - default: return TCC_Free; + default: + return TTI::TCC_Free; case Instruction::GetElementPtr: // Always hoist the base address of a GetElementPtr. This prevents the // creation of new constants for every base constant that gets constant // folded with the offset. if (Idx == 0) - return 2 * TCC_Basic; - return TCC_Free; + return 2 * TTI::TCC_Basic; + return TTI::TCC_Free; case Instruction::Store: ImmIdx = 0; break; @@ -1155,7 +1137,7 @@ unsigned X86TTI::getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, case Instruction::LShr: case Instruction::AShr: if (Idx == 1) - return TCC_Free; + return TTI::TCC_Free; break; case Instruction::Trunc: case Instruction::ZExt: @@ -1173,27 +1155,28 @@ unsigned X86TTI::getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, if (Idx == ImmIdx) { unsigned NumConstants = (BitSize + 63) / 64; - unsigned Cost = X86TTI::getIntImmCost(Imm, Ty); - return (Cost <= NumConstants * TCC_Basic) - ? static_cast(TCC_Free) - : Cost; + unsigned Cost = X86TTIImpl::getIntImmCost(Imm, Ty); + return (Cost <= NumConstants * TTI::TCC_Basic) + ? static_cast(TTI::TCC_Free) + : Cost; } - return X86TTI::getIntImmCost(Imm, Ty); + return X86TTIImpl::getIntImmCost(Imm, Ty); } -unsigned X86TTI::getIntImmCost(Intrinsic::ID IID, unsigned Idx, - const APInt &Imm, Type *Ty) const { +unsigned X86TTIImpl::getIntImmCost(Intrinsic::ID IID, unsigned Idx, + const APInt &Imm, Type *Ty) { assert(Ty->isIntegerTy()); unsigned BitSize = Ty->getPrimitiveSizeInBits(); // There is no cost model for constants with a bit size of 0. Return TCC_Free // here, so that constant hoisting will ignore this constant. if (BitSize == 0) - return TCC_Free; + return TTI::TCC_Free; switch (IID) { - default: return TCC_Free; + default: + return TTI::TCC_Free; case Intrinsic::sadd_with_overflow: case Intrinsic::uadd_with_overflow: case Intrinsic::ssub_with_overflow: @@ -1201,22 +1184,22 @@ unsigned X86TTI::getIntImmCost(Intrinsic::ID IID, unsigned Idx, case Intrinsic::smul_with_overflow: case Intrinsic::umul_with_overflow: if ((Idx == 1) && Imm.getBitWidth() <= 64 && isInt<32>(Imm.getSExtValue())) - return TCC_Free; + return TTI::TCC_Free; break; case Intrinsic::experimental_stackmap: if ((Idx < 2) || (Imm.getBitWidth() <= 64 && isInt<64>(Imm.getSExtValue()))) - return TCC_Free; + return TTI::TCC_Free; break; case Intrinsic::experimental_patchpoint_void: case Intrinsic::experimental_patchpoint_i64: if ((Idx < 4) || (Imm.getBitWidth() <= 64 && isInt<64>(Imm.getSExtValue()))) - return TCC_Free; + return TTI::TCC_Free; break; } - return X86TTI::getIntImmCost(Imm, Ty); + return X86TTIImpl::getIntImmCost(Imm, Ty); } -bool X86TTI::isLegalMaskedLoad(Type *DataTy, int Consecutive) const { +bool X86TTIImpl::isLegalMaskedLoad(Type *DataTy, int Consecutive) { int DataWidth = DataTy->getPrimitiveSizeInBits(); // Todo: AVX512 allows gather/scatter, works with strided and random as well @@ -1227,7 +1210,7 @@ bool X86TTI::isLegalMaskedLoad(Type *DataTy, int Consecutive) const { return false; } -bool X86TTI::isLegalMaskedStore(Type *DataType, int Consecutive) const { +bool X86TTIImpl::isLegalMaskedStore(Type *DataType, int Consecutive) { return isLegalMaskedLoad(DataType, Consecutive); } diff --git a/llvm/lib/Target/XCore/XCoreTargetMachine.cpp b/llvm/lib/Target/XCore/XCoreTargetMachine.cpp index b6cd027..82df1c9 100644 --- a/llvm/lib/Target/XCore/XCoreTargetMachine.cpp +++ b/llvm/lib/Target/XCore/XCoreTargetMachine.cpp @@ -83,9 +83,5 @@ extern "C" void LLVMInitializeXCoreTarget() { } void XCoreTargetMachine::addAnalysisPasses(PassManagerBase &PM) { - // Add first the target-independent BasicTTI pass, then our XCore pass. This - // allows the XCore pass to delegate to the target independent layer when - // appropriate. - PM.add(createBasicTargetTransformInfoPass(this)); PM.add(createXCoreTargetTransformInfoPass(this)); } diff --git a/llvm/lib/Target/XCore/XCoreTargetTransformInfo.cpp b/llvm/lib/Target/XCore/XCoreTargetTransformInfo.cpp index da232da..d2b152f 100644 --- a/llvm/lib/Target/XCore/XCoreTargetTransformInfo.cpp +++ b/llvm/lib/Target/XCore/XCoreTargetTransformInfo.cpp @@ -15,7 +15,9 @@ //===----------------------------------------------------------------------===// #include "XCore.h" +#include "XCoreTargetMachine.h" #include "llvm/Analysis/TargetTransformInfo.h" +#include "llvm/CodeGen/BasicTTIImpl.h" #include "llvm/Support/Debug.h" #include "llvm/Target/CostTable.h" #include "llvm/Target/TargetLowering.h" @@ -23,43 +25,30 @@ using namespace llvm; #define DEBUG_TYPE "xcoretti" -// Declare the pass initialization routine locally as target-specific passes -// don't have a target-wide initialization entry point, and so we rely on the -// pass constructor initialization. -namespace llvm { -void initializeXCoreTTIPass(PassRegistry &); -} - namespace { -class XCoreTTI final : public ImmutablePass, public TargetTransformInfo { -public: - XCoreTTI() : ImmutablePass(ID) { - llvm_unreachable("This pass cannot be directly constructed"); - } - - XCoreTTI(const XCoreTargetMachine *TM) - : ImmutablePass(ID) { - initializeXCoreTTIPass(*PassRegistry::getPassRegistry()); - } +class XCoreTTIImpl : public BasicTTIImplBase { + typedef BasicTTIImplBase BaseT; + typedef TargetTransformInfo TTI; - void initializePass() override { - pushTTIStack(this); - } +public: + explicit XCoreTTIImpl(const XCoreTargetMachine *TM = nullptr) : BaseT(TM) {} - void getAnalysisUsage(AnalysisUsage &AU) const override { - TargetTransformInfo::getAnalysisUsage(AU); + // Provide value semantics. MSVC requires that we spell all of these out. + XCoreTTIImpl(const XCoreTTIImpl &Arg) + : BaseT(static_cast(Arg)) {} + XCoreTTIImpl(XCoreTTIImpl &&Arg) + : BaseT(std::move(static_cast(Arg))) {} + XCoreTTIImpl &operator=(const XCoreTTIImpl &RHS) { + BaseT::operator=(static_cast(RHS)); + return *this; } - - static char ID; - - void *getAdjustedAnalysisPointer(const void *ID) override { - if (ID == &TargetTransformInfo::ID) - return (TargetTransformInfo*)this; - return this; + XCoreTTIImpl &operator=(XCoreTTIImpl &&RHS) { + BaseT::operator=(std::move(static_cast(RHS))); + return *this; } - unsigned getNumberOfRegisters(bool Vector) const override { + unsigned getNumberOfRegisters(bool Vector) { if (Vector) { return 0; } @@ -69,12 +58,7 @@ public: } // end anonymous namespace -INITIALIZE_AG_PASS(XCoreTTI, TargetTransformInfo, "xcoretti", - "XCore Target Transform Info", true, true, false) -char XCoreTTI::ID = 0; - - ImmutablePass * llvm::createXCoreTargetTransformInfoPass(const XCoreTargetMachine *TM) { - return new XCoreTTI(TM); + return new TargetTransformInfoWrapperPass(XCoreTTIImpl(TM)); } diff --git a/llvm/lib/Transforms/Scalar/ConstantHoisting.cpp b/llvm/lib/Transforms/Scalar/ConstantHoisting.cpp index 27c177a..0a02701 100644 --- a/llvm/lib/Transforms/Scalar/ConstantHoisting.cpp +++ b/llvm/lib/Transforms/Scalar/ConstantHoisting.cpp @@ -131,14 +131,14 @@ public: void getAnalysisUsage(AnalysisUsage &AU) const override { AU.setPreservesCFG(); AU.addRequired(); - AU.addRequired(); + AU.addRequired(); } private: /// \brief Initialize the pass. void setup(Function &Fn) { DT = &getAnalysis().getDomTree(); - TTI = &getAnalysis(); + TTI = &getAnalysis().getTTI(); Entry = &Fn.getEntryBlock(); } @@ -176,7 +176,7 @@ char ConstantHoisting::ID = 0; INITIALIZE_PASS_BEGIN(ConstantHoisting, "consthoist", "Constant Hoisting", false, false) INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_PASS_END(ConstantHoisting, "consthoist", "Constant Hoisting", false, false) diff --git a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp index a6dbf39..11e347f 100644 --- a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp +++ b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp @@ -712,7 +712,7 @@ public: DataLayoutPass *DLP = getAnalysisIfAvailable(); auto *DL = DLP ? &DLP->getDataLayout() : nullptr; auto &TLI = getAnalysis().getTLI(); - auto &TTI = getAnalysis(); + auto &TTI = getAnalysis().getTTI(); auto &DT = getAnalysis().getDomTree(); auto &AC = getAnalysis().getAssumptionCache(F); @@ -725,7 +725,7 @@ public: AU.addRequired(); AU.addRequired(); AU.addRequired(); - AU.addRequired(); + AU.addRequired(); AU.setPreservesCFG(); } }; @@ -737,7 +737,7 @@ FunctionPass *llvm::createEarlyCSEPass() { return new EarlyCSELegacyPass(); } INITIALIZE_PASS_BEGIN(EarlyCSELegacyPass, "early-cse", "Early CSE", false, false) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker) INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass) INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass) diff --git a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp index df88f3f..1c055f6 100644 --- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp +++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp @@ -1936,7 +1936,8 @@ bool IndVarSimplify::runOnLoop(Loop *L, LPPassManager &LPM) { DL = DLP ? &DLP->getDataLayout() : nullptr; auto *TLIP = getAnalysisIfAvailable(); TLI = TLIP ? &TLIP->getTLI() : nullptr; - TTI = getAnalysisIfAvailable(); + auto *TTIP = getAnalysisIfAvailable(); + TTI = TTIP ? &TTIP->getTTI() : nullptr; DeadInsts.clear(); Changed = false; diff --git a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp index f9df43e..78d8c1c 100644 --- a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp +++ b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp @@ -176,7 +176,7 @@ namespace { AU.addPreserved(); AU.addRequired(); AU.addRequired(); - AU.addRequired(); + AU.addRequired(); } const DataLayout *getDataLayout() { @@ -204,7 +204,8 @@ namespace { } const TargetTransformInfo *getTargetTransformInfo() { - return TTI ? TTI : (TTI = &getAnalysis()); + return TTI ? TTI : (TTI = &getAnalysis() + .getTTI()); } Loop *getLoop() const { return CurLoop; } @@ -225,7 +226,7 @@ INITIALIZE_PASS_DEPENDENCY(LCSSA) INITIALIZE_PASS_DEPENDENCY(ScalarEvolution) INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass) INITIALIZE_AG_DEPENDENCY(AliasAnalysis) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_PASS_END(LoopIdiomRecognize, "loop-idiom", "Recognize loop idioms", false, false) diff --git a/llvm/lib/Transforms/Scalar/LoopRotation.cpp b/llvm/lib/Transforms/Scalar/LoopRotation.cpp index 02fb80c..541afa5 100644 --- a/llvm/lib/Transforms/Scalar/LoopRotation.cpp +++ b/llvm/lib/Transforms/Scalar/LoopRotation.cpp @@ -63,7 +63,7 @@ namespace { AU.addRequiredID(LCSSAID); AU.addPreservedID(LCSSAID); AU.addPreserved(); - AU.addRequired(); + AU.addRequired(); } bool runOnLoop(Loop *L, LPPassManager &LPM) override; @@ -81,7 +81,7 @@ namespace { char LoopRotate::ID = 0; INITIALIZE_PASS_BEGIN(LoopRotate, "loop-rotate", "Rotate Loops", false, false) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker) INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass) INITIALIZE_PASS_DEPENDENCY(LoopSimplify) @@ -102,7 +102,7 @@ bool LoopRotate::runOnLoop(Loop *L, LPPassManager &LPM) { MDNode *LoopMD = L->getLoopID(); LI = &getAnalysis().getLoopInfo(); - TTI = &getAnalysis(); + TTI = &getAnalysis().getTTI(); AC = &getAnalysis().getAssumptionCache( *L->getHeader()->getParent()); auto *DTWP = getAnalysisIfAvailable(); diff --git a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp index fdd4ff5..8325333 100644 --- a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp +++ b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp @@ -4866,8 +4866,8 @@ LSRInstance::LSRInstance(Loop *L, Pass *P) : IU(P->getAnalysis()), SE(P->getAnalysis()), DT(P->getAnalysis().getDomTree()), LI(P->getAnalysis().getLoopInfo()), - TTI(P->getAnalysis()), L(L), Changed(false), - IVIncInsertPos(nullptr) { + TTI(P->getAnalysis().getTTI()), L(L), + Changed(false), IVIncInsertPos(nullptr) { // If LoopSimplify form is not available, stay out of trouble. if (!L->isLoopSimplifyForm()) return; @@ -5043,7 +5043,7 @@ private: char LoopStrengthReduce::ID = 0; INITIALIZE_PASS_BEGIN(LoopStrengthReduce, "loop-reduce", "Loop Strength Reduction", false, false) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass) INITIALIZE_PASS_DEPENDENCY(ScalarEvolution) INITIALIZE_PASS_DEPENDENCY(IVUsers) @@ -5078,7 +5078,7 @@ void LoopStrengthReduce::getAnalysisUsage(AnalysisUsage &AU) const { AU.addRequiredID(LoopSimplifyID); AU.addRequired(); AU.addPreserved(); - AU.addRequired(); + AU.addRequired(); } bool LoopStrengthReduce::runOnLoop(Loop *L, LPPassManager & /*LPM*/) { @@ -5100,7 +5100,7 @@ bool LoopStrengthReduce::runOnLoop(Loop *L, LPPassManager & /*LPM*/) { #endif unsigned numFolded = Rewriter.replaceCongruentIVs( L, &getAnalysis().getDomTree(), DeadInsts, - &getAnalysis()); + &getAnalysis().getTTI()); if (numFolded) { Changed = true; DeleteTriviallyDeadInstructions(DeadInsts); diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp index 036200b..0a6bd84 100644 --- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp +++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp @@ -113,7 +113,7 @@ namespace { AU.addPreservedID(LCSSAID); AU.addRequired(); AU.addPreserved(); - AU.addRequired(); + AU.addRequired(); AU.addRequired(); // FIXME: Loop unroll requires LCSSA. And LCSSA requires dom info. // If loop unroll does not preserve dom info then LCSSA pass on next @@ -185,7 +185,7 @@ namespace { char LoopUnroll::ID = 0; INITIALIZE_PASS_BEGIN(LoopUnroll, "loop-unroll", "Unroll loops", false, false) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker) INITIALIZE_PASS_DEPENDENCY(FunctionTargetTransformInfo) INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass) @@ -365,7 +365,8 @@ bool LoopUnroll::runOnLoop(Loop *L, LPPassManager &LPM) { LoopInfo *LI = &getAnalysis().getLoopInfo(); ScalarEvolution *SE = &getAnalysis(); - const TargetTransformInfo &TTI = getAnalysis(); + const TargetTransformInfo &TTI = + getAnalysis().getTTI(); const FunctionTargetTransformInfo &FTTI = getAnalysis(); auto &AC = getAnalysis().getAssumptionCache( diff --git a/llvm/lib/Transforms/Scalar/LoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/LoopUnswitch.cpp index c78462f..9d18615 100644 --- a/llvm/lib/Transforms/Scalar/LoopUnswitch.cpp +++ b/llvm/lib/Transforms/Scalar/LoopUnswitch.cpp @@ -176,7 +176,7 @@ namespace { AU.addPreservedID(LCSSAID); AU.addPreserved(); AU.addPreserved(); - AU.addRequired(); + AU.addRequired(); } private: @@ -333,7 +333,7 @@ void LUAnalysisCache::cloneData(const Loop *NewLoop, const Loop *OldLoop, char LoopUnswitch::ID = 0; INITIALIZE_PASS_BEGIN(LoopUnswitch, "loop-unswitch", "Unswitch loops", false, false) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker) INITIALIZE_PASS_DEPENDENCY(LoopSimplify) INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass) @@ -432,8 +432,9 @@ bool LoopUnswitch::processCurrentLoop() { // Probably we reach the quota of branches for this loop. If so // stop unswitching. - if (!BranchesInfo.countLoop(currentLoop, getAnalysis(), - AC)) + if (!BranchesInfo.countLoop( + currentLoop, getAnalysis().getTTI(), + AC)) return false; // Loop over all of the basic blocks in the loop. If we find an interior diff --git a/llvm/lib/Transforms/Scalar/PartiallyInlineLibCalls.cpp b/llvm/lib/Transforms/Scalar/PartiallyInlineLibCalls.cpp index d6adfbe..da50fdf 100644 --- a/llvm/lib/Transforms/Scalar/PartiallyInlineLibCalls.cpp +++ b/llvm/lib/Transforms/Scalar/PartiallyInlineLibCalls.cpp @@ -53,7 +53,7 @@ INITIALIZE_PASS(PartiallyInlineLibCalls, "partially-inline-libcalls", void PartiallyInlineLibCalls::getAnalysisUsage(AnalysisUsage &AU) const { AU.addRequired(); - AU.addRequired(); + AU.addRequired(); FunctionPass::getAnalysisUsage(AU); } @@ -62,7 +62,8 @@ bool PartiallyInlineLibCalls::runOnFunction(Function &F) { Function::iterator CurrBB; TargetLibraryInfo *TLI = &getAnalysis().getTLI(); - const TargetTransformInfo *TTI = &getAnalysis(); + const TargetTransformInfo *TTI = + &getAnalysis().getTTI(); for (Function::iterator BB = F.begin(), BE = F.end(); BB != BE;) { CurrBB = BB++; diff --git a/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp b/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp index 0c70d0f..34994d2 100644 --- a/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp +++ b/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp @@ -313,7 +313,7 @@ class SeparateConstOffsetFromGEP : public FunctionPass { void getAnalysisUsage(AnalysisUsage &AU) const override { AU.addRequired(); - AU.addRequired(); + AU.addRequired(); } bool doInitialization(Module &M) override { @@ -384,7 +384,7 @@ INITIALIZE_PASS_BEGIN( SeparateConstOffsetFromGEP, "separate-const-offset-from-gep", "Split GEPs to a variadic base and a constant offset for better CSE", false, false) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_PASS_DEPENDENCY(DataLayoutPass) INITIALIZE_PASS_END( SeparateConstOffsetFromGEP, "separate-const-offset-from-gep", @@ -857,7 +857,8 @@ bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) { // of variable indices. Therefore, we don't check for addressing modes in that // case. if (!LowerGEP) { - TargetTransformInfo &TTI = getAnalysis(); + TargetTransformInfo &TTI = + getAnalysis().getTTI(); if (!TTI.isLegalAddressingMode(GEP->getType()->getElementType(), /*BaseGV=*/nullptr, AccumulativeByteOffset, /*HasBaseReg=*/true, /*Scale=*/0)) { diff --git a/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp b/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp index 2e317f9..0a4bc87 100644 --- a/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp +++ b/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp @@ -59,7 +59,7 @@ struct CFGSimplifyPass : public FunctionPass { void getAnalysisUsage(AnalysisUsage &AU) const override { AU.addRequired(); - AU.addRequired(); + AU.addRequired(); } }; } @@ -67,7 +67,7 @@ struct CFGSimplifyPass : public FunctionPass { char CFGSimplifyPass::ID = 0; INITIALIZE_PASS_BEGIN(CFGSimplifyPass, "simplifycfg", "Simplify the CFG", false, false) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker) INITIALIZE_PASS_END(CFGSimplifyPass, "simplifycfg", "Simplify the CFG", false, false) @@ -185,7 +185,8 @@ bool CFGSimplifyPass::runOnFunction(Function &F) { AssumptionCache *AC = &getAnalysis().getAssumptionCache(F); - const TargetTransformInfo &TTI = getAnalysis(); + const TargetTransformInfo &TTI = + getAnalysis().getTTI(); DataLayoutPass *DLP = getAnalysisIfAvailable(); const DataLayout *DL = DLP ? &DLP->getDataLayout() : nullptr; bool EverChanged = removeUnreachableBlocks(F); diff --git a/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp b/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp index f3c3e30..3fccd4f 100644 --- a/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp +++ b/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp @@ -126,7 +126,7 @@ namespace { char TailCallElim::ID = 0; INITIALIZE_PASS_BEGIN(TailCallElim, "tailcallelim", "Tail Call Elimination", false, false) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_PASS_END(TailCallElim, "tailcallelim", "Tail Call Elimination", false, false) @@ -136,7 +136,7 @@ FunctionPass *llvm::createTailCallEliminationPass() { } void TailCallElim::getAnalysisUsage(AnalysisUsage &AU) const { - AU.addRequired(); + AU.addRequired(); } /// \brief Scan the specified function for alloca instructions. @@ -386,7 +386,7 @@ bool TailCallElim::runTRE(Function &F) { // right, so don't even try to convert it... if (F.getFunctionType()->isVarArg()) return false; - TTI = &getAnalysis(); + TTI = &getAnalysis().getTTI(); BasicBlock *OldEntry = nullptr; bool TailCallsAreMarkedTail = false; SmallVector ArgumentPHIs; diff --git a/llvm/lib/Transforms/Vectorize/BBVectorize.cpp b/llvm/lib/Transforms/Vectorize/BBVectorize.cpp index e3cc288..8b541f6 100644 --- a/llvm/lib/Transforms/Vectorize/BBVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/BBVectorize.cpp @@ -208,7 +208,9 @@ namespace { SE = &P->getAnalysis(); DataLayoutPass *DLP = P->getAnalysisIfAvailable(); DL = DLP ? &DLP->getDataLayout() : nullptr; - TTI = IgnoreTargetInfo ? nullptr : &P->getAnalysis(); + TTI = IgnoreTargetInfo + ? nullptr + : &P->getAnalysis().getTTI(); } typedef std::pair ValuePair; @@ -442,7 +444,9 @@ namespace { SE = &getAnalysis(); DataLayoutPass *DLP = getAnalysisIfAvailable(); DL = DLP ? &DLP->getDataLayout() : nullptr; - TTI = IgnoreTargetInfo ? nullptr : &getAnalysis(); + TTI = IgnoreTargetInfo + ? nullptr + : &getAnalysis().getTTI(); return vectorizeBB(BB); } @@ -452,7 +456,7 @@ namespace { AU.addRequired(); AU.addRequired(); AU.addRequired(); - AU.addRequired(); + AU.addRequired(); AU.addPreserved(); AU.addPreserved(); AU.addPreserved(); @@ -3192,7 +3196,7 @@ char BBVectorize::ID = 0; static const char bb_vectorize_name[] = "Basic-Block Vectorization"; INITIALIZE_PASS_BEGIN(BBVectorize, BBV_NAME, bb_vectorize_name, false, false) INITIALIZE_AG_DEPENDENCY(AliasAnalysis) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass) INITIALIZE_PASS_DEPENDENCY(ScalarEvolution) INITIALIZE_PASS_END(BBVectorize, BBV_NAME, bb_vectorize_name, false, false) diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index fa41ab9..f6b6056 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -1330,7 +1330,7 @@ struct LoopVectorize : public FunctionPass { DataLayoutPass *DLP = getAnalysisIfAvailable(); DL = DLP ? &DLP->getDataLayout() : nullptr; LI = &getAnalysis().getLoopInfo(); - TTI = &getAnalysis(); + TTI = &getAnalysis().getTTI(); DT = &getAnalysis().getDomTree(); BFI = &getAnalysis(); auto *TLIP = getAnalysisIfAvailable(); @@ -1550,7 +1550,7 @@ struct LoopVectorize : public FunctionPass { AU.addRequired(); AU.addRequired(); AU.addRequired(); - AU.addRequired(); + AU.addRequired(); AU.addRequired(); AU.addPreserved(); AU.addPreserved(); @@ -6152,7 +6152,7 @@ Type* LoopVectorizationCostModel::ToVectorTy(Type *Scalar, unsigned VF) { char LoopVectorize::ID = 0; static const char lv_name[] = "Loop Vectorization"; INITIALIZE_PASS_BEGIN(LoopVectorize, LV_NAME, lv_name, false, false) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_AG_DEPENDENCY(AliasAnalysis) INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker) INITIALIZE_PASS_DEPENDENCY(BlockFrequencyInfo) diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp index 4dee2d9..2a12c20 100644 --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp @@ -3052,7 +3052,7 @@ struct SLPVectorizer : public FunctionPass { SE = &getAnalysis(); DataLayoutPass *DLP = getAnalysisIfAvailable(); DL = DLP ? &DLP->getDataLayout() : nullptr; - TTI = &getAnalysis(); + TTI = &getAnalysis().getTTI(); auto *TLIP = getAnalysisIfAvailable(); TLI = TLIP ? &TLIP->getTLI() : nullptr; AA = &getAnalysis(); @@ -3114,7 +3114,7 @@ struct SLPVectorizer : public FunctionPass { AU.addRequired(); AU.addRequired(); AU.addRequired(); - AU.addRequired(); + AU.addRequired(); AU.addRequired(); AU.addRequired(); AU.addPreserved(); @@ -3999,7 +3999,7 @@ char SLPVectorizer::ID = 0; static const char lv_name[] = "SLP Vectorizer"; INITIALIZE_PASS_BEGIN(SLPVectorizer, SV_NAME, lv_name, false, false) INITIALIZE_AG_DEPENDENCY(AliasAnalysis) -INITIALIZE_AG_DEPENDENCY(TargetTransformInfo) +INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass) INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker) INITIALIZE_PASS_DEPENDENCY(ScalarEvolution) INITIALIZE_PASS_DEPENDENCY(LoopSimplify) diff --git a/llvm/test/Analysis/CostModel/no_info.ll b/llvm/test/Analysis/CostModel/no_info.ll index f3f165b..5f3b56a 100644 --- a/llvm/test/Analysis/CostModel/no_info.ll +++ b/llvm/test/Analysis/CostModel/no_info.ll @@ -1,11 +1,12 @@ ; RUN: opt < %s -cost-model -analyze | FileCheck %s -; The cost model does not have any target information so it can't make a decision. +; The cost model does not have any target information so it just makes boring +; assumptions. ; -- No triple in this module -- -;CHECK: Unknown cost {{.*}} add -;CHECK: Unknown cost {{.*}} ret +;CHECK: cost of 1 {{.*}} add +;CHECK: cost of 1 {{.*}} ret define i32 @no_info(i32 %arg) { %e = add i32 %arg, %arg ret i32 %e diff --git a/llvm/tools/opt/opt.cpp b/llvm/tools/opt/opt.cpp index 6083e7a..4f4f858 100644 --- a/llvm/tools/opt/opt.cpp +++ b/llvm/tools/opt/opt.cpp @@ -21,6 +21,7 @@ #include "llvm/Analysis/LoopPass.h" #include "llvm/Analysis/RegionPass.h" #include "llvm/Analysis/TargetLibraryInfo.h" +#include "llvm/Analysis/TargetTransformInfo.h" #include "llvm/Bitcode/BitcodeWriterPass.h" #include "llvm/CodeGen/CommandFlags.h" #include "llvm/IR/DataLayout.h" @@ -428,6 +429,8 @@ int main(int argc, char **argv) { // Add internal analysis passes from the target machine. if (TM) TM->addAnalysisPasses(Passes); + else + Passes.add(createNoTargetTransformInfoPass(DL)); std::unique_ptr FPasses; if (OptLevelO1 || OptLevelO2 || OptLevelOs || OptLevelOz || OptLevelO3) { @@ -436,6 +439,8 @@ int main(int argc, char **argv) { FPasses->add(new DataLayoutPass()); if (TM) TM->addAnalysisPasses(*FPasses); + else + FPasses->add(createNoTargetTransformInfoPass(DL)); } -- 2.7.4