* :ref:`amdgpu-amdhsa-memory-model-gfx6-gfx9`
* :ref:`amdgpu-amdhsa-memory-model-gfx90a`
+* :ref:`amdgpu-amdhsa-memory-model-gfx940`
* :ref:`amdgpu-amdhsa-memory-model-gfx10`
.. _amdgpu-amdhsa-memory-model-gfx6-gfx9:
- system for OpenCL.*
============ ============ ============== ========== ================================
+.. _amdgpu-amdhsa-memory-model-gfx940:
+
+Memory Model GFX940
++++++++++++++++++++
+
+For GFX940:
+
+* Each agent has multiple shader arrays (SA).
+* Each SA has multiple compute units (CU).
+* Each CU has multiple SIMDs that execute wavefronts.
+* The wavefronts for a single work-group are executed in the same CU but may be
+ executed by different SIMDs. The exception is when in tgsplit execution mode
+ when the wavefronts may be executed by different SIMDs in different CUs.
+* Each CU has a single LDS memory shared by the wavefronts of the work-groups
+ executing on it. The exception is when in tgsplit execution mode when no LDS
+ is allocated as wavefronts of the same work-group can be in different CUs.
+* All LDS operations of a CU are performed as wavefront wide operations in a
+ global order and involve no caching. Completion is reported to a wavefront in
+ execution order.
+* The LDS memory has multiple request queues shared by the SIMDs of a
+ CU. Therefore, the LDS operations performed by different wavefronts of a
+ work-group can be reordered relative to each other, which can result in
+ reordering the visibility of vector memory operations with respect to LDS
+ operations of other wavefronts in the same work-group. A ``s_waitcnt
+ lgkmcnt(0)`` is required to ensure synchronization between LDS operations and
+ vector memory operations between wavefronts of a work-group, but not between
+ operations performed by the same wavefront.
+* The vector memory operations are performed as wavefront wide operations and
+ completion is reported to a wavefront in execution order. The exception is
+ that ``flat_load/store/atomic`` instructions can report out of vector memory
+ order if they access LDS memory, and out of LDS operation order if they access
+ global memory.
+* The vector memory operations access a single vector L1 cache shared by all
+ SIMDs a CU. Therefore:
+
+ * No special action is required for coherence between the lanes of a single
+ wavefront.
+
+ * No special action is required for coherence between wavefronts in the same
+ work-group since they execute on the same CU. The exception is when in
+ tgsplit execution mode as wavefronts of the same work-group can be in
+ different CUs and so a ``buffer_inv sc0`` is required which will invalidate
+ the L1 cache is in tgsplit mode.
+
+ * A ``buffer_inv sc1`` is required to invalidate the L1 cache for coherence
+ between wavefronts executing in different work-groups as they may be
+ executing on different CUs.
+
+* The scalar memory operations access a scalar L1 cache shared by all wavefronts
+ on a group of CUs. The scalar and vector L1 caches are not coherent. However,
+ scalar operations are used in a restricted way so do not impact the memory
+ model. See :ref:`amdgpu-amdhsa-memory-spaces`.
+* The vector and scalar memory operations use an L2 cache.
+
+ * The gfx940 can be configured as a number of smaller agents with each having
+ a single L2 shared by all CUs on the same agent, or as fewer (possibly one)
+ larger agents with groups of CUs on each agent each sharing separate L2
+ caches.
+ * The L2 cache has independent channels to service disjoint ranges of virtual
+ addresses.
+ * Each CU has a separate request queue per channel for its associated L2.
+ Therefore, the vector and scalar memory operations performed by wavefronts
+ executing with different L1 caches and the same L2 cache can be reordered
+ relative to each other.
+ * A ``s_waitcnt vmcnt(0)`` is required to ensure synchronization between
+ vector memory operations of different CUs. It ensures a previous vector
+ memory operation has completed before executing a subsequent vector memory
+ or LDS operation and so can be used to meet the requirements of acquire and
+ release.
+ * An L2 cache can be kept coherent with other L2 caches by using the MTYPE RW
+ (read-write) for memory local to the L2, and MTYPE NC (non-coherent) with
+ the PTE C-bit set for memory not local to the L2.
+
+ * Any local memory cache lines will be automatically invalidated by writes
+ from CUs associated with other L2 caches, or writes from the CPU, due to
+ the cache probe caused by the PTE C-bit.
+ * XGMI accesses from the CPU to local memory may be cached on the CPU.
+ Subsequent access from the GPU will automatically invalidate or writeback
+ the CPU cache due to the L2 probe filter.
+ * To ensure coherence of local memory writes of CUs with different L1 caches
+ in the same agent a ``buffer_wbl2`` is required. It does nothing if the
+ agent is configured to have a single L2, or will writeback dirty L2 cache
+ lines if configured to have multiple L2 caches.
+ * To ensure coherence of local memory writes of CUs in different agents a
+ ``buffer_wbl2 sc1`` is required. It will writeback dirty L2 cache lines.
+ * To ensure coherence of local memory reads of CUs with different L1 caches
+ in the same agent a ``buffer_inv sc1`` is required. It does nothing if the
+ agent is configured to have a single L2, or will invalidate non-local L2
+ cache lines if configured to have multiple L2 caches.
+ * To ensure coherence of local memory reads of CUs in different agents a
+ ``buffer_inv sc0 sc1`` is required. It will invalidate non-local L2 cache
+ lines if configured to have multiple L2 caches.
+
+ * PCIe access from the GPU to the CPU can be kept coherent by using the MTYPE
+ UC (uncached) which bypasses the L2.
+
+Scalar memory operations are only used to access memory that is proven to not
+change during the execution of the kernel dispatch. This includes constant
+address space and global address space for program scope ``const`` variables.
+Therefore, the kernel machine code does not have to maintain the scalar cache to
+ensure it is coherent with the vector caches. The scalar and vector caches are
+invalidated between kernel dispatches by CP since constant address space data
+may change between kernel dispatch executions. See
+:ref:`amdgpu-amdhsa-memory-spaces`.
+
+The one exception is if scalar writes are used to spill SGPR registers. In this
+case the AMDGPU backend ensures the memory location used to spill is never
+accessed by vector memory operations at the same time. If scalar writes are used
+then a ``s_dcache_wb`` is inserted before the ``s_endpgm`` and before a function
+return since the locations may be used for vector memory instructions by a
+future wavefront that uses the same scratch area, or a function call that
+creates a frame at the same address, respectively. There is no need for a
+``s_dcache_inv`` as all scalar writes are write-before-read in the same thread.
+
+For kernarg backing memory:
+
+* CP invalidates the L1 cache at the start of each kernel dispatch.
+* On dGPU over XGMI or PCIe the kernarg backing memory is allocated in host
+ memory accessed as MTYPE UC (uncached) to avoid needing to invalidate the L2
+ cache. This also causes it to be treated as non-volatile and so is not
+ invalidated by ``*_vol``.
+* On APU the kernarg backing memory is accessed as MTYPE CC (cache coherent) and
+ so the L2 cache will be coherent with the CPU and other agents.
+
+Scratch backing memory (which is used for the private address space) is accessed
+with MTYPE NC_NV (non-coherent non-volatile). Since the private address space is
+only accessed by a single thread, and is always write-before-read, there is
+never a need to invalidate these entries from the L1 cache. Hence all cache
+invalidates are done as ``*_vol`` to only invalidate the volatile cache lines.
+
+The code sequences used to implement the memory model for GFX940 are defined
+in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
+
+ .. table:: AMDHSA Memory Model Code Sequences GFX940
+ :name: amdgpu-amdhsa-memory-model-code-sequences-gfx940-table
+
+ ============ ============ ============== ========== ================================
+ LLVM Instr LLVM Memory LLVM Memory AMDGPU AMDGPU Machine Code
+ Ordering Sync Scope Address GFX940
+ Space
+ ============ ============ ============== ========== ================================
+ **Non-Atomic**
+ ------------------------------------------------------------------------------------
+ load *none* *none* - global - !volatile & !nontemporal
+ - generic
+ - private 1. buffer/global/flat_load
+ - constant
+ - !volatile & nontemporal
+
+ 1. buffer/global/flat_load
+ nt=1
+
+ - volatile
+
+ 1. buffer/global/flat_load
+ sc0=1 sc1=1
+ 2. s_waitcnt vmcnt(0)
+
+ - Must happen before
+ any following volatile
+ global/generic
+ load/store.
+ - Ensures that
+ volatile
+ operations to
+ different
+ addresses will not
+ be reordered by
+ hardware.
+
+ load *none* *none* - local 1. ds_load
+ store *none* *none* - global - !volatile & !nontemporal
+ - generic
+ - private 1. buffer/global/flat_store
+ - constant
+ - !volatile & nontemporal
+
+ 1. buffer/global/flat_store
+ nt=1
+
+ - volatile
+
+ 1. buffer/global/flat_store
+ sc0=1 sc1=1
+ 2. s_waitcnt vmcnt(0)
+
+ - Must happen before
+ any following volatile
+ global/generic
+ load/store.
+ - Ensures that
+ volatile
+ operations to
+ different
+ addresses will not
+ be reordered by
+ hardware.
+
+ store *none* *none* - local 1. ds_store
+ **Unordered Atomic**
+ ------------------------------------------------------------------------------------
+ load atomic unordered *any* *any* *Same as non-atomic*.
+ store atomic unordered *any* *any* *Same as non-atomic*.
+ atomicrmw unordered *any* *any* *Same as monotonic atomic*.
+ **Monotonic Atomic**
+ ------------------------------------------------------------------------------------
+ load atomic monotonic - singlethread - global 1. buffer/global/flat_load
+ - wavefront - generic
+ load atomic monotonic - workgroup - global 1. buffer/global/flat_load
+ - generic sc0=1
+ load atomic monotonic - singlethread - local *If TgSplit execution mode,
+ - wavefront local address space cannot
+ - workgroup be used.*
+
+ 1. ds_load
+ load atomic monotonic - agent - global 1. buffer/global/flat_load
+ - generic sc1=1
+ load atomic monotonic - system - global 1. buffer/global/flat_load
+ - generic sc0=1 sc1=1
+ store atomic monotonic - singlethread - global 1. buffer/global/flat_store
+ - wavefront - generic
+ store atomic monotonic - singlethread - global 1. buffer/global/flat_store
+ - wavefront - generic
+ store atomic monotonic - workgroup - global 1. buffer/global/flat_store
+ - generic sc0=1
+ store atomic monotonic - agent - global 1. buffer/global/flat_store
+ - generic sc1=1
+ store atomic monotonic - system - global 1. buffer/global/flat_store
+ - generic sc0=1 sc1=1
+ store atomic monotonic - singlethread - local *If TgSplit execution mode,
+ - wavefront local address space cannot
+ - workgroup be used.*
+
+ 1. ds_store
+ atomicrmw monotonic - singlethread - global 1. buffer/global/flat_atomic
+ - wavefront - generic
+ - workgroup
+ - agent
+ atomicrmw monotonic - system - global 1. buffer/global/flat_atomic
+ - generic sc1=1
+ atomicrmw monotonic - singlethread - local *If TgSplit execution mode,
+ - wavefront local address space cannot
+ - workgroup be used.*
+
+ 1. ds_atomic
+ **Acquire Atomic**
+ ------------------------------------------------------------------------------------
+ load atomic acquire - singlethread - global 1. buffer/global/ds/flat_load
+ - wavefront - local
+ - generic
+ load atomic acquire - workgroup - global 1. buffer/global_load sc0=1
+ 2. s_waitcnt vmcnt(0)
+
+ - If not TgSplit execution
+ mode, omit.
+ - Must happen before the
+ following buffer_inv.
+
+ 3. buffer_inv sc0=1
+
+ - If not TgSplit execution
+ mode, omit.
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Ensures that
+ following
+ loads will not see
+ stale data.
+
+ load atomic acquire - workgroup - local *If TgSplit execution mode,
+ local address space cannot
+ be used.*
+
+ 1. ds_load
+ 2. s_waitcnt lgkmcnt(0)
+
+ - If OpenCL, omit.
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Ensures any
+ following global
+ data read is no
+ older than the local load
+ atomic value being
+ acquired.
+
+ load atomic acquire - workgroup - generic 1. flat_load sc0=1
+ 2. s_waitcnt lgkm/vmcnt(0)
+
+ - Use lgkmcnt(0) if not
+ TgSplit execution mode
+ and vmcnt(0) if TgSplit
+ execution mode.
+ - If OpenCL, omit lgkmcnt(0).
+ - Must happen before
+ the following
+ buffer_inv and any
+ following global/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Ensures any
+ following global
+ data read is no
+ older than a local load
+ atomic value being
+ acquired.
+
+ 3. buffer_inv sc0=1
+
+ - If not TgSplit execution
+ mode, omit.
+ - Ensures that
+ following
+ loads will not see
+ stale data.
+
+ load atomic acquire - agent - global 1. buffer/global_load
+ sc1=1
+ 2. s_waitcnt vmcnt(0)
+
+ - Must happen before
+ following
+ buffer_inv.
+ - Ensures the load
+ has completed
+ before invalidating
+ the cache.
+
+ 3. buffer_inv sc1=1
+
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/atomicrmw.
+ - Ensures that
+ following
+ loads will not see
+ stale global data.
+
+ load atomic acquire - system - global 1. buffer/global/flat_load
+ sc0=1 sc1=1
+ 2. s_waitcnt vmcnt(0)
+
+ - Must happen before
+ following
+ buffer_inv.
+ - Ensures the load
+ has completed
+ before invalidating
+ the cache.
+
+ 3. buffer_inv sc0=1 sc1=1
+
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/atomicrmw.
+ - Ensures that
+ following
+ loads will not see
+ stale MTYPE NC global data.
+ MTYPE RW and CC memory will
+ never be stale due to the
+ memory probes.
+
+ load atomic acquire - agent - generic 1. flat_load sc1=1
+ 2. s_waitcnt vmcnt(0) &
+ lgkmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL omit
+ lgkmcnt(0).
+ - Must happen before
+ following
+ buffer_inv.
+ - Ensures the flat_load
+ has completed
+ before invalidating
+ the cache.
+
+ 3. buffer_inv sc1=1
+
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/atomicrmw.
+ - Ensures that
+ following loads
+ will not see stale
+ global data.
+
+ load atomic acquire - system - generic 1. flat_load sc0=1 sc1=1
+ 2. s_waitcnt vmcnt(0) &
+ lgkmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL omit
+ lgkmcnt(0).
+ - Must happen before
+ the following
+ buffer_inv.
+ - Ensures the flat_load
+ has completed
+ before invalidating
+ the caches.
+
+ 3. buffer_inv sc0=1 sc1=1
+
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/atomicrmw.
+ - Ensures that
+ following
+ loads will not see
+ stale MTYPE NC global data.
+ MTYPE RW and CC memory will
+ never be stale due to the
+ memory probes.
+
+ atomicrmw acquire - singlethread - global 1. buffer/global/flat_atomic
+ - wavefront - generic
+ atomicrmw acquire - singlethread - local *If TgSplit execution mode,
+ - wavefront local address space cannot
+ be used.*
+
+ 1. ds_atomic
+ atomicrmw acquire - workgroup - global 1. buffer/global_atomic
+ 2. s_waitcnt vmcnt(0)
+
+ - If not TgSplit execution
+ mode, omit.
+ - Must happen before the
+ following buffer_inv.
+ - Ensures the atomicrmw
+ has completed
+ before invalidating
+ the cache.
+
+ 3. buffer_inv sc0=1
+
+ - If not TgSplit execution
+ mode, omit.
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/atomicrmw.
+ - Ensures that
+ following loads
+ will not see stale
+ global data.
+
+ atomicrmw acquire - workgroup - local *If TgSplit execution mode,
+ local address space cannot
+ be used.*
+
+ 1. ds_atomic
+ 2. s_waitcnt lgkmcnt(0)
+
+ - If OpenCL, omit.
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Ensures any
+ following global
+ data read is no
+ older than the local
+ atomicrmw value
+ being acquired.
+
+ atomicrmw acquire - workgroup - generic 1. flat_atomic
+ 2. s_waitcnt lgkm/vmcnt(0)
+
+ - Use lgkmcnt(0) if not
+ TgSplit execution mode
+ and vmcnt(0) if TgSplit
+ execution mode.
+ - If OpenCL, omit lgkmcnt(0).
+ - Must happen before
+ the following
+ buffer_inv and
+ any following
+ global/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Ensures any
+ following global
+ data read is no
+ older than a local
+ atomicrmw value
+ being acquired.
+
+ 3. buffer_inv sc0=1
+
+ - If not TgSplit execution
+ mode, omit.
+ - Ensures that
+ following
+ loads will not see
+ stale data.
+
+ atomicrmw acquire - agent - global 1. buffer/global_atomic
+ 2. s_waitcnt vmcnt(0)
+
+ - Must happen before
+ following
+ buffer_inv.
+ - Ensures the
+ atomicrmw has
+ completed before
+ invalidating the
+ cache.
+
+ 3. buffer_inv sc1=1
+
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/atomicrmw.
+ - Ensures that
+ following loads
+ will not see stale
+ global data.
+
+ atomicrmw acquire - system - global 1. buffer/global_atomic
+ sc1=1
+ 2. s_waitcnt vmcnt(0)
+
+ - Must happen before
+ following
+ buffer_inv.
+ - Ensures the
+ atomicrmw has
+ completed before
+ invalidating the
+ caches.
+
+ 3. buffer_inv sc0=1 sc1=1
+
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/atomicrmw.
+ - Ensures that
+ following
+ loads will not see
+ stale MTYPE NC global data.
+ MTYPE RW and CC memory will
+ never be stale due to the
+ memory probes.
+
+ atomicrmw acquire - agent - generic 1. flat_atomic
+ 2. s_waitcnt vmcnt(0) &
+ lgkmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL, omit
+ lgkmcnt(0).
+ - Must happen before
+ following
+ buffer_inv.
+ - Ensures the
+ atomicrmw has
+ completed before
+ invalidating the
+ cache.
+
+ 3. buffer_inv sc1=1
+
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/atomicrmw.
+ - Ensures that
+ following loads
+ will not see stale
+ global data.
+
+ atomicrmw acquire - system - generic 1. flat_atomic sc1=1
+ 2. s_waitcnt vmcnt(0) &
+ lgkmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL, omit
+ lgkmcnt(0).
+ - Must happen before
+ following
+ buffer_inv.
+ - Ensures the
+ atomicrmw has
+ completed before
+ invalidating the
+ caches.
+
+ 3. buffer_inv sc0=1 sc1=1
+
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/atomicrmw.
+ - Ensures that
+ following
+ loads will not see
+ stale MTYPE NC global data.
+ MTYPE RW and CC memory will
+ never be stale due to the
+ memory probes.
+
+ fence acquire - singlethread *none* *none*
+ - wavefront
+ fence acquire - workgroup *none* 1. s_waitcnt lgkm/vmcnt(0)
+
+ - Use lgkmcnt(0) if not
+ TgSplit execution mode
+ and vmcnt(0) if TgSplit
+ execution mode.
+ - If OpenCL and
+ address space is
+ not generic, omit
+ lgkmcnt(0).
+ - If OpenCL and
+ address space is
+ local, omit
+ vmcnt(0).
+ - However, since LLVM
+ currently has no
+ address space on
+ the fence need to
+ conservatively
+ always generate. If
+ fence had an
+ address space then
+ set to address
+ space of OpenCL
+ fence flag, or to
+ generic if both
+ local and global
+ flags are
+ specified.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic load
+ atomic/
+ atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ fence-paired-atomic).
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic load
+ atomic/atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ fence-paired-atomic).
+ - Must happen before
+ the following
+ buffer_inv and
+ any following
+ global/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Ensures any
+ following global
+ data read is no
+ older than the
+ value read by the
+ fence-paired-atomic.
+
+ 3. buffer_inv sc0=1
+
+ - If not TgSplit execution
+ mode, omit.
+ - Ensures that
+ following
+ loads will not see
+ stale data.
+
+ fence acquire - agent *none* 1. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL and
+ address space is
+ not generic, omit
+ lgkmcnt(0).
+ - However, since LLVM
+ currently has no
+ address space on
+ the fence need to
+ conservatively
+ always generate
+ (see comment for
+ previous fence).
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0) and
+ s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic load
+ atomic/atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ fence-paired-atomic).
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic load
+ atomic/atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ fence-paired-atomic).
+ - Must happen before
+ the following
+ buffer_inv.
+ - Ensures that the
+ fence-paired atomic
+ has completed
+ before invalidating
+ the
+ cache. Therefore
+ any following
+ locations read must
+ be no older than
+ the value read by
+ the
+ fence-paired-atomic.
+
+ 2. buffer_inv sc1=1
+
+ - Must happen before any
+ following global/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Ensures that
+ following loads
+ will not see stale
+ global data.
+
+ fence acquire - system *none* 1. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL and
+ address space is
+ not generic, omit
+ lgkmcnt(0).
+ - However, since LLVM
+ currently has no
+ address space on
+ the fence need to
+ conservatively
+ always generate
+ (see comment for
+ previous fence).
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0) and
+ s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic load
+ atomic/atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ fence-paired-atomic).
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic load
+ atomic/atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ fence-paired-atomic).
+ - Must happen before
+ the following
+ buffer_inv.
+ - Ensures that the
+ fence-paired atomic
+ has completed
+ before invalidating
+ the
+ cache. Therefore
+ any following
+ locations read must
+ be no older than
+ the value read by
+ the
+ fence-paired-atomic.
+
+ 2. buffer_inv sc0=1 sc1=1
+
+ - Must happen before any
+ following global/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Ensures that
+ following loads
+ will not see stale
+ global data.
+
+ **Release Atomic**
+ ------------------------------------------------------------------------------------
+ store atomic release - singlethread - global 1. buffer/global/flat_store
+ - wavefront - generic
+ store atomic release - singlethread - local *If TgSplit execution mode,
+ - wavefront local address space cannot
+ be used.*
+
+ 1. ds_store
+ store atomic release - workgroup - global 1. s_waitcnt lgkm/vmcnt(0)
+ - generic
+ - Use lgkmcnt(0) if not
+ TgSplit execution mode
+ and vmcnt(0) if TgSplit
+ execution mode.
+ - If OpenCL, omit lgkmcnt(0).
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic load/store/
+ load atomic/store atomic/
+ atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ the following
+ store.
+ - Ensures that all
+ memory operations
+ have
+ completed before
+ performing the
+ store that is being
+ released.
+
+ 2. buffer/global/flat_store sc0=1
+ store atomic release - workgroup - local *If TgSplit execution mode,
+ local address space cannot
+ be used.*
+
+ 1. ds_store
+ store atomic release - agent - global 1. buffer_wbl2 sc1=1
+ - generic
+ - Must happen before
+ following s_waitcnt.
+ - Performs L2 writeback to
+ ensure previous
+ global/generic
+ store/atomicrmw are
+ visible at agent scope.
+
+ 2. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL and
+ address space is
+ not generic, omit
+ lgkmcnt(0).
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0) and
+ s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ the following
+ store.
+ - Ensures that all
+ memory operations
+ to memory have
+ completed before
+ performing the
+ store that is being
+ released.
+
+ 3. buffer/global/flat_store sc1=1
+ store atomic release - system - global 1. buffer_wbl2 sc0=1 sc1=1
+ - generic
+ - Must happen before
+ following s_waitcnt.
+ - Performs L2 writeback to
+ ensure previous
+ global/generic
+ store/atomicrmw are
+ visible at system scope.
+
+ 2. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL and
+ address space is
+ not generic, omit
+ lgkmcnt(0).
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0) and
+ s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt vmcnt(0)
+ must happen after any
+ preceding
+ global/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after any
+ preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ the following
+ store.
+ - Ensures that all
+ memory operations
+ to memory and the L2
+ writeback have
+ completed before
+ performing the
+ store that is being
+ released.
+
+ 2. buffer/global/flat_store
+ sc0=1 sc1=1
+ atomicrmw release - singlethread - global 1. buffer/global/flat_atomic
+ - wavefront - generic
+ atomicrmw release - singlethread - local *If TgSplit execution mode,
+ - wavefront local address space cannot
+ be used.*
+
+ 1. ds_atomic
+ atomicrmw release - workgroup - global 1. s_waitcnt lgkm/vmcnt(0)
+ - generic
+ - Use lgkmcnt(0) if not
+ TgSplit execution mode
+ and vmcnt(0) if TgSplit
+ execution mode.
+ - If OpenCL, omit
+ lgkmcnt(0).
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic load/store/
+ load atomic/store atomic/
+ atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ the following
+ atomicrmw.
+ - Ensures that all
+ memory operations
+ have
+ completed before
+ performing the
+ atomicrmw that is
+ being released.
+
+ 2. buffer/global/flat_atomic sc0=1
+ atomicrmw release - workgroup - local *If TgSplit execution mode,
+ local address space cannot
+ be used.*
+
+ 1. ds_atomic
+ atomicrmw release - agent - global 1. buffer_wbl2 sc1=1
+ - generic
+ - Must happen before
+ following s_waitcnt.
+ - Performs L2 writeback to
+ ensure previous
+ global/generic
+ store/atomicrmw are
+ visible at agent scope.
+
+ 2. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL, omit
+ lgkmcnt(0).
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0) and
+ s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ the following
+ atomicrmw.
+ - Ensures that all
+ memory operations
+ to global and local
+ have completed
+ before performing
+ the atomicrmw that
+ is being released.
+
+ 3. buffer/global/flat_atomic sc1=1
+ atomicrmw release - system - global 1. buffer_wbl2 sc0=1 sc1=1
+ - generic
+ - Must happen before
+ following s_waitcnt.
+ - Performs L2 writeback to
+ ensure previous
+ global/generic
+ store/atomicrmw are
+ visible at system scope.
+
+ 2. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL, omit
+ lgkmcnt(0).
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0) and
+ s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ the following
+ atomicrmw.
+ - Ensures that all
+ memory operations
+ to memory and the L2
+ writeback have
+ completed before
+ performing the
+ store that is being
+ released.
+
+ 3. buffer/global/flat_atomic
+ sc0=1 sc1=1
+ fence release - singlethread *none* *none*
+ - wavefront
+ fence release - workgroup *none* 1. s_waitcnt lgkm/vmcnt(0)
+
+ - Use lgkmcnt(0) if not
+ TgSplit execution mode
+ and vmcnt(0) if TgSplit
+ execution mode.
+ - If OpenCL and
+ address space is
+ not generic, omit
+ lgkmcnt(0).
+ - If OpenCL and
+ address space is
+ local, omit
+ vmcnt(0).
+ - However, since LLVM
+ currently has no
+ address space on
+ the fence need to
+ conservatively
+ always generate. If
+ fence had an
+ address space then
+ set to address
+ space of OpenCL
+ fence flag, or to
+ generic if both
+ local and global
+ flags are
+ specified.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic
+ load/store/
+ load atomic/store atomic/
+ atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Must happen before
+ any following store
+ atomic/atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ fence-paired-atomic).
+ - Ensures that all
+ memory operations
+ have
+ completed before
+ performing the
+ following
+ fence-paired-atomic.
+
+ fence release - agent *none* 1. buffer_wbl2 sc1=1
+
+ - If OpenCL and
+ address space is
+ local, omit.
+ - Must happen before
+ following s_waitcnt.
+ - Performs L2 writeback to
+ ensure previous
+ global/generic
+ store/atomicrmw are
+ visible at agent scope.
+
+ 2. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL and
+ address space is
+ not generic, omit
+ lgkmcnt(0).
+ - If OpenCL and
+ address space is
+ local, omit
+ vmcnt(0).
+ - However, since LLVM
+ currently has no
+ address space on
+ the fence need to
+ conservatively
+ always generate. If
+ fence had an
+ address space then
+ set to address
+ space of OpenCL
+ fence flag, or to
+ generic if both
+ local and global
+ flags are
+ specified.
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0) and
+ s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ any following store
+ atomic/atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ fence-paired-atomic).
+ - Ensures that all
+ memory operations
+ have
+ completed before
+ performing the
+ following
+ fence-paired-atomic.
+
+ fence release - system *none* 1. buffer_wbl2 sc0=1 sc1=1
+
+ - Must happen before
+ following s_waitcnt.
+ - Performs L2 writeback to
+ ensure previous
+ global/generic
+ store/atomicrmw are
+ visible at system scope.
+
+ 2. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL and
+ address space is
+ not generic, omit
+ lgkmcnt(0).
+ - If OpenCL and
+ address space is
+ local, omit
+ vmcnt(0).
+ - However, since LLVM
+ currently has no
+ address space on
+ the fence need to
+ conservatively
+ always generate. If
+ fence had an
+ address space then
+ set to address
+ space of OpenCL
+ fence flag, or to
+ generic if both
+ local and global
+ flags are
+ specified.
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0) and
+ s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ any following store
+ atomic/atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ fence-paired-atomic).
+ - Ensures that all
+ memory operations
+ have
+ completed before
+ performing the
+ following
+ fence-paired-atomic.
+
+ **Acquire-Release Atomic**
+ ------------------------------------------------------------------------------------
+ atomicrmw acq_rel - singlethread - global 1. buffer/global/flat_atomic
+ - wavefront - generic
+ atomicrmw acq_rel - singlethread - local *If TgSplit execution mode,
+ - wavefront local address space cannot
+ be used.*
+
+ 1. ds_atomic
+ atomicrmw acq_rel - workgroup - global 1. s_waitcnt lgkm/vmcnt(0)
+
+ - Use lgkmcnt(0) if not
+ TgSplit execution mode
+ and vmcnt(0) if TgSplit
+ execution mode.
+ - If OpenCL, omit
+ lgkmcnt(0).
+ - Must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic load/store/
+ load atomic/store atomic/
+ atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ the following
+ atomicrmw.
+ - Ensures that all
+ memory operations
+ have
+ completed before
+ performing the
+ atomicrmw that is
+ being released.
+
+ 2. buffer/global_atomic
+ 3. s_waitcnt vmcnt(0)
+
+ - If not TgSplit execution
+ mode, omit.
+ - Must happen before
+ the following
+ buffer_inv.
+ - Ensures any
+ following global
+ data read is no
+ older than the
+ atomicrmw value
+ being acquired.
+
+ 4. buffer_inv sc0=1
+
+ - If not TgSplit execution
+ mode, omit.
+ - Ensures that
+ following
+ loads will not see
+ stale data.
+
+ atomicrmw acq_rel - workgroup - local *If TgSplit execution mode,
+ local address space cannot
+ be used.*
+
+ 1. ds_atomic
+ 2. s_waitcnt lgkmcnt(0)
+
+ - If OpenCL, omit.
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Ensures any
+ following global
+ data read is no
+ older than the local load
+ atomic value being
+ acquired.
+
+ atomicrmw acq_rel - workgroup - generic 1. s_waitcnt lgkm/vmcnt(0)
+
+ - Use lgkmcnt(0) if not
+ TgSplit execution mode
+ and vmcnt(0) if TgSplit
+ execution mode.
+ - If OpenCL, omit
+ lgkmcnt(0).
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic load/store/
+ load atomic/store atomic/
+ atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ the following
+ atomicrmw.
+ - Ensures that all
+ memory operations
+ have
+ completed before
+ performing the
+ atomicrmw that is
+ being released.
+
+ 2. flat_atomic
+ 3. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If not TgSplit execution
+ mode, omit vmcnt(0).
+ - If OpenCL, omit
+ lgkmcnt(0).
+ - Must happen before
+ the following
+ buffer_inv and
+ any following
+ global/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Ensures any
+ following global
+ data read is no
+ older than a local load
+ atomic value being
+ acquired.
+
+ 3. buffer_inv sc0=1
+
+ - If not TgSplit execution
+ mode, omit.
+ - Ensures that
+ following
+ loads will not see
+ stale data.
+
+ atomicrmw acq_rel - agent - global 1. buffer_wbl2 sc1=1
+
+ - Must happen before
+ following s_waitcnt.
+ - Performs L2 writeback to
+ ensure previous
+ global/generic
+ store/atomicrmw are
+ visible at agent scope.
+
+ 2. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL, omit
+ lgkmcnt(0).
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0) and
+ s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ the following
+ atomicrmw.
+ - Ensures that all
+ memory operations
+ to global have
+ completed before
+ performing the
+ atomicrmw that is
+ being released.
+
+ 3. buffer/global_atomic
+ 4. s_waitcnt vmcnt(0)
+
+ - Must happen before
+ following
+ buffer_inv.
+ - Ensures the
+ atomicrmw has
+ completed before
+ invalidating the
+ cache.
+
+ 5. buffer_inv sc1=1
+
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/atomicrmw.
+ - Ensures that
+ following loads
+ will not see stale
+ global data.
+
+ atomicrmw acq_rel - system - global 1. buffer_wbl2 sc0=1 sc1=1
+
+ - Must happen before
+ following s_waitcnt.
+ - Performs L2 writeback to
+ ensure previous
+ global/generic
+ store/atomicrmw are
+ visible at system scope.
+
+ 2. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL, omit
+ lgkmcnt(0).
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0) and
+ s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ the following
+ atomicrmw.
+ - Ensures that all
+ memory operations
+ to global and L2 writeback
+ have completed before
+ performing the
+ atomicrmw that is
+ being released.
+
+ 3. buffer/global_atomic
+ sc1=1
+ 4. s_waitcnt vmcnt(0)
+
+ - Must happen before
+ following
+ buffer_inv.
+ - Ensures the
+ atomicrmw has
+ completed before
+ invalidating the
+ caches.
+
+ 5. buffer_inv sc0=1 sc1=1
+
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/atomicrmw.
+ - Ensures that
+ following loads
+ will not see stale
+ MTYPE NC global data.
+ MTYPE RW and CC memory will
+ never be stale due to the
+ memory probes.
+
+ atomicrmw acq_rel - agent - generic 1. buffer_wbl2 sc1=1
+
+ - Must happen before
+ following s_waitcnt.
+ - Performs L2 writeback to
+ ensure previous
+ global/generic
+ store/atomicrmw are
+ visible at agent scope.
+
+ 2. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL, omit
+ lgkmcnt(0).
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0) and
+ s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ the following
+ atomicrmw.
+ - Ensures that all
+ memory operations
+ to global have
+ completed before
+ performing the
+ atomicrmw that is
+ being released.
+
+ 3. flat_atomic
+ 4. s_waitcnt vmcnt(0) &
+ lgkmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL, omit
+ lgkmcnt(0).
+ - Must happen before
+ following
+ buffer_inv.
+ - Ensures the
+ atomicrmw has
+ completed before
+ invalidating the
+ cache.
+
+ 5. buffer_inv sc1=1
+
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/atomicrmw.
+ - Ensures that
+ following loads
+ will not see stale
+ global data.
+
+ atomicrmw acq_rel - system - generic 1. buffer_wbl2 sc0=1 sc1=1
+
+ - Must happen before
+ following s_waitcnt.
+ - Performs L2 writeback to
+ ensure previous
+ global/generic
+ store/atomicrmw are
+ visible at system scope.
+
+ 2. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL, omit
+ lgkmcnt(0).
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0) and
+ s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ the following
+ atomicrmw.
+ - Ensures that all
+ memory operations
+ to global and L2 writeback
+ have completed before
+ performing the
+ atomicrmw that is
+ being released.
+
+ 3. flat_atomic sc1=1
+ 4. s_waitcnt vmcnt(0) &
+ lgkmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL, omit
+ lgkmcnt(0).
+ - Must happen before
+ following
+ buffer_inv.
+ - Ensures the
+ atomicrmw has
+ completed before
+ invalidating the
+ caches.
+
+ 5. buffer_inv sc0=1 sc1=1
+
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/atomicrmw.
+ - Ensures that
+ following loads
+ will not see stale
+ MTYPE NC global data.
+ MTYPE RW and CC memory will
+ never be stale due to the
+ memory probes.
+
+ fence acq_rel - singlethread *none* *none*
+ - wavefront
+ fence acq_rel - workgroup *none* 1. s_waitcnt lgkm/vmcnt(0)
+
+ - Use lgkmcnt(0) if not
+ TgSplit execution mode
+ and vmcnt(0) if TgSplit
+ execution mode.
+ - If OpenCL and
+ address space is
+ not generic, omit
+ lgkmcnt(0).
+ - If OpenCL and
+ address space is
+ local, omit
+ vmcnt(0).
+ - However,
+ since LLVM
+ currently has no
+ address space on
+ the fence need to
+ conservatively
+ always generate
+ (see comment for
+ previous fence).
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic
+ load/store/
+ load atomic/store atomic/
+ atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Ensures that all
+ memory operations
+ have
+ completed before
+ performing any
+ following global
+ memory operations.
+ - Ensures that the
+ preceding
+ local/generic load
+ atomic/atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ acquire-fence-paired-atomic)
+ has completed
+ before following
+ global memory
+ operations. This
+ satisfies the
+ requirements of
+ acquire.
+ - Ensures that all
+ previous memory
+ operations have
+ completed before a
+ following
+ local/generic store
+ atomic/atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ release-fence-paired-atomic).
+ This satisfies the
+ requirements of
+ release.
+ - Must happen before
+ the following
+ buffer_inv.
+ - Ensures that the
+ acquire-fence-paired
+ atomic has completed
+ before invalidating
+ the
+ cache. Therefore
+ any following
+ locations read must
+ be no older than
+ the value read by
+ the
+ acquire-fence-paired-atomic.
+
+ 3. buffer_inv sc0=1
+
+ - If not TgSplit execution
+ mode, omit.
+ - Ensures that
+ following
+ loads will not see
+ stale data.
+
+ fence acq_rel - agent *none* 1. buffer_wbl2 sc1=1
+
+ - If OpenCL and
+ address space is
+ local, omit.
+ - Must happen before
+ following s_waitcnt.
+ - Performs L2 writeback to
+ ensure previous
+ global/generic
+ store/atomicrmw are
+ visible at agent scope.
+
+ 2. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL and
+ address space is
+ not generic, omit
+ lgkmcnt(0).
+ - However, since LLVM
+ currently has no
+ address space on
+ the fence need to
+ conservatively
+ always generate
+ (see comment for
+ previous fence).
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0) and
+ s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ the following
+ buffer_inv.
+ - Ensures that the
+ preceding
+ global/local/generic
+ load
+ atomic/atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ acquire-fence-paired-atomic)
+ has completed
+ before invalidating
+ the cache. This
+ satisfies the
+ requirements of
+ acquire.
+ - Ensures that all
+ previous memory
+ operations have
+ completed before a
+ following
+ global/local/generic
+ store
+ atomic/atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ release-fence-paired-atomic).
+ This satisfies the
+ requirements of
+ release.
+
+ 3. buffer_inv sc1=1
+
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Ensures that
+ following loads
+ will not see stale
+ global data. This
+ satisfies the
+ requirements of
+ acquire.
+
+ fence acq_rel - system *none* 1. buffer_wbl2 sc0=1 sc1=1
+
+ - If OpenCL and
+ address space is
+ local, omit.
+ - Must happen before
+ following s_waitcnt.
+ - Performs L2 writeback to
+ ensure previous
+ global/generic
+ store/atomicrmw are
+ visible at system scope.
+
+ 1. s_waitcnt lgkmcnt(0) &
+ vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - If OpenCL and
+ address space is
+ not generic, omit
+ lgkmcnt(0).
+ - However, since LLVM
+ currently has no
+ address space on
+ the fence need to
+ conservatively
+ always generate
+ (see comment for
+ previous fence).
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0) and
+ s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt vmcnt(0)
+ must happen after
+ any preceding
+ global/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ any preceding
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
+ - Must happen before
+ the following
+ buffer_inv.
+ - Ensures that the
+ preceding
+ global/local/generic
+ load
+ atomic/atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ acquire-fence-paired-atomic)
+ has completed
+ before invalidating
+ the cache. This
+ satisfies the
+ requirements of
+ acquire.
+ - Ensures that all
+ previous memory
+ operations have
+ completed before a
+ following
+ global/local/generic
+ store
+ atomic/atomicrmw
+ with an equal or
+ wider sync scope
+ and memory ordering
+ stronger than
+ unordered (this is
+ termed the
+ release-fence-paired-atomic).
+ This satisfies the
+ requirements of
+ release.
+
+ 2. buffer_inv sc0=1 sc1=1
+
+ - Must happen before
+ any following
+ global/generic
+ load/load
+ atomic/store/store
+ atomic/atomicrmw.
+ - Ensures that
+ following loads
+ will not see stale
+ MTYPE NC global data.
+ MTYPE RW and CC memory will
+ never be stale due to the
+ memory probes.
+
+ **Sequential Consistent Atomic**
+ ------------------------------------------------------------------------------------
+ load atomic seq_cst - singlethread - global *Same as corresponding
+ - wavefront - local load atomic acquire,
+ - generic except must generated
+ all instructions even
+ for OpenCL.*
+ load atomic seq_cst - workgroup - global 1. s_waitcnt lgkm/vmcnt(0)
+ - generic
+ - Use lgkmcnt(0) if not
+ TgSplit execution mode
+ and vmcnt(0) if TgSplit
+ execution mode.
+ - s_waitcnt lgkmcnt(0) must
+ happen after
+ preceding
+ local/generic load
+ atomic/store
+ atomic/atomicrmw
+ with memory
+ ordering of seq_cst
+ and with equal or
+ wider sync scope.
+ (Note that seq_cst
+ fences have their
+ own s_waitcnt
+ lgkmcnt(0) and so do
+ not need to be
+ considered.)
+ - s_waitcnt vmcnt(0)
+ must happen after
+ preceding
+ global/generic load
+ atomic/store
+ atomic/atomicrmw
+ with memory
+ ordering of seq_cst
+ and with equal or
+ wider sync scope.
+ (Note that seq_cst
+ fences have their
+ own s_waitcnt
+ vmcnt(0) and so do
+ not need to be
+ considered.)
+ - Ensures any
+ preceding
+ sequential
+ consistent global/local
+ memory instructions
+ have completed
+ before executing
+ this sequentially
+ consistent
+ instruction. This
+ prevents reordering
+ a seq_cst store
+ followed by a
+ seq_cst load. (Note
+ that seq_cst is
+ stronger than
+ acquire/release as
+ the reordering of
+ load acquire
+ followed by a store
+ release is
+ prevented by the
+ s_waitcnt of
+ the release, but
+ there is nothing
+ preventing a store
+ release followed by
+ load acquire from
+ completing out of
+ order. The s_waitcnt
+ could be placed after
+ seq_store or before
+ the seq_load. We
+ choose the load to
+ make the s_waitcnt be
+ as late as possible
+ so that the store
+ may have already
+ completed.)
+
+ 2. *Following
+ instructions same as
+ corresponding load
+ atomic acquire,
+ except must generated
+ all instructions even
+ for OpenCL.*
+ load atomic seq_cst - workgroup - local *If TgSplit execution mode,
+ local address space cannot
+ be used.*
+
+ *Same as corresponding
+ load atomic acquire,
+ except must generated
+ all instructions even
+ for OpenCL.*
+
+ load atomic seq_cst - agent - global 1. s_waitcnt lgkmcnt(0) &
+ - system - generic vmcnt(0)
+
+ - If TgSplit execution mode,
+ omit lgkmcnt(0).
+ - Could be split into
+ separate s_waitcnt
+ vmcnt(0)
+ and s_waitcnt
+ lgkmcnt(0) to allow
+ them to be
+ independently moved
+ according to the
+ following rules.
+ - s_waitcnt lgkmcnt(0)
+ must happen after
+ preceding
+ global/generic load
+ atomic/store
+ atomic/atomicrmw
+ with memory
+ ordering of seq_cst
+ and with equal or
+ wider sync scope.
+ (Note that seq_cst
+ fences have their
+ own s_waitcnt
+ lgkmcnt(0) and so do
+ not need to be
+ considered.)
+ - s_waitcnt vmcnt(0)
+ must happen after
+ preceding
+ global/generic load
+ atomic/store
+ atomic/atomicrmw
+ with memory
+ ordering of seq_cst
+ and with equal or
+ wider sync scope.
+ (Note that seq_cst
+ fences have their
+ own s_waitcnt
+ vmcnt(0) and so do
+ not need to be
+ considered.)
+ - Ensures any
+ preceding
+ sequential
+ consistent global
+ memory instructions
+ have completed
+ before executing
+ this sequentially
+ consistent
+ instruction. This
+ prevents reordering
+ a seq_cst store
+ followed by a
+ seq_cst load. (Note
+ that seq_cst is
+ stronger than
+ acquire/release as
+ the reordering of
+ load acquire
+ followed by a store
+ release is
+ prevented by the
+ s_waitcnt of
+ the release, but
+ there is nothing
+ preventing a store
+ release followed by
+ load acquire from
+ completing out of
+ order. The s_waitcnt
+ could be placed after
+ seq_store or before
+ the seq_load. We
+ choose the load to
+ make the s_waitcnt be
+ as late as possible
+ so that the store
+ may have already
+ completed.)
+
+ 2. *Following
+ instructions same as
+ corresponding load
+ atomic acquire,
+ except must generated
+ all instructions even
+ for OpenCL.*
+ store atomic seq_cst - singlethread - global *Same as corresponding
+ - wavefront - local store atomic release,
+ - workgroup - generic except must generated
+ - agent all instructions even
+ - system for OpenCL.*
+ atomicrmw seq_cst - singlethread - global *Same as corresponding
+ - wavefront - local atomicrmw acq_rel,
+ - workgroup - generic except must generated
+ - agent all instructions even
+ - system for OpenCL.*
+ fence seq_cst - singlethread *none* *Same as corresponding
+ - wavefront fence acq_rel,
+ - workgroup except must generated
+ - agent all instructions even
+ - system for OpenCL.*
+ ============ ============ ============== ========== ================================
+
.. _amdgpu-amdhsa-memory-model-gfx10:
Memory Model GFX10
Position Pos) const override;
};
+class SIGfx940CacheControl : public SIGfx90ACacheControl {
+protected:
+
+ /// Sets SC0 bit to "true" if present in \p MI. Returns true if \p MI
+ /// is modified, false otherwise.
+ bool enableSC0Bit(const MachineBasicBlock::iterator &MI) const {
+ return enableNamedBit(MI, AMDGPU::CPol::SC0);
+ }
+
+ /// Sets SC1 bit to "true" if present in \p MI. Returns true if \p MI
+ /// is modified, false otherwise.
+ bool enableSC1Bit(const MachineBasicBlock::iterator &MI) const {
+ return enableNamedBit(MI, AMDGPU::CPol::SC1);
+ }
+
+ /// Sets NT bit to "true" if present in \p MI. Returns true if \p MI
+ /// is modified, false otherwise.
+ bool enableNTBit(const MachineBasicBlock::iterator &MI) const {
+ return enableNamedBit(MI, AMDGPU::CPol::NT);
+ }
+
+public:
+
+ SIGfx940CacheControl(const GCNSubtarget &ST) : SIGfx90ACacheControl(ST) {};
+
+ bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
+ SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace) const override;
+
+ bool enableStoreCacheBypass(const MachineBasicBlock::iterator &MI,
+ SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace) const override;
+
+ bool enableRMWCacheBypass(const MachineBasicBlock::iterator &MI,
+ SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace) const override;
+
+ bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
+ SIAtomicAddrSpace AddrSpace, SIMemOp Op,
+ bool IsVolatile,
+ bool IsNonTemporal) const override;
+
+ bool insertAcquire(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace, Position Pos) const override;
+
+ bool insertRelease(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace, bool IsCrossAddrSpaceOrdering,
+ Position Pos) const override;
+};
+
class SIGfx10CacheControl : public SIGfx7CacheControl {
protected:
/* static */
std::unique_ptr<SICacheControl> SICacheControl::create(const GCNSubtarget &ST) {
GCNSubtarget::Generation Generation = ST.getGeneration();
+ if (ST.hasGFX940Insts())
+ return std::make_unique<SIGfx940CacheControl>(ST);
if (ST.hasGFX90AInsts())
return std::make_unique<SIGfx90ACacheControl>(ST);
if (Generation <= AMDGPUSubtarget::SOUTHERN_ISLANDS)
return Changed;
}
+bool SIGfx940CacheControl::enableLoadCacheBypass(
+ const MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace) const {
+ assert(MI->mayLoad() && !MI->mayStore());
+ bool Changed = false;
+
+ if ((AddrSpace & SIAtomicAddrSpace::GLOBAL) != SIAtomicAddrSpace::NONE) {
+ switch (Scope) {
+ case SIAtomicScope::SYSTEM:
+ // Set SC bits to indicate system scope.
+ Changed |= enableSC0Bit(MI);
+ Changed |= enableSC1Bit(MI);
+ break;
+ case SIAtomicScope::AGENT:
+ // Set SC bits to indicate agent scope.
+ Changed |= enableSC1Bit(MI);
+ break;
+ case SIAtomicScope::WORKGROUP:
+ // In threadgroup split mode the waves of a work-group can be executing on
+ // different CUs. Therefore need to bypass the L1 which is per CU.
+ // Otherwise in non-threadgroup split mode all waves of a work-group are
+ // on the same CU, and so the L1 does not need to be bypassed. Setting SC
+ // bits to indicate work-group scope will do this automatically.
+ Changed |= enableSC0Bit(MI);
+ break;
+ case SIAtomicScope::WAVEFRONT:
+ case SIAtomicScope::SINGLETHREAD:
+ // Leave SC bits unset to indicate wavefront scope.
+ break;
+ default:
+ llvm_unreachable("Unsupported synchronization scope");
+ }
+ }
+
+ /// The scratch address space does not need the global memory caches
+ /// to be bypassed as all memory operations by the same thread are
+ /// sequentially consistent, and no other thread can access scratch
+ /// memory.
+
+ /// Other address spaces do not have a cache.
+
+ return Changed;
+}
+
+bool SIGfx940CacheControl::enableStoreCacheBypass(
+ const MachineBasicBlock::iterator &MI,
+ SIAtomicScope Scope, SIAtomicAddrSpace AddrSpace) const {
+ assert(!MI->mayLoad() && MI->mayStore());
+ bool Changed = false;
+
+ if ((AddrSpace & SIAtomicAddrSpace::GLOBAL) != SIAtomicAddrSpace::NONE) {
+ switch (Scope) {
+ case SIAtomicScope::SYSTEM:
+ // Set SC bits to indicate system scope.
+ Changed |= enableSC0Bit(MI);
+ Changed |= enableSC1Bit(MI);
+ break;
+ case SIAtomicScope::AGENT:
+ // Set SC bits to indicate agent scope.
+ Changed |= enableSC1Bit(MI);
+ break;
+ case SIAtomicScope::WORKGROUP:
+ // Set SC bits to indicate workgroup scope.
+ Changed |= enableSC0Bit(MI);
+ break;
+ case SIAtomicScope::WAVEFRONT:
+ case SIAtomicScope::SINGLETHREAD:
+ // Leave SC bits unset to indicate wavefront scope.
+ break;
+ default:
+ llvm_unreachable("Unsupported synchronization scope");
+ }
+ }
+
+ /// The scratch address space does not need the global memory caches
+ /// to be bypassed as all memory operations by the same thread are
+ /// sequentially consistent, and no other thread can access scratch
+ /// memory.
+
+ /// Other address spaces do not have a cache.
+
+ return Changed;
+}
+
+bool SIGfx940CacheControl::enableRMWCacheBypass(
+ const MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace) const {
+ assert(MI->mayLoad() && MI->mayStore());
+ bool Changed = false;
+
+ if ((AddrSpace & SIAtomicAddrSpace::GLOBAL) != SIAtomicAddrSpace::NONE) {
+ switch (Scope) {
+ case SIAtomicScope::SYSTEM:
+ // Set SC1 bit to indicate system scope.
+ Changed |= enableSC1Bit(MI);
+ break;
+ case SIAtomicScope::AGENT:
+ case SIAtomicScope::WORKGROUP:
+ case SIAtomicScope::WAVEFRONT:
+ case SIAtomicScope::SINGLETHREAD:
+ // RMW atomic operations implicitly bypass the L1 cache and only use SC1
+ // to indicate system or agent scope. The SC0 bit is used to indicate if
+ // they are return or no-return. Leave SC1 bit unset to indicate agent
+ // scope.
+ break;
+ default:
+ llvm_unreachable("Unsupported synchronization scope");
+ }
+ }
+
+ return Changed;
+}
+
+bool SIGfx940CacheControl::enableVolatileAndOrNonTemporal(
+ MachineBasicBlock::iterator &MI, SIAtomicAddrSpace AddrSpace, SIMemOp Op,
+ bool IsVolatile, bool IsNonTemporal) const {
+ // Only handle load and store, not atomic read-modify-write insructions. The
+ // latter use glc to indicate if the atomic returns a result and so must not
+ // be used for cache control.
+ assert(MI->mayLoad() ^ MI->mayStore());
+
+ // Only update load and store, not LLVM IR atomic read-modify-write
+ // instructions. The latter are always marked as volatile so cannot sensibly
+ // handle it as do not want to pessimize all atomics. Also they do not support
+ // the nontemporal attribute.
+ assert(Op == SIMemOp::LOAD || Op == SIMemOp::STORE);
+
+ bool Changed = false;
+
+ if (IsVolatile) {
+ // Set SC bits to indicate system scope.
+ Changed |= enableSC0Bit(MI);
+ Changed |= enableSC1Bit(MI);
+
+ // Ensure operation has completed at system scope to cause all volatile
+ // operations to be visible outside the program in a global order. Do not
+ // request cross address space as only the global address space can be
+ // observable outside the program, so no need to cause a waitcnt for LDS
+ // address space operations.
+ Changed |= insertWait(MI, SIAtomicScope::SYSTEM, AddrSpace, Op, false,
+ Position::AFTER);
+
+ return Changed;
+ }
+
+ if (IsNonTemporal) {
+ Changed |= enableNTBit(MI);
+ return Changed;
+ }
+
+ return Changed;
+}
+
+bool SIGfx940CacheControl::insertAcquire(MachineBasicBlock::iterator &MI,
+ SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace,
+ Position Pos) const {
+ if (!InsertCacheInv)
+ return false;
+
+ bool Changed = false;
+
+ MachineBasicBlock &MBB = *MI->getParent();
+ DebugLoc DL = MI->getDebugLoc();
+
+ if (Pos == Position::AFTER)
+ ++MI;
+
+ if ((AddrSpace & SIAtomicAddrSpace::GLOBAL) != SIAtomicAddrSpace::NONE) {
+ switch (Scope) {
+ case SIAtomicScope::SYSTEM:
+ // Ensures that following loads will not see stale remote VMEM data or
+ // stale local VMEM data with MTYPE NC. Local VMEM data with MTYPE RW and
+ // CC will never be stale due to the local memory probes.
+ BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_INV))
+ // Set SC bits to indicate system scope.
+ .addImm(AMDGPU::CPol::SC0 | AMDGPU::CPol::SC1);
+ // Inserting a "S_WAITCNT vmcnt(0)" after is not required because the
+ // hardware does not reorder memory operations by the same wave with
+ // respect to a preceding "BUFFER_INV". The invalidate is guaranteed to
+ // remove any cache lines of earlier writes by the same wave and ensures
+ // later reads by the same wave will refetch the cache lines.
+ Changed = true;
+ break;
+ case SIAtomicScope::AGENT:
+ // Ensures that following loads will not see stale remote date or local
+ // MTYPE NC global data. Local MTYPE RW and CC memory will never be stale
+ // due to the memory probes.
+ BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_INV))
+ // Set SC bits to indicate agent scope.
+ .addImm(AMDGPU::CPol::SC1);
+ // Inserting "S_WAITCNT vmcnt(0)" is not required because the hardware
+ // does not reorder memory operations with respect to preceeding buffer
+ // invalidate. The invalidate is guaranteed to remove any cache lines of
+ // earlier writes and ensures later writes will refetch the cache lines.
+ Changed = true;
+ break;
+ case SIAtomicScope::WORKGROUP:
+ // In threadgroup split mode the waves of a work-group can be executing on
+ // different CUs. Therefore need to invalidate the L1 which is per CU.
+ // Otherwise in non-threadgroup split mode all waves of a work-group are
+ // on the same CU, and so the L1 does not need to be invalidated.
+ if (ST.isTgSplitEnabled()) {
+ // Ensures L1 is invalidated if in threadgroup split mode. In
+ // non-threadgroup split mode it is a NOP, but no point generating it in
+ // that case if know not in that mode.
+ BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_INV))
+ // Set SC bits to indicate work-group scope.
+ .addImm(AMDGPU::CPol::SC0);
+ // Inserting "S_WAITCNT vmcnt(0)" is not required because the hardware
+ // does not reorder memory operations with respect to preceeding buffer
+ // invalidate. The invalidate is guaranteed to remove any cache lines of
+ // earlier writes and ensures later writes will refetch the cache lines.
+ Changed = true;
+ }
+ break;
+ case SIAtomicScope::WAVEFRONT:
+ case SIAtomicScope::SINGLETHREAD:
+ // Could generate "BUFFER_INV" but it would do nothing as there are no
+ // caches to invalidate.
+ break;
+ default:
+ llvm_unreachable("Unsupported synchronization scope");
+ }
+ }
+
+ /// The scratch address space does not need the global memory cache
+ /// to be flushed as all memory operations by the same thread are
+ /// sequentially consistent, and no other thread can access scratch
+ /// memory.
+
+ /// Other address spaces do not have a cache.
+
+ if (Pos == Position::AFTER)
+ --MI;
+
+ return Changed;
+}
+
+bool SIGfx940CacheControl::insertRelease(MachineBasicBlock::iterator &MI,
+ SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace,
+ bool IsCrossAddrSpaceOrdering,
+ Position Pos) const {
+ bool Changed = false;
+
+ MachineBasicBlock &MBB = *MI->getParent();
+ DebugLoc DL = MI->getDebugLoc();
+
+ if (Pos == Position::AFTER)
+ ++MI;
+
+ if ((AddrSpace & SIAtomicAddrSpace::GLOBAL) != SIAtomicAddrSpace::NONE) {
+ switch (Scope) {
+ case SIAtomicScope::SYSTEM:
+ // Inserting a "S_WAITCNT vmcnt(0)" before is not required because the
+ // hardware does not reorder memory operations by the same wave with
+ // respect to a following "BUFFER_WBL2". The "BUFFER_WBL2" is guaranteed
+ // to initiate writeback of any dirty cache lines of earlier writes by the
+ // same wave. A "S_WAITCNT vmcnt(0)" is needed after to ensure the
+ // writeback has completed.
+ BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_WBL2))
+ // Set SC bits to indicate system scope.
+ .addImm(AMDGPU::CPol::SC0 | AMDGPU::CPol::SC1);
+ // Since AddrSpace contains SIAtomicAddrSpace::GLOBAL and Scope is
+ // SIAtomicScope::SYSTEM, the following insertWait will generate the
+ // required "S_WAITCNT vmcnt(0)" needed by the "BUFFER_WBL2".
+ Changed = true;
+ break;
+ case SIAtomicScope::AGENT:
+ BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_WBL2))
+ // Set SC bits to indicate agent scope.
+ .addImm(AMDGPU::CPol::SC1);
+
+ // Since AddrSpace contains SIAtomicAddrSpace::GLOBAL and Scope is
+ // SIAtomicScope::AGENT, the following insertWait will generate the
+ // required "S_WAITCNT vmcnt(0)".
+ Changed = true;
+ break;
+ case SIAtomicScope::WORKGROUP:
+ case SIAtomicScope::WAVEFRONT:
+ case SIAtomicScope::SINGLETHREAD:
+ // Do not generate "BUFFER_WBL2" as there are no caches it would
+ // writeback, and would require an otherwise unnecessary
+ // "S_WAITCNT vmcnt(0)".
+ break;
+ default:
+ llvm_unreachable("Unsupported synchronization scope");
+ }
+ }
+
+ if (Pos == Position::AFTER)
+ --MI;
+
+ // Ensure the necessary S_WAITCNT needed by any "BUFFER_WBL2" as well as other
+ // S_WAITCNT needed.
+ Changed |= insertWait(MI, Scope, AddrSpace, SIMemOp::LOAD | SIMemOp::STORE,
+ IsCrossAddrSpaceOrdering, Pos);
+
+ return Changed;
+}
+
bool SIGfx10CacheControl::enableLoadCacheBypass(
const MachineBasicBlock::iterator &MI,
SIAtomicScope Scope,
; GFX940-NEXT: v_mov_b32_e32 v2, 4.0
; GFX940-NEXT: s_waitcnt lgkmcnt(0)
; GFX940-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
-; GFX940-NEXT: buffer_wbl2
+; GFX940-NEXT: buffer_wbl2 sc0 sc1
; GFX940-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX940-NEXT: flat_atomic_add_f32 v[0:1], v2
+; GFX940-NEXT: flat_atomic_add_f32 v[0:1], v2 sc1
; GFX940-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX940-NEXT: buffer_invl2
-; GFX940-NEXT: buffer_wbinvl1_vol
+; GFX940-NEXT: buffer_inv sc0 sc1
; GFX940-NEXT: s_endpgm
%ret = atomicrmw fadd float* %ptr, float 4.0 seq_cst
ret void
; GFX940-NEXT: v_mov_b32_e32 v2, 4.0
; GFX940-NEXT: s_waitcnt lgkmcnt(0)
; GFX940-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
-; GFX940-NEXT: buffer_wbl2
+; GFX940-NEXT: buffer_wbl2 sc0 sc1
; GFX940-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX940-NEXT: flat_atomic_add_f32 v[0:1], v2
+; GFX940-NEXT: flat_atomic_add_f32 v[0:1], v2 sc1
; GFX940-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX940-NEXT: buffer_invl2
-; GFX940-NEXT: buffer_wbinvl1_vol
+; GFX940-NEXT: buffer_inv sc0 sc1
; GFX940-NEXT: s_endpgm
%ret = atomicrmw fadd float* %ptr, float 4.0 seq_cst
ret void
; GFX940: ; %bb.0:
; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX940-NEXT: v_mov_b32_e32 v2, 4.0
-; GFX940-NEXT: buffer_wbl2
+; GFX940-NEXT: buffer_wbl2 sc0 sc1
; GFX940-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX940-NEXT: flat_atomic_add_f32 v0, v[0:1], v2 sc0
+; GFX940-NEXT: flat_atomic_add_f32 v0, v[0:1], v2 sc0 sc1
; GFX940-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX940-NEXT: buffer_invl2
-; GFX940-NEXT: buffer_wbinvl1_vol
+; GFX940-NEXT: buffer_inv sc0 sc1
; GFX940-NEXT: s_setpc_b64 s[30:31]
%ret = atomicrmw fadd float* %ptr, float 4.0 seq_cst
ret float %ret
; GFX940-NEXT: s_waitcnt lgkmcnt(0)
; GFX940-NEXT: v_mov_b32_e32 v0, s2
; GFX940-NEXT: v_mov_b32_e32 v1, s3
-; GFX940-NEXT: buffer_wbl2
+; GFX940-NEXT: buffer_wbl2 sc0 sc1
; GFX940-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX940-NEXT: ds_pk_add_bf16 v0, v1
; GFX940-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX940-NEXT: buffer_invl2
-; GFX940-NEXT: buffer_wbinvl1_vol
+; GFX940-NEXT: buffer_inv sc0 sc1
; GFX940-NEXT: s_endpgm
%ret = call <2 x i16> @llvm.amdgcn.ds.fadd.v2bf16(<2 x i16> addrspace(3)* %ptr, <2 x i16> %data)
ret void
; GFX940-LABEL: local_atomic_fadd_v2bf16_rtn:
; GFX940: ; %bb.0:
; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX940-NEXT: buffer_wbl2
+; GFX940-NEXT: buffer_wbl2 sc0 sc1
; GFX940-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX940-NEXT: ds_pk_add_rtn_bf16 v0, v0, v1
; GFX940-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX940-NEXT: buffer_invl2
-; GFX940-NEXT: buffer_wbinvl1_vol
+; GFX940-NEXT: buffer_inv sc0 sc1
; GFX940-NEXT: s_setpc_b64 s[30:31]
%ret = call <2 x i16> @llvm.amdgcn.ds.fadd.v2bf16(<2 x i16> addrspace(3)* %ptr, <2 x i16> %data)
ret <2 x i16> %ret
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @singlethread_acquire_fence() {
; GFX6-LABEL: singlethread_acquire_fence:
; GFX90A-TGSPLIT-LABEL: singlethread_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: singlethread_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: singlethread_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("singlethread") acquire
ret void
; GFX90A-TGSPLIT-LABEL: singlethread_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: singlethread_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: singlethread_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("singlethread") release
ret void
; GFX90A-TGSPLIT-LABEL: singlethread_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: singlethread_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: singlethread_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("singlethread") acq_rel
ret void
; GFX90A-TGSPLIT-LABEL: singlethread_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: singlethread_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: singlethread_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("singlethread") seq_cst
ret void
; GFX90A-TGSPLIT-LABEL: singlethread_one_as_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: singlethread_one_as_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: singlethread_one_as_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("singlethread-one-as") acquire
ret void
; GFX90A-TGSPLIT-LABEL: singlethread_one_as_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: singlethread_one_as_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: singlethread_one_as_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("singlethread-one-as") release
ret void
; GFX90A-TGSPLIT-LABEL: singlethread_one_as_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: singlethread_one_as_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: singlethread_one_as_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("singlethread-one-as") acq_rel
ret void
; GFX90A-TGSPLIT-LABEL: singlethread_one_as_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: singlethread_one_as_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: singlethread_one_as_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("singlethread-one-as") seq_cst
ret void
; GFX90A-TGSPLIT-LABEL: wavefront_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: wavefront_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: wavefront_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("wavefront") acquire
ret void
; GFX90A-TGSPLIT-LABEL: wavefront_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: wavefront_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: wavefront_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("wavefront") release
ret void
; GFX90A-TGSPLIT-LABEL: wavefront_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: wavefront_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: wavefront_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("wavefront") acq_rel
ret void
; GFX90A-TGSPLIT-LABEL: wavefront_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: wavefront_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: wavefront_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("wavefront") seq_cst
ret void
; GFX90A-TGSPLIT-LABEL: wavefront_one_as_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: wavefront_one_as_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: wavefront_one_as_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("wavefront-one-as") acquire
ret void
; GFX90A-TGSPLIT-LABEL: wavefront_one_as_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: wavefront_one_as_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: wavefront_one_as_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("wavefront-one-as") release
ret void
; GFX90A-TGSPLIT-LABEL: wavefront_one_as_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: wavefront_one_as_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: wavefront_one_as_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("wavefront-one-as") acq_rel
ret void
; GFX90A-TGSPLIT-LABEL: wavefront_one_as_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: wavefront_one_as_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: wavefront_one_as_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("wavefront-one-as") seq_cst
ret void
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("workgroup") acquire
ret void
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("workgroup") release
ret void
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("workgroup") acq_rel
ret void
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("workgroup") seq_cst
ret void
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("workgroup-one-as") acquire
ret void
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("workgroup-one-as") release
ret void
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("workgroup-one-as") acq_rel
ret void
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("workgroup-one-as") seq_cst
ret void
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("agent") acquire
ret void
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("agent") release
ret void
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("agent") acq_rel
ret void
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("agent") seq_cst
ret void
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("agent-one-as") acquire
ret void
; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("agent-one-as") release
ret void
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("agent-one-as") acq_rel
ret void
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: agent_one_as_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: agent_one_as_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("agent-one-as") seq_cst
ret void
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence acquire
ret void
; GFX90A-TGSPLIT-NEXT: buffer_wbl2
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence release
ret void
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence acq_rel
ret void
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence seq_cst
ret void
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_acquire_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_acquire_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("one-as") acquire
ret void
; GFX90A-TGSPLIT-NEXT: buffer_wbl2
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_release_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_release_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("one-as") release
ret void
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_acq_rel_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_acq_rel_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("one-as") acq_rel
ret void
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: system_one_as_seq_cst_fence:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: system_one_as_seq_cst_fence:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
entry:
fence syncscope("one-as") seq_cst
ret void
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @flat_agent_unordered_load(
; GFX7-LABEL: flat_agent_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("agent") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("agent") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("agent") acquire, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("agent") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("agent") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("agent") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("agent") release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("agent") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent") monotonic
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent") release
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent") seq_cst
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent") acquire
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent") acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("agent-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("agent-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("agent-one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("agent-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("agent-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("agent-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("agent-one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("agent-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent-one-as") monotonic
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent-one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent-one-as") release
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent-one-as") acquire
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("agent-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_agent_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_agent_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @flat_nontemporal_load_0(
; GFX7-LABEL: flat_nontemporal_load_0:
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_nontemporal_load_0:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] nt
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_nontemporal_load_0:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] nt
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load i32, i32* %in, align 4, !nontemporal !0
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_nontemporal_load_1:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: v_add_co_u32_e32 v0, vcc, s0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] nt
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_nontemporal_load_1:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: v_add_co_u32_e32 v0, vcc, s0, v0
+; GFX940-TGSPLIT-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] nt
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x()
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 glc slc
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_nontemporal_store_0:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 nt
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_nontemporal_store_0:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 nt
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load i32, i32* %in, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 glc slc
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_nontemporal_store_1:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v3, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: v_add_co_u32_e32 v0, vcc, s2, v0
+; GFX940-NOTTGSPLIT-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 nt
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_nontemporal_store_1:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v3, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: v_add_co_u32_e32 v0, vcc, s2, v0
+; GFX940-TGSPLIT-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 nt
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x()
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @flat_singlethread_unordered_load(
; GFX7-LABEL: flat_singlethread_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("singlethread") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("singlethread") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("singlethread") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("singlethread") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("singlethread") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("singlethread") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("singlethread") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("singlethread") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("singlethread-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("singlethread-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("singlethread-one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("singlethread-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("singlethread-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("singlethread-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("singlethread-one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("singlethread-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread-one-as") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread-one-as") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread-one-as") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread-one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("singlethread-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @flat_system_unordered_load(
; GFX7-LABEL: flat_system_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0 sc1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in acquire, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in monotonic
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in release
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in seq_cst
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in acquire
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0 sc1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("one-as") monotonic
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("one-as") release
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("one-as") acquire
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_system_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_system_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @flat_wavefront_unordered_load(
; GFX7-LABEL: flat_wavefront_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("wavefront") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("wavefront") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("wavefront") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("wavefront") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("wavefront") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("wavefront") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("wavefront") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("wavefront") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("wavefront-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("wavefront-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("wavefront-one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("wavefront-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("wavefront-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("wavefront-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("wavefront-one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("wavefront-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront-one-as") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront-one-as") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront-one-as") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront-one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("wavefront-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_acq_relc_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_acq_relc_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @flat_workgroup_unordered_load(
; GFX7-LABEL: flat_workgroup_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("workgroup") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("workgroup") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("workgroup") acquire, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("workgroup") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("workgroup") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("workgroup") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("workgroup") release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("workgroup") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup") monotonic
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup") release
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup") seq_cst
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup") acquire
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup") acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("workgroup-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("workgroup-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("workgroup-one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_load_dword v2, v[0:1] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s3
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %in, i32* %out) {
entry:
%val = load atomic i32, i32* %in syncscope("workgroup-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("workgroup-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("workgroup-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("workgroup-one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32* %out) {
entry:
store atomic i32 %in, i32* %out syncscope("workgroup-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s2
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup-one-as") monotonic
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup-one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup-one-as") release
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup-one-as") acquire
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_swap v2, v[0:1], v2 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32* %out, i32 %in syncscope("workgroup-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[2:3], s[2:3], s[2:3] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v[0:1], v[2:3] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_monotonicmonotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_monotonicmonotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: flat_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[2:3], s[4:5]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @global_agent_unordered_load(
; GFX6-LABEL: global_agent_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("agent") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("agent") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("agent") acquire, align 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("agent") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("agent") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("agent") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("agent") release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("agent") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent") monotonic
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent") release
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent") seq_cst
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent") acquire
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent") acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("agent-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("agent-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("agent-one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("agent-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("agent-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("agent-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("agent-one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("agent-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent-one-as") monotonic
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent-one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent-one-as") release
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent-one-as") acquire
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("agent-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_agent_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_agent_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @global_nontemporal_load_0(
; GFX6-LABEL: global_nontemporal_load_0:
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_nontemporal_load_0:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s0, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_nontemporal_load_0:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: s_load_dword s0, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load i32, i32 addrspace(1)* %in, align 4, !nontemporal !0
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v1, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_nontemporal_load_1:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v0, v0, s[0:1] nt
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v1, v0, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_nontemporal_load_1:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v0, v0, s[0:1] nt
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v1, v0, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x()
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] glc slc
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_nontemporal_store_0:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s0, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] nt
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_nontemporal_store_0:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: s_load_dword s0, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] nt
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load i32, i32 addrspace(1)* %in, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] glc slc
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_nontemporal_store_1:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s0, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] nt
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_nontemporal_store_1:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_lshlrev_b32_e32 v0, 2, v0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: s_load_dword s0, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] nt
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x()
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @global_singlethread_unordered_load(
; GFX6-LABEL: global_singlethread_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("singlethread") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("singlethread") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("singlethread") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("singlethread") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("singlethread") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("singlethread") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("singlethread") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("singlethread") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("singlethread-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("singlethread-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("singlethread-one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("singlethread-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("singlethread-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("singlethread-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("singlethread-one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("singlethread-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread-one-as") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread-one-as") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread-one-as") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread-one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("singlethread-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @global_system_unordered_load(
; GFX6-LABEL: global_system_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in monotonic, align 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in acquire, align 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in monotonic
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in release
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in seq_cst
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in acquire
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_relese_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_relese_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("one-as") monotonic
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("one-as") release
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3] sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("one-as") acquire
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_system_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_system_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: buffer_wbl2 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0 sc1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0 sc1
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @global_wavefront_unordered_load(
; GFX6-LABEL: global_wavefront_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("wavefront") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("wavefront") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("wavefront") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("wavefront") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("wavefront") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("wavefront") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("wavefront") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("wavefront") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("wavefront-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("wavefront-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("wavefront-one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("wavefront-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("wavefront-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("wavefront-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("wavefront-one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("wavefront-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront-one-as") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront-one-as") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront-one-as") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront-one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("wavefront-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @global_workgroup_unordered_load(
; GFX6-LABEL: global_workgroup_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("workgroup") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("workgroup") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("workgroup") acquire, align 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("workgroup") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("workgroup") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("workgroup") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("workgroup") release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("workgroup") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup") monotonic
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup") release
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup") seq_cst
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup") acquire
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup") acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("workgroup-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("workgroup-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("workgroup-one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_load_dword v1, v0, s[0:1] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(1)* %out) {
entry:
%val = load atomic i32, i32 addrspace(1)* %in syncscope("workgroup-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("workgroup-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("workgroup-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("workgroup-one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(1)* %out) {
entry:
store atomic i32 %in, i32 addrspace(1)* %out syncscope("workgroup-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup-one-as") monotonic
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup-one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup-one-as") release
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup-one-as") acquire
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v0, v1, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s4
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_swap v1, v0, v1, s[2:3] sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v0, v1, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(1)* %out, i32 %in syncscope("workgroup-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[2:3] offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v2, v[0:1], s[4:5] offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: global_store_dword v2, v0, s[2:3]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: global_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: global_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b64_e32 v[0:1], s[2:3]
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[4:5] offset:16 sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: global_store_dword v2, v0, s[4:5]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(1)* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @local_agent_unordered_load(
; GFX6-LABEL: local_agent_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("agent") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("agent") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("agent") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("agent") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("agent") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("agent") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("agent") release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("agent") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent") monotonic
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent") release
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent") acq_rel
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent") seq_cst
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("agent-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("agent-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("agent-one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("agent-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("agent-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("agent-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("agent-one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("agent-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent-one-as") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent-one-as") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent-one-as") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent-one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("agent-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_agent_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @local_nontemporal_load_0(
; GFX6-LABEL: local_nontemporal_load_0:
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v1, v0, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_nontemporal_load_0:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v1, v0, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_nontemporal_load_0:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v1, v0, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(1)* %out) {
entry:
%val = load i32, i32 addrspace(3)* %in, align 4, !nontemporal !0
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v1, v0, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_nontemporal_load_1:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_lshl_add_u32 v0, v0, 2, s4
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v1, v0, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_nontemporal_load_1:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_lshl_add_u32 v0, v0, 2, s4
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v1, v0, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(1)* %out) {
entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x()
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_nontemporal_store_0:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s4
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s0, s[2:3], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_nontemporal_store_0:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s4
+; GFX940-TGSPLIT-NEXT: s_load_dword s0, s[2:3], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(3)* %out) {
entry:
%val = load i32, i32 addrspace(1)* %in, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_nontemporal_store_1:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_lshl_add_u32 v0, v0, 2, s4
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s0, s[2:3], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_nontemporal_store_1:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_lshl_add_u32 v0, v0, 2, s4
+; GFX940-TGSPLIT-NEXT: s_load_dword s0, s[2:3], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(3)* %out) {
entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x()
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @local_singlethread_unordered_load(
; GFX6-LABEL: local_singlethread_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("singlethread") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("singlethread") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("singlethread") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("singlethread") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("singlethread") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("singlethread") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("singlethread") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("singlethread") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread") seq_cst
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("singlethread-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("singlethread-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("singlethread-one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("singlethread-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("singlethread-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("singlethread-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("singlethread-one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("singlethread-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread-one-as") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread-one-as") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread-one-as") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread-one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("singlethread-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_singlethread_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @local_system_unordered_load(
; GFX6-LABEL: local_system_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in monotonic
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in release
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in acq_rel
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in seq_cst
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("one-as") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("one-as") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("one-as") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_system_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_system_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @local_wavefront_unordered_load(
; GFX6-LABEL: local_wavefront_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("wavefront") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("wavefront") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("wavefront") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("wavefront") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("wavefront") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("wavefront") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("wavefront") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("wavefront") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront") seq_cst
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("wavefront-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("wavefront-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("wavefront-one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("wavefront-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("wavefront-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("wavefront-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("wavefront-one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("wavefront-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront-one-as") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront-one-as") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront-one-as") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront-one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("wavefront-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_wavefront_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @local_workgroup_unordered_load(
; GFX6-LABEL: local_workgroup_unordered_load:
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("workgroup") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("workgroup") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("workgroup") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("workgroup") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("workgroup") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("workgroup") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("workgroup") release, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("workgroup") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup") monotonic
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup") release
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup") acq_rel
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup") seq_cst
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: buffer_inv sc0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_unordered_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_unordered_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("workgroup-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_monotonic_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_monotonic_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("workgroup-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acquire_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("workgroup-one-as") acquire, align 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v1, v0
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_load:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_load:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: ds_read_b32 v0, v0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v1, v0
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %in, i32 addrspace(3)* %out) {
entry:
%val = load atomic i32, i32 addrspace(3)* %in syncscope("workgroup-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_unordered_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_unordered_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("workgroup-one-as") unordered, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_monotonic_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_monotonic_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("workgroup-one-as") monotonic, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_release_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_release_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("workgroup-one-as") release, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_store:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_store:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s1
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 %in, i32 addrspace(3)* %out) {
entry:
store atomic i32 %in, i32 addrspace(3)* %out syncscope("workgroup-one-as") seq_cst, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_monotonic_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_monotonic_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup-one-as") monotonic
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acquire_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup-one-as") acquire
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_release_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_release_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup-one-as") release
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
; GFX90A-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acquire_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup-one-as") acquire
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup-one-as") acq_rel
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_ret_atomicrmw:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_ret_atomicrmw:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s1
+; GFX940-TGSPLIT-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in) {
entry:
%val = atomicrmw volatile xchg i32 addrspace(3)* %out, i32 %in syncscope("workgroup-one-as") seq_cst
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_monotonic_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acquire_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_release_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_release_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_monotonic_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_monotonic_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_monotonic_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acquire_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_release_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_release_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_acquire_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_monotonic_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acquire_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_release_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_release_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
; GFX90A-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_seq_cst_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_b32 v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_monotonic_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acquire_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_release_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_monotonic_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_monotonic_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acquire_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_release_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_release_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_acquire_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_monotonic_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acquire_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_release_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_acq_rel_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: ds_write_b32 v0, v1
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-NOTTGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: local_workgroup_one_as_seq_cst_seq_cst_ret_cmpxchg:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s2
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v2, s1
+; GFX940-TGSPLIT-NEXT: ds_cmpst_rtn_b32 v1, v0, v1, v2 offset:16
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: ds_write_b32 v0, v1
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(3)* %out, i32 %in, i32 %old) {
entry:
%gep = getelementptr i32, i32 addrspace(3)* %out, i32 4
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations -verify-machineinstrs < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
define amdgpu_kernel void @private_nontemporal_load_0(
; GFX6-LABEL: private_nontemporal_load_0:
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v1, v0, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: private_nontemporal_load_0:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: scratch_load_dword v0, off, s4 nt
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v1, v0, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: private_nontemporal_load_0:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: scratch_load_dword v0, off, s4 nt
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v1, v0, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(5)* %in, i32 addrspace(1)* %out) {
entry:
%val = load i32, i32 addrspace(5)* %in, align 4, !nontemporal !0
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: global_store_dword v1, v0, s[0:1]
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: private_nontemporal_load_1:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, 0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_lshl_add_u32 v0, v0, 2, s4
+; GFX940-NOTTGSPLIT-NEXT: scratch_load_dword v0, v0, off nt
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: global_store_dword v1, v0, s[2:3]
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: private_nontemporal_load_1:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, 0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_lshl_add_u32 v0, v0, 2, s4
+; GFX940-TGSPLIT-NEXT: scratch_load_dword v0, v0, off nt
+; GFX940-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
+; GFX940-TGSPLIT-NEXT: global_store_dword v1, v0, s[2:3]
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(5)* %in, i32 addrspace(1)* %out) {
entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x()
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
; GFX90A-TGSPLIT-NEXT: buffer_store_dword v0, v1, s[8:11], 0 offen glc slc
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: private_nontemporal_store_0:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s0, s[2:3], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-NOTTGSPLIT-NEXT: scratch_store_dword off, v0, s4 nt
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: private_nontemporal_store_0:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: s_load_dword s0, s[2:3], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v0, s0
+; GFX940-TGSPLIT-NEXT: scratch_store_dword off, v0, s4 nt
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(5)* %out) {
entry:
%val = load i32, i32 addrspace(1)* %in, align 4
; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
; GFX90A-TGSPLIT-NEXT: buffer_store_dword v1, v0, s[8:11], 0 offen glc slc
; GFX90A-TGSPLIT-NEXT: s_endpgm
+;
+; GFX940-NOTTGSPLIT-LABEL: private_nontemporal_store_1:
+; GFX940-NOTTGSPLIT: ; %bb.0: ; %entry
+; GFX940-NOTTGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_lshl_add_u32 v0, v0, 2, s4
+; GFX940-NOTTGSPLIT-NEXT: s_load_dword s0, s[2:3], 0x0
+; GFX940-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-NOTTGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-NOTTGSPLIT-NEXT: scratch_store_dword v0, v1, off nt
+; GFX940-NOTTGSPLIT-NEXT: s_endpgm
+;
+; GFX940-TGSPLIT-LABEL: private_nontemporal_store_1:
+; GFX940-TGSPLIT: ; %bb.0: ; %entry
+; GFX940-TGSPLIT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
+; GFX940-TGSPLIT-NEXT: s_load_dword s4, s[0:1], 0x8
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_lshl_add_u32 v0, v0, 2, s4
+; GFX940-TGSPLIT-NEXT: s_load_dword s0, s[2:3], 0x0
+; GFX940-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
+; GFX940-TGSPLIT-NEXT: v_mov_b32_e32 v1, s0
+; GFX940-TGSPLIT-NEXT: scratch_store_dword v0, v1, off nt
+; GFX940-TGSPLIT-NEXT: s_endpgm
i32 addrspace(1)* %in, i32 addrspace(5)* %out) {
entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x()