Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel...
authorLinus Torvalds <torvalds@linux-foundation.org>
Fri, 24 Feb 2023 01:09:35 +0000 (17:09 -0800)
committerLinus Torvalds <torvalds@linux-foundation.org>
Fri, 24 Feb 2023 01:09:35 +0000 (17:09 -0800)
Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...

108 files changed:
1  2 
Documentation/admin-guide/cgroup-v1/memory.rst
Documentation/admin-guide/mm/damon/reclaim.rst
Documentation/admin-guide/mm/hugetlbpage.rst
Documentation/admin-guide/mm/idle_page_tracking.rst
Documentation/admin-guide/mm/numaperf.rst
Documentation/admin-guide/mm/pagemap.rst
Documentation/mm/balance.rst
Documentation/mm/highmem.rst
Documentation/mm/hugetlbfs_reserv.rst
Documentation/mm/page_owner.rst
Documentation/mm/slub.rst
Documentation/mm/transhuge.rst
Documentation/mm/unevictable-lru.rst
Documentation/mm/zsmalloc.rst
Documentation/translations/zh_CN/mm/hugetlbfs_reserv.rst
Documentation/translations/zh_CN/mm/page_owner.rst
MAINTAINERS
arch/arm/kernel/process.c
arch/arm64/include/asm/pgtable.h
arch/riscv/include/asm/pgtable.h
arch/s390/include/asm/pgtable.h
arch/x86/entry/vdso/vma.c
arch/x86/mm/pat/memtype.c
drivers/accel/habanalabs/common/memory.c
drivers/accel/habanalabs/gaudi/gaudi.c
drivers/accel/habanalabs/gaudi2/gaudi2.c
drivers/accel/habanalabs/goya/goya.c
drivers/accel/ivpu/ivpu_gem.c
drivers/block/brd.c
drivers/block/zram/zram_drv.c
drivers/crypto/hisilicon/qm.c
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
drivers/gpu/drm/amd/amdkfd/kfd_process.c
drivers/gpu/drm/drm_gem.c
drivers/gpu/drm/drm_gem_dma_helper.c
drivers/gpu/drm/drm_gem_shmem_helper.c
drivers/gpu/drm/gma500/framebuffer.c
drivers/gpu/drm/i915/gem/i915_gem_mman.c
drivers/gpu/drm/mediatek/mtk_drm_gem.c
drivers/gpu/drm/omapdrm/omap_gem.c
drivers/gpu/drm/ttm/ttm_bo_vm.c
drivers/infiniband/hw/hfi1/file_ops.c
drivers/infiniband/hw/mlx5/main.c
drivers/video/fbdev/core/fb_defio.c
fs/afs/write.c
fs/btrfs/extent_io.c
fs/buffer.c
fs/ceph/addr.c
fs/cifs/file.c
fs/coredump.c
fs/erofs/data.c
fs/exec.c
fs/ext4/inode.c
fs/ext4/super.c
fs/f2fs/data.c
fs/fuse/file.c
fs/gfs2/aops.c
fs/gfs2/glops.c
fs/gfs2/log.c
fs/hugetlbfs/inode.c
fs/iomap/buffered-io.c
fs/mpage.c
fs/nfs/write.c
fs/ntfs3/inode.c
fs/orangefs/file.c
fs/orangefs/inode.c
fs/ramfs/file-nommu.c
fs/udf/inode.c
fs/xfs/xfs_file.c
include/linux/blkdev.h
include/linux/fs.h
include/linux/hugetlb.h
include/linux/memcontrol.h
include/linux/mm.h
include/linux/mm_types.h
include/linux/pagemap.h
init/main.c
io_uring/io_uring.c
kernel/bpf/syscall.c
kernel/events/core.c
kernel/fork.c
kernel/pid_namespace.c
kernel/sched/fair.c
kernel/sys.c
lib/Kconfig.debug
mm/compaction.c
mm/filemap.c
mm/huge_memory.c
mm/internal.h
mm/kasan/kasan.h
mm/khugepaged.c
mm/madvise.c
mm/memcontrol.c
mm/migrate.c
mm/page_alloc.c
mm/page_io.c
mm/secretmem.c
mm/shmem.c
mm/slab.c
mm/slub.c
mm/swap.c
mm/swapfile.c
mm/vmalloc.c
net/ipv4/tcp.c
tools/objtool/check.c
tools/testing/selftests/Makefile
tools/testing/selftests/mm/Makefile

index 27d89495ac880a5acc43b97ab5f76c483373a81d,258e45cc3b2db1866e005fbe2e7da09f5ed67195..47d1d7d932a82be09b072854ee8de609823e4fe0
@@@ -725,10 -719,15 +727,17 @@@ If we want to change this to 1G, we ca
         It is recommended to set the soft limit always below the hard limit,
         otherwise the hard limit will take precedence.
  
- 8. Move charges at task migration
- =================================
 +.. _cgroup-v1-memory-move-charges:
 +
+ 8. Move charges at task migration (DEPRECATED!)
+ ===============================================
+ THIS IS DEPRECATED!
+ It's expensive and unreliable! It's better practice to launch workload
+ tasks directly from inside their target cgroup. Use dedicated workload
+ cgroups to allow fine-grained policy adjustments without having to
+ move physical pages between control domains.
  
  Users can move charges associated with a task along with task migration, that
  is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
index 24e63e740420c34c9832dcd7bd53b0de4c0dcd63,544a6d16c80152c4bd80496c021cde58d31153b0..90a12b6a8bfc0880bf526e1a39cbc62747379ea2
@@@ -1,4 -1,9 +1,7 @@@
- =============
 -.. _numaperf:
 -
+ =======================
+ NUMA Memory Performance
+ =======================
  NUMA Locality
  =============
  
Simple merge
Simple merge
Simple merge
Simple merge
Simple merge
Simple merge
Simple merge
Simple merge
diff --cc MAINTAINERS
Simple merge
Simple merge
Simple merge
Simple merge
Simple merge
Simple merge
Simple merge
index e6474d38afc49187f92e041f8fdff6c592e9e558,0000000000000000000000000000000000000000..761a47e89b005a80ed92a0889f5ce0a62d1850bf
mode 100644,000000..100644
--- /dev/null
@@@ -1,3003 -1,0 +1,3003 @@@
-       vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY | VM_NORESERVE;
 +// SPDX-License-Identifier: GPL-2.0
 +
 +/*
 + * Copyright 2016-2022 HabanaLabs, Ltd.
 + * All Rights Reserved.
 + */
 +
 +#include <uapi/drm/habanalabs_accel.h>
 +#include "habanalabs.h"
 +#include "../include/hw_ip/mmu/mmu_general.h"
 +
 +#include <linux/uaccess.h>
 +#include <linux/slab.h>
 +#include <linux/vmalloc.h>
 +#include <linux/pci-p2pdma.h>
 +
 +MODULE_IMPORT_NS(DMA_BUF);
 +
 +#define HL_MMU_DEBUG  0
 +
 +/* use small pages for supporting non-pow2 (32M/40M/48M) DRAM phys page sizes */
 +#define DRAM_POOL_PAGE_SIZE   SZ_8M
 +
 +#define MEM_HANDLE_INVALID    ULONG_MAX
 +
 +static int allocate_timestamps_buffers(struct hl_fpriv *hpriv,
 +                      struct hl_mem_in *args, u64 *handle);
 +
 +static int set_alloc_page_size(struct hl_device *hdev, struct hl_mem_in *args, u32 *page_size)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 psize;
 +
 +      /*
 +       * for ASIC that supports setting the allocation page size by user we will address
 +       * user's choice only if it is not 0 (as 0 means taking the default page size)
 +       */
 +      if (prop->supports_user_set_page_size && args->alloc.page_size) {
 +              psize = args->alloc.page_size;
 +
 +              if (!is_power_of_2(psize)) {
 +                      dev_err(hdev->dev, "user page size (%#llx) is not power of 2\n", psize);
 +                      return -EINVAL;
 +              }
 +      } else {
 +              psize = prop->device_mem_alloc_default_page_size;
 +      }
 +
 +      *page_size = psize;
 +
 +      return 0;
 +}
 +
 +/*
 + * The va ranges in context object contain a list with the available chunks of
 + * device virtual memory.
 + * There is one range for host allocations and one for DRAM allocations.
 + *
 + * On initialization each range contains one chunk of all of its available
 + * virtual range which is a half of the total device virtual range.
 + *
 + * On each mapping of physical pages, a suitable virtual range chunk (with a
 + * minimum size) is selected from the list. If the chunk size equals the
 + * requested size, the chunk is returned. Otherwise, the chunk is split into
 + * two chunks - one to return as result and a remainder to stay in the list.
 + *
 + * On each Unmapping of a virtual address, the relevant virtual chunk is
 + * returned to the list. The chunk is added to the list and if its edges match
 + * the edges of the adjacent chunks (means a contiguous chunk can be created),
 + * the chunks are merged.
 + *
 + * On finish, the list is checked to have only one chunk of all the relevant
 + * virtual range (which is a half of the device total virtual range).
 + * If not (means not all mappings were unmapped), a warning is printed.
 + */
 +
 +/*
 + * alloc_device_memory() - allocate device memory.
 + * @ctx: pointer to the context structure.
 + * @args: host parameters containing the requested size.
 + * @ret_handle: result handle.
 + *
 + * This function does the following:
 + * - Allocate the requested size rounded up to 'dram_page_size' pages.
 + * - Return unique handle for later map/unmap/free.
 + */
 +static int alloc_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args,
 +                              u32 *ret_handle)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      struct hl_vm *vm = &hdev->vm;
 +      struct hl_vm_phys_pg_pack *phys_pg_pack;
 +      u64 paddr = 0, total_size, num_pgs, i;
 +      u32 num_curr_pgs, page_size;
 +      bool contiguous;
 +      int handle, rc;
 +
 +      num_curr_pgs = 0;
 +
 +      rc = set_alloc_page_size(hdev, args, &page_size);
 +      if (rc)
 +              return rc;
 +
 +      num_pgs = DIV_ROUND_UP_ULL(args->alloc.mem_size, page_size);
 +      total_size = num_pgs * page_size;
 +
 +      if (!total_size) {
 +              dev_err(hdev->dev, "Cannot allocate 0 bytes\n");
 +              return -EINVAL;
 +      }
 +
 +      contiguous = args->flags & HL_MEM_CONTIGUOUS;
 +
 +      if (contiguous) {
 +              if (is_power_of_2(page_size))
 +                      paddr = (uintptr_t) gen_pool_dma_alloc_align(vm->dram_pg_pool,
 +                                                                   total_size, NULL, page_size);
 +              else
 +                      paddr = gen_pool_alloc(vm->dram_pg_pool, total_size);
 +              if (!paddr) {
 +                      dev_err(hdev->dev,
 +                              "Cannot allocate %llu contiguous pages with total size of %llu\n",
 +                              num_pgs, total_size);
 +                      return -ENOMEM;
 +              }
 +      }
 +
 +      phys_pg_pack = kzalloc(sizeof(*phys_pg_pack), GFP_KERNEL);
 +      if (!phys_pg_pack) {
 +              rc = -ENOMEM;
 +              goto pages_pack_err;
 +      }
 +
 +      phys_pg_pack->vm_type = VM_TYPE_PHYS_PACK;
 +      phys_pg_pack->asid = ctx->asid;
 +      phys_pg_pack->npages = num_pgs;
 +      phys_pg_pack->page_size = page_size;
 +      phys_pg_pack->total_size = total_size;
 +      phys_pg_pack->flags = args->flags;
 +      phys_pg_pack->contiguous = contiguous;
 +
 +      phys_pg_pack->pages = kvmalloc_array(num_pgs, sizeof(u64), GFP_KERNEL);
 +      if (ZERO_OR_NULL_PTR(phys_pg_pack->pages)) {
 +              rc = -ENOMEM;
 +              goto pages_arr_err;
 +      }
 +
 +      if (phys_pg_pack->contiguous) {
 +              for (i = 0 ; i < num_pgs ; i++)
 +                      phys_pg_pack->pages[i] = paddr + i * page_size;
 +      } else {
 +              for (i = 0 ; i < num_pgs ; i++) {
 +                      if (is_power_of_2(page_size))
 +                              phys_pg_pack->pages[i] =
 +                                      (uintptr_t)gen_pool_dma_alloc_align(vm->dram_pg_pool,
 +                                                                          page_size, NULL,
 +                                                                          page_size);
 +                      else
 +                              phys_pg_pack->pages[i] = gen_pool_alloc(vm->dram_pg_pool,
 +                                                                      page_size);
 +
 +                      if (!phys_pg_pack->pages[i]) {
 +                              dev_err(hdev->dev,
 +                                      "Cannot allocate device memory (out of memory)\n");
 +                              rc = -ENOMEM;
 +                              goto page_err;
 +                      }
 +
 +                      num_curr_pgs++;
 +              }
 +      }
 +
 +      spin_lock(&vm->idr_lock);
 +      handle = idr_alloc(&vm->phys_pg_pack_handles, phys_pg_pack, 1, 0,
 +                              GFP_ATOMIC);
 +      spin_unlock(&vm->idr_lock);
 +
 +      if (handle < 0) {
 +              dev_err(hdev->dev, "Failed to get handle for page\n");
 +              rc = -EFAULT;
 +              goto idr_err;
 +      }
 +
 +      for (i = 0 ; i < num_pgs ; i++)
 +              kref_get(&vm->dram_pg_pool_refcount);
 +
 +      phys_pg_pack->handle = handle;
 +
 +      atomic64_add(phys_pg_pack->total_size, &ctx->dram_phys_mem);
 +      atomic64_add(phys_pg_pack->total_size, &hdev->dram_used_mem);
 +
 +      *ret_handle = handle;
 +
 +      return 0;
 +
 +idr_err:
 +page_err:
 +      if (!phys_pg_pack->contiguous)
 +              for (i = 0 ; i < num_curr_pgs ; i++)
 +                      gen_pool_free(vm->dram_pg_pool, phys_pg_pack->pages[i],
 +                                      page_size);
 +
 +      kvfree(phys_pg_pack->pages);
 +pages_arr_err:
 +      kfree(phys_pg_pack);
 +pages_pack_err:
 +      if (contiguous)
 +              gen_pool_free(vm->dram_pg_pool, paddr, total_size);
 +
 +      return rc;
 +}
 +
 +/**
 + * dma_map_host_va() - DMA mapping of the given host virtual address.
 + * @hdev: habanalabs device structure.
 + * @addr: the host virtual address of the memory area.
 + * @size: the size of the memory area.
 + * @p_userptr: pointer to result userptr structure.
 + *
 + * This function does the following:
 + * - Allocate userptr structure.
 + * - Pin the given host memory using the userptr structure.
 + * - Perform DMA mapping to have the DMA addresses of the pages.
 + */
 +static int dma_map_host_va(struct hl_device *hdev, u64 addr, u64 size,
 +                              struct hl_userptr **p_userptr)
 +{
 +      struct hl_userptr *userptr;
 +      int rc;
 +
 +      userptr = kzalloc(sizeof(*userptr), GFP_KERNEL);
 +      if (!userptr) {
 +              rc = -ENOMEM;
 +              goto userptr_err;
 +      }
 +
 +      rc = hl_pin_host_memory(hdev, addr, size, userptr);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to pin host memory\n");
 +              goto pin_err;
 +      }
 +
 +      userptr->dma_mapped = true;
 +      userptr->dir = DMA_BIDIRECTIONAL;
 +      userptr->vm_type = VM_TYPE_USERPTR;
 +
 +      *p_userptr = userptr;
 +
 +      rc = hdev->asic_funcs->asic_dma_map_sgtable(hdev, userptr->sgt, DMA_BIDIRECTIONAL);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to map sgt with DMA region\n");
 +              goto dma_map_err;
 +      }
 +
 +      return 0;
 +
 +dma_map_err:
 +      hl_unpin_host_memory(hdev, userptr);
 +pin_err:
 +      kfree(userptr);
 +userptr_err:
 +
 +      return rc;
 +}
 +
 +/**
 + * dma_unmap_host_va() - DMA unmapping of the given host virtual address.
 + * @hdev: habanalabs device structure.
 + * @userptr: userptr to free.
 + *
 + * This function does the following:
 + * - Unpins the physical pages.
 + * - Frees the userptr structure.
 + */
 +static void dma_unmap_host_va(struct hl_device *hdev,
 +                              struct hl_userptr *userptr)
 +{
 +      hl_unpin_host_memory(hdev, userptr);
 +      kfree(userptr);
 +}
 +
 +/**
 + * dram_pg_pool_do_release() - free DRAM pages pool
 + * @ref: pointer to reference object.
 + *
 + * This function does the following:
 + * - Frees the idr structure of physical pages handles.
 + * - Frees the generic pool of DRAM physical pages.
 + */
 +static void dram_pg_pool_do_release(struct kref *ref)
 +{
 +      struct hl_vm *vm = container_of(ref, struct hl_vm,
 +                      dram_pg_pool_refcount);
 +
 +      /*
 +       * free the idr here as only here we know for sure that there are no
 +       * allocated physical pages and hence there are no handles in use
 +       */
 +      idr_destroy(&vm->phys_pg_pack_handles);
 +      gen_pool_destroy(vm->dram_pg_pool);
 +}
 +
 +/**
 + * free_phys_pg_pack() - free physical page pack.
 + * @hdev: habanalabs device structure.
 + * @phys_pg_pack: physical page pack to free.
 + *
 + * This function does the following:
 + * - For DRAM memory only
 + *   - iterate over the pack, free each physical block structure by
 + *     returning it to the general pool.
 + * - Free the hl_vm_phys_pg_pack structure.
 + */
 +static void free_phys_pg_pack(struct hl_device *hdev,
 +                              struct hl_vm_phys_pg_pack *phys_pg_pack)
 +{
 +      struct hl_vm *vm = &hdev->vm;
 +      u64 i;
 +
 +      if (phys_pg_pack->created_from_userptr)
 +              goto end;
 +
 +      if (phys_pg_pack->contiguous) {
 +              gen_pool_free(vm->dram_pg_pool, phys_pg_pack->pages[0],
 +                      phys_pg_pack->total_size);
 +
 +              for (i = 0; i < phys_pg_pack->npages ; i++)
 +                      kref_put(&vm->dram_pg_pool_refcount,
 +                              dram_pg_pool_do_release);
 +      } else {
 +              for (i = 0 ; i < phys_pg_pack->npages ; i++) {
 +                      gen_pool_free(vm->dram_pg_pool,
 +                              phys_pg_pack->pages[i],
 +                              phys_pg_pack->page_size);
 +                      kref_put(&vm->dram_pg_pool_refcount,
 +                              dram_pg_pool_do_release);
 +              }
 +      }
 +
 +end:
 +      kvfree(phys_pg_pack->pages);
 +      kfree(phys_pg_pack);
 +
 +      return;
 +}
 +
 +/**
 + * free_device_memory() - free device memory.
 + * @ctx: pointer to the context structure.
 + * @args: host parameters containing the requested size.
 + *
 + * This function does the following:
 + * - Free the device memory related to the given handle.
 + */
 +static int free_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      struct hl_vm *vm = &hdev->vm;
 +      struct hl_vm_phys_pg_pack *phys_pg_pack;
 +      u32 handle = args->free.handle;
 +
 +      spin_lock(&vm->idr_lock);
 +      phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, handle);
 +      if (!phys_pg_pack) {
 +              spin_unlock(&vm->idr_lock);
 +              dev_err(hdev->dev, "free device memory failed, no match for handle %u\n", handle);
 +              return -EINVAL;
 +      }
 +
 +      if (atomic_read(&phys_pg_pack->mapping_cnt) > 0) {
 +              spin_unlock(&vm->idr_lock);
 +              dev_err(hdev->dev, "handle %u is mapped, cannot free\n", handle);
 +              return -EINVAL;
 +      }
 +
 +      /* must remove from idr before the freeing of the physical pages as the refcount of the pool
 +       * is also the trigger of the idr destroy
 +       */
 +      idr_remove(&vm->phys_pg_pack_handles, handle);
 +      spin_unlock(&vm->idr_lock);
 +
 +      atomic64_sub(phys_pg_pack->total_size, &ctx->dram_phys_mem);
 +      atomic64_sub(phys_pg_pack->total_size, &hdev->dram_used_mem);
 +
 +      free_phys_pg_pack(hdev, phys_pg_pack);
 +
 +      return 0;
 +}
 +
 +/**
 + * clear_va_list_locked() - free virtual addresses list.
 + * @hdev: habanalabs device structure.
 + * @va_list: list of virtual addresses to free.
 + *
 + * This function does the following:
 + * - Iterate over the list and free each virtual addresses block.
 + *
 + * This function should be called only when va_list lock is taken.
 + */
 +static void clear_va_list_locked(struct hl_device *hdev,
 +              struct list_head *va_list)
 +{
 +      struct hl_vm_va_block *va_block, *tmp;
 +
 +      list_for_each_entry_safe(va_block, tmp, va_list, node) {
 +              list_del(&va_block->node);
 +              kfree(va_block);
 +      }
 +}
 +
 +/**
 + * print_va_list_locked() - print virtual addresses list.
 + * @hdev: habanalabs device structure.
 + * @va_list: list of virtual addresses to print.
 + *
 + * This function does the following:
 + * - Iterate over the list and print each virtual addresses block.
 + *
 + * This function should be called only when va_list lock is taken.
 + */
 +static void print_va_list_locked(struct hl_device *hdev,
 +              struct list_head *va_list)
 +{
 +#if HL_MMU_DEBUG
 +      struct hl_vm_va_block *va_block;
 +
 +      dev_dbg(hdev->dev, "print va list:\n");
 +
 +      list_for_each_entry(va_block, va_list, node)
 +              dev_dbg(hdev->dev,
 +                      "va block, start: 0x%llx, end: 0x%llx, size: %llu\n",
 +                      va_block->start, va_block->end, va_block->size);
 +#endif
 +}
 +
 +/**
 + * merge_va_blocks_locked() - merge a virtual block if possible.
 + * @hdev: pointer to the habanalabs device structure.
 + * @va_list: pointer to the virtual addresses block list.
 + * @va_block: virtual block to merge with adjacent blocks.
 + *
 + * This function does the following:
 + * - Merge the given blocks with the adjacent blocks if their virtual ranges
 + *   create a contiguous virtual range.
 + *
 + * This Function should be called only when va_list lock is taken.
 + */
 +static void merge_va_blocks_locked(struct hl_device *hdev,
 +              struct list_head *va_list, struct hl_vm_va_block *va_block)
 +{
 +      struct hl_vm_va_block *prev, *next;
 +
 +      prev = list_prev_entry(va_block, node);
 +      if (&prev->node != va_list && prev->end + 1 == va_block->start) {
 +              prev->end = va_block->end;
 +              prev->size = prev->end - prev->start + 1;
 +              list_del(&va_block->node);
 +              kfree(va_block);
 +              va_block = prev;
 +      }
 +
 +      next = list_next_entry(va_block, node);
 +      if (&next->node != va_list && va_block->end + 1 == next->start) {
 +              next->start = va_block->start;
 +              next->size = next->end - next->start + 1;
 +              list_del(&va_block->node);
 +              kfree(va_block);
 +      }
 +}
 +
 +/**
 + * add_va_block_locked() - add a virtual block to the virtual addresses list.
 + * @hdev: pointer to the habanalabs device structure.
 + * @va_list: pointer to the virtual addresses block list.
 + * @start: start virtual address.
 + * @end: end virtual address.
 + *
 + * This function does the following:
 + * - Add the given block to the virtual blocks list and merge with other blocks
 + *   if a contiguous virtual block can be created.
 + *
 + * This Function should be called only when va_list lock is taken.
 + */
 +static int add_va_block_locked(struct hl_device *hdev,
 +              struct list_head *va_list, u64 start, u64 end)
 +{
 +      struct hl_vm_va_block *va_block, *res = NULL;
 +      u64 size = end - start + 1;
 +
 +      print_va_list_locked(hdev, va_list);
 +
 +      list_for_each_entry(va_block, va_list, node) {
 +              /* TODO: remove upon matureness */
 +              if (hl_mem_area_crosses_range(start, size, va_block->start,
 +                              va_block->end)) {
 +                      dev_err(hdev->dev,
 +                              "block crossing ranges at start 0x%llx, end 0x%llx\n",
 +                              va_block->start, va_block->end);
 +                      return -EINVAL;
 +              }
 +
 +              if (va_block->end < start)
 +                      res = va_block;
 +      }
 +
 +      va_block = kmalloc(sizeof(*va_block), GFP_KERNEL);
 +      if (!va_block)
 +              return -ENOMEM;
 +
 +      va_block->start = start;
 +      va_block->end = end;
 +      va_block->size = size;
 +
 +      if (!res)
 +              list_add(&va_block->node, va_list);
 +      else
 +              list_add(&va_block->node, &res->node);
 +
 +      merge_va_blocks_locked(hdev, va_list, va_block);
 +
 +      print_va_list_locked(hdev, va_list);
 +
 +      return 0;
 +}
 +
 +/**
 + * add_va_block() - wrapper for add_va_block_locked.
 + * @hdev: pointer to the habanalabs device structure.
 + * @va_range: pointer to the virtual addresses range object.
 + * @start: start virtual address.
 + * @end: end virtual address.
 + *
 + * This function does the following:
 + * - Takes the list lock and calls add_va_block_locked.
 + */
 +static inline int add_va_block(struct hl_device *hdev,
 +              struct hl_va_range *va_range, u64 start, u64 end)
 +{
 +      int rc;
 +
 +      mutex_lock(&va_range->lock);
 +      rc = add_va_block_locked(hdev, &va_range->list, start, end);
 +      mutex_unlock(&va_range->lock);
 +
 +      return rc;
 +}
 +
 +/**
 + * is_hint_crossing_range() - check if hint address crossing specified reserved.
 + * @range_type: virtual space range type.
 + * @start_addr: start virtual address.
 + * @size: block size.
 + * @prop: asic properties structure to retrieve reserved ranges from.
 + */
 +static inline bool is_hint_crossing_range(enum hl_va_range_type range_type,
 +              u64 start_addr, u32 size, struct asic_fixed_properties *prop) {
 +      bool range_cross;
 +
 +      if (range_type == HL_VA_RANGE_TYPE_DRAM)
 +              range_cross =
 +                      hl_mem_area_crosses_range(start_addr, size,
 +                      prop->hints_dram_reserved_va_range.start_addr,
 +                      prop->hints_dram_reserved_va_range.end_addr);
 +      else if (range_type == HL_VA_RANGE_TYPE_HOST)
 +              range_cross =
 +                      hl_mem_area_crosses_range(start_addr,   size,
 +                      prop->hints_host_reserved_va_range.start_addr,
 +                      prop->hints_host_reserved_va_range.end_addr);
 +      else
 +              range_cross =
 +                      hl_mem_area_crosses_range(start_addr, size,
 +                      prop->hints_host_hpage_reserved_va_range.start_addr,
 +                      prop->hints_host_hpage_reserved_va_range.end_addr);
 +
 +      return range_cross;
 +}
 +
 +/**
 + * get_va_block() - get a virtual block for the given size and alignment.
 + *
 + * @hdev: pointer to the habanalabs device structure.
 + * @va_range: pointer to the virtual addresses range.
 + * @size: requested block size.
 + * @hint_addr: hint for requested address by the user.
 + * @va_block_align: required alignment of the virtual block start address.
 + * @range_type: va range type (host, dram)
 + * @flags: additional memory flags, currently only uses HL_MEM_FORCE_HINT
 + *
 + * This function does the following:
 + * - Iterate on the virtual block list to find a suitable virtual block for the
 + *   given size, hint address and alignment.
 + * - Reserve the requested block and update the list.
 + * - Return the start address of the virtual block.
 + */
 +static u64 get_va_block(struct hl_device *hdev,
 +                              struct hl_va_range *va_range,
 +                              u64 size, u64 hint_addr, u32 va_block_align,
 +                              enum hl_va_range_type range_type,
 +                              u32 flags)
 +{
 +      struct hl_vm_va_block *va_block, *new_va_block = NULL;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 tmp_hint_addr, valid_start, valid_size, prev_start, prev_end,
 +              align_mask, reserved_valid_start = 0, reserved_valid_size = 0,
 +              dram_hint_mask = prop->dram_hints_align_mask;
 +      bool add_prev = false;
 +      bool is_align_pow_2  = is_power_of_2(va_range->page_size);
 +      bool is_hint_dram_addr = hl_is_dram_va(hdev, hint_addr);
 +      bool force_hint = flags & HL_MEM_FORCE_HINT;
 +
 +      if (is_align_pow_2)
 +              align_mask = ~((u64)va_block_align - 1);
 +      else
 +              /*
 +               * with non-power-of-2 range we work only with page granularity
 +               * and the start address is page aligned,
 +               * so no need for alignment checking.
 +               */
 +              size = DIV_ROUND_UP_ULL(size, va_range->page_size) *
 +                                                      va_range->page_size;
 +
 +      tmp_hint_addr = hint_addr & ~dram_hint_mask;
 +
 +      /* Check if we need to ignore hint address */
 +      if ((is_align_pow_2 && (hint_addr & (va_block_align - 1))) ||
 +                      (!is_align_pow_2 && is_hint_dram_addr &&
 +                      do_div(tmp_hint_addr, va_range->page_size))) {
 +
 +              if (force_hint) {
 +                      /* Hint must be respected, so here we just fail */
 +                      dev_err(hdev->dev,
 +                              "Hint address 0x%llx is not page aligned - cannot be respected\n",
 +                              hint_addr);
 +                      return 0;
 +              }
 +
 +              dev_dbg(hdev->dev,
 +                      "Hint address 0x%llx will be ignored because it is not aligned\n",
 +                      hint_addr);
 +              hint_addr = 0;
 +      }
 +
 +      mutex_lock(&va_range->lock);
 +
 +      print_va_list_locked(hdev, &va_range->list);
 +
 +      list_for_each_entry(va_block, &va_range->list, node) {
 +              /* Calc the first possible aligned addr */
 +              valid_start = va_block->start;
 +
 +              if (is_align_pow_2 && (valid_start & (va_block_align - 1))) {
 +                      valid_start &= align_mask;
 +                      valid_start += va_block_align;
 +                      if (valid_start > va_block->end)
 +                              continue;
 +              }
 +
 +              valid_size = va_block->end - valid_start + 1;
 +              if (valid_size < size)
 +                      continue;
 +
 +              /*
 +               * In case hint address is 0, and hints_range_reservation
 +               * property enabled, then avoid allocating va blocks from the
 +               * range reserved for hint addresses
 +               */
 +              if (prop->hints_range_reservation && !hint_addr)
 +                      if (is_hint_crossing_range(range_type, valid_start,
 +                                      size, prop))
 +                              continue;
 +
 +              /* Pick the minimal length block which has the required size */
 +              if (!new_va_block || (valid_size < reserved_valid_size)) {
 +                      new_va_block = va_block;
 +                      reserved_valid_start = valid_start;
 +                      reserved_valid_size = valid_size;
 +              }
 +
 +              if (hint_addr && hint_addr >= valid_start &&
 +                                      (hint_addr + size) <= va_block->end) {
 +                      new_va_block = va_block;
 +                      reserved_valid_start = hint_addr;
 +                      reserved_valid_size = valid_size;
 +                      break;
 +              }
 +      }
 +
 +      if (!new_va_block) {
 +              dev_err(hdev->dev, "no available va block for size %llu\n",
 +                                                              size);
 +              goto out;
 +      }
 +
 +      if (force_hint && reserved_valid_start != hint_addr) {
 +              /* Hint address must be respected. If we are here - this means
 +               * we could not respect it.
 +               */
 +              dev_err(hdev->dev,
 +                      "Hint address 0x%llx could not be respected\n",
 +                      hint_addr);
 +              reserved_valid_start = 0;
 +              goto out;
 +      }
 +
 +      /*
 +       * Check if there is some leftover range due to reserving the new
 +       * va block, then return it to the main virtual addresses list.
 +       */
 +      if (reserved_valid_start > new_va_block->start) {
 +              prev_start = new_va_block->start;
 +              prev_end = reserved_valid_start - 1;
 +
 +              new_va_block->start = reserved_valid_start;
 +              new_va_block->size = reserved_valid_size;
 +
 +              add_prev = true;
 +      }
 +
 +      if (new_va_block->size > size) {
 +              new_va_block->start += size;
 +              new_va_block->size = new_va_block->end - new_va_block->start + 1;
 +      } else {
 +              list_del(&new_va_block->node);
 +              kfree(new_va_block);
 +      }
 +
 +      if (add_prev)
 +              add_va_block_locked(hdev, &va_range->list, prev_start,
 +                              prev_end);
 +
 +      print_va_list_locked(hdev, &va_range->list);
 +out:
 +      mutex_unlock(&va_range->lock);
 +
 +      return reserved_valid_start;
 +}
 +
 +/*
 + * hl_reserve_va_block() - reserve a virtual block of a given size.
 + * @hdev: pointer to the habanalabs device structure.
 + * @ctx: current context
 + * @type: virtual addresses range type.
 + * @size: requested block size.
 + * @alignment: required alignment in bytes of the virtual block start address,
 + *             0 means no alignment.
 + *
 + * This function does the following:
 + * - Iterate on the virtual block list to find a suitable virtual block for the
 + *   given size and alignment.
 + * - Reserve the requested block and update the list.
 + * - Return the start address of the virtual block.
 + */
 +u64 hl_reserve_va_block(struct hl_device *hdev, struct hl_ctx *ctx,
 +              enum hl_va_range_type type, u64 size, u32 alignment)
 +{
 +      return get_va_block(hdev, ctx->va_range[type], size, 0,
 +                      max(alignment, ctx->va_range[type]->page_size),
 +                      type, 0);
 +}
 +
 +/**
 + * hl_get_va_range_type() - get va_range type for the given address and size.
 + * @ctx: context to fetch va_range from.
 + * @address: the start address of the area we want to validate.
 + * @size: the size in bytes of the area we want to validate.
 + * @type: returned va_range type.
 + *
 + * Return: true if the area is inside a valid range, false otherwise.
 + */
 +static int hl_get_va_range_type(struct hl_ctx *ctx, u64 address, u64 size,
 +                      enum hl_va_range_type *type)
 +{
 +      int i;
 +
 +      for (i = 0 ; i < HL_VA_RANGE_TYPE_MAX; i++) {
 +              if (hl_mem_area_inside_range(address, size,
 +                              ctx->va_range[i]->start_addr,
 +                              ctx->va_range[i]->end_addr)) {
 +                      *type = i;
 +                      return 0;
 +              }
 +      }
 +
 +      return -EINVAL;
 +}
 +
 +/**
 + * hl_unreserve_va_block() - wrapper for add_va_block to unreserve a va block.
 + * @hdev: pointer to the habanalabs device structure
 + * @ctx: pointer to the context structure.
 + * @start_addr: start virtual address.
 + * @size: number of bytes to unreserve.
 + *
 + * This function does the following:
 + * - Takes the list lock and calls add_va_block_locked.
 + */
 +int hl_unreserve_va_block(struct hl_device *hdev, struct hl_ctx *ctx,
 +              u64 start_addr, u64 size)
 +{
 +      enum hl_va_range_type type;
 +      int rc;
 +
 +      rc = hl_get_va_range_type(ctx, start_addr, size, &type);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "cannot find va_range for va %#llx size %llu",
 +                      start_addr, size);
 +              return rc;
 +      }
 +
 +      rc = add_va_block(hdev, ctx->va_range[type], start_addr,
 +                                              start_addr + size - 1);
 +      if (rc)
 +              dev_warn(hdev->dev,
 +                      "add va block failed for vaddr: 0x%llx\n", start_addr);
 +
 +      return rc;
 +}
 +
 +/**
 + * init_phys_pg_pack_from_userptr() - initialize physical page pack from host
 + *                                    memory
 + * @ctx: pointer to the context structure.
 + * @userptr: userptr to initialize from.
 + * @pphys_pg_pack: result pointer.
 + * @force_regular_page: tell the function to ignore huge page optimization,
 + *                      even if possible. Needed for cases where the device VA
 + *                      is allocated before we know the composition of the
 + *                      physical pages
 + *
 + * This function does the following:
 + * - Pin the physical pages related to the given virtual block.
 + * - Create a physical page pack from the physical pages related to the given
 + *   virtual block.
 + */
 +static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx,
 +                              struct hl_userptr *userptr,
 +                              struct hl_vm_phys_pg_pack **pphys_pg_pack,
 +                              bool force_regular_page)
 +{
 +      u32 npages, page_size = PAGE_SIZE,
 +              huge_page_size = ctx->hdev->asic_prop.pmmu_huge.page_size;
 +      u32 pgs_in_huge_page = huge_page_size >> __ffs(page_size);
 +      struct hl_vm_phys_pg_pack *phys_pg_pack;
 +      bool first = true, is_huge_page_opt;
 +      u64 page_mask, total_npages;
 +      struct scatterlist *sg;
 +      dma_addr_t dma_addr;
 +      int rc, i, j;
 +
 +      phys_pg_pack = kzalloc(sizeof(*phys_pg_pack), GFP_KERNEL);
 +      if (!phys_pg_pack)
 +              return -ENOMEM;
 +
 +      phys_pg_pack->vm_type = userptr->vm_type;
 +      phys_pg_pack->created_from_userptr = true;
 +      phys_pg_pack->asid = ctx->asid;
 +      atomic_set(&phys_pg_pack->mapping_cnt, 1);
 +
 +      is_huge_page_opt = (force_regular_page ? false : true);
 +
 +      /* Only if all dma_addrs are aligned to 2MB and their
 +       * sizes is at least 2MB, we can use huge page mapping.
 +       * We limit the 2MB optimization to this condition,
 +       * since later on we acquire the related VA range as one
 +       * consecutive block.
 +       */
 +      total_npages = 0;
 +      for_each_sgtable_dma_sg(userptr->sgt, sg, i) {
 +              npages = hl_get_sg_info(sg, &dma_addr);
 +
 +              total_npages += npages;
 +
 +              if ((npages % pgs_in_huge_page) ||
 +                                      (dma_addr & (huge_page_size - 1)))
 +                      is_huge_page_opt = false;
 +      }
 +
 +      if (is_huge_page_opt) {
 +              page_size = huge_page_size;
 +              do_div(total_npages, pgs_in_huge_page);
 +      }
 +
 +      page_mask = ~(((u64) page_size) - 1);
 +
 +      phys_pg_pack->pages = kvmalloc_array(total_npages, sizeof(u64),
 +                                              GFP_KERNEL);
 +      if (ZERO_OR_NULL_PTR(phys_pg_pack->pages)) {
 +              rc = -ENOMEM;
 +              goto page_pack_arr_mem_err;
 +      }
 +
 +      phys_pg_pack->npages = total_npages;
 +      phys_pg_pack->page_size = page_size;
 +      phys_pg_pack->total_size = total_npages * page_size;
 +
 +      j = 0;
 +      for_each_sgtable_dma_sg(userptr->sgt, sg, i) {
 +              npages = hl_get_sg_info(sg, &dma_addr);
 +
 +              /* align down to physical page size and save the offset */
 +              if (first) {
 +                      first = false;
 +                      phys_pg_pack->offset = dma_addr & (page_size - 1);
 +                      dma_addr &= page_mask;
 +              }
 +
 +              while (npages) {
 +                      phys_pg_pack->pages[j++] = dma_addr;
 +                      dma_addr += page_size;
 +
 +                      if (is_huge_page_opt)
 +                              npages -= pgs_in_huge_page;
 +                      else
 +                              npages--;
 +              }
 +      }
 +
 +      *pphys_pg_pack = phys_pg_pack;
 +
 +      return 0;
 +
 +page_pack_arr_mem_err:
 +      kfree(phys_pg_pack);
 +
 +      return rc;
 +}
 +
 +/**
 + * map_phys_pg_pack() - maps the physical page pack..
 + * @ctx: pointer to the context structure.
 + * @vaddr: start address of the virtual area to map from.
 + * @phys_pg_pack: the pack of physical pages to map to.
 + *
 + * This function does the following:
 + * - Maps each chunk of virtual memory to matching physical chunk.
 + * - Stores number of successful mappings in the given argument.
 + * - Returns 0 on success, error code otherwise.
 + */
 +static int map_phys_pg_pack(struct hl_ctx *ctx, u64 vaddr,
 +                              struct hl_vm_phys_pg_pack *phys_pg_pack)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      u64 next_vaddr = vaddr, paddr, mapped_pg_cnt = 0, i;
 +      u32 page_size = phys_pg_pack->page_size;
 +      int rc = 0;
 +      bool is_host_addr;
 +
 +      for (i = 0 ; i < phys_pg_pack->npages ; i++) {
 +              paddr = phys_pg_pack->pages[i];
 +
 +              rc = hl_mmu_map_page(ctx, next_vaddr, paddr, page_size,
 +                              (i + 1) == phys_pg_pack->npages);
 +              if (rc) {
 +                      dev_err(hdev->dev,
 +                              "map failed for handle %u, npages: %llu, mapped: %llu",
 +                              phys_pg_pack->handle, phys_pg_pack->npages,
 +                              mapped_pg_cnt);
 +                      goto err;
 +              }
 +
 +              mapped_pg_cnt++;
 +              next_vaddr += page_size;
 +      }
 +
 +      return 0;
 +
 +err:
 +      is_host_addr = !hl_is_dram_va(hdev, vaddr);
 +
 +      next_vaddr = vaddr;
 +      for (i = 0 ; i < mapped_pg_cnt ; i++) {
 +              if (hl_mmu_unmap_page(ctx, next_vaddr, page_size,
 +                                      (i + 1) == mapped_pg_cnt))
 +                      dev_warn_ratelimited(hdev->dev,
 +                              "failed to unmap handle %u, va: 0x%llx, pa: 0x%llx, page size: %u\n",
 +                                      phys_pg_pack->handle, next_vaddr,
 +                                      phys_pg_pack->pages[i], page_size);
 +
 +              next_vaddr += page_size;
 +
 +              /*
 +               * unmapping on Palladium can be really long, so avoid a CPU
 +               * soft lockup bug by sleeping a little between unmapping pages
 +               *
 +               * In addition, on host num of pages could be huge,
 +               * because page size could be 4KB, so when unmapping host
 +               * pages sleep every 32K pages to avoid soft lockup
 +               */
 +              if (hdev->pldm || (is_host_addr && (i & 0x7FFF) == 0))
 +                      usleep_range(50, 200);
 +      }
 +
 +      return rc;
 +}
 +
 +/**
 + * unmap_phys_pg_pack() - unmaps the physical page pack.
 + * @ctx: pointer to the context structure.
 + * @vaddr: start address of the virtual area to unmap.
 + * @phys_pg_pack: the pack of physical pages to unmap.
 + */
 +static void unmap_phys_pg_pack(struct hl_ctx *ctx, u64 vaddr,
 +                              struct hl_vm_phys_pg_pack *phys_pg_pack)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      u64 next_vaddr, i;
 +      bool is_host_addr;
 +      u32 page_size;
 +
 +      is_host_addr = !hl_is_dram_va(hdev, vaddr);
 +      page_size = phys_pg_pack->page_size;
 +      next_vaddr = vaddr;
 +
 +      for (i = 0 ; i < phys_pg_pack->npages ; i++, next_vaddr += page_size) {
 +              if (hl_mmu_unmap_page(ctx, next_vaddr, page_size,
 +                                     (i + 1) == phys_pg_pack->npages))
 +                      dev_warn_ratelimited(hdev->dev,
 +                      "unmap failed for vaddr: 0x%llx\n", next_vaddr);
 +
 +              /*
 +               * unmapping on Palladium can be really long, so avoid a CPU
 +               * soft lockup bug by sleeping a little between unmapping pages
 +               *
 +               * In addition, on host num of pages could be huge,
 +               * because page size could be 4KB, so when unmapping host
 +               * pages sleep every 32K pages to avoid soft lockup
 +               */
 +              if (hdev->pldm || (is_host_addr && (i & 0x7FFF) == 0))
 +                      usleep_range(50, 200);
 +      }
 +}
 +
 +static int get_paddr_from_handle(struct hl_ctx *ctx, struct hl_mem_in *args,
 +                                      u64 *paddr)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      struct hl_vm *vm = &hdev->vm;
 +      struct hl_vm_phys_pg_pack *phys_pg_pack;
 +      u32 handle;
 +
 +      handle = lower_32_bits(args->map_device.handle);
 +      spin_lock(&vm->idr_lock);
 +      phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, handle);
 +      if (!phys_pg_pack) {
 +              spin_unlock(&vm->idr_lock);
 +              dev_err(hdev->dev, "no match for handle %u\n", handle);
 +              return -EINVAL;
 +      }
 +
 +      *paddr = phys_pg_pack->pages[0];
 +
 +      spin_unlock(&vm->idr_lock);
 +
 +      return 0;
 +}
 +
 +/**
 + * map_device_va() - map the given memory.
 + * @ctx: pointer to the context structure.
 + * @args: host parameters with handle/host virtual address.
 + * @device_addr: pointer to result device virtual address.
 + *
 + * This function does the following:
 + * - If given a physical device memory handle, map to a device virtual block
 + *   and return the start address of this block.
 + * - If given a host virtual address and size, find the related physical pages,
 + *   map a device virtual block to this pages and return the start address of
 + *   this block.
 + */
 +static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, u64 *device_addr)
 +{
 +      struct hl_vm_phys_pg_pack *phys_pg_pack;
 +      enum hl_va_range_type va_range_type = 0;
 +      struct hl_device *hdev = ctx->hdev;
 +      struct hl_userptr *userptr = NULL;
 +      u32 handle = 0, va_block_align;
 +      struct hl_vm_hash_node *hnode;
 +      struct hl_vm *vm = &hdev->vm;
 +      struct hl_va_range *va_range;
 +      bool is_userptr, do_prefetch;
 +      u64 ret_vaddr, hint_addr;
 +      enum vm_type *vm_type;
 +      int rc;
 +
 +      /* set map flags */
 +      is_userptr = args->flags & HL_MEM_USERPTR;
 +      do_prefetch = hdev->supports_mmu_prefetch && (args->flags & HL_MEM_PREFETCH);
 +
 +      /* Assume failure */
 +      *device_addr = 0;
 +
 +      if (is_userptr) {
 +              u64 addr = args->map_host.host_virt_addr,
 +                      size = args->map_host.mem_size;
 +              u32 page_size = hdev->asic_prop.pmmu.page_size,
 +                      huge_page_size = hdev->asic_prop.pmmu_huge.page_size;
 +
 +              rc = dma_map_host_va(hdev, addr, size, &userptr);
 +              if (rc) {
 +                      dev_err(hdev->dev, "failed to get userptr from va\n");
 +                      return rc;
 +              }
 +
 +              rc = init_phys_pg_pack_from_userptr(ctx, userptr,
 +                              &phys_pg_pack, false);
 +              if (rc) {
 +                      dev_err(hdev->dev,
 +                              "unable to init page pack for vaddr 0x%llx\n",
 +                              addr);
 +                      goto init_page_pack_err;
 +              }
 +
 +              vm_type = (enum vm_type *) userptr;
 +              hint_addr = args->map_host.hint_addr;
 +              handle = phys_pg_pack->handle;
 +
 +              /* get required alignment */
 +              if (phys_pg_pack->page_size == page_size) {
 +                      va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST];
 +                      va_range_type = HL_VA_RANGE_TYPE_HOST;
 +                      /*
 +                       * huge page alignment may be needed in case of regular
 +                       * page mapping, depending on the host VA alignment
 +                       */
 +                      if (addr & (huge_page_size - 1))
 +                              va_block_align = page_size;
 +                      else
 +                              va_block_align = huge_page_size;
 +              } else {
 +                      /*
 +                       * huge page alignment is needed in case of huge page
 +                       * mapping
 +                       */
 +                      va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE];
 +                      va_range_type = HL_VA_RANGE_TYPE_HOST_HUGE;
 +                      va_block_align = huge_page_size;
 +              }
 +      } else {
 +              handle = lower_32_bits(args->map_device.handle);
 +
 +              spin_lock(&vm->idr_lock);
 +              phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, handle);
 +              if (!phys_pg_pack) {
 +                      spin_unlock(&vm->idr_lock);
 +                      dev_err(hdev->dev,
 +                              "no match for handle %u\n", handle);
 +                      return -EINVAL;
 +              }
 +
 +              /* increment now to avoid freeing device memory while mapping */
 +              atomic_inc(&phys_pg_pack->mapping_cnt);
 +
 +              spin_unlock(&vm->idr_lock);
 +
 +              vm_type = (enum vm_type *) phys_pg_pack;
 +
 +              hint_addr = args->map_device.hint_addr;
 +
 +              /* DRAM VA alignment is the same as the MMU page size */
 +              va_range = ctx->va_range[HL_VA_RANGE_TYPE_DRAM];
 +              va_range_type = HL_VA_RANGE_TYPE_DRAM;
 +              va_block_align = hdev->asic_prop.dmmu.page_size;
 +      }
 +
 +      /*
 +       * relevant for mapping device physical memory only, as host memory is
 +       * implicitly shared
 +       */
 +      if (!is_userptr && !(phys_pg_pack->flags & HL_MEM_SHARED) &&
 +                      phys_pg_pack->asid != ctx->asid) {
 +              dev_err(hdev->dev,
 +                      "Failed to map memory, handle %u is not shared\n",
 +                      handle);
 +              rc = -EPERM;
 +              goto shared_err;
 +      }
 +
 +      hnode = kzalloc(sizeof(*hnode), GFP_KERNEL);
 +      if (!hnode) {
 +              rc = -ENOMEM;
 +              goto hnode_err;
 +      }
 +
 +      if (hint_addr && phys_pg_pack->offset) {
 +              if (args->flags & HL_MEM_FORCE_HINT) {
 +                      /* Fail if hint must be respected but it can't be */
 +                      dev_err(hdev->dev,
 +                              "Hint address 0x%llx cannot be respected because source memory is not aligned 0x%x\n",
 +                              hint_addr, phys_pg_pack->offset);
 +                      rc = -EINVAL;
 +                      goto va_block_err;
 +              }
 +              dev_dbg(hdev->dev,
 +                      "Hint address 0x%llx will be ignored because source memory is not aligned 0x%x\n",
 +                      hint_addr, phys_pg_pack->offset);
 +      }
 +
 +      ret_vaddr = get_va_block(hdev, va_range, phys_pg_pack->total_size,
 +                                      hint_addr, va_block_align,
 +                                      va_range_type, args->flags);
 +      if (!ret_vaddr) {
 +              dev_err(hdev->dev, "no available va block for handle %u\n",
 +                              handle);
 +              rc = -ENOMEM;
 +              goto va_block_err;
 +      }
 +
 +      mutex_lock(&hdev->mmu_lock);
 +
 +      rc = map_phys_pg_pack(ctx, ret_vaddr, phys_pg_pack);
 +      if (rc) {
 +              dev_err(hdev->dev, "mapping page pack failed for handle %u\n", handle);
 +              mutex_unlock(&hdev->mmu_lock);
 +              goto map_err;
 +      }
 +
 +      rc = hl_mmu_invalidate_cache_range(hdev, false, *vm_type | MMU_OP_SKIP_LOW_CACHE_INV,
 +                              ctx->asid, ret_vaddr, phys_pg_pack->total_size);
 +      mutex_unlock(&hdev->mmu_lock);
 +      if (rc)
 +              goto map_err;
 +
 +      /*
 +       * prefetch is done upon user's request. it is performed in WQ as and so can
 +       * be outside the MMU lock. the operation itself is already protected by the mmu lock
 +       */
 +      if (do_prefetch) {
 +              rc = hl_mmu_prefetch_cache_range(ctx, *vm_type, ctx->asid, ret_vaddr,
 +                                                      phys_pg_pack->total_size);
 +              if (rc)
 +                      goto map_err;
 +      }
 +
 +      ret_vaddr += phys_pg_pack->offset;
 +
 +      hnode->ptr = vm_type;
 +      hnode->vaddr = ret_vaddr;
 +      hnode->handle = is_userptr ? MEM_HANDLE_INVALID : handle;
 +
 +      mutex_lock(&ctx->mem_hash_lock);
 +      hash_add(ctx->mem_hash, &hnode->node, ret_vaddr);
 +      mutex_unlock(&ctx->mem_hash_lock);
 +
 +      *device_addr = ret_vaddr;
 +
 +      if (is_userptr)
 +              free_phys_pg_pack(hdev, phys_pg_pack);
 +
 +      return rc;
 +
 +map_err:
 +      if (add_va_block(hdev, va_range, ret_vaddr,
 +                              ret_vaddr + phys_pg_pack->total_size - 1))
 +              dev_warn(hdev->dev,
 +                      "release va block failed for handle 0x%x, vaddr: 0x%llx\n",
 +                              handle, ret_vaddr);
 +
 +va_block_err:
 +      kfree(hnode);
 +hnode_err:
 +shared_err:
 +      atomic_dec(&phys_pg_pack->mapping_cnt);
 +      if (is_userptr)
 +              free_phys_pg_pack(hdev, phys_pg_pack);
 +init_page_pack_err:
 +      if (is_userptr)
 +              dma_unmap_host_va(hdev, userptr);
 +
 +      return rc;
 +}
 +
 +/**
 + * unmap_device_va() - unmap the given device virtual address.
 + * @ctx: pointer to the context structure.
 + * @args: host parameters with device virtual address to unmap.
 + * @ctx_free: true if in context free flow, false otherwise.
 + *
 + * This function does the following:
 + * - unmap the physical pages related to the given virtual address.
 + * - return the device virtual block to the virtual block list.
 + */
 +static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
 +                              bool ctx_free)
 +{
 +      struct hl_vm_phys_pg_pack *phys_pg_pack = NULL;
 +      u64 vaddr = args->unmap.device_virt_addr;
 +      struct hl_vm_hash_node *hnode = NULL;
 +      struct asic_fixed_properties *prop;
 +      struct hl_device *hdev = ctx->hdev;
 +      struct hl_userptr *userptr = NULL;
 +      struct hl_va_range *va_range;
 +      enum vm_type *vm_type;
 +      bool is_userptr;
 +      int rc = 0;
 +
 +      prop = &hdev->asic_prop;
 +
 +      /* protect from double entrance */
 +      mutex_lock(&ctx->mem_hash_lock);
 +      hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)vaddr)
 +              if (vaddr == hnode->vaddr)
 +                      break;
 +
 +      if (!hnode) {
 +              mutex_unlock(&ctx->mem_hash_lock);
 +              dev_err(hdev->dev,
 +                      "unmap failed, no mem hnode for vaddr 0x%llx\n",
 +                      vaddr);
 +              return -EINVAL;
 +      }
 +
 +      if (hnode->export_cnt) {
 +              mutex_unlock(&ctx->mem_hash_lock);
 +              dev_err(hdev->dev, "failed to unmap %#llx, memory is exported\n", vaddr);
 +              return -EINVAL;
 +      }
 +
 +      hash_del(&hnode->node);
 +      mutex_unlock(&ctx->mem_hash_lock);
 +
 +      vm_type = hnode->ptr;
 +
 +      if (*vm_type == VM_TYPE_USERPTR) {
 +              is_userptr = true;
 +              userptr = hnode->ptr;
 +
 +              rc = init_phys_pg_pack_from_userptr(ctx, userptr, &phys_pg_pack,
 +                                                      false);
 +              if (rc) {
 +                      dev_err(hdev->dev,
 +                              "unable to init page pack for vaddr 0x%llx\n",
 +                              vaddr);
 +                      goto vm_type_err;
 +              }
 +
 +              if (phys_pg_pack->page_size ==
 +                                      hdev->asic_prop.pmmu.page_size)
 +                      va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST];
 +              else
 +                      va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE];
 +      } else if (*vm_type == VM_TYPE_PHYS_PACK) {
 +              is_userptr = false;
 +              va_range = ctx->va_range[HL_VA_RANGE_TYPE_DRAM];
 +              phys_pg_pack = hnode->ptr;
 +      } else {
 +              dev_warn(hdev->dev,
 +                      "unmap failed, unknown vm desc for vaddr 0x%llx\n",
 +                              vaddr);
 +              rc = -EFAULT;
 +              goto vm_type_err;
 +      }
 +
 +      if (atomic_read(&phys_pg_pack->mapping_cnt) == 0) {
 +              dev_err(hdev->dev, "vaddr 0x%llx is not mapped\n", vaddr);
 +              rc = -EINVAL;
 +              goto mapping_cnt_err;
 +      }
 +
 +      if (!is_userptr && !is_power_of_2(phys_pg_pack->page_size))
 +              vaddr = prop->dram_base_address +
 +                      DIV_ROUND_DOWN_ULL(vaddr - prop->dram_base_address,
 +                                              phys_pg_pack->page_size) *
 +                                                      phys_pg_pack->page_size;
 +      else
 +              vaddr &= ~(((u64) phys_pg_pack->page_size) - 1);
 +
 +      mutex_lock(&hdev->mmu_lock);
 +
 +      unmap_phys_pg_pack(ctx, vaddr, phys_pg_pack);
 +
 +      /*
 +       * During context free this function is called in a loop to clean all
 +       * the context mappings. Hence the cache invalidation can be called once
 +       * at the loop end rather than for each iteration
 +       */
 +      if (!ctx_free)
 +              rc = hl_mmu_invalidate_cache_range(hdev, true, *vm_type, ctx->asid, vaddr,
 +                                                      phys_pg_pack->total_size);
 +
 +      mutex_unlock(&hdev->mmu_lock);
 +
 +      /*
 +       * If the context is closing we don't need to check for the MMU cache
 +       * invalidation return code and update the VA free list as in this flow
 +       * we invalidate the MMU cache outside of this unmap function and the VA
 +       * free list will be freed anyway.
 +       */
 +      if (!ctx_free) {
 +              int tmp_rc;
 +
 +              tmp_rc = add_va_block(hdev, va_range, vaddr,
 +                                      vaddr + phys_pg_pack->total_size - 1);
 +              if (tmp_rc) {
 +                      dev_warn(hdev->dev,
 +                                      "add va block failed for vaddr: 0x%llx\n",
 +                                      vaddr);
 +                      if (!rc)
 +                              rc = tmp_rc;
 +              }
 +      }
 +
 +      atomic_dec(&phys_pg_pack->mapping_cnt);
 +      kfree(hnode);
 +
 +      if (is_userptr) {
 +              free_phys_pg_pack(hdev, phys_pg_pack);
 +              dma_unmap_host_va(hdev, userptr);
 +      }
 +
 +      return rc;
 +
 +mapping_cnt_err:
 +      if (is_userptr)
 +              free_phys_pg_pack(hdev, phys_pg_pack);
 +vm_type_err:
 +      mutex_lock(&ctx->mem_hash_lock);
 +      hash_add(ctx->mem_hash, &hnode->node, vaddr);
 +      mutex_unlock(&ctx->mem_hash_lock);
 +
 +      return rc;
 +}
 +
 +static int map_block(struct hl_device *hdev, u64 address, u64 *handle, u32 *size)
 +{
 +      u32 block_id;
 +      int rc;
 +
 +      *handle = 0;
 +      if (size)
 +              *size = 0;
 +
 +      rc = hdev->asic_funcs->get_hw_block_id(hdev, address, size, &block_id);
 +      if (rc)
 +              return rc;
 +
 +      *handle = block_id | HL_MMAP_TYPE_BLOCK;
 +      *handle <<= PAGE_SHIFT;
 +
 +      return 0;
 +}
 +
 +static void hw_block_vm_close(struct vm_area_struct *vma)
 +{
 +      struct hl_vm_hw_block_list_node *lnode =
 +              (struct hl_vm_hw_block_list_node *) vma->vm_private_data;
 +      struct hl_ctx *ctx = lnode->ctx;
 +      long new_mmap_size;
 +
 +      new_mmap_size = lnode->mapped_size - (vma->vm_end - vma->vm_start);
 +      if (new_mmap_size > 0) {
 +              lnode->mapped_size = new_mmap_size;
 +              return;
 +      }
 +
 +      mutex_lock(&ctx->hw_block_list_lock);
 +      list_del(&lnode->node);
 +      mutex_unlock(&ctx->hw_block_list_lock);
 +      hl_ctx_put(ctx);
 +      kfree(lnode);
 +      vma->vm_private_data = NULL;
 +}
 +
 +static const struct vm_operations_struct hw_block_vm_ops = {
 +      .close = hw_block_vm_close
 +};
 +
 +/**
 + * hl_hw_block_mmap() - mmap a hw block to user.
 + * @hpriv: pointer to the private data of the fd
 + * @vma: pointer to vm_area_struct of the process
 + *
 + * Driver increments context reference for every HW block mapped in order
 + * to prevent user from closing FD without unmapping first
 + */
 +int hl_hw_block_mmap(struct hl_fpriv *hpriv, struct vm_area_struct *vma)
 +{
 +      struct hl_vm_hw_block_list_node *lnode;
 +      struct hl_device *hdev = hpriv->hdev;
 +      struct hl_ctx *ctx = hpriv->ctx;
 +      u32 block_id, block_size;
 +      int rc;
 +
 +      /* We use the page offset to hold the block id and thus we need to clear
 +       * it before doing the mmap itself
 +       */
 +      block_id = vma->vm_pgoff;
 +      vma->vm_pgoff = 0;
 +
 +      /* Driver only allows mapping of a complete HW block */
 +      block_size = vma->vm_end - vma->vm_start;
 +
 +      if (!access_ok((void __user *) (uintptr_t) vma->vm_start, block_size)) {
 +              dev_err(hdev->dev,
 +                      "user pointer is invalid - 0x%lx\n",
 +                      vma->vm_start);
 +
 +              return -EINVAL;
 +      }
 +
 +      lnode = kzalloc(sizeof(*lnode), GFP_KERNEL);
 +      if (!lnode)
 +              return -ENOMEM;
 +
 +      rc = hdev->asic_funcs->hw_block_mmap(hdev, vma, block_id, block_size);
 +      if (rc) {
 +              kfree(lnode);
 +              return rc;
 +      }
 +
 +      hl_ctx_get(ctx);
 +
 +      lnode->ctx = ctx;
 +      lnode->vaddr = vma->vm_start;
 +      lnode->block_size = block_size;
 +      lnode->mapped_size = lnode->block_size;
 +      lnode->id = block_id;
 +
 +      vma->vm_private_data = lnode;
 +      vma->vm_ops = &hw_block_vm_ops;
 +
 +      mutex_lock(&ctx->hw_block_list_lock);
 +      list_add_tail(&lnode->node, &ctx->hw_block_mem_list);
 +      mutex_unlock(&ctx->hw_block_list_lock);
 +
 +      vma->vm_pgoff = block_id;
 +
 +      return 0;
 +}
 +
 +static int set_dma_sg(struct scatterlist *sg, u64 bar_address, u64 chunk_size,
 +                      struct device *dev, enum dma_data_direction dir)
 +{
 +      dma_addr_t addr;
 +      int rc;
 +
 +      addr = dma_map_resource(dev, bar_address, chunk_size, dir,
 +                              DMA_ATTR_SKIP_CPU_SYNC);
 +      rc = dma_mapping_error(dev, addr);
 +      if (rc)
 +              return rc;
 +
 +      sg_set_page(sg, NULL, chunk_size, 0);
 +      sg_dma_address(sg) = addr;
 +      sg_dma_len(sg) = chunk_size;
 +
 +      return 0;
 +}
 +
 +static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64 *pages, u64 npages,
 +                                              u64 page_size, u64 exported_size,
 +                                              struct device *dev, enum dma_data_direction dir)
 +{
 +      u64 chunk_size, bar_address, dma_max_seg_size, cur_size_to_export, cur_npages;
 +      struct asic_fixed_properties *prop;
 +      int rc, i, j, nents, cur_page;
 +      struct scatterlist *sg;
 +      struct sg_table *sgt;
 +
 +      prop = &hdev->asic_prop;
 +
 +      dma_max_seg_size = dma_get_max_seg_size(dev);
 +
 +      /* We would like to align the max segment size to PAGE_SIZE, so the
 +       * SGL will contain aligned addresses that can be easily mapped to
 +       * an MMU
 +       */
 +      dma_max_seg_size = ALIGN_DOWN(dma_max_seg_size, PAGE_SIZE);
 +      if (dma_max_seg_size < PAGE_SIZE) {
 +              dev_err_ratelimited(hdev->dev,
 +                              "dma_max_seg_size %llu can't be smaller than PAGE_SIZE\n",
 +                              dma_max_seg_size);
 +              return ERR_PTR(-EINVAL);
 +      }
 +
 +      sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
 +      if (!sgt)
 +              return ERR_PTR(-ENOMEM);
 +
 +      /* remove export size restrictions in case not explicitly defined */
 +      cur_size_to_export = exported_size ? exported_size : (npages * page_size);
 +
 +      /* If the size of each page is larger than the dma max segment size,
 +       * then we can't combine pages and the number of entries in the SGL
 +       * will just be the
 +       * <number of pages> * <chunks of max segment size in each page>
 +       */
 +      if (page_size > dma_max_seg_size) {
 +              /* we should limit number of pages according to the exported size */
 +              cur_npages = DIV_ROUND_UP_SECTOR_T(cur_size_to_export, page_size);
 +              nents = cur_npages * DIV_ROUND_UP_SECTOR_T(page_size, dma_max_seg_size);
 +      } else {
 +              cur_npages = npages;
 +
 +              /* Get number of non-contiguous chunks */
 +              for (i = 1, nents = 1, chunk_size = page_size ; i < cur_npages ; i++) {
 +                      if (pages[i - 1] + page_size != pages[i] ||
 +                                      chunk_size + page_size > dma_max_seg_size) {
 +                              nents++;
 +                              chunk_size = page_size;
 +                              continue;
 +                      }
 +
 +                      chunk_size += page_size;
 +              }
 +      }
 +
 +      rc = sg_alloc_table(sgt, nents, GFP_KERNEL | __GFP_ZERO);
 +      if (rc)
 +              goto error_free;
 +
 +      cur_page = 0;
 +
 +      if (page_size > dma_max_seg_size) {
 +              u64 size_left, cur_device_address = 0;
 +
 +              size_left = page_size;
 +
 +              /* Need to split each page into the number of chunks of
 +               * dma_max_seg_size
 +               */
 +              for_each_sgtable_dma_sg(sgt, sg, i) {
 +                      if (size_left == page_size)
 +                              cur_device_address =
 +                                      pages[cur_page] - prop->dram_base_address;
 +                      else
 +                              cur_device_address += dma_max_seg_size;
 +
 +                      /* make sure not to export over exported size */
 +                      chunk_size = min3(size_left, dma_max_seg_size, cur_size_to_export);
 +
 +                      bar_address = hdev->dram_pci_bar_start + cur_device_address;
 +
 +                      rc = set_dma_sg(sg, bar_address, chunk_size, dev, dir);
 +                      if (rc)
 +                              goto error_unmap;
 +
 +                      cur_size_to_export -= chunk_size;
 +
 +                      if (size_left > dma_max_seg_size) {
 +                              size_left -= dma_max_seg_size;
 +                      } else {
 +                              cur_page++;
 +                              size_left = page_size;
 +                      }
 +              }
 +      } else {
 +              /* Merge pages and put them into the scatterlist */
 +              for_each_sgtable_dma_sg(sgt, sg, i) {
 +                      chunk_size = page_size;
 +                      for (j = cur_page + 1 ; j < cur_npages ; j++) {
 +                              if (pages[j - 1] + page_size != pages[j] ||
 +                                              chunk_size + page_size > dma_max_seg_size)
 +                                      break;
 +
 +                              chunk_size += page_size;
 +                      }
 +
 +                      bar_address = hdev->dram_pci_bar_start +
 +                                      (pages[cur_page] - prop->dram_base_address);
 +
 +                      /* make sure not to export over exported size */
 +                      chunk_size = min(chunk_size, cur_size_to_export);
 +                      rc = set_dma_sg(sg, bar_address, chunk_size, dev, dir);
 +                      if (rc)
 +                              goto error_unmap;
 +
 +                      cur_size_to_export -= chunk_size;
 +                      cur_page = j;
 +              }
 +      }
 +
 +      /* Because we are not going to include a CPU list we want to have some
 +       * chance that other users will detect this by setting the orig_nents
 +       * to 0 and using only nents (length of DMA list) when going over the
 +       * sgl
 +       */
 +      sgt->orig_nents = 0;
 +
 +      return sgt;
 +
 +error_unmap:
 +      for_each_sgtable_dma_sg(sgt, sg, i) {
 +              if (!sg_dma_len(sg))
 +                      continue;
 +
 +              dma_unmap_resource(dev, sg_dma_address(sg),
 +                                      sg_dma_len(sg), dir,
 +                                      DMA_ATTR_SKIP_CPU_SYNC);
 +      }
 +
 +      sg_free_table(sgt);
 +
 +error_free:
 +      kfree(sgt);
 +      return ERR_PTR(rc);
 +}
 +
 +static int hl_dmabuf_attach(struct dma_buf *dmabuf,
 +                              struct dma_buf_attachment *attachment)
 +{
 +      struct hl_dmabuf_priv *hl_dmabuf;
 +      struct hl_device *hdev;
 +      int rc;
 +
 +      hl_dmabuf = dmabuf->priv;
 +      hdev = hl_dmabuf->ctx->hdev;
 +
 +      rc = pci_p2pdma_distance(hdev->pdev, attachment->dev, true);
 +
 +      if (rc < 0)
 +              attachment->peer2peer = false;
 +      return 0;
 +}
 +
 +static struct sg_table *hl_map_dmabuf(struct dma_buf_attachment *attachment,
 +                                      enum dma_data_direction dir)
 +{
 +      struct dma_buf *dma_buf = attachment->dmabuf;
 +      struct hl_vm_phys_pg_pack *phys_pg_pack;
 +      struct hl_dmabuf_priv *hl_dmabuf;
 +      struct hl_device *hdev;
 +      struct sg_table *sgt;
 +
 +      hl_dmabuf = dma_buf->priv;
 +      hdev = hl_dmabuf->ctx->hdev;
 +      phys_pg_pack = hl_dmabuf->phys_pg_pack;
 +
 +      if (!attachment->peer2peer) {
 +              dev_dbg(hdev->dev, "Failed to map dmabuf because p2p is disabled\n");
 +              return ERR_PTR(-EPERM);
 +      }
 +
 +      if (phys_pg_pack)
 +              sgt = alloc_sgt_from_device_pages(hdev,
 +                                              phys_pg_pack->pages,
 +                                              phys_pg_pack->npages,
 +                                              phys_pg_pack->page_size,
 +                                              phys_pg_pack->exported_size,
 +                                              attachment->dev,
 +                                              dir);
 +      else
 +              sgt = alloc_sgt_from_device_pages(hdev,
 +                                              &hl_dmabuf->device_address,
 +                                              1,
 +                                              hl_dmabuf->dmabuf->size,
 +                                              0,
 +                                              attachment->dev,
 +                                              dir);
 +
 +      if (IS_ERR(sgt))
 +              dev_err(hdev->dev, "failed (%ld) to initialize sgt for dmabuf\n", PTR_ERR(sgt));
 +
 +      return sgt;
 +}
 +
 +static void hl_unmap_dmabuf(struct dma_buf_attachment *attachment,
 +                                struct sg_table *sgt,
 +                                enum dma_data_direction dir)
 +{
 +      struct scatterlist *sg;
 +      int i;
 +
 +      /* The memory behind the dma-buf has *always* resided on the device itself, i.e. it lives
 +       * only in the 'device' domain (after all, it maps a PCI bar address which points to the
 +       * device memory).
 +       *
 +       * Therefore, it was never in the 'CPU' domain and hence, there is no need to perform
 +       * a sync of the memory to the CPU's cache, as it never resided inside that cache.
 +       */
 +      for_each_sgtable_dma_sg(sgt, sg, i)
 +              dma_unmap_resource(attachment->dev, sg_dma_address(sg),
 +                                      sg_dma_len(sg), dir,
 +                                      DMA_ATTR_SKIP_CPU_SYNC);
 +
 +      /* Need to restore orig_nents because sg_free_table use that field */
 +      sgt->orig_nents = sgt->nents;
 +      sg_free_table(sgt);
 +      kfree(sgt);
 +}
 +
 +static void hl_release_dmabuf(struct dma_buf *dmabuf)
 +{
 +      struct hl_dmabuf_priv *hl_dmabuf = dmabuf->priv;
 +      struct hl_ctx *ctx;
 +
 +      if (!hl_dmabuf)
 +              return;
 +
 +      ctx = hl_dmabuf->ctx;
 +
 +      if (hl_dmabuf->memhash_hnode) {
 +              mutex_lock(&ctx->mem_hash_lock);
 +              hl_dmabuf->memhash_hnode->export_cnt--;
 +              mutex_unlock(&ctx->mem_hash_lock);
 +      }
 +
 +      hl_ctx_put(ctx);
 +      kfree(hl_dmabuf);
 +}
 +
 +static const struct dma_buf_ops habanalabs_dmabuf_ops = {
 +      .attach = hl_dmabuf_attach,
 +      .map_dma_buf = hl_map_dmabuf,
 +      .unmap_dma_buf = hl_unmap_dmabuf,
 +      .release = hl_release_dmabuf,
 +};
 +
 +static int export_dmabuf(struct hl_ctx *ctx,
 +                              struct hl_dmabuf_priv *hl_dmabuf,
 +                              u64 total_size, int flags, int *dmabuf_fd)
 +{
 +      DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
 +      struct hl_device *hdev = ctx->hdev;
 +      int rc, fd;
 +
 +      exp_info.ops = &habanalabs_dmabuf_ops;
 +      exp_info.size = total_size;
 +      exp_info.flags = flags;
 +      exp_info.priv = hl_dmabuf;
 +
 +      hl_dmabuf->dmabuf = dma_buf_export(&exp_info);
 +      if (IS_ERR(hl_dmabuf->dmabuf)) {
 +              dev_err(hdev->dev, "failed to export dma-buf\n");
 +              return PTR_ERR(hl_dmabuf->dmabuf);
 +      }
 +
 +      fd = dma_buf_fd(hl_dmabuf->dmabuf, flags);
 +      if (fd < 0) {
 +              dev_err(hdev->dev, "failed to get a file descriptor for a dma-buf, %d\n", fd);
 +              rc = fd;
 +              goto err_dma_buf_put;
 +      }
 +
 +      hl_dmabuf->ctx = ctx;
 +      hl_ctx_get(hl_dmabuf->ctx);
 +
 +      *dmabuf_fd = fd;
 +
 +      return 0;
 +
 +err_dma_buf_put:
 +      hl_dmabuf->dmabuf->priv = NULL;
 +      dma_buf_put(hl_dmabuf->dmabuf);
 +      return rc;
 +}
 +
 +static int validate_export_params_common(struct hl_device *hdev, u64 device_addr, u64 size)
 +{
 +      if (!IS_ALIGNED(device_addr, PAGE_SIZE)) {
 +              dev_dbg(hdev->dev,
 +                      "exported device memory address 0x%llx should be aligned to 0x%lx\n",
 +                      device_addr, PAGE_SIZE);
 +              return -EINVAL;
 +      }
 +
 +      if (size < PAGE_SIZE) {
 +              dev_dbg(hdev->dev,
 +                      "exported device memory size %llu should be equal to or greater than %lu\n",
 +                      size, PAGE_SIZE);
 +              return -EINVAL;
 +      }
 +
 +      return 0;
 +}
 +
 +static int validate_export_params_no_mmu(struct hl_device *hdev, u64 device_addr, u64 size)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 bar_address;
 +      int rc;
 +
 +      rc = validate_export_params_common(hdev, device_addr, size);
 +      if (rc)
 +              return rc;
 +
 +      if (device_addr < prop->dram_user_base_address ||
 +                              (device_addr + size) > prop->dram_end_address ||
 +                              (device_addr + size) < device_addr) {
 +              dev_dbg(hdev->dev,
 +                      "DRAM memory range 0x%llx (+0x%llx) is outside of DRAM boundaries\n",
 +                      device_addr, size);
 +              return -EINVAL;
 +      }
 +
 +      bar_address = hdev->dram_pci_bar_start + (device_addr - prop->dram_base_address);
 +
 +      if ((bar_address + size) > (hdev->dram_pci_bar_start + prop->dram_pci_bar_size) ||
 +                      (bar_address + size) < bar_address) {
 +              dev_dbg(hdev->dev,
 +                      "DRAM memory range 0x%llx (+0x%llx) is outside of PCI BAR boundaries\n",
 +                      device_addr, size);
 +              return -EINVAL;
 +      }
 +
 +      return 0;
 +}
 +
 +static int validate_export_params(struct hl_device *hdev, u64 device_addr, u64 size, u64 offset,
 +                                      struct hl_vm_phys_pg_pack *phys_pg_pack)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 bar_address;
 +      int i, rc;
 +
 +      rc = validate_export_params_common(hdev, device_addr, size);
 +      if (rc)
 +              return rc;
 +
 +      if ((offset + size) > phys_pg_pack->total_size) {
 +              dev_dbg(hdev->dev, "offset %#llx and size %#llx exceed total map size %#llx\n",
 +                              offset, size, phys_pg_pack->total_size);
 +              return -EINVAL;
 +      }
 +
 +      for (i = 0 ; i < phys_pg_pack->npages ; i++) {
 +
 +              bar_address = hdev->dram_pci_bar_start +
 +                                      (phys_pg_pack->pages[i] - prop->dram_base_address);
 +
 +              if ((bar_address + phys_pg_pack->page_size) >
 +                              (hdev->dram_pci_bar_start + prop->dram_pci_bar_size) ||
 +                              (bar_address + phys_pg_pack->page_size) < bar_address) {
 +                      dev_dbg(hdev->dev,
 +                              "DRAM memory range 0x%llx (+0x%x) is outside of PCI BAR boundaries\n",
 +                                      phys_pg_pack->pages[i],
 +                                      phys_pg_pack->page_size);
 +
 +                      return -EINVAL;
 +              }
 +      }
 +
 +      return 0;
 +}
 +
 +static struct hl_vm_hash_node *memhash_node_export_get(struct hl_ctx *ctx, u64 addr)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      struct hl_vm_hash_node *hnode;
 +
 +      /* get the memory handle */
 +      mutex_lock(&ctx->mem_hash_lock);
 +      hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)addr)
 +              if (addr == hnode->vaddr)
 +                      break;
 +
 +      if (!hnode) {
 +              mutex_unlock(&ctx->mem_hash_lock);
 +              dev_dbg(hdev->dev, "map address %#llx not found\n", addr);
 +              return ERR_PTR(-EINVAL);
 +      }
 +
 +      if (upper_32_bits(hnode->handle)) {
 +              mutex_unlock(&ctx->mem_hash_lock);
 +              dev_dbg(hdev->dev, "invalid handle %#llx for map address %#llx\n",
 +                              hnode->handle, addr);
 +              return ERR_PTR(-EINVAL);
 +      }
 +
 +      /*
 +       * node found, increase export count so this memory cannot be unmapped
 +       * and the hash node cannot be deleted.
 +       */
 +      hnode->export_cnt++;
 +      mutex_unlock(&ctx->mem_hash_lock);
 +
 +      return hnode;
 +}
 +
 +static void memhash_node_export_put(struct hl_ctx *ctx, struct hl_vm_hash_node *hnode)
 +{
 +      mutex_lock(&ctx->mem_hash_lock);
 +      hnode->export_cnt--;
 +      mutex_unlock(&ctx->mem_hash_lock);
 +}
 +
 +static struct hl_vm_phys_pg_pack *get_phys_pg_pack_from_hash_node(struct hl_device *hdev,
 +                                                      struct hl_vm_hash_node *hnode)
 +{
 +      struct hl_vm_phys_pg_pack *phys_pg_pack;
 +      struct hl_vm *vm = &hdev->vm;
 +
 +      spin_lock(&vm->idr_lock);
 +      phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, (u32) hnode->handle);
 +      if (!phys_pg_pack) {
 +              spin_unlock(&vm->idr_lock);
 +              dev_dbg(hdev->dev, "no match for handle 0x%x\n", (u32) hnode->handle);
 +              return ERR_PTR(-EINVAL);
 +      }
 +
 +      spin_unlock(&vm->idr_lock);
 +
 +      if (phys_pg_pack->vm_type != VM_TYPE_PHYS_PACK) {
 +              dev_dbg(hdev->dev, "handle 0x%llx does not represent DRAM memory\n", hnode->handle);
 +              return ERR_PTR(-EINVAL);
 +      }
 +
 +      return phys_pg_pack;
 +}
 +
 +/**
 + * export_dmabuf_from_addr() - export a dma-buf object for the given memory
 + *                             address and size.
 + * @ctx: pointer to the context structure.
 + * @addr: device address.
 + * @size: size of device memory to export.
 + * @offset: the offset into the buffer from which to start exporting
 + * @flags: DMA-BUF file/FD flags.
 + * @dmabuf_fd: pointer to result FD that represents the dma-buf object.
 + *
 + * Create and export a dma-buf object for an existing memory allocation inside
 + * the device memory, and return a FD which is associated with the dma-buf
 + * object.
 + *
 + * Return: 0 on success, non-zero for failure.
 + */
 +static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 addr, u64 size, u64 offset,
 +                                      int flags, int *dmabuf_fd)
 +{
 +      struct hl_vm_phys_pg_pack *phys_pg_pack = NULL;
 +      struct hl_vm_hash_node *hnode = NULL;
 +      struct asic_fixed_properties *prop;
 +      struct hl_dmabuf_priv *hl_dmabuf;
 +      struct hl_device *hdev;
 +      u64 export_addr;
 +      int rc;
 +
 +      hdev = ctx->hdev;
 +      prop = &hdev->asic_prop;
 +
 +      /* offset must be 0 in devices without virtual memory support */
 +      if (!prop->dram_supports_virtual_memory && offset) {
 +              dev_dbg(hdev->dev, "offset is not allowed in device without virtual memory\n");
 +              return -EINVAL;
 +      }
 +
 +      export_addr = addr + offset;
 +
 +      hl_dmabuf = kzalloc(sizeof(*hl_dmabuf), GFP_KERNEL);
 +      if (!hl_dmabuf)
 +              return -ENOMEM;
 +
 +      if (prop->dram_supports_virtual_memory) {
 +              hnode = memhash_node_export_get(ctx, addr);
 +              if (IS_ERR(hnode)) {
 +                      rc = PTR_ERR(hnode);
 +                      goto err_free_dmabuf_wrapper;
 +              }
 +              phys_pg_pack = get_phys_pg_pack_from_hash_node(hdev, hnode);
 +              if (IS_ERR(phys_pg_pack)) {
 +                      rc = PTR_ERR(phys_pg_pack);
 +                      goto dec_memhash_export_cnt;
 +              }
 +              rc = validate_export_params(hdev, export_addr, size, offset, phys_pg_pack);
 +              if (rc)
 +                      goto dec_memhash_export_cnt;
 +
 +              phys_pg_pack->exported_size = size;
 +              hl_dmabuf->phys_pg_pack = phys_pg_pack;
 +              hl_dmabuf->memhash_hnode = hnode;
 +      } else {
 +              rc = validate_export_params_no_mmu(hdev, export_addr, size);
 +              if (rc)
 +                      goto err_free_dmabuf_wrapper;
 +      }
 +
 +      hl_dmabuf->device_address = export_addr;
 +
 +      rc = export_dmabuf(ctx, hl_dmabuf, size, flags, dmabuf_fd);
 +      if (rc)
 +              goto dec_memhash_export_cnt;
 +
 +      return 0;
 +
 +dec_memhash_export_cnt:
 +      if (prop->dram_supports_virtual_memory)
 +              memhash_node_export_put(ctx, hnode);
 +err_free_dmabuf_wrapper:
 +      kfree(hl_dmabuf);
 +      return rc;
 +}
 +
 +static int mem_ioctl_no_mmu(struct hl_fpriv *hpriv, union hl_mem_args *args)
 +{
 +      struct hl_device *hdev = hpriv->hdev;
 +      u64 block_handle, device_addr = 0;
 +      struct hl_ctx *ctx = hpriv->ctx;
 +      u32 handle = 0, block_size;
 +      int rc;
 +
 +      switch (args->in.op) {
 +      case HL_MEM_OP_ALLOC:
 +              if (args->in.alloc.mem_size == 0) {
 +                      dev_err(hdev->dev, "alloc size must be larger than 0\n");
 +                      rc = -EINVAL;
 +                      goto out;
 +              }
 +
 +              /* Force contiguous as there are no real MMU
 +               * translations to overcome physical memory gaps
 +               */
 +              args->in.flags |= HL_MEM_CONTIGUOUS;
 +              rc = alloc_device_memory(ctx, &args->in, &handle);
 +
 +              memset(args, 0, sizeof(*args));
 +              args->out.handle = (__u64) handle;
 +              break;
 +
 +      case HL_MEM_OP_FREE:
 +              rc = free_device_memory(ctx, &args->in);
 +              break;
 +
 +      case HL_MEM_OP_MAP:
 +              if (args->in.flags & HL_MEM_USERPTR) {
 +                      dev_err(hdev->dev, "Failed to map host memory when MMU is disabled\n");
 +                      rc = -EPERM;
 +              } else {
 +                      rc = get_paddr_from_handle(ctx, &args->in, &device_addr);
 +                      memset(args, 0, sizeof(*args));
 +                      args->out.device_virt_addr = device_addr;
 +              }
 +
 +              break;
 +
 +      case HL_MEM_OP_UNMAP:
 +              rc = 0;
 +              break;
 +
 +      case HL_MEM_OP_MAP_BLOCK:
 +              rc = map_block(hdev, args->in.map_block.block_addr, &block_handle, &block_size);
 +              args->out.block_handle = block_handle;
 +              args->out.block_size = block_size;
 +              break;
 +
 +      case HL_MEM_OP_EXPORT_DMABUF_FD:
 +              dev_err(hdev->dev, "Failed to export dma-buf object when MMU is disabled\n");
 +              rc = -EPERM;
 +              break;
 +
 +      case HL_MEM_OP_TS_ALLOC:
 +              rc = allocate_timestamps_buffers(hpriv, &args->in, &args->out.handle);
 +              break;
 +      default:
 +              dev_err(hdev->dev, "Unknown opcode for memory IOCTL\n");
 +              rc = -EINVAL;
 +              break;
 +      }
 +
 +out:
 +      return rc;
 +}
 +
 +static void ts_buff_release(struct hl_mmap_mem_buf *buf)
 +{
 +      struct hl_ts_buff *ts_buff = buf->private;
 +
 +      vfree(ts_buff->kernel_buff_address);
 +      vfree(ts_buff->user_buff_address);
 +      kfree(ts_buff);
 +}
 +
 +static int hl_ts_mmap(struct hl_mmap_mem_buf *buf, struct vm_area_struct *vma, void *args)
 +{
 +      struct hl_ts_buff *ts_buff = buf->private;
 +
++      vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY | VM_NORESERVE);
 +      return remap_vmalloc_range(vma, ts_buff->user_buff_address, 0);
 +}
 +
 +static int hl_ts_alloc_buf(struct hl_mmap_mem_buf *buf, gfp_t gfp, void *args)
 +{
 +      struct hl_ts_buff *ts_buff = NULL;
 +      u32 num_elements;
 +      size_t size;
 +      void *p;
 +
 +      num_elements = *(u32 *)args;
 +
 +      ts_buff = kzalloc(sizeof(*ts_buff), gfp);
 +      if (!ts_buff)
 +              return -ENOMEM;
 +
 +      /* Allocate the user buffer */
 +      size = num_elements * sizeof(u64);
 +      p = vmalloc_user(size);
 +      if (!p)
 +              goto free_mem;
 +
 +      ts_buff->user_buff_address = p;
 +      buf->mappable_size = size;
 +
 +      /* Allocate the internal kernel buffer */
 +      size = num_elements * sizeof(struct hl_user_pending_interrupt);
 +      p = vzalloc(size);
 +      if (!p)
 +              goto free_user_buff;
 +
 +      ts_buff->kernel_buff_address = p;
 +      ts_buff->kernel_buff_size = size;
 +
 +      buf->private = ts_buff;
 +
 +      return 0;
 +
 +free_user_buff:
 +      vfree(ts_buff->user_buff_address);
 +free_mem:
 +      kfree(ts_buff);
 +      return -ENOMEM;
 +}
 +
 +static struct hl_mmap_mem_buf_behavior hl_ts_behavior = {
 +      .topic = "TS",
 +      .mem_id = HL_MMAP_TYPE_TS_BUFF,
 +      .mmap = hl_ts_mmap,
 +      .alloc = hl_ts_alloc_buf,
 +      .release = ts_buff_release,
 +};
 +
 +/**
 + * allocate_timestamps_buffers() - allocate timestamps buffers
 + * This function will allocate ts buffer that will later on be mapped to the user
 + * in order to be able to read the timestamp.
 + * in additon it'll allocate an extra buffer for registration management.
 + * since we cannot fail during registration for out-of-memory situation, so
 + * we'll prepare a pool which will be used as user interrupt nodes and instead
 + * of dynamically allocating nodes while registration we'll pick the node from
 + * this pool. in addtion it'll add node to the mapping hash which will be used
 + * to map user ts buffer to the internal kernel ts buffer.
 + * @hpriv: pointer to the private data of the fd
 + * @args: ioctl input
 + * @handle: user timestamp buffer handle as an output
 + */
 +static int allocate_timestamps_buffers(struct hl_fpriv *hpriv, struct hl_mem_in *args, u64 *handle)
 +{
 +      struct hl_mem_mgr *mmg = &hpriv->mem_mgr;
 +      struct hl_mmap_mem_buf *buf;
 +
 +      if (args->num_of_elements > TS_MAX_ELEMENTS_NUM) {
 +              dev_err(mmg->dev, "Num of elements exceeds Max allowed number (0x%x > 0x%x)\n",
 +                              args->num_of_elements, TS_MAX_ELEMENTS_NUM);
 +              return -EINVAL;
 +      }
 +
 +      buf = hl_mmap_mem_buf_alloc(mmg, &hl_ts_behavior, GFP_KERNEL, &args->num_of_elements);
 +      if (!buf)
 +              return -ENOMEM;
 +
 +      *handle = buf->handle;
 +
 +      return 0;
 +}
 +
 +int hl_mem_ioctl(struct hl_fpriv *hpriv, void *data)
 +{
 +      enum hl_device_status status;
 +      union hl_mem_args *args = data;
 +      struct hl_device *hdev = hpriv->hdev;
 +      struct hl_ctx *ctx = hpriv->ctx;
 +      u64 block_handle, device_addr = 0;
 +      u32 handle = 0, block_size;
 +      int rc, dmabuf_fd = -EBADF;
 +
 +      if (!hl_device_operational(hdev, &status)) {
 +              dev_dbg_ratelimited(hdev->dev,
 +                      "Device is %s. Can't execute MEMORY IOCTL\n",
 +                      hdev->status[status]);
 +              return -EBUSY;
 +      }
 +
 +      if (!hdev->mmu_enable)
 +              return mem_ioctl_no_mmu(hpriv, args);
 +
 +      switch (args->in.op) {
 +      case HL_MEM_OP_ALLOC:
 +              if (args->in.alloc.mem_size == 0) {
 +                      dev_err(hdev->dev,
 +                              "alloc size must be larger than 0\n");
 +                      rc = -EINVAL;
 +                      goto out;
 +              }
 +
 +              /* If DRAM does not support virtual memory the driver won't
 +               * handle the allocation/freeing of that memory. However, for
 +               * system administration/monitoring purposes, the driver will
 +               * keep track of the amount of DRAM memory that is allocated
 +               * and freed by the user. Because this code totally relies on
 +               * the user's input, the driver can't ensure the validity
 +               * of this accounting.
 +               */
 +              if (!hdev->asic_prop.dram_supports_virtual_memory) {
 +                      atomic64_add(args->in.alloc.mem_size,
 +                                      &ctx->dram_phys_mem);
 +                      atomic64_add(args->in.alloc.mem_size,
 +                                      &hdev->dram_used_mem);
 +
 +                      dev_dbg(hdev->dev, "DRAM alloc is not supported\n");
 +                      rc = 0;
 +
 +                      memset(args, 0, sizeof(*args));
 +                      args->out.handle = 0;
 +                      goto out;
 +              }
 +
 +              rc = alloc_device_memory(ctx, &args->in, &handle);
 +
 +              memset(args, 0, sizeof(*args));
 +              args->out.handle = (__u64) handle;
 +              break;
 +
 +      case HL_MEM_OP_FREE:
 +              /* If DRAM does not support virtual memory the driver won't
 +               * handle the allocation/freeing of that memory. However, for
 +               * system administration/monitoring purposes, the driver will
 +               * keep track of the amount of DRAM memory that is allocated
 +               * and freed by the user. Because this code totally relies on
 +               * the user's input, the driver can't ensure the validity
 +               * of this accounting.
 +               */
 +              if (!hdev->asic_prop.dram_supports_virtual_memory) {
 +                      atomic64_sub(args->in.alloc.mem_size,
 +                                      &ctx->dram_phys_mem);
 +                      atomic64_sub(args->in.alloc.mem_size,
 +                                      &hdev->dram_used_mem);
 +
 +                      dev_dbg(hdev->dev, "DRAM alloc is not supported\n");
 +                      rc = 0;
 +
 +                      goto out;
 +              }
 +
 +              rc = free_device_memory(ctx, &args->in);
 +              break;
 +
 +      case HL_MEM_OP_MAP:
 +              rc = map_device_va(ctx, &args->in, &device_addr);
 +
 +              memset(args, 0, sizeof(*args));
 +              args->out.device_virt_addr = device_addr;
 +              break;
 +
 +      case HL_MEM_OP_UNMAP:
 +              rc = unmap_device_va(ctx, &args->in, false);
 +              break;
 +
 +      case HL_MEM_OP_MAP_BLOCK:
 +              rc = map_block(hdev, args->in.map_block.block_addr,
 +                              &block_handle, &block_size);
 +              args->out.block_handle = block_handle;
 +              args->out.block_size = block_size;
 +              break;
 +
 +      case HL_MEM_OP_EXPORT_DMABUF_FD:
 +              rc = export_dmabuf_from_addr(ctx,
 +                              args->in.export_dmabuf_fd.addr,
 +                              args->in.export_dmabuf_fd.mem_size,
 +                              args->in.export_dmabuf_fd.offset,
 +                              args->in.flags,
 +                              &dmabuf_fd);
 +              memset(args, 0, sizeof(*args));
 +              args->out.fd = dmabuf_fd;
 +              break;
 +
 +      case HL_MEM_OP_TS_ALLOC:
 +              rc = allocate_timestamps_buffers(hpriv, &args->in, &args->out.handle);
 +              break;
 +      default:
 +              dev_err(hdev->dev, "Unknown opcode for memory IOCTL\n");
 +              rc = -EINVAL;
 +              break;
 +      }
 +
 +out:
 +      return rc;
 +}
 +
 +static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 +                              u32 npages, u64 start, u32 offset,
 +                              struct hl_userptr *userptr)
 +{
 +      int rc;
 +
 +      if (!access_ok((void __user *) (uintptr_t) addr, size)) {
 +              dev_err(hdev->dev, "user pointer is invalid - 0x%llx\n", addr);
 +              return -EFAULT;
 +      }
 +
 +      userptr->pages = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
 +      if (!userptr->pages)
 +              return -ENOMEM;
 +
 +      rc = pin_user_pages_fast(start, npages, FOLL_WRITE | FOLL_LONGTERM,
 +                               userptr->pages);
 +
 +      if (rc != npages) {
 +              dev_err(hdev->dev,
 +                      "Failed (%d) to pin host memory with user ptr 0x%llx, size 0x%llx, npages %d\n",
 +                      rc, addr, size, npages);
 +              if (rc < 0)
 +                      goto destroy_pages;
 +              npages = rc;
 +              rc = -EFAULT;
 +              goto put_pages;
 +      }
 +      userptr->npages = npages;
 +
 +      rc = sg_alloc_table_from_pages(userptr->sgt,
 +                                     userptr->pages,
 +                                     npages, offset, size, GFP_KERNEL);
 +      if (rc < 0) {
 +              dev_err(hdev->dev, "failed to create SG table from pages\n");
 +              goto put_pages;
 +      }
 +
 +      return 0;
 +
 +put_pages:
 +      unpin_user_pages(userptr->pages, npages);
 +destroy_pages:
 +      kvfree(userptr->pages);
 +      return rc;
 +}
 +
 +/**
 + * hl_pin_host_memory() - pins a chunk of host memory.
 + * @hdev: pointer to the habanalabs device structure.
 + * @addr: the host virtual address of the memory area.
 + * @size: the size of the memory area.
 + * @userptr: pointer to hl_userptr structure.
 + *
 + * This function does the following:
 + * - Pins the physical pages.
 + * - Create an SG list from those pages.
 + */
 +int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
 +                                      struct hl_userptr *userptr)
 +{
 +      u64 start, end;
 +      u32 npages, offset;
 +      int rc;
 +
 +      if (!size) {
 +              dev_err(hdev->dev, "size to pin is invalid - %llu\n", size);
 +              return -EINVAL;
 +      }
 +
 +      /*
 +       * If the combination of the address and size requested for this memory
 +       * region causes an integer overflow, return error.
 +       */
 +      if (((addr + size) < addr) ||
 +                      PAGE_ALIGN(addr + size) < (addr + size)) {
 +              dev_err(hdev->dev,
 +                      "user pointer 0x%llx + %llu causes integer overflow\n",
 +                      addr, size);
 +              return -EINVAL;
 +      }
 +
 +      userptr->pid = current->pid;
 +      userptr->sgt = kzalloc(sizeof(*userptr->sgt), GFP_KERNEL);
 +      if (!userptr->sgt)
 +              return -ENOMEM;
 +
 +      start = addr & PAGE_MASK;
 +      offset = addr & ~PAGE_MASK;
 +      end = PAGE_ALIGN(addr + size);
 +      npages = (end - start) >> PAGE_SHIFT;
 +
 +      userptr->size = size;
 +      userptr->addr = addr;
 +      userptr->dma_mapped = false;
 +      INIT_LIST_HEAD(&userptr->job_node);
 +
 +      rc = get_user_memory(hdev, addr, size, npages, start, offset,
 +                              userptr);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "failed to get user memory for address 0x%llx\n",
 +                      addr);
 +              goto free_sgt;
 +      }
 +
 +      hl_debugfs_add_userptr(hdev, userptr);
 +
 +      return 0;
 +
 +free_sgt:
 +      kfree(userptr->sgt);
 +      return rc;
 +}
 +
 +/*
 + * hl_unpin_host_memory - unpins a chunk of host memory.
 + * @hdev: pointer to the habanalabs device structure
 + * @userptr: pointer to hl_userptr structure
 + *
 + * This function does the following:
 + * - Unpins the physical pages related to the host memory
 + * - Free the SG list
 + */
 +void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 +{
 +      hl_debugfs_remove_userptr(hdev, userptr);
 +
 +      if (userptr->dma_mapped)
 +              hdev->asic_funcs->hl_dma_unmap_sgtable(hdev, userptr->sgt, userptr->dir);
 +
 +      unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
 +      kvfree(userptr->pages);
 +
 +      list_del(&userptr->job_node);
 +
 +      sg_free_table(userptr->sgt);
 +      kfree(userptr->sgt);
 +}
 +
 +/**
 + * hl_userptr_delete_list() - clear userptr list.
 + * @hdev: pointer to the habanalabs device structure.
 + * @userptr_list: pointer to the list to clear.
 + *
 + * This function does the following:
 + * - Iterates over the list and unpins the host memory and frees the userptr
 + *   structure.
 + */
 +void hl_userptr_delete_list(struct hl_device *hdev,
 +                              struct list_head *userptr_list)
 +{
 +      struct hl_userptr *userptr, *tmp;
 +
 +      list_for_each_entry_safe(userptr, tmp, userptr_list, job_node) {
 +              hl_unpin_host_memory(hdev, userptr);
 +              kfree(userptr);
 +      }
 +
 +      INIT_LIST_HEAD(userptr_list);
 +}
 +
 +/**
 + * hl_userptr_is_pinned() - returns whether the given userptr is pinned.
 + * @hdev: pointer to the habanalabs device structure.
 + * @addr: user address to check.
 + * @size: user block size to check.
 + * @userptr_list: pointer to the list to clear.
 + * @userptr: pointer to userptr to check.
 + *
 + * This function does the following:
 + * - Iterates over the list and checks if the given userptr is in it, means is
 + *   pinned. If so, returns true, otherwise returns false.
 + */
 +bool hl_userptr_is_pinned(struct hl_device *hdev, u64 addr,
 +                              u32 size, struct list_head *userptr_list,
 +                              struct hl_userptr **userptr)
 +{
 +      list_for_each_entry((*userptr), userptr_list, job_node) {
 +              if ((addr == (*userptr)->addr) && (size == (*userptr)->size))
 +                      return true;
 +      }
 +
 +      return false;
 +}
 +
 +/**
 + * va_range_init() - initialize virtual addresses range.
 + * @hdev: pointer to the habanalabs device structure.
 + * @va_ranges: pointer to va_ranges array.
 + * @range_type: virtual address range type.
 + * @start: range start address, inclusive.
 + * @end: range end address, inclusive.
 + * @page_size: page size for this va_range.
 + *
 + * This function does the following:
 + * - Initializes the virtual addresses list of the given range with the given
 + *   addresses.
 + */
 +static int va_range_init(struct hl_device *hdev, struct hl_va_range **va_ranges,
 +                              enum hl_va_range_type range_type, u64 start,
 +                              u64 end, u32 page_size)
 +{
 +      struct hl_va_range *va_range = va_ranges[range_type];
 +      int rc;
 +
 +      INIT_LIST_HEAD(&va_range->list);
 +
 +      /*
 +       * PAGE_SIZE alignment
 +       * it is the caller's responsibility to align the addresses if the
 +       * page size is not a power of 2
 +       */
 +
 +      if (is_power_of_2(page_size)) {
 +              start = round_up(start, page_size);
 +
 +              /*
 +               * The end of the range is inclusive, hence we need to align it
 +               * to the end of the last full page in the range. For example if
 +               * end = 0x3ff5 with page size 0x1000, we need to align it to
 +               * 0x2fff. The remaining 0xff5 bytes do not form a full page.
 +               */
 +              end = round_down(end + 1, page_size) - 1;
 +      }
 +
 +      if (start >= end) {
 +              dev_err(hdev->dev, "too small vm range for va list\n");
 +              return -EFAULT;
 +      }
 +
 +      rc = add_va_block(hdev, va_range, start, end);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to init host va list\n");
 +              return rc;
 +      }
 +
 +      va_range->start_addr = start;
 +      va_range->end_addr = end;
 +      va_range->page_size = page_size;
 +
 +      return 0;
 +}
 +
 +/**
 + * va_range_fini() - clear a virtual addresses range.
 + * @hdev: pointer to the habanalabs structure.
 + * @va_range: pointer to virtual addresses range.
 + *
 + * This function does the following:
 + * - Frees the virtual addresses block list and its lock.
 + */
 +static void va_range_fini(struct hl_device *hdev, struct hl_va_range *va_range)
 +{
 +      mutex_lock(&va_range->lock);
 +      clear_va_list_locked(hdev, &va_range->list);
 +      mutex_unlock(&va_range->lock);
 +
 +      mutex_destroy(&va_range->lock);
 +      kfree(va_range);
 +}
 +
 +/**
 + * vm_ctx_init_with_ranges() - initialize virtual memory for context.
 + * @ctx: pointer to the habanalabs context structure.
 + * @host_range_start: host virtual addresses range start.
 + * @host_range_end: host virtual addresses range end.
 + * @host_page_size: host page size.
 + * @host_huge_range_start: host virtual addresses range start for memory
 + *                         allocated with huge pages.
 + * @host_huge_range_end: host virtual addresses range end for memory allocated
 + *                        with huge pages.
 + * @host_huge_page_size: host huge page size.
 + * @dram_range_start: dram virtual addresses range start.
 + * @dram_range_end: dram virtual addresses range end.
 + * @dram_page_size: dram page size.
 + *
 + * This function initializes the following:
 + * - MMU for context.
 + * - Virtual address to area descriptor hashtable.
 + * - Virtual block list of available virtual memory.
 + */
 +static int vm_ctx_init_with_ranges(struct hl_ctx *ctx,
 +                                      u64 host_range_start,
 +                                      u64 host_range_end,
 +                                      u32 host_page_size,
 +                                      u64 host_huge_range_start,
 +                                      u64 host_huge_range_end,
 +                                      u32 host_huge_page_size,
 +                                      u64 dram_range_start,
 +                                      u64 dram_range_end,
 +                                      u32 dram_page_size)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      int i, rc;
 +
 +      for (i = 0 ; i < HL_VA_RANGE_TYPE_MAX ; i++) {
 +              ctx->va_range[i] =
 +                      kzalloc(sizeof(struct hl_va_range), GFP_KERNEL);
 +              if (!ctx->va_range[i]) {
 +                      rc = -ENOMEM;
 +                      goto free_va_range;
 +              }
 +      }
 +
 +      rc = hl_mmu_ctx_init(ctx);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to init context %d\n", ctx->asid);
 +              goto free_va_range;
 +      }
 +
 +      mutex_init(&ctx->mem_hash_lock);
 +      hash_init(ctx->mem_hash);
 +
 +      mutex_init(&ctx->va_range[HL_VA_RANGE_TYPE_HOST]->lock);
 +
 +      rc = va_range_init(hdev, ctx->va_range, HL_VA_RANGE_TYPE_HOST,
 +                      host_range_start, host_range_end, host_page_size);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to init host vm range\n");
 +              goto mmu_ctx_fini;
 +      }
 +
 +      if (hdev->pmmu_huge_range) {
 +              mutex_init(&ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->lock);
 +
 +              rc = va_range_init(hdev,
 +                      ctx->va_range, HL_VA_RANGE_TYPE_HOST_HUGE,
 +                      host_huge_range_start, host_huge_range_end,
 +                      host_huge_page_size);
 +              if (rc) {
 +                      dev_err(hdev->dev,
 +                              "failed to init host huge vm range\n");
 +                      goto clear_host_va_range;
 +              }
 +      } else {
 +              kfree(ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]);
 +              ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE] =
 +                              ctx->va_range[HL_VA_RANGE_TYPE_HOST];
 +      }
 +
 +      mutex_init(&ctx->va_range[HL_VA_RANGE_TYPE_DRAM]->lock);
 +
 +      rc = va_range_init(hdev, ctx->va_range, HL_VA_RANGE_TYPE_DRAM,
 +                      dram_range_start, dram_range_end, dram_page_size);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to init dram vm range\n");
 +              goto clear_host_huge_va_range;
 +      }
 +
 +      hl_debugfs_add_ctx_mem_hash(hdev, ctx);
 +
 +      return 0;
 +
 +clear_host_huge_va_range:
 +      mutex_destroy(&ctx->va_range[HL_VA_RANGE_TYPE_DRAM]->lock);
 +
 +      if (hdev->pmmu_huge_range) {
 +              mutex_lock(&ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->lock);
 +              clear_va_list_locked(hdev,
 +                      &ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->list);
 +              mutex_unlock(&ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->lock);
 +      }
 +clear_host_va_range:
 +      if (hdev->pmmu_huge_range)
 +              mutex_destroy(&ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->lock);
 +      mutex_lock(&ctx->va_range[HL_VA_RANGE_TYPE_HOST]->lock);
 +      clear_va_list_locked(hdev, &ctx->va_range[HL_VA_RANGE_TYPE_HOST]->list);
 +      mutex_unlock(&ctx->va_range[HL_VA_RANGE_TYPE_HOST]->lock);
 +mmu_ctx_fini:
 +      mutex_destroy(&ctx->va_range[HL_VA_RANGE_TYPE_HOST]->lock);
 +      mutex_destroy(&ctx->mem_hash_lock);
 +      hl_mmu_ctx_fini(ctx);
 +free_va_range:
 +      for (i = 0 ; i < HL_VA_RANGE_TYPE_MAX ; i++)
 +              kfree(ctx->va_range[i]);
 +
 +      return rc;
 +}
 +
 +int hl_vm_ctx_init(struct hl_ctx *ctx)
 +{
 +      struct asic_fixed_properties *prop = &ctx->hdev->asic_prop;
 +      u64 host_range_start, host_range_end, host_huge_range_start,
 +              host_huge_range_end, dram_range_start, dram_range_end;
 +      u32 host_page_size, host_huge_page_size, dram_page_size;
 +
 +      atomic64_set(&ctx->dram_phys_mem, 0);
 +
 +      /*
 +       * - If MMU is enabled, init the ranges as usual.
 +       * - If MMU is disabled, in case of host mapping, the returned address
 +       *   is the given one.
 +       *   In case of DRAM mapping, the returned address is the physical
 +       *   address of the memory related to the given handle.
 +       */
 +      if (!ctx->hdev->mmu_enable)
 +              return 0;
 +
 +      dram_range_start = prop->dmmu.start_addr;
 +      dram_range_end = prop->dmmu.end_addr - 1;
 +      dram_page_size = prop->dram_page_size ?
 +                              prop->dram_page_size : prop->dmmu.page_size;
 +      host_range_start = prop->pmmu.start_addr;
 +      host_range_end = prop->pmmu.end_addr - 1;
 +      host_page_size = prop->pmmu.page_size;
 +      host_huge_range_start = prop->pmmu_huge.start_addr;
 +      host_huge_range_end = prop->pmmu_huge.end_addr - 1;
 +      host_huge_page_size = prop->pmmu_huge.page_size;
 +
 +      return vm_ctx_init_with_ranges(ctx, host_range_start, host_range_end,
 +                      host_page_size, host_huge_range_start,
 +                      host_huge_range_end, host_huge_page_size,
 +                      dram_range_start, dram_range_end, dram_page_size);
 +}
 +
 +/**
 + * hl_vm_ctx_fini() - virtual memory teardown of context.
 + * @ctx: pointer to the habanalabs context structure.
 + *
 + * This function perform teardown the following:
 + * - Virtual block list of available virtual memory.
 + * - Virtual address to area descriptor hashtable.
 + * - MMU for context.
 + *
 + * In addition this function does the following:
 + * - Unmaps the existing hashtable nodes if the hashtable is not empty. The
 + *   hashtable should be empty as no valid mappings should exist at this
 + *   point.
 + * - Frees any existing physical page list from the idr which relates to the
 + *   current context asid.
 + * - This function checks the virtual block list for correctness. At this point
 + *   the list should contain one element which describes the whole virtual
 + *   memory range of the context. Otherwise, a warning is printed.
 + */
 +void hl_vm_ctx_fini(struct hl_ctx *ctx)
 +{
 +      struct hl_vm_phys_pg_pack *phys_pg_list, *tmp_phys_node;
 +      struct hl_device *hdev = ctx->hdev;
 +      struct hl_vm_hash_node *hnode;
 +      struct hl_vm *vm = &hdev->vm;
 +      struct hlist_node *tmp_node;
 +      struct list_head free_list;
 +      struct hl_mem_in args;
 +      int i;
 +
 +      if (!hdev->mmu_enable)
 +              return;
 +
 +      hl_debugfs_remove_ctx_mem_hash(hdev, ctx);
 +
 +      /*
 +       * Clearly something went wrong on hard reset so no point in printing
 +       * another side effect error
 +       */
 +      if (!hdev->reset_info.hard_reset_pending && !hash_empty(ctx->mem_hash))
 +              dev_dbg(hdev->dev,
 +                      "user released device without removing its memory mappings\n");
 +
 +      hash_for_each_safe(ctx->mem_hash, i, tmp_node, hnode, node) {
 +              dev_dbg(hdev->dev,
 +                      "hl_mem_hash_node of vaddr 0x%llx of asid %d is still alive\n",
 +                      hnode->vaddr, ctx->asid);
 +              args.unmap.device_virt_addr = hnode->vaddr;
 +              unmap_device_va(ctx, &args, true);
 +      }
 +
 +      mutex_lock(&hdev->mmu_lock);
 +
 +      /* invalidate the cache once after the unmapping loop */
 +      hl_mmu_invalidate_cache(hdev, true, MMU_OP_USERPTR);
 +      hl_mmu_invalidate_cache(hdev, true, MMU_OP_PHYS_PACK);
 +
 +      mutex_unlock(&hdev->mmu_lock);
 +
 +      INIT_LIST_HEAD(&free_list);
 +
 +      spin_lock(&vm->idr_lock);
 +      idr_for_each_entry(&vm->phys_pg_pack_handles, phys_pg_list, i)
 +              if (phys_pg_list->asid == ctx->asid) {
 +                      dev_dbg(hdev->dev,
 +                              "page list 0x%px of asid %d is still alive\n",
 +                              phys_pg_list, ctx->asid);
 +
 +                      atomic64_sub(phys_pg_list->total_size, &hdev->dram_used_mem);
 +                      idr_remove(&vm->phys_pg_pack_handles, i);
 +                      list_add(&phys_pg_list->node, &free_list);
 +              }
 +      spin_unlock(&vm->idr_lock);
 +
 +      list_for_each_entry_safe(phys_pg_list, tmp_phys_node, &free_list, node)
 +              free_phys_pg_pack(hdev, phys_pg_list);
 +
 +      va_range_fini(hdev, ctx->va_range[HL_VA_RANGE_TYPE_DRAM]);
 +      va_range_fini(hdev, ctx->va_range[HL_VA_RANGE_TYPE_HOST]);
 +
 +      if (hdev->pmmu_huge_range)
 +              va_range_fini(hdev, ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]);
 +
 +      mutex_destroy(&ctx->mem_hash_lock);
 +      hl_mmu_ctx_fini(ctx);
 +
 +      /* In this case we need to clear the global accounting of DRAM usage
 +       * because the user notifies us on allocations. If the user is no more,
 +       * all DRAM is available
 +       */
 +      if (ctx->asid != HL_KERNEL_ASID_ID &&
 +                      !hdev->asic_prop.dram_supports_virtual_memory)
 +              atomic64_set(&hdev->dram_used_mem, 0);
 +}
 +
 +/**
 + * hl_vm_init() - initialize virtual memory module.
 + * @hdev: pointer to the habanalabs device structure.
 + *
 + * This function initializes the following:
 + * - MMU module.
 + * - DRAM physical pages pool of 2MB.
 + * - Idr for device memory allocation handles.
 + */
 +int hl_vm_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct hl_vm *vm = &hdev->vm;
 +      int rc;
 +
 +      if (is_power_of_2(prop->dram_page_size))
 +              vm->dram_pg_pool =
 +                      gen_pool_create(__ffs(prop->dram_page_size), -1);
 +      else
 +              vm->dram_pg_pool =
 +                      gen_pool_create(__ffs(DRAM_POOL_PAGE_SIZE), -1);
 +
 +      if (!vm->dram_pg_pool) {
 +              dev_err(hdev->dev, "Failed to create dram page pool\n");
 +              return -ENOMEM;
 +      }
 +
 +      kref_init(&vm->dram_pg_pool_refcount);
 +
 +      rc = gen_pool_add(vm->dram_pg_pool, prop->dram_user_base_address,
 +                      prop->dram_end_address - prop->dram_user_base_address,
 +                      -1);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to add memory to dram page pool %d\n", rc);
 +              goto pool_add_err;
 +      }
 +
 +      spin_lock_init(&vm->idr_lock);
 +      idr_init(&vm->phys_pg_pack_handles);
 +
 +      atomic64_set(&hdev->dram_used_mem, 0);
 +
 +      vm->init_done = true;
 +
 +      return 0;
 +
 +pool_add_err:
 +      gen_pool_destroy(vm->dram_pg_pool);
 +
 +      return rc;
 +}
 +
 +/**
 + * hl_vm_fini() - virtual memory module teardown.
 + * @hdev: pointer to the habanalabs device structure.
 + *
 + * This function perform teardown to the following:
 + * - Idr for device memory allocation handles.
 + * - DRAM physical pages pool of 2MB.
 + * - MMU module.
 + */
 +void hl_vm_fini(struct hl_device *hdev)
 +{
 +      struct hl_vm *vm = &hdev->vm;
 +
 +      if (!vm->init_done)
 +              return;
 +
 +      /*
 +       * At this point all the contexts should be freed and hence no DRAM
 +       * memory should be in use. Hence the DRAM pool should be freed here.
 +       */
 +      if (kref_put(&vm->dram_pg_pool_refcount, dram_pg_pool_do_release) != 1)
 +              dev_warn(hdev->dev, "dram_pg_pool was not destroyed on %s\n",
 +                              __func__);
 +
 +      vm->init_done = false;
 +}
 +
 +/**
 + * hl_hw_block_mem_init() - HW block memory initialization.
 + * @ctx: pointer to the habanalabs context structure.
 + *
 + * This function initializes the HW block virtual mapped addresses list and
 + * it's lock.
 + */
 +void hl_hw_block_mem_init(struct hl_ctx *ctx)
 +{
 +      mutex_init(&ctx->hw_block_list_lock);
 +      INIT_LIST_HEAD(&ctx->hw_block_mem_list);
 +}
 +
 +/**
 + * hl_hw_block_mem_fini() - HW block memory teardown.
 + * @ctx: pointer to the habanalabs context structure.
 + *
 + * This function clears the HW block virtual mapped addresses list and destroys
 + * it's lock.
 + */
 +void hl_hw_block_mem_fini(struct hl_ctx *ctx)
 +{
 +      struct hl_vm_hw_block_list_node *lnode, *tmp;
 +
 +      if (!list_empty(&ctx->hw_block_mem_list))
 +              dev_crit(ctx->hdev->dev, "HW block mem list isn't empty\n");
 +
 +      list_for_each_entry_safe(lnode, tmp, &ctx->hw_block_mem_list, node) {
 +              list_del(&lnode->node);
 +              kfree(lnode);
 +      }
 +
 +      mutex_destroy(&ctx->hw_block_list_lock);
 +}
index 71debe862c865fd1e4440036db32a63933ba8583,0000000000000000000000000000000000000000..bb858b94e1e81348853601019a907a268a4883fa
mode 100644,000000..100644
--- /dev/null
@@@ -1,9282 -1,0 +1,9282 @@@
-       vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
-                       VM_DONTCOPY | VM_NORESERVE;
 +// SPDX-License-Identifier: GPL-2.0
 +
 +/*
 + * Copyright 2016-2022 HabanaLabs, Ltd.
 + * All Rights Reserved.
 + */
 +
 +#include "gaudiP.h"
 +#include "../include/hw_ip/mmu/mmu_general.h"
 +#include "../include/hw_ip/mmu/mmu_v1_1.h"
 +#include "../include/gaudi/gaudi_masks.h"
 +#include "../include/gaudi/gaudi_fw_if.h"
 +#include "../include/gaudi/gaudi_reg_map.h"
 +#include "../include/gaudi/gaudi_async_ids_map_extended.h"
 +
 +#include <linux/module.h>
 +#include <linux/pci.h>
 +#include <linux/firmware.h>
 +#include <linux/hwmon.h>
 +#include <linux/iommu.h>
 +#include <linux/seq_file.h>
 +
 +/*
 + * Gaudi security scheme:
 + *
 + * 1. Host is protected by:
 + *        - Range registers
 + *        - MMU
 + *
 + * 2. DDR is protected by:
 + *        - Range registers (protect the first 512MB)
 + *
 + * 3. Configuration is protected by:
 + *        - Range registers
 + *        - Protection bits
 + *
 + * MMU is always enabled.
 + *
 + * QMAN DMA channels 0,1 (PCI DMAN):
 + *     - DMA is not secured.
 + *     - PQ and CQ are secured.
 + *     - CP is secured: The driver needs to parse CB but WREG should be allowed
 + *                      because of TDMA (tensor DMA). Hence, WREG is always not
 + *                      secured.
 + *
 + * When the driver needs to use DMA it will check that Gaudi is idle, set DMA
 + * channel 0 to be secured, execute the DMA and change it back to not secured.
 + * Currently, the driver doesn't use the DMA while there are compute jobs
 + * running.
 + *
 + * The current use cases for the driver to use the DMA are:
 + *     - Clear SRAM on context switch (happens on context switch when device is
 + *       idle)
 + *     - MMU page tables area clear (happens on init)
 + *
 + * QMAN DMA 2-7, TPC, MME, NIC:
 + * PQ is secured and is located on the Host (HBM CON TPC3 bug)
 + * CQ, CP and the engine are not secured
 + *
 + */
 +
 +#define GAUDI_BOOT_FIT_FILE   "habanalabs/gaudi/gaudi-boot-fit.itb"
 +#define GAUDI_LINUX_FW_FILE   "habanalabs/gaudi/gaudi-fit.itb"
 +#define GAUDI_TPC_FW_FILE     "habanalabs/gaudi/gaudi_tpc.bin"
 +
 +#define GAUDI_DMA_POOL_BLK_SIZE               0x100 /* 256 bytes */
 +
 +#define GAUDI_RESET_TIMEOUT_MSEC      2000            /* 2000ms */
 +#define GAUDI_RESET_WAIT_MSEC         1               /* 1ms */
 +#define GAUDI_CPU_RESET_WAIT_MSEC     200             /* 200ms */
 +#define GAUDI_TEST_QUEUE_WAIT_USEC    100000          /* 100ms */
 +
 +#define GAUDI_PLDM_RESET_WAIT_MSEC    1000            /* 1s */
 +#define GAUDI_PLDM_HRESET_TIMEOUT_MSEC        20000           /* 20s */
 +#define GAUDI_PLDM_TEST_QUEUE_WAIT_USEC       1000000         /* 1s */
 +#define GAUDI_PLDM_MMU_TIMEOUT_USEC   (MMU_CONFIG_TIMEOUT_USEC * 100)
 +#define GAUDI_PLDM_QMAN0_TIMEOUT_USEC (HL_DEVICE_TIMEOUT_USEC * 30)
 +#define GAUDI_PLDM_TPC_KERNEL_WAIT_USEC       (HL_DEVICE_TIMEOUT_USEC * 30)
 +#define GAUDI_BOOT_FIT_REQ_TIMEOUT_USEC       4000000         /* 4s */
 +#define GAUDI_MSG_TO_CPU_TIMEOUT_USEC 4000000         /* 4s */
 +#define GAUDI_WAIT_FOR_BL_TIMEOUT_USEC        15000000        /* 15s */
 +
 +#define GAUDI_QMAN0_FENCE_VAL         0x72E91AB9
 +
 +#define GAUDI_MAX_STRING_LEN          20
 +
 +#define GAUDI_CB_POOL_CB_CNT          512
 +#define GAUDI_CB_POOL_CB_SIZE         0x20000 /* 128KB */
 +
 +#define GAUDI_ALLOC_CPU_MEM_RETRY_CNT 3
 +
 +#define GAUDI_NUM_OF_TPC_INTR_CAUSE   20
 +
 +#define GAUDI_NUM_OF_QM_ERR_CAUSE     16
 +
 +#define GAUDI_NUM_OF_QM_ARB_ERR_CAUSE 3
 +
 +#define GAUDI_ARB_WDT_TIMEOUT         0xEE6b27FF /* 8 seconds */
 +
 +#define HBM_SCRUBBING_TIMEOUT_US      1000000 /* 1s */
 +
 +#define BIN_REG_STRING_SIZE   sizeof("0b10101010101010101010101010101010")
 +
 +#define MONITOR_SOB_STRING_SIZE               256
 +
 +static u32 gaudi_stream_master[GAUDI_STREAM_MASTER_ARR_SIZE] = {
 +      GAUDI_QUEUE_ID_DMA_0_0,
 +      GAUDI_QUEUE_ID_DMA_0_1,
 +      GAUDI_QUEUE_ID_DMA_0_2,
 +      GAUDI_QUEUE_ID_DMA_0_3,
 +      GAUDI_QUEUE_ID_DMA_1_0,
 +      GAUDI_QUEUE_ID_DMA_1_1,
 +      GAUDI_QUEUE_ID_DMA_1_2,
 +      GAUDI_QUEUE_ID_DMA_1_3
 +};
 +
 +static const char gaudi_irq_name[GAUDI_MSI_ENTRIES][GAUDI_MAX_STRING_LEN] = {
 +              "gaudi cq 0_0", "gaudi cq 0_1", "gaudi cq 0_2", "gaudi cq 0_3",
 +              "gaudi cq 1_0", "gaudi cq 1_1", "gaudi cq 1_2", "gaudi cq 1_3",
 +              "gaudi cq 5_0", "gaudi cq 5_1", "gaudi cq 5_2", "gaudi cq 5_3",
 +              "gaudi cpu eq"
 +};
 +
 +static const u8 gaudi_dma_assignment[GAUDI_DMA_MAX] = {
 +      [GAUDI_PCI_DMA_1] = GAUDI_ENGINE_ID_DMA_0,
 +      [GAUDI_PCI_DMA_2] = GAUDI_ENGINE_ID_DMA_1,
 +      [GAUDI_HBM_DMA_1] = GAUDI_ENGINE_ID_DMA_2,
 +      [GAUDI_HBM_DMA_2] = GAUDI_ENGINE_ID_DMA_3,
 +      [GAUDI_HBM_DMA_3] = GAUDI_ENGINE_ID_DMA_4,
 +      [GAUDI_HBM_DMA_4] = GAUDI_ENGINE_ID_DMA_5,
 +      [GAUDI_HBM_DMA_5] = GAUDI_ENGINE_ID_DMA_6,
 +      [GAUDI_HBM_DMA_6] = GAUDI_ENGINE_ID_DMA_7
 +};
 +
 +static const u8 gaudi_cq_assignment[NUMBER_OF_CMPLT_QUEUES] = {
 +      [0] = GAUDI_QUEUE_ID_DMA_0_0,
 +      [1] = GAUDI_QUEUE_ID_DMA_0_1,
 +      [2] = GAUDI_QUEUE_ID_DMA_0_2,
 +      [3] = GAUDI_QUEUE_ID_DMA_0_3,
 +      [4] = GAUDI_QUEUE_ID_DMA_1_0,
 +      [5] = GAUDI_QUEUE_ID_DMA_1_1,
 +      [6] = GAUDI_QUEUE_ID_DMA_1_2,
 +      [7] = GAUDI_QUEUE_ID_DMA_1_3,
 +};
 +
 +static const u16 gaudi_packet_sizes[MAX_PACKET_ID] = {
 +      [PACKET_WREG_32]        = sizeof(struct packet_wreg32),
 +      [PACKET_WREG_BULK]      = sizeof(struct packet_wreg_bulk),
 +      [PACKET_MSG_LONG]       = sizeof(struct packet_msg_long),
 +      [PACKET_MSG_SHORT]      = sizeof(struct packet_msg_short),
 +      [PACKET_CP_DMA]         = sizeof(struct packet_cp_dma),
 +      [PACKET_REPEAT]         = sizeof(struct packet_repeat),
 +      [PACKET_MSG_PROT]       = sizeof(struct packet_msg_prot),
 +      [PACKET_FENCE]          = sizeof(struct packet_fence),
 +      [PACKET_LIN_DMA]        = sizeof(struct packet_lin_dma),
 +      [PACKET_NOP]            = sizeof(struct packet_nop),
 +      [PACKET_STOP]           = sizeof(struct packet_stop),
 +      [PACKET_ARB_POINT]      = sizeof(struct packet_arb_point),
 +      [PACKET_WAIT]           = sizeof(struct packet_wait),
 +      [PACKET_LOAD_AND_EXE]   = sizeof(struct packet_load_and_exe)
 +};
 +
 +static inline bool validate_packet_id(enum packet_id id)
 +{
 +      switch (id) {
 +      case PACKET_WREG_32:
 +      case PACKET_WREG_BULK:
 +      case PACKET_MSG_LONG:
 +      case PACKET_MSG_SHORT:
 +      case PACKET_CP_DMA:
 +      case PACKET_REPEAT:
 +      case PACKET_MSG_PROT:
 +      case PACKET_FENCE:
 +      case PACKET_LIN_DMA:
 +      case PACKET_NOP:
 +      case PACKET_STOP:
 +      case PACKET_ARB_POINT:
 +      case PACKET_WAIT:
 +      case PACKET_LOAD_AND_EXE:
 +              return true;
 +      default:
 +              return false;
 +      }
 +}
 +
 +static const char * const
 +gaudi_tpc_interrupts_cause[GAUDI_NUM_OF_TPC_INTR_CAUSE] = {
 +      "tpc_address_exceed_slm",
 +      "tpc_div_by_0",
 +      "tpc_spu_mac_overflow",
 +      "tpc_spu_addsub_overflow",
 +      "tpc_spu_abs_overflow",
 +      "tpc_spu_fp_dst_nan_inf",
 +      "tpc_spu_fp_dst_denorm",
 +      "tpc_vpu_mac_overflow",
 +      "tpc_vpu_addsub_overflow",
 +      "tpc_vpu_abs_overflow",
 +      "tpc_vpu_fp_dst_nan_inf",
 +      "tpc_vpu_fp_dst_denorm",
 +      "tpc_assertions",
 +      "tpc_illegal_instruction",
 +      "tpc_pc_wrap_around",
 +      "tpc_qm_sw_err",
 +      "tpc_hbw_rresp_err",
 +      "tpc_hbw_bresp_err",
 +      "tpc_lbw_rresp_err",
 +      "tpc_lbw_bresp_err"
 +};
 +
 +static const char * const
 +gaudi_qman_error_cause[GAUDI_NUM_OF_QM_ERR_CAUSE] = {
 +      "PQ AXI HBW error",
 +      "CQ AXI HBW error",
 +      "CP AXI HBW error",
 +      "CP error due to undefined OPCODE",
 +      "CP encountered STOP OPCODE",
 +      "CP AXI LBW error",
 +      "CP WRREG32 or WRBULK returned error",
 +      "N/A",
 +      "FENCE 0 inc over max value and clipped",
 +      "FENCE 1 inc over max value and clipped",
 +      "FENCE 2 inc over max value and clipped",
 +      "FENCE 3 inc over max value and clipped",
 +      "FENCE 0 dec under min value and clipped",
 +      "FENCE 1 dec under min value and clipped",
 +      "FENCE 2 dec under min value and clipped",
 +      "FENCE 3 dec under min value and clipped"
 +};
 +
 +static const char * const
 +gaudi_qman_arb_error_cause[GAUDI_NUM_OF_QM_ARB_ERR_CAUSE] = {
 +      "Choice push while full error",
 +      "Choice Q watchdog error",
 +      "MSG AXI LBW returned with error"
 +};
 +
 +static enum hl_queue_type gaudi_queue_type[GAUDI_QUEUE_ID_SIZE] = {
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_0_0 */
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_0_1 */
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_0_2 */
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_0_3 */
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_1_0 */
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_1_1 */
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_1_2 */
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_1_3 */
 +      QUEUE_TYPE_CPU, /* GAUDI_QUEUE_ID_CPU_PQ */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_2_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_2_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_2_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_2_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_3_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_3_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_3_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_3_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_4_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_4_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_4_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_4_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_5_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_5_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_5_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_5_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_6_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_6_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_6_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_6_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_7_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_7_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_7_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_7_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_0_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_0_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_0_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_0_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_1_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_1_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_1_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_1_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_0_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_0_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_0_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_0_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_1_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_1_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_1_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_1_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_2_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_2_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_2_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_2_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_3_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_3_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_3_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_3_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_4_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_4_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_4_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_4_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_5_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_5_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_5_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_5_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_6_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_6_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_6_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_6_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_3 */
 +};
 +
 +static struct hl_hw_obj_name_entry gaudi_so_id_to_str[] = {
 +      { .id = 0,  .name = "SYNC_OBJ_DMA_DOWN_FEEDBACK" },
 +      { .id = 1,  .name = "SYNC_OBJ_DMA_UP_FEEDBACK" },
 +      { .id = 2,  .name = "SYNC_OBJ_DMA_STATIC_DRAM_SRAM_FEEDBACK" },
 +      { .id = 3,  .name = "SYNC_OBJ_DMA_SRAM_DRAM_FEEDBACK" },
 +      { .id = 4,  .name = "SYNC_OBJ_FIRST_COMPUTE_FINISH" },
 +      { .id = 5,  .name = "SYNC_OBJ_HOST_DRAM_DONE" },
 +      { .id = 6,  .name = "SYNC_OBJ_DBG_CTR_DEPRECATED" },
 +      { .id = 7,  .name = "SYNC_OBJ_DMA_ACTIVATIONS_DRAM_SRAM_FEEDBACK" },
 +      { .id = 8,  .name = "SYNC_OBJ_ENGINE_SEM_MME_0" },
 +      { .id = 9,  .name = "SYNC_OBJ_ENGINE_SEM_MME_1" },
 +      { .id = 10, .name = "SYNC_OBJ_ENGINE_SEM_TPC_0" },
 +      { .id = 11, .name = "SYNC_OBJ_ENGINE_SEM_TPC_1" },
 +      { .id = 12, .name = "SYNC_OBJ_ENGINE_SEM_TPC_2" },
 +      { .id = 13, .name = "SYNC_OBJ_ENGINE_SEM_TPC_3" },
 +      { .id = 14, .name = "SYNC_OBJ_ENGINE_SEM_TPC_4" },
 +      { .id = 15, .name = "SYNC_OBJ_ENGINE_SEM_TPC_5" },
 +      { .id = 16, .name = "SYNC_OBJ_ENGINE_SEM_TPC_6" },
 +      { .id = 17, .name = "SYNC_OBJ_ENGINE_SEM_TPC_7" },
 +      { .id = 18, .name = "SYNC_OBJ_ENGINE_SEM_DMA_1" },
 +      { .id = 19, .name = "SYNC_OBJ_ENGINE_SEM_DMA_2" },
 +      { .id = 20, .name = "SYNC_OBJ_ENGINE_SEM_DMA_3" },
 +      { .id = 21, .name = "SYNC_OBJ_ENGINE_SEM_DMA_4" },
 +      { .id = 22, .name = "SYNC_OBJ_ENGINE_SEM_DMA_5" },
 +      { .id = 23, .name = "SYNC_OBJ_ENGINE_SEM_DMA_6" },
 +      { .id = 24, .name = "SYNC_OBJ_ENGINE_SEM_DMA_7" },
 +      { .id = 25, .name = "SYNC_OBJ_DBG_CTR_0" },
 +      { .id = 26, .name = "SYNC_OBJ_DBG_CTR_1" },
 +};
 +
 +static struct hl_hw_obj_name_entry gaudi_monitor_id_to_str[] = {
 +      { .id = 200, .name = "MON_OBJ_DMA_DOWN_FEEDBACK_RESET" },
 +      { .id = 201, .name = "MON_OBJ_DMA_UP_FEEDBACK_RESET" },
 +      { .id = 203, .name = "MON_OBJ_DRAM_TO_SRAM_QUEUE_FENCE" },
 +      { .id = 204, .name = "MON_OBJ_TPC_0_CLK_GATE" },
 +      { .id = 205, .name = "MON_OBJ_TPC_1_CLK_GATE" },
 +      { .id = 206, .name = "MON_OBJ_TPC_2_CLK_GATE" },
 +      { .id = 207, .name = "MON_OBJ_TPC_3_CLK_GATE" },
 +      { .id = 208, .name = "MON_OBJ_TPC_4_CLK_GATE" },
 +      { .id = 209, .name = "MON_OBJ_TPC_5_CLK_GATE" },
 +      { .id = 210, .name = "MON_OBJ_TPC_6_CLK_GATE" },
 +      { .id = 211, .name = "MON_OBJ_TPC_7_CLK_GATE" },
 +};
 +
 +static s64 gaudi_state_dump_specs_props[] = {
 +      [SP_SYNC_OBJ_BASE_ADDR] = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0,
 +      [SP_NEXT_SYNC_OBJ_ADDR] = NEXT_SYNC_OBJ_ADDR_INTERVAL,
 +      [SP_SYNC_OBJ_AMOUNT] = NUM_OF_SOB_IN_BLOCK,
 +      [SP_MON_OBJ_WR_ADDR_LOW] =
 +              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0,
 +      [SP_MON_OBJ_WR_ADDR_HIGH] =
 +              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0,
 +      [SP_MON_OBJ_WR_DATA] = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_DATA_0,
 +      [SP_MON_OBJ_ARM_DATA] = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_ARM_0,
 +      [SP_MON_OBJ_STATUS] = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_STATUS_0,
 +      [SP_MONITORS_AMOUNT] = NUM_OF_MONITORS_IN_BLOCK,
 +      [SP_TPC0_CMDQ] = mmTPC0_QM_GLBL_CFG0,
 +      [SP_TPC0_CFG_SO] = mmTPC0_CFG_QM_SYNC_OBJECT_ADDR,
 +      [SP_NEXT_TPC] = mmTPC1_QM_GLBL_CFG0 - mmTPC0_QM_GLBL_CFG0,
 +      [SP_MME_CMDQ] = mmMME0_QM_GLBL_CFG0,
 +      [SP_MME_CFG_SO] = mmMME0_CTRL_ARCH_DESC_SYNC_OBJECT_ADDR_LOW_LOCAL,
 +      [SP_NEXT_MME] = mmMME2_QM_GLBL_CFG0 - mmMME0_QM_GLBL_CFG0,
 +      [SP_DMA_CMDQ] = mmDMA0_QM_GLBL_CFG0,
 +      [SP_DMA_CFG_SO] = mmDMA0_CORE_WR_COMP_ADDR_LO,
 +      [SP_DMA_QUEUES_OFFSET] = mmDMA1_QM_GLBL_CFG0 - mmDMA0_QM_GLBL_CFG0,
 +      [SP_NUM_OF_MME_ENGINES] = NUM_OF_MME_ENGINES,
 +      [SP_SUB_MME_ENG_NUM] = NUM_OF_MME_SUB_ENGINES,
 +      [SP_NUM_OF_DMA_ENGINES] = NUM_OF_DMA_ENGINES,
 +      [SP_NUM_OF_TPC_ENGINES] = NUM_OF_TPC_ENGINES,
 +      [SP_ENGINE_NUM_OF_QUEUES] = NUM_OF_QUEUES,
 +      [SP_ENGINE_NUM_OF_STREAMS] = NUM_OF_STREAMS,
 +      [SP_ENGINE_NUM_OF_FENCES] = NUM_OF_FENCES,
 +      [SP_FENCE0_CNT_OFFSET] =
 +              mmDMA0_QM_CP_FENCE0_CNT_0 - mmDMA0_QM_GLBL_CFG0,
 +      [SP_FENCE0_RDATA_OFFSET] =
 +              mmDMA0_QM_CP_FENCE0_RDATA_0 - mmDMA0_QM_GLBL_CFG0,
 +      [SP_CP_STS_OFFSET] = mmDMA0_QM_CP_STS_0 - mmDMA0_QM_GLBL_CFG0,
 +      [SP_NUM_CORES] = 1,
 +};
 +
 +static const int gaudi_queue_id_to_engine_id[] = {
 +      [GAUDI_QUEUE_ID_DMA_0_0...GAUDI_QUEUE_ID_DMA_0_3] = GAUDI_ENGINE_ID_DMA_0,
 +      [GAUDI_QUEUE_ID_DMA_1_0...GAUDI_QUEUE_ID_DMA_1_3] = GAUDI_ENGINE_ID_DMA_1,
 +      [GAUDI_QUEUE_ID_CPU_PQ] = GAUDI_ENGINE_ID_SIZE,
 +      [GAUDI_QUEUE_ID_DMA_2_0...GAUDI_QUEUE_ID_DMA_2_3] = GAUDI_ENGINE_ID_DMA_2,
 +      [GAUDI_QUEUE_ID_DMA_3_0...GAUDI_QUEUE_ID_DMA_3_3] = GAUDI_ENGINE_ID_DMA_3,
 +      [GAUDI_QUEUE_ID_DMA_4_0...GAUDI_QUEUE_ID_DMA_4_3] = GAUDI_ENGINE_ID_DMA_4,
 +      [GAUDI_QUEUE_ID_DMA_5_0...GAUDI_QUEUE_ID_DMA_5_3] = GAUDI_ENGINE_ID_DMA_5,
 +      [GAUDI_QUEUE_ID_DMA_6_0...GAUDI_QUEUE_ID_DMA_6_3] = GAUDI_ENGINE_ID_DMA_6,
 +      [GAUDI_QUEUE_ID_DMA_7_0...GAUDI_QUEUE_ID_DMA_7_3] = GAUDI_ENGINE_ID_DMA_7,
 +      [GAUDI_QUEUE_ID_MME_0_0...GAUDI_QUEUE_ID_MME_0_3] = GAUDI_ENGINE_ID_MME_0,
 +      [GAUDI_QUEUE_ID_MME_1_0...GAUDI_QUEUE_ID_MME_1_3] = GAUDI_ENGINE_ID_MME_2,
 +      [GAUDI_QUEUE_ID_TPC_0_0...GAUDI_QUEUE_ID_TPC_0_3] = GAUDI_ENGINE_ID_TPC_0,
 +      [GAUDI_QUEUE_ID_TPC_1_0...GAUDI_QUEUE_ID_TPC_1_3] = GAUDI_ENGINE_ID_TPC_1,
 +      [GAUDI_QUEUE_ID_TPC_2_0...GAUDI_QUEUE_ID_TPC_2_3] = GAUDI_ENGINE_ID_TPC_2,
 +      [GAUDI_QUEUE_ID_TPC_3_0...GAUDI_QUEUE_ID_TPC_3_3] = GAUDI_ENGINE_ID_TPC_3,
 +      [GAUDI_QUEUE_ID_TPC_4_0...GAUDI_QUEUE_ID_TPC_4_3] = GAUDI_ENGINE_ID_TPC_4,
 +      [GAUDI_QUEUE_ID_TPC_5_0...GAUDI_QUEUE_ID_TPC_5_3] = GAUDI_ENGINE_ID_TPC_5,
 +      [GAUDI_QUEUE_ID_TPC_6_0...GAUDI_QUEUE_ID_TPC_6_3] = GAUDI_ENGINE_ID_TPC_6,
 +      [GAUDI_QUEUE_ID_TPC_7_0...GAUDI_QUEUE_ID_TPC_7_3] = GAUDI_ENGINE_ID_TPC_7,
 +      [GAUDI_QUEUE_ID_NIC_0_0...GAUDI_QUEUE_ID_NIC_0_3] = GAUDI_ENGINE_ID_NIC_0,
 +      [GAUDI_QUEUE_ID_NIC_1_0...GAUDI_QUEUE_ID_NIC_1_3] = GAUDI_ENGINE_ID_NIC_1,
 +      [GAUDI_QUEUE_ID_NIC_2_0...GAUDI_QUEUE_ID_NIC_2_3] = GAUDI_ENGINE_ID_NIC_2,
 +      [GAUDI_QUEUE_ID_NIC_3_0...GAUDI_QUEUE_ID_NIC_3_3] = GAUDI_ENGINE_ID_NIC_3,
 +      [GAUDI_QUEUE_ID_NIC_4_0...GAUDI_QUEUE_ID_NIC_4_3] = GAUDI_ENGINE_ID_NIC_4,
 +      [GAUDI_QUEUE_ID_NIC_5_0...GAUDI_QUEUE_ID_NIC_5_3] = GAUDI_ENGINE_ID_NIC_5,
 +      [GAUDI_QUEUE_ID_NIC_6_0...GAUDI_QUEUE_ID_NIC_6_3] = GAUDI_ENGINE_ID_NIC_6,
 +      [GAUDI_QUEUE_ID_NIC_7_0...GAUDI_QUEUE_ID_NIC_7_3] = GAUDI_ENGINE_ID_NIC_7,
 +      [GAUDI_QUEUE_ID_NIC_8_0...GAUDI_QUEUE_ID_NIC_8_3] = GAUDI_ENGINE_ID_NIC_8,
 +      [GAUDI_QUEUE_ID_NIC_9_0...GAUDI_QUEUE_ID_NIC_9_3] = GAUDI_ENGINE_ID_NIC_9,
 +};
 +
 +/* The order here is opposite to the order of the indexing in the h/w.
 + * i.e. SYNC_MGR_W_S is actually 0, SYNC_MGR_E_S is 1, etc.
 + */
 +static const char * const gaudi_sync_manager_names[] = {
 +      "SYNC_MGR_E_N",
 +      "SYNC_MGR_W_N",
 +      "SYNC_MGR_E_S",
 +      "SYNC_MGR_W_S",
 +      NULL
 +};
 +
 +struct ecc_info_extract_params {
 +      u64 block_address;
 +      u32 num_memories;
 +      bool derr;
 +};
 +
 +static int gaudi_mmu_update_asid_hop0_addr(struct hl_device *hdev, u32 asid,
 +                                                              u64 phys_addr);
 +static int gaudi_send_job_on_qman0(struct hl_device *hdev,
 +                                      struct hl_cs_job *job);
 +static int gaudi_memset_device_memory(struct hl_device *hdev, u64 addr,
 +                                      u32 size, u64 val);
 +static int gaudi_memset_registers(struct hl_device *hdev, u64 reg_base,
 +                                      u32 num_regs, u32 val);
 +static int gaudi_run_tpc_kernel(struct hl_device *hdev, u64 tpc_kernel,
 +                              u32 tpc_id);
 +static int gaudi_mmu_clear_pgt_range(struct hl_device *hdev);
 +static int gaudi_cpucp_info_get(struct hl_device *hdev);
 +static void gaudi_disable_clock_gating(struct hl_device *hdev);
 +static void gaudi_mmu_prepare(struct hl_device *hdev, u32 asid);
 +static u32 gaudi_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id,
 +                              u32 size, bool eb);
 +static u32 gaudi_gen_wait_cb(struct hl_device *hdev,
 +                              struct hl_gen_wait_properties *prop);
 +static inline enum hl_collective_mode
 +get_collective_mode(struct hl_device *hdev, u32 queue_id)
 +{
 +      if (gaudi_queue_type[queue_id] == QUEUE_TYPE_EXT)
 +              return HL_COLLECTIVE_MASTER;
 +
 +      if (queue_id >= GAUDI_QUEUE_ID_DMA_5_0 &&
 +                      queue_id <= GAUDI_QUEUE_ID_DMA_5_3)
 +              return HL_COLLECTIVE_SLAVE;
 +
 +      if (queue_id >= GAUDI_QUEUE_ID_TPC_7_0 &&
 +                      queue_id <= GAUDI_QUEUE_ID_TPC_7_3)
 +              return HL_COLLECTIVE_SLAVE;
 +
 +      if (queue_id >= GAUDI_QUEUE_ID_NIC_0_0 &&
 +                      queue_id <= GAUDI_QUEUE_ID_NIC_9_3)
 +              return HL_COLLECTIVE_SLAVE;
 +
 +      return HL_COLLECTIVE_NOT_SUPPORTED;
 +}
 +
 +static inline void set_default_power_values(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +
 +      if (hdev->card_type == cpucp_card_type_pmc) {
 +              prop->max_power_default = MAX_POWER_DEFAULT_PMC;
 +
 +              if (prop->fw_security_enabled)
 +                      prop->dc_power_default = DC_POWER_DEFAULT_PMC_SEC;
 +              else
 +                      prop->dc_power_default = DC_POWER_DEFAULT_PMC;
 +      } else {
 +              prop->max_power_default = MAX_POWER_DEFAULT_PCI;
 +              prop->dc_power_default = DC_POWER_DEFAULT_PCI;
 +      }
 +}
 +
 +static int gaudi_set_fixed_properties(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u32 num_sync_stream_queues = 0;
 +      int i;
 +
 +      prop->max_queues = GAUDI_QUEUE_ID_SIZE;
 +      prop->hw_queues_props = kcalloc(prop->max_queues,
 +                      sizeof(struct hw_queue_properties),
 +                      GFP_KERNEL);
 +
 +      if (!prop->hw_queues_props)
 +              return -ENOMEM;
 +
 +      for (i = 0 ; i < prop->max_queues ; i++) {
 +              if (gaudi_queue_type[i] == QUEUE_TYPE_EXT) {
 +                      prop->hw_queues_props[i].type = QUEUE_TYPE_EXT;
 +                      prop->hw_queues_props[i].driver_only = 0;
 +                      prop->hw_queues_props[i].supports_sync_stream = 1;
 +                      prop->hw_queues_props[i].cb_alloc_flags =
 +                              CB_ALLOC_KERNEL;
 +                      num_sync_stream_queues++;
 +              } else if (gaudi_queue_type[i] == QUEUE_TYPE_CPU) {
 +                      prop->hw_queues_props[i].type = QUEUE_TYPE_CPU;
 +                      prop->hw_queues_props[i].driver_only = 1;
 +                      prop->hw_queues_props[i].supports_sync_stream = 0;
 +                      prop->hw_queues_props[i].cb_alloc_flags =
 +                              CB_ALLOC_KERNEL;
 +              } else if (gaudi_queue_type[i] == QUEUE_TYPE_INT) {
 +                      prop->hw_queues_props[i].type = QUEUE_TYPE_INT;
 +                      prop->hw_queues_props[i].driver_only = 0;
 +                      prop->hw_queues_props[i].supports_sync_stream = 0;
 +                      prop->hw_queues_props[i].cb_alloc_flags =
 +                              CB_ALLOC_USER;
 +
 +              }
 +              prop->hw_queues_props[i].collective_mode =
 +                                              get_collective_mode(hdev, i);
 +      }
 +
 +      prop->cache_line_size = DEVICE_CACHE_LINE_SIZE;
 +      prop->cfg_base_address = CFG_BASE;
 +      prop->device_dma_offset_for_host_access = HOST_PHYS_BASE;
 +      prop->host_base_address = HOST_PHYS_BASE;
 +      prop->host_end_address = prop->host_base_address + HOST_PHYS_SIZE;
 +      prop->completion_queues_count = NUMBER_OF_CMPLT_QUEUES;
 +      prop->completion_mode = HL_COMPLETION_MODE_JOB;
 +      prop->collective_first_sob = 0;
 +      prop->collective_first_mon = 0;
 +
 +      /* 2 SOBs per internal queue stream are reserved for collective */
 +      prop->sync_stream_first_sob =
 +                      ALIGN(NUMBER_OF_SOBS_IN_GRP, HL_MAX_SOBS_PER_MONITOR)
 +                      * QMAN_STREAMS * HL_RSVD_SOBS;
 +
 +      /* 1 monitor per internal queue stream are reserved for collective
 +       * 2 monitors per external queue stream are reserved for collective
 +       */
 +      prop->sync_stream_first_mon =
 +                      (NUMBER_OF_COLLECTIVE_QUEUES * QMAN_STREAMS) +
 +                      (NUMBER_OF_EXT_HW_QUEUES * 2);
 +
 +      prop->dram_base_address = DRAM_PHYS_BASE;
 +      prop->dram_size = GAUDI_HBM_SIZE_32GB;
 +      prop->dram_end_address = prop->dram_base_address + prop->dram_size;
 +      prop->dram_user_base_address = DRAM_BASE_ADDR_USER;
 +
 +      prop->sram_base_address = SRAM_BASE_ADDR;
 +      prop->sram_size = SRAM_SIZE;
 +      prop->sram_end_address = prop->sram_base_address + prop->sram_size;
 +      prop->sram_user_base_address =
 +                      prop->sram_base_address + SRAM_USER_BASE_OFFSET;
 +
 +      prop->mmu_cache_mng_addr = MMU_CACHE_MNG_ADDR;
 +      prop->mmu_cache_mng_size = MMU_CACHE_MNG_SIZE;
 +
 +      prop->mmu_pgt_addr = MMU_PAGE_TABLES_ADDR;
 +      if (hdev->pldm)
 +              prop->mmu_pgt_size = 0x800000; /* 8MB */
 +      else
 +              prop->mmu_pgt_size = MMU_PAGE_TABLES_SIZE;
 +      prop->mmu_pte_size = HL_PTE_SIZE;
 +      prop->mmu_hop_table_size = HOP_TABLE_SIZE_512_PTE;
 +      prop->mmu_hop0_tables_total_size = HOP0_512_PTE_TABLES_TOTAL_SIZE;
 +      prop->dram_page_size = PAGE_SIZE_2MB;
 +      prop->device_mem_alloc_default_page_size = prop->dram_page_size;
 +      prop->dram_supports_virtual_memory = false;
 +
 +      prop->pmmu.hop_shifts[MMU_HOP0] = MMU_V1_1_HOP0_SHIFT;
 +      prop->pmmu.hop_shifts[MMU_HOP1] = MMU_V1_1_HOP1_SHIFT;
 +      prop->pmmu.hop_shifts[MMU_HOP2] = MMU_V1_1_HOP2_SHIFT;
 +      prop->pmmu.hop_shifts[MMU_HOP3] = MMU_V1_1_HOP3_SHIFT;
 +      prop->pmmu.hop_shifts[MMU_HOP4] = MMU_V1_1_HOP4_SHIFT;
 +      prop->pmmu.hop_masks[MMU_HOP0] = MMU_V1_1_HOP0_MASK;
 +      prop->pmmu.hop_masks[MMU_HOP1] = MMU_V1_1_HOP1_MASK;
 +      prop->pmmu.hop_masks[MMU_HOP2] = MMU_V1_1_HOP2_MASK;
 +      prop->pmmu.hop_masks[MMU_HOP3] = MMU_V1_1_HOP3_MASK;
 +      prop->pmmu.hop_masks[MMU_HOP4] = MMU_V1_1_HOP4_MASK;
 +      prop->pmmu.start_addr = VA_HOST_SPACE_START;
 +      prop->pmmu.end_addr =
 +                      (VA_HOST_SPACE_START + VA_HOST_SPACE_SIZE / 2) - 1;
 +      prop->pmmu.page_size = PAGE_SIZE_4KB;
 +      prop->pmmu.num_hops = MMU_ARCH_5_HOPS;
 +      prop->pmmu.last_mask = LAST_MASK;
 +      /* TODO: will be duplicated until implementing per-MMU props */
 +      prop->pmmu.hop_table_size = prop->mmu_hop_table_size;
 +      prop->pmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
 +
 +      /* PMMU and HPMMU are the same except of page size */
 +      memcpy(&prop->pmmu_huge, &prop->pmmu, sizeof(prop->pmmu));
 +      prop->pmmu_huge.page_size = PAGE_SIZE_2MB;
 +
 +      /* shifts and masks are the same in PMMU and DMMU */
 +      memcpy(&prop->dmmu, &prop->pmmu, sizeof(prop->pmmu));
 +      prop->dmmu.start_addr = (VA_HOST_SPACE_START + VA_HOST_SPACE_SIZE / 2);
 +      prop->dmmu.end_addr = VA_HOST_SPACE_END;
 +      prop->dmmu.page_size = PAGE_SIZE_2MB;
 +
 +      prop->cfg_size = CFG_SIZE;
 +      prop->max_asid = MAX_ASID;
 +      prop->num_of_events = GAUDI_EVENT_SIZE;
 +      prop->tpc_enabled_mask = TPC_ENABLED_MASK;
 +
 +      set_default_power_values(hdev);
 +
 +      prop->cb_pool_cb_cnt = GAUDI_CB_POOL_CB_CNT;
 +      prop->cb_pool_cb_size = GAUDI_CB_POOL_CB_SIZE;
 +
 +      prop->pcie_dbi_base_address = mmPCIE_DBI_BASE;
 +      prop->pcie_aux_dbi_reg_addr = CFG_BASE + mmPCIE_AUX_DBI;
 +
 +      strncpy(prop->cpucp_info.card_name, GAUDI_DEFAULT_CARD_NAME,
 +                                      CARD_NAME_MAX_LEN);
 +
 +      prop->max_pending_cs = GAUDI_MAX_PENDING_CS;
 +
 +      prop->first_available_user_sob[HL_GAUDI_WS_DCORE] =
 +                      prop->sync_stream_first_sob +
 +                      (num_sync_stream_queues * HL_RSVD_SOBS);
 +      prop->first_available_user_mon[HL_GAUDI_WS_DCORE] =
 +                      prop->sync_stream_first_mon +
 +                      (num_sync_stream_queues * HL_RSVD_MONS);
 +
 +      prop->first_available_user_interrupt = USHRT_MAX;
 +
 +      for (i = 0 ; i < HL_MAX_DCORES ; i++)
 +              prop->first_available_cq[i] = USHRT_MAX;
 +
 +      prop->fw_cpu_boot_dev_sts0_valid = false;
 +      prop->fw_cpu_boot_dev_sts1_valid = false;
 +      prop->hard_reset_done_by_fw = false;
 +      prop->gic_interrupts_enable = true;
 +
 +      prop->server_type = HL_SERVER_TYPE_UNKNOWN;
 +
 +      prop->clk_pll_index = HL_GAUDI_MME_PLL;
 +      prop->max_freq_value = GAUDI_MAX_CLK_FREQ;
 +
 +      prop->use_get_power_for_reset_history = true;
 +
 +      prop->configurable_stop_on_err = true;
 +
 +      prop->set_max_power_on_device_init = true;
 +
 +      prop->dma_mask = 48;
 +
 +      prop->hbw_flush_reg = mmPCIE_WRAP_RR_ELBI_RD_SEC_REG_CTRL;
 +
 +      return 0;
 +}
 +
 +static int gaudi_pci_bars_map(struct hl_device *hdev)
 +{
 +      static const char * const name[] = {"SRAM", "CFG", "HBM"};
 +      bool is_wc[3] = {false, false, true};
 +      int rc;
 +
 +      rc = hl_pci_bars_map(hdev, name, is_wc);
 +      if (rc)
 +              return rc;
 +
 +      hdev->rmmio = hdev->pcie_bar[CFG_BAR_ID] +
 +                      (CFG_BASE - SPI_FLASH_BASE_ADDR);
 +
 +      return 0;
 +}
 +
 +static u64 gaudi_set_hbm_bar_base(struct hl_device *hdev, u64 addr)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct hl_inbound_pci_region pci_region;
 +      u64 old_addr = addr;
 +      int rc;
 +
 +      if ((gaudi) && (gaudi->hbm_bar_cur_addr == addr))
 +              return old_addr;
 +
 +      if (hdev->asic_prop.iatu_done_by_fw)
 +              return U64_MAX;
 +
 +      /* Inbound Region 2 - Bar 4 - Point to HBM */
 +      pci_region.mode = PCI_BAR_MATCH_MODE;
 +      pci_region.bar = HBM_BAR_ID;
 +      pci_region.addr = addr;
 +      rc = hl_pci_set_inbound_region(hdev, 2, &pci_region);
 +      if (rc)
 +              return U64_MAX;
 +
 +      if (gaudi) {
 +              old_addr = gaudi->hbm_bar_cur_addr;
 +              gaudi->hbm_bar_cur_addr = addr;
 +      }
 +
 +      return old_addr;
 +}
 +
 +static int gaudi_init_iatu(struct hl_device *hdev)
 +{
 +      struct hl_inbound_pci_region inbound_region;
 +      struct hl_outbound_pci_region outbound_region;
 +      int rc;
 +
 +      if (hdev->asic_prop.iatu_done_by_fw)
 +              return 0;
 +
 +      /* Inbound Region 0 - Bar 0 - Point to SRAM + CFG */
 +      inbound_region.mode = PCI_BAR_MATCH_MODE;
 +      inbound_region.bar = SRAM_BAR_ID;
 +      inbound_region.addr = SRAM_BASE_ADDR;
 +      rc = hl_pci_set_inbound_region(hdev, 0, &inbound_region);
 +      if (rc)
 +              goto done;
 +
 +      /* Inbound Region 1 - Bar 2 - Point to SPI FLASH */
 +      inbound_region.mode = PCI_BAR_MATCH_MODE;
 +      inbound_region.bar = CFG_BAR_ID;
 +      inbound_region.addr = SPI_FLASH_BASE_ADDR;
 +      rc = hl_pci_set_inbound_region(hdev, 1, &inbound_region);
 +      if (rc)
 +              goto done;
 +
 +      /* Inbound Region 2 - Bar 4 - Point to HBM */
 +      inbound_region.mode = PCI_BAR_MATCH_MODE;
 +      inbound_region.bar = HBM_BAR_ID;
 +      inbound_region.addr = DRAM_PHYS_BASE;
 +      rc = hl_pci_set_inbound_region(hdev, 2, &inbound_region);
 +      if (rc)
 +              goto done;
 +
 +      /* Outbound Region 0 - Point to Host */
 +      outbound_region.addr = HOST_PHYS_BASE;
 +      outbound_region.size = HOST_PHYS_SIZE;
 +      rc = hl_pci_set_outbound_region(hdev, &outbound_region);
 +
 +done:
 +      return rc;
 +}
 +
 +static enum hl_device_hw_state gaudi_get_hw_state(struct hl_device *hdev)
 +{
 +      return RREG32(mmHW_STATE);
 +}
 +
 +static int gaudi_early_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct pci_dev *pdev = hdev->pdev;
 +      resource_size_t pci_bar_size;
 +      u32 fw_boot_status;
 +      int rc;
 +
 +      rc = gaudi_set_fixed_properties(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed setting fixed properties\n");
 +              return rc;
 +      }
 +
 +      /* Check BAR sizes */
 +      pci_bar_size = pci_resource_len(pdev, SRAM_BAR_ID);
 +
 +      if (pci_bar_size != SRAM_BAR_SIZE) {
 +              dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
 +                      SRAM_BAR_ID, &pci_bar_size, SRAM_BAR_SIZE);
 +              rc = -ENODEV;
 +              goto free_queue_props;
 +      }
 +
 +      pci_bar_size = pci_resource_len(pdev, CFG_BAR_ID);
 +
 +      if (pci_bar_size != CFG_BAR_SIZE) {
 +              dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
 +                      CFG_BAR_ID, &pci_bar_size, CFG_BAR_SIZE);
 +              rc = -ENODEV;
 +              goto free_queue_props;
 +      }
 +
 +      prop->dram_pci_bar_size = pci_resource_len(pdev, HBM_BAR_ID);
 +      hdev->dram_pci_bar_start = pci_resource_start(pdev, HBM_BAR_ID);
 +
 +      /* If FW security is enabled at this point it means no access to ELBI */
 +      if (hdev->asic_prop.fw_security_enabled) {
 +              hdev->asic_prop.iatu_done_by_fw = true;
 +
 +              /*
 +               * GIC-security-bit can ONLY be set by CPUCP, so in this stage
 +               * decision can only be taken based on PCI ID security.
 +               */
 +              hdev->asic_prop.gic_interrupts_enable = false;
 +              goto pci_init;
 +      }
 +
 +      rc = hl_pci_elbi_read(hdev, CFG_BASE + mmCPU_BOOT_DEV_STS0,
 +                              &fw_boot_status);
 +      if (rc)
 +              goto free_queue_props;
 +
 +      /* Check whether FW is configuring iATU */
 +      if ((fw_boot_status & CPU_BOOT_DEV_STS0_ENABLED) &&
 +                      (fw_boot_status & CPU_BOOT_DEV_STS0_FW_IATU_CONF_EN))
 +              hdev->asic_prop.iatu_done_by_fw = true;
 +
 +pci_init:
 +      rc = hl_pci_init(hdev);
 +      if (rc)
 +              goto free_queue_props;
 +
 +      /* Before continuing in the initialization, we need to read the preboot
 +       * version to determine whether we run with a security-enabled firmware
 +       */
 +      rc = hl_fw_read_preboot_status(hdev);
 +      if (rc) {
 +              if (hdev->reset_on_preboot_fail)
 +                      hdev->asic_funcs->hw_fini(hdev, true, false);
 +              goto pci_fini;
 +      }
 +
 +      if (gaudi_get_hw_state(hdev) == HL_DEVICE_HW_STATE_DIRTY) {
 +              dev_dbg(hdev->dev, "H/W state is dirty, must reset before initializing\n");
 +              hdev->asic_funcs->hw_fini(hdev, true, false);
 +      }
 +
 +      return 0;
 +
 +pci_fini:
 +      hl_pci_fini(hdev);
 +free_queue_props:
 +      kfree(hdev->asic_prop.hw_queues_props);
 +      return rc;
 +}
 +
 +static int gaudi_early_fini(struct hl_device *hdev)
 +{
 +      kfree(hdev->asic_prop.hw_queues_props);
 +      hl_pci_fini(hdev);
 +
 +      return 0;
 +}
 +
 +/**
 + * gaudi_fetch_psoc_frequency - Fetch PSOC frequency values
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + */
 +static int gaudi_fetch_psoc_frequency(struct hl_device *hdev)
 +{
 +      u32 nr = 0, nf = 0, od = 0, div_fctr = 0, pll_clk, div_sel;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u16 pll_freq_arr[HL_PLL_NUM_OUTPUTS], freq;
 +      int rc;
 +
 +      if ((hdev->fw_components & FW_TYPE_LINUX) &&
 +                      (prop->fw_app_cpu_boot_dev_sts0 & CPU_BOOT_DEV_STS0_PLL_INFO_EN)) {
 +              struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
 +                      return 0;
 +
 +              rc = hl_fw_cpucp_pll_info_get(hdev, HL_GAUDI_CPU_PLL, pll_freq_arr);
 +
 +              if (rc)
 +                      return rc;
 +
 +              freq = pll_freq_arr[2];
 +      } else {
 +              /* Backward compatibility */
 +              div_fctr = RREG32(mmPSOC_CPU_PLL_DIV_FACTOR_2);
 +              div_sel = RREG32(mmPSOC_CPU_PLL_DIV_SEL_2);
 +              nr = RREG32(mmPSOC_CPU_PLL_NR);
 +              nf = RREG32(mmPSOC_CPU_PLL_NF);
 +              od = RREG32(mmPSOC_CPU_PLL_OD);
 +
 +              if (div_sel == DIV_SEL_REF_CLK ||
 +                              div_sel == DIV_SEL_DIVIDED_REF) {
 +                      if (div_sel == DIV_SEL_REF_CLK)
 +                              freq = PLL_REF_CLK;
 +                      else
 +                              freq = PLL_REF_CLK / (div_fctr + 1);
 +              } else if (div_sel == DIV_SEL_PLL_CLK ||
 +                      div_sel == DIV_SEL_DIVIDED_PLL) {
 +                      pll_clk = PLL_REF_CLK * (nf + 1) /
 +                                      ((nr + 1) * (od + 1));
 +                      if (div_sel == DIV_SEL_PLL_CLK)
 +                              freq = pll_clk;
 +                      else
 +                              freq = pll_clk / (div_fctr + 1);
 +              } else {
 +                      dev_warn(hdev->dev, "Received invalid div select value: %#x", div_sel);
 +                      freq = 0;
 +              }
 +      }
 +
 +      prop->psoc_timestamp_frequency = freq;
 +      prop->psoc_pci_pll_nr = nr;
 +      prop->psoc_pci_pll_nf = nf;
 +      prop->psoc_pci_pll_od = od;
 +      prop->psoc_pci_pll_div_factor = div_fctr;
 +
 +      return 0;
 +}
 +
 +static int _gaudi_init_tpc_mem(struct hl_device *hdev,
 +              dma_addr_t tpc_kernel_src_addr, u32 tpc_kernel_size)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct packet_lin_dma *init_tpc_mem_pkt;
 +      struct hl_cs_job *job;
 +      struct hl_cb *cb;
 +      u64 dst_addr;
 +      u32 cb_size, ctl;
 +      u8 tpc_id;
 +      int rc;
 +
 +      cb = hl_cb_kernel_create(hdev, PAGE_SIZE, false);
 +      if (!cb)
 +              return -EFAULT;
 +
 +      init_tpc_mem_pkt = cb->kernel_address;
 +      cb_size = sizeof(*init_tpc_mem_pkt);
 +      memset(init_tpc_mem_pkt, 0, cb_size);
 +
 +      init_tpc_mem_pkt->tsize = cpu_to_le32(tpc_kernel_size);
 +
 +      ctl = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_LIN_DMA);
 +      ctl |= FIELD_PREP(GAUDI_PKT_LIN_DMA_CTL_LIN_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      init_tpc_mem_pkt->ctl = cpu_to_le32(ctl);
 +
 +      init_tpc_mem_pkt->src_addr = cpu_to_le64(tpc_kernel_src_addr);
 +
 +      /* TPC_CMD is configured with I$ prefetch enabled, so address should be aligned to 8KB */
 +      dst_addr = FIELD_PREP(GAUDI_PKT_LIN_DMA_DST_ADDR_MASK,
 +                              round_up(prop->sram_user_base_address, SZ_8K));
 +      init_tpc_mem_pkt->dst_addr |= cpu_to_le64(dst_addr);
 +
 +      job = hl_cs_allocate_job(hdev, QUEUE_TYPE_EXT, true);
 +      if (!job) {
 +              dev_err(hdev->dev, "Failed to allocate a new job\n");
 +              rc = -ENOMEM;
 +              goto release_cb;
 +      }
 +
 +      job->id = 0;
 +      job->user_cb = cb;
 +      atomic_inc(&job->user_cb->cs_cnt);
 +      job->user_cb_size = cb_size;
 +      job->hw_queue_id = GAUDI_QUEUE_ID_DMA_0_0;
 +      job->patched_cb = job->user_cb;
 +      job->job_cb_size = job->user_cb_size + sizeof(struct packet_msg_prot);
 +
 +      hl_debugfs_add_job(hdev, job);
 +
 +      rc = gaudi_send_job_on_qman0(hdev, job);
 +
 +      if (rc)
 +              goto free_job;
 +
 +      for (tpc_id = 0 ; tpc_id < TPC_NUMBER_OF_ENGINES ; tpc_id++) {
 +              rc = gaudi_run_tpc_kernel(hdev, dst_addr, tpc_id);
 +              if (rc)
 +                      break;
 +      }
 +
 +free_job:
 +      hl_userptr_delete_list(hdev, &job->userptr_list);
 +      hl_debugfs_remove_job(hdev, job);
 +      kfree(job);
 +      atomic_dec(&cb->cs_cnt);
 +
 +release_cb:
 +      hl_cb_put(cb);
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
 +
 +      return rc;
 +}
 +
 +/*
 + * gaudi_init_tpc_mem() - Initialize TPC memories.
 + * @hdev: Pointer to hl_device structure.
 + *
 + * Copy TPC kernel fw from firmware file and run it to initialize TPC memories.
 + *
 + * Return: 0 for success, negative value for error.
 + */
 +static int gaudi_init_tpc_mem(struct hl_device *hdev)
 +{
 +      const struct firmware *fw;
 +      size_t fw_size;
 +      void *cpu_addr;
 +      dma_addr_t dma_handle;
 +      int rc, count = 5;
 +
 +again:
 +      rc = request_firmware(&fw, GAUDI_TPC_FW_FILE, hdev->dev);
 +      if (rc == -EINTR && count-- > 0) {
 +              msleep(50);
 +              goto again;
 +      }
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to load firmware file %s\n",
 +                              GAUDI_TPC_FW_FILE);
 +              goto out;
 +      }
 +
 +      fw_size = fw->size;
 +      cpu_addr = hl_asic_dma_alloc_coherent(hdev, fw_size, &dma_handle, GFP_KERNEL | __GFP_ZERO);
 +      if (!cpu_addr) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate %zu of dma memory for TPC kernel\n",
 +                      fw_size);
 +              rc = -ENOMEM;
 +              goto out;
 +      }
 +
 +      memcpy(cpu_addr, fw->data, fw_size);
 +
 +      rc = _gaudi_init_tpc_mem(hdev, dma_handle, fw_size);
 +
 +      hl_asic_dma_free_coherent(hdev, fw->size, cpu_addr, dma_handle);
 +
 +out:
 +      release_firmware(fw);
 +      return rc;
 +}
 +
 +static void gaudi_collective_map_sobs(struct hl_device *hdev, u32 stream)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_collective_properties *prop = &gaudi->collective_props;
 +      struct hl_hw_queue *q;
 +      u32 i, sob_id, sob_group_id, queue_id;
 +
 +      /* Iterate through SOB groups and assign a SOB for each slave queue */
 +      sob_group_id =
 +              stream * HL_RSVD_SOBS + prop->curr_sob_group_idx[stream];
 +      sob_id = prop->hw_sob_group[sob_group_id].base_sob_id;
 +
 +      queue_id = GAUDI_QUEUE_ID_NIC_0_0 + stream;
 +      for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++) {
 +              q = &hdev->kernel_queues[queue_id + (4 * i)];
 +              q->sync_stream_prop.collective_sob_id = sob_id + i;
 +      }
 +
 +      /* Both DMA5 and TPC7 use the same resources since only a single
 +       * engine need to participate in the reduction process
 +       */
 +      queue_id = GAUDI_QUEUE_ID_DMA_5_0 + stream;
 +      q = &hdev->kernel_queues[queue_id];
 +      q->sync_stream_prop.collective_sob_id =
 +                      sob_id + NIC_NUMBER_OF_ENGINES;
 +
 +      queue_id = GAUDI_QUEUE_ID_TPC_7_0 + stream;
 +      q = &hdev->kernel_queues[queue_id];
 +      q->sync_stream_prop.collective_sob_id =
 +                      sob_id + NIC_NUMBER_OF_ENGINES;
 +}
 +
 +static void gaudi_sob_group_hw_reset(struct kref *ref)
 +{
 +      struct gaudi_hw_sob_group *hw_sob_group =
 +              container_of(ref, struct gaudi_hw_sob_group, kref);
 +      struct hl_device *hdev = hw_sob_group->hdev;
 +      int i;
 +
 +      for (i = 0 ; i < NUMBER_OF_SOBS_IN_GRP ; i++)
 +              WREG32((mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 +
 +                      (hw_sob_group->base_sob_id * 4) + (i * 4)), 0);
 +
 +      kref_init(&hw_sob_group->kref);
 +}
 +
 +static void gaudi_sob_group_reset_error(struct kref *ref)
 +{
 +      struct gaudi_hw_sob_group *hw_sob_group =
 +              container_of(ref, struct gaudi_hw_sob_group, kref);
 +      struct hl_device *hdev = hw_sob_group->hdev;
 +
 +      dev_crit(hdev->dev,
 +              "SOB release shouldn't be called here, base_sob_id: %d\n",
 +              hw_sob_group->base_sob_id);
 +}
 +
 +static void gaudi_collective_mstr_sob_mask_set(struct gaudi_device *gaudi)
 +{
 +      struct gaudi_collective_properties *prop;
 +      int i;
 +
 +      prop = &gaudi->collective_props;
 +
 +      memset(prop->mstr_sob_mask, 0, sizeof(prop->mstr_sob_mask));
 +
 +      for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++)
 +              if (gaudi->hw_cap_initialized & BIT(HW_CAP_NIC_SHIFT + i))
 +                      prop->mstr_sob_mask[i / HL_MAX_SOBS_PER_MONITOR] |=
 +                                      BIT(i % HL_MAX_SOBS_PER_MONITOR);
 +      /* Set collective engine bit */
 +      prop->mstr_sob_mask[i / HL_MAX_SOBS_PER_MONITOR] |=
 +                              BIT(i % HL_MAX_SOBS_PER_MONITOR);
 +}
 +
 +static int gaudi_collective_init(struct hl_device *hdev)
 +{
 +      u32 i, sob_id, reserved_sobs_per_group;
 +      struct gaudi_collective_properties *prop;
 +      struct gaudi_device *gaudi;
 +
 +      gaudi = hdev->asic_specific;
 +      prop = &gaudi->collective_props;
 +      sob_id = hdev->asic_prop.collective_first_sob;
 +
 +      /* First sob in group must be aligned to HL_MAX_SOBS_PER_MONITOR */
 +      reserved_sobs_per_group =
 +              ALIGN(NUMBER_OF_SOBS_IN_GRP, HL_MAX_SOBS_PER_MONITOR);
 +
 +      /* Init SOB groups */
 +      for (i = 0 ; i < NUM_SOB_GROUPS; i++) {
 +              prop->hw_sob_group[i].hdev = hdev;
 +              prop->hw_sob_group[i].base_sob_id = sob_id;
 +              sob_id += reserved_sobs_per_group;
 +              gaudi_sob_group_hw_reset(&prop->hw_sob_group[i].kref);
 +      }
 +
 +      for (i = 0 ; i < QMAN_STREAMS; i++) {
 +              prop->next_sob_group_val[i] = 1;
 +              prop->curr_sob_group_idx[i] = 0;
 +              gaudi_collective_map_sobs(hdev, i);
 +      }
 +
 +      gaudi_collective_mstr_sob_mask_set(gaudi);
 +
 +      return 0;
 +}
 +
 +static void gaudi_reset_sob_group(struct hl_device *hdev, u16 sob_group)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_collective_properties *cprop = &gaudi->collective_props;
 +
 +      kref_put(&cprop->hw_sob_group[sob_group].kref,
 +                                      gaudi_sob_group_hw_reset);
 +}
 +
 +static void gaudi_collective_master_init_job(struct hl_device *hdev,
 +              struct hl_cs_job *job, u32 stream, u32 sob_group_offset)
 +{
 +      u32 master_sob_base, master_monitor, queue_id, cb_size = 0;
 +      struct gaudi_collective_properties *cprop;
 +      struct hl_gen_wait_properties wait_prop;
 +      struct hl_sync_stream_properties *prop;
 +      struct gaudi_device *gaudi;
 +
 +      gaudi = hdev->asic_specific;
 +      cprop = &gaudi->collective_props;
 +      queue_id = job->hw_queue_id;
 +      prop = &hdev->kernel_queues[queue_id].sync_stream_prop;
 +
 +      master_sob_base =
 +              cprop->hw_sob_group[sob_group_offset].base_sob_id;
 +      master_monitor = prop->collective_mstr_mon_id[0];
 +
 +      cprop->hw_sob_group[sob_group_offset].queue_id = queue_id;
 +
 +      dev_dbg(hdev->dev,
 +              "Generate master wait CBs, sob %d (mask %#x), val:0x%x, mon %u, q %d\n",
 +              master_sob_base, cprop->mstr_sob_mask[0],
 +              cprop->next_sob_group_val[stream],
 +              master_monitor, queue_id);
 +
 +      wait_prop.data = (void *) job->patched_cb;
 +      wait_prop.sob_base = master_sob_base;
 +      wait_prop.sob_mask = cprop->mstr_sob_mask[0];
 +      wait_prop.sob_val = cprop->next_sob_group_val[stream];
 +      wait_prop.mon_id = master_monitor;
 +      wait_prop.q_idx = queue_id;
 +      wait_prop.size = cb_size;
 +      cb_size += gaudi_gen_wait_cb(hdev, &wait_prop);
 +
 +      master_sob_base += HL_MAX_SOBS_PER_MONITOR;
 +      master_monitor = prop->collective_mstr_mon_id[1];
 +
 +      dev_dbg(hdev->dev,
 +              "Generate master wait CBs, sob %d (mask %#x), val:0x%x, mon %u, q %d\n",
 +              master_sob_base, cprop->mstr_sob_mask[1],
 +              cprop->next_sob_group_val[stream],
 +              master_monitor, queue_id);
 +
 +      wait_prop.sob_base = master_sob_base;
 +      wait_prop.sob_mask = cprop->mstr_sob_mask[1];
 +      wait_prop.mon_id = master_monitor;
 +      wait_prop.size = cb_size;
 +      cb_size += gaudi_gen_wait_cb(hdev, &wait_prop);
 +}
 +
 +static void gaudi_collective_slave_init_job(struct hl_device *hdev,
 +              struct hl_cs_job *job, struct hl_cs_compl *cs_cmpl)
 +{
 +      struct hl_gen_wait_properties wait_prop;
 +      struct hl_sync_stream_properties *prop;
 +      u32 queue_id, cb_size = 0;
 +
 +      queue_id = job->hw_queue_id;
 +      prop = &hdev->kernel_queues[queue_id].sync_stream_prop;
 +
 +      if (job->cs->encaps_signals) {
 +              /* use the encaps signal handle store earlier in the flow
 +               * and set the SOB information from the encaps
 +               * signals handle
 +               */
 +              hl_hw_queue_encaps_sig_set_sob_info(hdev, job->cs, job,
 +                                              cs_cmpl);
 +
 +              dev_dbg(hdev->dev, "collective wait: Sequence %llu found, sob_id: %u,  wait for sob_val: %u\n",
 +                              job->cs->sequence,
 +                              cs_cmpl->hw_sob->sob_id,
 +                              cs_cmpl->sob_val);
 +      }
 +
 +      /* Add to wait CBs using slave monitor */
 +      wait_prop.data = (void *) job->user_cb;
 +      wait_prop.sob_base = cs_cmpl->hw_sob->sob_id;
 +      wait_prop.sob_mask = 0x1;
 +      wait_prop.sob_val = cs_cmpl->sob_val;
 +      wait_prop.mon_id = prop->collective_slave_mon_id;
 +      wait_prop.q_idx = queue_id;
 +      wait_prop.size = cb_size;
 +
 +      dev_dbg(hdev->dev,
 +              "Generate slave wait CB, sob %d, val:%x, mon %d, q %d\n",
 +              cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val,
 +              prop->collective_slave_mon_id, queue_id);
 +
 +      cb_size += gaudi_gen_wait_cb(hdev, &wait_prop);
 +
 +      dev_dbg(hdev->dev,
 +              "generate signal CB, sob_id: %d, sob val: 1, q_idx: %d\n",
 +              prop->collective_sob_id, queue_id);
 +
 +      cb_size += gaudi_gen_signal_cb(hdev, job->user_cb,
 +                      prop->collective_sob_id, cb_size, false);
 +}
 +
 +static int gaudi_collective_wait_init_cs(struct hl_cs *cs)
 +{
 +      struct hl_cs_compl *signal_cs_cmpl =
 +              container_of(cs->signal_fence, struct hl_cs_compl, base_fence);
 +      struct hl_cs_compl *cs_cmpl =
 +              container_of(cs->fence, struct hl_cs_compl, base_fence);
 +      struct hl_cs_encaps_sig_handle *handle = cs->encaps_sig_hdl;
 +      struct gaudi_collective_properties *cprop;
 +      u32 stream, queue_id, sob_group_offset;
 +      struct gaudi_device *gaudi;
 +      struct hl_device *hdev;
 +      struct hl_cs_job *job;
 +      struct hl_ctx *ctx;
 +
 +      ctx = cs->ctx;
 +      hdev = ctx->hdev;
 +      gaudi = hdev->asic_specific;
 +      cprop = &gaudi->collective_props;
 +
 +      if (cs->encaps_signals) {
 +              cs_cmpl->hw_sob = handle->hw_sob;
 +              /* at this checkpoint we only need the hw_sob pointer
 +               * for the completion check before start going over the jobs
 +               * of the master/slaves, the sob_value will be taken later on
 +               * in gaudi_collective_slave_init_job depends on each
 +               * job wait offset value.
 +               */
 +              cs_cmpl->sob_val = 0;
 +      } else {
 +              /* copy the SOB id and value of the signal CS */
 +              cs_cmpl->hw_sob = signal_cs_cmpl->hw_sob;
 +              cs_cmpl->sob_val = signal_cs_cmpl->sob_val;
 +      }
 +
 +      /* check again if the signal cs already completed.
 +       * if yes then don't send any wait cs since the hw_sob
 +       * could be in reset already. if signal is not completed
 +       * then get refcount to hw_sob to prevent resetting the sob
 +       * while wait cs is not submitted.
 +       * note that this check is protected by two locks,
 +       * hw queue lock and completion object lock,
 +       * and the same completion object lock also protects
 +       * the hw_sob reset handler function.
 +       * The hw_queue lock prevent out of sync of hw_sob
 +       * refcount value, changed by signal/wait flows.
 +       */
 +      spin_lock(&signal_cs_cmpl->lock);
 +
 +      if (completion_done(&cs->signal_fence->completion)) {
 +              spin_unlock(&signal_cs_cmpl->lock);
 +              return -EINVAL;
 +      }
 +      /* Increment kref since all slave queues are now waiting on it */
 +      kref_get(&cs_cmpl->hw_sob->kref);
 +
 +      spin_unlock(&signal_cs_cmpl->lock);
 +
 +      /* Calculate the stream from collective master queue (1st job) */
 +      job = list_first_entry(&cs->job_list, struct hl_cs_job, cs_node);
 +      stream = job->hw_queue_id % 4;
 +      sob_group_offset =
 +              stream * HL_RSVD_SOBS + cprop->curr_sob_group_idx[stream];
 +
 +      list_for_each_entry(job, &cs->job_list, cs_node) {
 +              queue_id = job->hw_queue_id;
 +
 +              if (hdev->kernel_queues[queue_id].collective_mode ==
 +                              HL_COLLECTIVE_MASTER)
 +                      gaudi_collective_master_init_job(hdev, job, stream,
 +                                              sob_group_offset);
 +              else
 +                      gaudi_collective_slave_init_job(hdev, job, cs_cmpl);
 +      }
 +
 +      cs_cmpl->sob_group = sob_group_offset;
 +
 +      /* Handle sob group kref and wraparound */
 +      kref_get(&cprop->hw_sob_group[sob_group_offset].kref);
 +      cprop->next_sob_group_val[stream]++;
 +
 +      if (cprop->next_sob_group_val[stream] == HL_MAX_SOB_VAL) {
 +              /*
 +               * Decrement as we reached the max value.
 +               * The release function won't be called here as we've
 +               * just incremented the refcount.
 +               */
 +              kref_put(&cprop->hw_sob_group[sob_group_offset].kref,
 +                              gaudi_sob_group_reset_error);
 +              cprop->next_sob_group_val[stream] = 1;
 +              /* only two SOBs are currently in use */
 +              cprop->curr_sob_group_idx[stream] =
 +                      (cprop->curr_sob_group_idx[stream] + 1) &
 +                                                      (HL_RSVD_SOBS - 1);
 +
 +              gaudi_collective_map_sobs(hdev, stream);
 +
 +              dev_dbg(hdev->dev, "switched to SOB group %d, stream: %d\n",
 +                              cprop->curr_sob_group_idx[stream], stream);
 +      }
 +
 +      mb();
 +      hl_fence_put(cs->signal_fence);
 +      cs->signal_fence = NULL;
 +
 +      return 0;
 +}
 +
 +static u32 gaudi_get_patched_cb_extra_size(u32 user_cb_size)
 +{
 +      u32 cacheline_end, additional_commands;
 +
 +      cacheline_end = round_up(user_cb_size, DEVICE_CACHE_LINE_SIZE);
 +      additional_commands = sizeof(struct packet_msg_prot) * 2;
 +
 +      if (user_cb_size + additional_commands > cacheline_end)
 +              return cacheline_end - user_cb_size + additional_commands;
 +      else
 +              return additional_commands;
 +}
 +
 +static int gaudi_collective_wait_create_job(struct hl_device *hdev,
 +              struct hl_ctx *ctx, struct hl_cs *cs,
 +              enum hl_collective_mode mode, u32 queue_id, u32 wait_queue_id,
 +              u32 encaps_signal_offset)
 +{
 +      struct hw_queue_properties *hw_queue_prop;
 +      struct hl_cs_counters_atomic *cntr;
 +      struct hl_cs_job *job;
 +      struct hl_cb *cb;
 +      u32 cb_size;
 +      bool patched_cb;
 +
 +      cntr = &hdev->aggregated_cs_counters;
 +
 +      if (mode == HL_COLLECTIVE_MASTER) {
 +              /* CB size of collective master queue contains
 +               * 4 msg short packets for monitor 1 configuration
 +               * 1 fence packet
 +               * 4 msg short packets for monitor 2 configuration
 +               * 1 fence packet
 +               * 2 msg prot packets for completion and MSI
 +               */
 +              cb_size = sizeof(struct packet_msg_short) * 8 +
 +                              sizeof(struct packet_fence) * 2 +
 +                              sizeof(struct packet_msg_prot) * 2;
 +              patched_cb = true;
 +      } else {
 +              /* CB size of collective slave queues contains
 +               * 4 msg short packets for monitor configuration
 +               * 1 fence packet
 +               * 1 additional msg short packet for sob signal
 +               */
 +              cb_size = sizeof(struct packet_msg_short) * 5 +
 +                              sizeof(struct packet_fence);
 +              patched_cb = false;
 +      }
 +
 +      hw_queue_prop = &hdev->asic_prop.hw_queues_props[queue_id];
 +      job = hl_cs_allocate_job(hdev, hw_queue_prop->type, true);
 +      if (!job) {
 +              atomic64_inc(&ctx->cs_counters.out_of_mem_drop_cnt);
 +              atomic64_inc(&cntr->out_of_mem_drop_cnt);
 +              dev_err(hdev->dev, "Failed to allocate a new job\n");
 +              return -ENOMEM;
 +      }
 +
 +      /* Allocate internal mapped CB for non patched CBs */
 +      cb = hl_cb_kernel_create(hdev, cb_size,
 +                      hdev->mmu_enable && !patched_cb);
 +      if (!cb) {
 +              atomic64_inc(&ctx->cs_counters.out_of_mem_drop_cnt);
 +              atomic64_inc(&cntr->out_of_mem_drop_cnt);
 +              kfree(job);
 +              return -EFAULT;
 +      }
 +
 +      job->id = 0;
 +      job->cs = cs;
 +      job->user_cb = cb;
 +      atomic_inc(&job->user_cb->cs_cnt);
 +      job->user_cb_size = cb_size;
 +      job->hw_queue_id = queue_id;
 +
 +      /* since its guaranteed to have only one chunk in the collective wait
 +       * cs, we can use this chunk to set the encapsulated signal offset
 +       * in the jobs.
 +       */
 +      if (cs->encaps_signals)
 +              job->encaps_sig_wait_offset = encaps_signal_offset;
 +
 +      /*
 +       * No need in parsing, user CB is the patched CB.
 +       * We call hl_cb_destroy() out of two reasons - we don't need
 +       * the CB in the CB idr anymore and to decrement its refcount as
 +       * it was incremented inside hl_cb_kernel_create().
 +       */
 +      if (patched_cb)
 +              job->patched_cb = job->user_cb;
 +      else
 +              job->patched_cb = NULL;
 +
 +      job->job_cb_size = job->user_cb_size;
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
 +
 +      /* increment refcount as for external queues we get completion */
 +      if (hw_queue_prop->type == QUEUE_TYPE_EXT)
 +              cs_get(cs);
 +
 +      cs->jobs_in_queue_cnt[job->hw_queue_id]++;
 +
 +      list_add_tail(&job->cs_node, &cs->job_list);
 +
 +      hl_debugfs_add_job(hdev, job);
 +
 +      return 0;
 +}
 +
 +static int gaudi_collective_wait_create_jobs(struct hl_device *hdev,
 +              struct hl_ctx *ctx, struct hl_cs *cs,
 +              u32 wait_queue_id, u32 collective_engine_id,
 +              u32 encaps_signal_offset)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct hw_queue_properties *hw_queue_prop;
 +      u32 queue_id, collective_queue, num_jobs;
 +      u32 stream, nic_queue, nic_idx = 0;
 +      bool skip;
 +      int i, rc = 0;
 +
 +      /* Verify wait queue id is configured as master */
 +      hw_queue_prop = &hdev->asic_prop.hw_queues_props[wait_queue_id];
 +      if (!(hw_queue_prop->collective_mode == HL_COLLECTIVE_MASTER)) {
 +              dev_err(hdev->dev,
 +                      "Queue %d is not configured as collective master\n",
 +                      wait_queue_id);
 +              return -EINVAL;
 +      }
 +
 +      /* Verify engine id is supported */
 +      if (collective_engine_id != GAUDI_ENGINE_ID_DMA_5 &&
 +                      collective_engine_id != GAUDI_ENGINE_ID_TPC_7) {
 +              dev_err(hdev->dev,
 +                      "Collective wait does not support engine %u\n",
 +                      collective_engine_id);
 +              return -EINVAL;
 +      }
 +
 +      stream = wait_queue_id % 4;
 +
 +      if (collective_engine_id == GAUDI_ENGINE_ID_DMA_5)
 +              collective_queue = GAUDI_QUEUE_ID_DMA_5_0 + stream;
 +      else
 +              collective_queue = GAUDI_QUEUE_ID_TPC_7_0 + stream;
 +
 +      num_jobs = NUMBER_OF_SOBS_IN_GRP + 1;
 +      nic_queue = GAUDI_QUEUE_ID_NIC_0_0 + stream;
 +
 +      /* First job goes to the collective master queue, it will wait for
 +       * the collective slave queues to finish execution.
 +       * The synchronization is done using two monitors:
 +       * First monitor for NICs 0-7, second monitor for NICs 8-9 and the
 +       * reduction engine (DMA5/TPC7).
 +       *
 +       * Rest of the jobs goes to the collective slave queues which will
 +       * all wait for the user to signal sob 'cs_cmpl->sob_val'.
 +       */
 +      for (i = 0 ; i < num_jobs ; i++) {
 +              if (i == 0) {
 +                      queue_id = wait_queue_id;
 +                      rc = gaudi_collective_wait_create_job(hdev, ctx, cs,
 +                              HL_COLLECTIVE_MASTER, queue_id,
 +                              wait_queue_id, encaps_signal_offset);
 +              } else {
 +                      if (nic_idx < NIC_NUMBER_OF_ENGINES) {
 +                              if (gaudi->hw_cap_initialized &
 +                                      BIT(HW_CAP_NIC_SHIFT + nic_idx))
 +                                      skip = false;
 +                              else
 +                                      skip = true;
 +
 +                              queue_id = nic_queue;
 +                              nic_queue += 4;
 +                              nic_idx++;
 +
 +                              if (skip)
 +                                      continue;
 +                      } else {
 +                              queue_id = collective_queue;
 +                      }
 +
 +                      rc = gaudi_collective_wait_create_job(hdev, ctx, cs,
 +                              HL_COLLECTIVE_SLAVE, queue_id,
 +                              wait_queue_id, encaps_signal_offset);
 +              }
 +
 +              if (rc)
 +                      return rc;
 +      }
 +
 +      return rc;
 +}
 +
 +static int gaudi_late_init(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      int rc;
 +
 +      rc = gaudi->cpucp_info_get(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to get cpucp info\n");
 +              return rc;
 +      }
 +
 +      if ((hdev->card_type == cpucp_card_type_pci) &&
 +                      (hdev->nic_ports_mask & 0x3)) {
 +              dev_info(hdev->dev,
 +                      "PCI card detected, only 8 ports are enabled\n");
 +              hdev->nic_ports_mask &= ~0x3;
 +
 +              /* Stop and disable unused NIC QMANs */
 +              WREG32(mmNIC0_QM0_GLBL_CFG1, NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                                      NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                                      NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +              WREG32(mmNIC0_QM1_GLBL_CFG1, NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                                      NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                                      NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +              WREG32(mmNIC0_QM0_GLBL_CFG0, 0);
 +              WREG32(mmNIC0_QM1_GLBL_CFG0, 0);
 +
 +              gaudi->hw_cap_initialized &= ~(HW_CAP_NIC0 | HW_CAP_NIC1);
 +      }
 +
 +      rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_ENABLE_PCI_ACCESS, 0x0);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to enable PCI access from CPU\n");
 +              return rc;
 +      }
 +
 +      /* Scrub both SRAM and DRAM */
 +      rc = hdev->asic_funcs->scrub_device_mem(hdev);
 +      if (rc)
 +              goto disable_pci_access;
 +
 +      rc = gaudi_fetch_psoc_frequency(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to fetch psoc frequency\n");
 +              goto disable_pci_access;
 +      }
 +
 +      rc = gaudi_mmu_clear_pgt_range(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to clear MMU page tables range\n");
 +              goto disable_pci_access;
 +      }
 +
 +      rc = gaudi_init_tpc_mem(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to initialize TPC memories\n");
 +              goto disable_pci_access;
 +      }
 +
 +      rc = gaudi_collective_init(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to init collective\n");
 +              goto disable_pci_access;
 +      }
 +
 +      /* We only support a single ASID for the user, so for the sake of optimization, just
 +       * initialize the ASID one time during device initialization with the fixed value of 1
 +       */
 +      gaudi_mmu_prepare(hdev, 1);
 +
 +      hl_fw_set_pll_profile(hdev);
 +
 +      return 0;
 +
 +disable_pci_access:
 +      hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
 +
 +      return rc;
 +}
 +
 +static void gaudi_late_fini(struct hl_device *hdev)
 +{
 +      hl_hwmon_release_resources(hdev);
 +}
 +
 +static int gaudi_alloc_cpu_accessible_dma_mem(struct hl_device *hdev)
 +{
 +      dma_addr_t dma_addr_arr[GAUDI_ALLOC_CPU_MEM_RETRY_CNT] = {}, end_addr;
 +      void *virt_addr_arr[GAUDI_ALLOC_CPU_MEM_RETRY_CNT] = {};
 +      int i, j, rc = 0;
 +
 +      /*
 +       * The device CPU works with 40-bits addresses, while bit 39 must be set
 +       * to '1' when accessing the host.
 +       * Bits 49:39 of the full host address are saved for a later
 +       * configuration of the HW to perform extension to 50 bits.
 +       * Because there is a single HW register that holds the extension bits,
 +       * these bits must be identical in all allocated range.
 +       */
 +
 +      for (i = 0 ; i < GAUDI_ALLOC_CPU_MEM_RETRY_CNT ; i++) {
 +              virt_addr_arr[i] = hl_asic_dma_alloc_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE,
 +                                                              &dma_addr_arr[i],
 +                                                              GFP_KERNEL | __GFP_ZERO);
 +              if (!virt_addr_arr[i]) {
 +                      rc = -ENOMEM;
 +                      goto free_dma_mem_arr;
 +              }
 +
 +              end_addr = dma_addr_arr[i] + HL_CPU_ACCESSIBLE_MEM_SIZE - 1;
 +              if (GAUDI_CPU_PCI_MSB_ADDR(dma_addr_arr[i]) ==
 +                              GAUDI_CPU_PCI_MSB_ADDR(end_addr))
 +                      break;
 +      }
 +
 +      if (i == GAUDI_ALLOC_CPU_MEM_RETRY_CNT) {
 +              dev_err(hdev->dev,
 +                      "MSB of CPU accessible DMA memory are not identical in all range\n");
 +              rc = -EFAULT;
 +              goto free_dma_mem_arr;
 +      }
 +
 +      hdev->cpu_accessible_dma_mem = virt_addr_arr[i];
 +      hdev->cpu_accessible_dma_address = dma_addr_arr[i];
 +      hdev->cpu_pci_msb_addr =
 +              GAUDI_CPU_PCI_MSB_ADDR(hdev->cpu_accessible_dma_address);
 +
 +      if (!hdev->asic_prop.fw_security_enabled)
 +              GAUDI_PCI_TO_CPU_ADDR(hdev->cpu_accessible_dma_address);
 +
 +free_dma_mem_arr:
 +      for (j = 0 ; j < i ; j++)
 +              hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, virt_addr_arr[j],
 +                                              dma_addr_arr[j]);
 +
 +      return rc;
 +}
 +
 +static void gaudi_free_internal_qmans_pq_mem(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_internal_qman_info *q;
 +      u32 i;
 +
 +      for (i = 0 ; i < GAUDI_QUEUE_ID_SIZE ; i++) {
 +              q = &gaudi->internal_qmans[i];
 +              if (!q->pq_kernel_addr)
 +                      continue;
 +              hl_asic_dma_free_coherent(hdev, q->pq_size, q->pq_kernel_addr, q->pq_dma_addr);
 +      }
 +}
 +
 +static int gaudi_alloc_internal_qmans_pq_mem(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_internal_qman_info *q;
 +      int rc, i;
 +
 +      for (i = 0 ; i < GAUDI_QUEUE_ID_SIZE ; i++) {
 +              if (gaudi_queue_type[i] != QUEUE_TYPE_INT)
 +                      continue;
 +
 +              q = &gaudi->internal_qmans[i];
 +
 +              switch (i) {
 +              case GAUDI_QUEUE_ID_DMA_2_0 ... GAUDI_QUEUE_ID_DMA_7_3:
 +                      q->pq_size = HBM_DMA_QMAN_SIZE_IN_BYTES;
 +                      break;
 +              case GAUDI_QUEUE_ID_MME_0_0 ... GAUDI_QUEUE_ID_MME_1_3:
 +                      q->pq_size = MME_QMAN_SIZE_IN_BYTES;
 +                      break;
 +              case GAUDI_QUEUE_ID_TPC_0_0 ... GAUDI_QUEUE_ID_TPC_7_3:
 +                      q->pq_size = TPC_QMAN_SIZE_IN_BYTES;
 +                      break;
 +              case GAUDI_QUEUE_ID_NIC_0_0 ... GAUDI_QUEUE_ID_NIC_9_3:
 +                      q->pq_size = NIC_QMAN_SIZE_IN_BYTES;
 +                      break;
 +              default:
 +                      dev_err(hdev->dev, "Bad internal queue index %d", i);
 +                      rc = -EINVAL;
 +                      goto free_internal_qmans_pq_mem;
 +              }
 +
 +              q->pq_kernel_addr = hl_asic_dma_alloc_coherent(hdev, q->pq_size, &q->pq_dma_addr,
 +                                                              GFP_KERNEL | __GFP_ZERO);
 +              if (!q->pq_kernel_addr) {
 +                      rc = -ENOMEM;
 +                      goto free_internal_qmans_pq_mem;
 +              }
 +      }
 +
 +      return 0;
 +
 +free_internal_qmans_pq_mem:
 +      gaudi_free_internal_qmans_pq_mem(hdev);
 +      return rc;
 +}
 +
 +static void gaudi_set_pci_memory_regions(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct pci_mem_region *region;
 +
 +      /* CFG */
 +      region = &hdev->pci_mem_region[PCI_REGION_CFG];
 +      region->region_base = CFG_BASE;
 +      region->region_size = CFG_SIZE;
 +      region->offset_in_bar = CFG_BASE - SPI_FLASH_BASE_ADDR;
 +      region->bar_size = CFG_BAR_SIZE;
 +      region->bar_id = CFG_BAR_ID;
 +      region->used = 1;
 +
 +      /* SRAM */
 +      region = &hdev->pci_mem_region[PCI_REGION_SRAM];
 +      region->region_base = SRAM_BASE_ADDR;
 +      region->region_size = SRAM_SIZE;
 +      region->offset_in_bar = 0;
 +      region->bar_size = SRAM_BAR_SIZE;
 +      region->bar_id = SRAM_BAR_ID;
 +      region->used = 1;
 +
 +      /* DRAM */
 +      region = &hdev->pci_mem_region[PCI_REGION_DRAM];
 +      region->region_base = DRAM_PHYS_BASE;
 +      region->region_size = hdev->asic_prop.dram_size;
 +      region->offset_in_bar = 0;
 +      region->bar_size = prop->dram_pci_bar_size;
 +      region->bar_id = HBM_BAR_ID;
 +      region->used = 1;
 +
 +      /* SP SRAM */
 +      region = &hdev->pci_mem_region[PCI_REGION_SP_SRAM];
 +      region->region_base = PSOC_SCRATCHPAD_ADDR;
 +      region->region_size = PSOC_SCRATCHPAD_SIZE;
 +      region->offset_in_bar = PSOC_SCRATCHPAD_ADDR - SPI_FLASH_BASE_ADDR;
 +      region->bar_size = CFG_BAR_SIZE;
 +      region->bar_id = CFG_BAR_ID;
 +      region->used = 1;
 +}
 +
 +static int gaudi_sw_init(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi;
 +      u32 i, event_id = 0;
 +      int rc;
 +
 +      /* Allocate device structure */
 +      gaudi = kzalloc(sizeof(*gaudi), GFP_KERNEL);
 +      if (!gaudi)
 +              return -ENOMEM;
 +
 +      for (i = 0 ; i < ARRAY_SIZE(gaudi_irq_map_table) ; i++) {
 +              if (gaudi_irq_map_table[i].valid) {
 +                      if (event_id == GAUDI_EVENT_SIZE) {
 +                              dev_err(hdev->dev,
 +                                      "Event array exceeds the limit of %u events\n",
 +                                      GAUDI_EVENT_SIZE);
 +                              rc = -EINVAL;
 +                              goto free_gaudi_device;
 +                      }
 +
 +                      gaudi->events[event_id++] =
 +                                      gaudi_irq_map_table[i].fc_id;
 +              }
 +      }
 +
 +      gaudi->cpucp_info_get = gaudi_cpucp_info_get;
 +
 +      hdev->asic_specific = gaudi;
 +
 +      /* Create DMA pool for small allocations */
 +      hdev->dma_pool = dma_pool_create(dev_name(hdev->dev),
 +                      &hdev->pdev->dev, GAUDI_DMA_POOL_BLK_SIZE, 8, 0);
 +      if (!hdev->dma_pool) {
 +              dev_err(hdev->dev, "failed to create DMA pool\n");
 +              rc = -ENOMEM;
 +              goto free_gaudi_device;
 +      }
 +
 +      rc = gaudi_alloc_cpu_accessible_dma_mem(hdev);
 +      if (rc)
 +              goto free_dma_pool;
 +
 +      hdev->cpu_accessible_dma_pool = gen_pool_create(ilog2(32), -1);
 +      if (!hdev->cpu_accessible_dma_pool) {
 +              dev_err(hdev->dev,
 +                      "Failed to create CPU accessible DMA pool\n");
 +              rc = -ENOMEM;
 +              goto free_cpu_dma_mem;
 +      }
 +
 +      rc = gen_pool_add(hdev->cpu_accessible_dma_pool,
 +                              (uintptr_t) hdev->cpu_accessible_dma_mem,
 +                              HL_CPU_ACCESSIBLE_MEM_SIZE, -1);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to add memory to CPU accessible DMA pool\n");
 +              rc = -EFAULT;
 +              goto free_cpu_accessible_dma_pool;
 +      }
 +
 +      rc = gaudi_alloc_internal_qmans_pq_mem(hdev);
 +      if (rc)
 +              goto free_cpu_accessible_dma_pool;
 +
 +      spin_lock_init(&gaudi->hw_queues_lock);
 +
 +      hdev->supports_sync_stream = true;
 +      hdev->supports_coresight = true;
 +      hdev->supports_staged_submission = true;
 +      hdev->supports_wait_for_multi_cs = true;
 +
 +      hdev->asic_funcs->set_pci_memory_regions(hdev);
 +      hdev->stream_master_qid_arr =
 +                              hdev->asic_funcs->get_stream_master_qid_arr();
 +      hdev->stream_master_qid_arr_size = GAUDI_STREAM_MASTER_ARR_SIZE;
 +
 +      return 0;
 +
 +free_cpu_accessible_dma_pool:
 +      gen_pool_destroy(hdev->cpu_accessible_dma_pool);
 +free_cpu_dma_mem:
 +      if (!hdev->asic_prop.fw_security_enabled)
 +              GAUDI_CPU_TO_PCI_ADDR(hdev->cpu_accessible_dma_address,
 +                                      hdev->cpu_pci_msb_addr);
 +      hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
 +                                      hdev->cpu_accessible_dma_address);
 +free_dma_pool:
 +      dma_pool_destroy(hdev->dma_pool);
 +free_gaudi_device:
 +      kfree(gaudi);
 +      return rc;
 +}
 +
 +static int gaudi_sw_fini(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      gaudi_free_internal_qmans_pq_mem(hdev);
 +
 +      gen_pool_destroy(hdev->cpu_accessible_dma_pool);
 +
 +      if (!hdev->asic_prop.fw_security_enabled)
 +              GAUDI_CPU_TO_PCI_ADDR(hdev->cpu_accessible_dma_address,
 +                                      hdev->cpu_pci_msb_addr);
 +
 +      hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
 +                                      hdev->cpu_accessible_dma_address);
 +
 +      dma_pool_destroy(hdev->dma_pool);
 +
 +      kfree(gaudi);
 +
 +      return 0;
 +}
 +
 +static irqreturn_t gaudi_irq_handler_single(int irq, void *arg)
 +{
 +      struct hl_device *hdev = arg;
 +      int i;
 +
 +      if (hdev->disabled)
 +              return IRQ_HANDLED;
 +
 +      for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++)
 +              hl_irq_handler_cq(irq, &hdev->completion_queue[i]);
 +
 +      hl_irq_handler_eq(irq, &hdev->event_queue);
 +
 +      return IRQ_HANDLED;
 +}
 +
 +/*
 + * For backward compatibility, new MSI interrupts should be set after the
 + * existing CPU and NIC interrupts.
 + */
 +static int gaudi_pci_irq_vector(struct hl_device *hdev, unsigned int nr,
 +                              bool cpu_eq)
 +{
 +      int msi_vec;
 +
 +      if ((nr != GAUDI_EVENT_QUEUE_MSI_IDX) && (cpu_eq))
 +              dev_crit(hdev->dev, "CPU EQ must use IRQ %d\n",
 +                              GAUDI_EVENT_QUEUE_MSI_IDX);
 +
 +      msi_vec = ((nr < GAUDI_EVENT_QUEUE_MSI_IDX) || (cpu_eq)) ? nr :
 +                      (nr + NIC_NUMBER_OF_ENGINES + 1);
 +
 +      return pci_irq_vector(hdev->pdev, msi_vec);
 +}
 +
 +static int gaudi_enable_msi_single(struct hl_device *hdev)
 +{
 +      int rc, irq;
 +
 +      dev_dbg(hdev->dev, "Working in single MSI IRQ mode\n");
 +
 +      irq = gaudi_pci_irq_vector(hdev, 0, false);
 +      rc = request_irq(irq, gaudi_irq_handler_single, 0,
 +                      "gaudi single msi", hdev);
 +      if (rc)
 +              dev_err(hdev->dev,
 +                      "Failed to request single MSI IRQ\n");
 +
 +      return rc;
 +}
 +
 +static int gaudi_enable_msi_multi(struct hl_device *hdev)
 +{
 +      int cq_cnt = hdev->asic_prop.completion_queues_count;
 +      int rc, i, irq_cnt_init, irq;
 +
 +      for (i = 0, irq_cnt_init = 0 ; i < cq_cnt ; i++, irq_cnt_init++) {
 +              irq = gaudi_pci_irq_vector(hdev, i, false);
 +              rc = request_irq(irq, hl_irq_handler_cq, 0, gaudi_irq_name[i],
 +                              &hdev->completion_queue[i]);
 +              if (rc) {
 +                      dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +                      goto free_irqs;
 +              }
 +      }
 +
 +      irq = gaudi_pci_irq_vector(hdev, GAUDI_EVENT_QUEUE_MSI_IDX, true);
 +      rc = request_irq(irq, hl_irq_handler_eq, 0, gaudi_irq_name[cq_cnt],
 +                              &hdev->event_queue);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +              goto free_irqs;
 +      }
 +
 +      return 0;
 +
 +free_irqs:
 +      for (i = 0 ; i < irq_cnt_init ; i++)
 +              free_irq(gaudi_pci_irq_vector(hdev, i, false),
 +                              &hdev->completion_queue[i]);
 +      return rc;
 +}
 +
 +static int gaudi_enable_msi(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      int rc;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_MSI)
 +              return 0;
 +
 +      rc = pci_alloc_irq_vectors(hdev->pdev, 1, 1, PCI_IRQ_MSI);
 +      if (rc < 0) {
 +              dev_err(hdev->dev, "MSI: Failed to enable support %d\n", rc);
 +              return rc;
 +      }
 +
 +      if (rc < NUMBER_OF_INTERRUPTS) {
 +              gaudi->multi_msi_mode = false;
 +              rc = gaudi_enable_msi_single(hdev);
 +      } else {
 +              gaudi->multi_msi_mode = true;
 +              rc = gaudi_enable_msi_multi(hdev);
 +      }
 +
 +      if (rc)
 +              goto free_pci_irq_vectors;
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_MSI;
 +
 +      return 0;
 +
 +free_pci_irq_vectors:
 +      pci_free_irq_vectors(hdev->pdev);
 +      return rc;
 +}
 +
 +static void gaudi_sync_irqs(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      int i, cq_cnt = hdev->asic_prop.completion_queues_count;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MSI))
 +              return;
 +
 +      /* Wait for all pending IRQs to be finished */
 +      if (gaudi->multi_msi_mode) {
 +              for (i = 0 ; i < cq_cnt ; i++)
 +                      synchronize_irq(gaudi_pci_irq_vector(hdev, i, false));
 +
 +              synchronize_irq(gaudi_pci_irq_vector(hdev,
 +                                              GAUDI_EVENT_QUEUE_MSI_IDX,
 +                                              true));
 +      } else {
 +              synchronize_irq(gaudi_pci_irq_vector(hdev, 0, false));
 +      }
 +}
 +
 +static void gaudi_disable_msi(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      int i, irq, cq_cnt = hdev->asic_prop.completion_queues_count;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MSI))
 +              return;
 +
 +      gaudi_sync_irqs(hdev);
 +
 +      if (gaudi->multi_msi_mode) {
 +              irq = gaudi_pci_irq_vector(hdev, GAUDI_EVENT_QUEUE_MSI_IDX,
 +                                              true);
 +              free_irq(irq, &hdev->event_queue);
 +
 +              for (i = 0 ; i < cq_cnt ; i++) {
 +                      irq = gaudi_pci_irq_vector(hdev, i, false);
 +                      free_irq(irq, &hdev->completion_queue[i]);
 +              }
 +      } else {
 +              free_irq(gaudi_pci_irq_vector(hdev, 0, false), hdev);
 +      }
 +
 +      pci_free_irq_vectors(hdev->pdev);
 +
 +      gaudi->hw_cap_initialized &= ~HW_CAP_MSI;
 +}
 +
 +static void gaudi_init_scrambler_sram(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (hdev->asic_prop.fw_security_enabled)
 +              return;
 +
 +      if (hdev->asic_prop.fw_app_cpu_boot_dev_sts0 &
 +                                              CPU_BOOT_DEV_STS0_SRAM_SCR_EN)
 +              return;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_SRAM_SCRAMBLER)
 +              return;
 +
 +      WREG32(mmNIF_RTR_CTRL_0_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_1_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_2_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_3_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_4_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_5_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_6_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_7_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_0_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_1_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_2_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_3_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_4_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_5_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_6_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_7_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_SRAM_SCRAMBLER;
 +}
 +
 +static void gaudi_init_scrambler_hbm(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (hdev->asic_prop.fw_security_enabled)
 +              return;
 +
 +      if (hdev->asic_prop.fw_bootfit_cpu_boot_dev_sts0 &
 +                                      CPU_BOOT_DEV_STS0_DRAM_SCR_EN)
 +              return;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_HBM_SCRAMBLER)
 +              return;
 +
 +      WREG32(mmNIF_RTR_CTRL_0_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_1_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_2_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_3_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_4_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_5_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_6_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_7_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_0_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_1_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_2_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_3_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_4_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_5_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_6_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_7_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_HBM_SCRAMBLER;
 +}
 +
 +static void gaudi_init_e2e(struct hl_device *hdev)
 +{
 +      if (hdev->asic_prop.fw_security_enabled)
 +              return;
 +
 +      if (hdev->asic_prop.fw_bootfit_cpu_boot_dev_sts0 &
 +                                      CPU_BOOT_DEV_STS0_E2E_CRED_EN)
 +              return;
 +
 +      WREG32(mmSIF_RTR_CTRL_0_E2E_HBM_WR_SIZE, 247 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_0_E2E_HBM_RD_SIZE, 785 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_0_E2E_PCI_WR_SIZE, 49);
 +      WREG32(mmSIF_RTR_CTRL_0_E2E_PCI_RD_SIZE, 101);
 +
 +      WREG32(mmSIF_RTR_CTRL_1_E2E_HBM_WR_SIZE, 275 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_1_E2E_HBM_RD_SIZE, 614 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_1_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_1_E2E_PCI_RD_SIZE, 39);
 +
 +      WREG32(mmSIF_RTR_CTRL_2_E2E_HBM_WR_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_2_E2E_HBM_RD_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_2_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_2_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmSIF_RTR_CTRL_3_E2E_HBM_WR_SIZE, 176 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_3_E2E_HBM_RD_SIZE, 32 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_3_E2E_PCI_WR_SIZE, 19);
 +      WREG32(mmSIF_RTR_CTRL_3_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmSIF_RTR_CTRL_4_E2E_HBM_WR_SIZE, 176 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_4_E2E_HBM_RD_SIZE, 32 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_4_E2E_PCI_WR_SIZE, 19);
 +      WREG32(mmSIF_RTR_CTRL_4_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmSIF_RTR_CTRL_5_E2E_HBM_WR_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_5_E2E_HBM_RD_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_5_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_5_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmSIF_RTR_CTRL_6_E2E_HBM_WR_SIZE, 275 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_6_E2E_HBM_RD_SIZE, 614 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_6_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_6_E2E_PCI_RD_SIZE, 39);
 +
 +      WREG32(mmSIF_RTR_CTRL_7_E2E_HBM_WR_SIZE, 297 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_7_E2E_HBM_RD_SIZE, 908 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_7_E2E_PCI_WR_SIZE, 19);
 +      WREG32(mmSIF_RTR_CTRL_7_E2E_PCI_RD_SIZE, 19);
 +
 +      WREG32(mmNIF_RTR_CTRL_0_E2E_HBM_WR_SIZE, 318 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_0_E2E_HBM_RD_SIZE, 956 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_0_E2E_PCI_WR_SIZE, 79);
 +      WREG32(mmNIF_RTR_CTRL_0_E2E_PCI_RD_SIZE, 163);
 +
 +      WREG32(mmNIF_RTR_CTRL_1_E2E_HBM_WR_SIZE, 275 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_1_E2E_HBM_RD_SIZE, 614 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_1_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_1_E2E_PCI_RD_SIZE, 39);
 +
 +      WREG32(mmNIF_RTR_CTRL_2_E2E_HBM_WR_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_2_E2E_HBM_RD_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_2_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_2_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmNIF_RTR_CTRL_3_E2E_HBM_WR_SIZE, 176 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_3_E2E_HBM_RD_SIZE, 32 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_3_E2E_PCI_WR_SIZE, 19);
 +      WREG32(mmNIF_RTR_CTRL_3_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmNIF_RTR_CTRL_4_E2E_HBM_WR_SIZE, 176 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_4_E2E_HBM_RD_SIZE, 32 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_4_E2E_PCI_WR_SIZE, 19);
 +      WREG32(mmNIF_RTR_CTRL_4_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmNIF_RTR_CTRL_5_E2E_HBM_WR_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_5_E2E_HBM_RD_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_5_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_5_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmNIF_RTR_CTRL_6_E2E_HBM_WR_SIZE, 275 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_6_E2E_HBM_RD_SIZE, 614 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_6_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_6_E2E_PCI_RD_SIZE, 39);
 +
 +      WREG32(mmNIF_RTR_CTRL_7_E2E_HBM_WR_SIZE, 318 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_7_E2E_HBM_RD_SIZE, 956 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_7_E2E_PCI_WR_SIZE, 79);
 +      WREG32(mmNIF_RTR_CTRL_7_E2E_PCI_RD_SIZE, 79);
 +
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmSIF_RTR_CTRL_0_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_0_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_1_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_1_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_2_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_2_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_3_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_3_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_4_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_4_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_5_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_5_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_6_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_6_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_7_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_7_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_0_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_0_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_1_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_1_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_2_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_2_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_3_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_3_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_4_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_4_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_5_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_5_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_6_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_6_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_7_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_7_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +}
 +
 +static void gaudi_init_hbm_cred(struct hl_device *hdev)
 +{
 +      u32 hbm0_wr, hbm1_wr, hbm0_rd, hbm1_rd;
 +
 +      if (hdev->asic_prop.fw_security_enabled)
 +              return;
 +
 +      if (hdev->asic_prop.fw_bootfit_cpu_boot_dev_sts0 &
 +                                              CPU_BOOT_DEV_STS0_HBM_CRED_EN)
 +              return;
 +
 +      hbm0_wr = 0x33333333;
 +      hbm0_rd = 0x77777777;
 +      hbm1_wr = 0x55555555;
 +      hbm1_rd = 0xDDDDDDDD;
 +
 +      WREG32(mmDMA_IF_E_N_HBM0_WR_CRED_CNT, hbm0_wr);
 +      WREG32(mmDMA_IF_E_N_HBM1_WR_CRED_CNT, hbm1_wr);
 +      WREG32(mmDMA_IF_E_N_HBM0_RD_CRED_CNT, hbm0_rd);
 +      WREG32(mmDMA_IF_E_N_HBM1_RD_CRED_CNT, hbm1_rd);
 +
 +      WREG32(mmDMA_IF_E_S_HBM0_WR_CRED_CNT, hbm0_wr);
 +      WREG32(mmDMA_IF_E_S_HBM1_WR_CRED_CNT, hbm1_wr);
 +      WREG32(mmDMA_IF_E_S_HBM0_RD_CRED_CNT, hbm0_rd);
 +      WREG32(mmDMA_IF_E_S_HBM1_RD_CRED_CNT, hbm1_rd);
 +
 +      WREG32(mmDMA_IF_W_N_HBM0_WR_CRED_CNT, hbm0_wr);
 +      WREG32(mmDMA_IF_W_N_HBM1_WR_CRED_CNT, hbm1_wr);
 +      WREG32(mmDMA_IF_W_N_HBM0_RD_CRED_CNT, hbm0_rd);
 +      WREG32(mmDMA_IF_W_N_HBM1_RD_CRED_CNT, hbm1_rd);
 +
 +      WREG32(mmDMA_IF_W_S_HBM0_WR_CRED_CNT, hbm0_wr);
 +      WREG32(mmDMA_IF_W_S_HBM1_WR_CRED_CNT, hbm1_wr);
 +      WREG32(mmDMA_IF_W_S_HBM0_RD_CRED_CNT, hbm0_rd);
 +      WREG32(mmDMA_IF_W_S_HBM1_RD_CRED_CNT, hbm1_rd);
 +
 +      WREG32(mmDMA_IF_E_N_HBM_CRED_EN_0,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +      WREG32(mmDMA_IF_E_S_HBM_CRED_EN_0,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +      WREG32(mmDMA_IF_W_N_HBM_CRED_EN_0,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +      WREG32(mmDMA_IF_W_S_HBM_CRED_EN_0,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +
 +      WREG32(mmDMA_IF_E_N_HBM_CRED_EN_1,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +      WREG32(mmDMA_IF_E_S_HBM_CRED_EN_1,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +      WREG32(mmDMA_IF_W_N_HBM_CRED_EN_1,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +      WREG32(mmDMA_IF_W_S_HBM_CRED_EN_1,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +}
 +
 +static void gaudi_init_golden_registers(struct hl_device *hdev)
 +{
 +      u32 tpc_offset;
 +      int tpc_id, i;
 +
 +      gaudi_init_e2e(hdev);
 +      gaudi_init_hbm_cred(hdev);
 +
 +      for (tpc_id = 0, tpc_offset = 0;
 +                              tpc_id < TPC_NUMBER_OF_ENGINES;
 +                              tpc_id++, tpc_offset += TPC_CFG_OFFSET) {
 +              /* Mask all arithmetic interrupts from TPC */
 +              WREG32(mmTPC0_CFG_TPC_INTR_MASK + tpc_offset, 0x8FFE);
 +              /* Set 16 cache lines */
 +              WREG32_FIELD(TPC0_CFG_MSS_CONFIG, tpc_offset,
 +                              ICACHE_FETCH_LINE_NUM, 2);
 +      }
 +
 +      /* Make sure 1st 128 bytes in SRAM are 0 for Tensor DMA */
 +      for (i = 0 ; i < 128 ; i += 8)
 +              writeq(0, hdev->pcie_bar[SRAM_BAR_ID] + i);
 +
 +      WREG32(mmMME0_CTRL_EUS_ROLLUP_CNT_ADD, 3);
 +      WREG32(mmMME1_CTRL_EUS_ROLLUP_CNT_ADD, 3);
 +      WREG32(mmMME2_CTRL_EUS_ROLLUP_CNT_ADD, 3);
 +      WREG32(mmMME3_CTRL_EUS_ROLLUP_CNT_ADD, 3);
 +}
 +
 +static void gaudi_init_pci_dma_qman(struct hl_device *hdev, int dma_id,
 +                                      int qman_id, dma_addr_t qman_pq_addr)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 mtr_base_en_lo, mtr_base_en_hi, mtr_base_ws_lo, mtr_base_ws_hi;
 +      u32 so_base_en_lo, so_base_en_hi, so_base_ws_lo, so_base_ws_hi;
 +      u32 q_off, dma_qm_offset;
 +      u32 dma_qm_err_cfg, irq_handler_offset;
 +
 +      dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +
 +      mtr_base_en_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_en_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      mtr_base_ws_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_ws_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      q_off = dma_qm_offset + qman_id * 4;
 +
 +      WREG32(mmDMA0_QM_PQ_BASE_LO_0 + q_off, lower_32_bits(qman_pq_addr));
 +      WREG32(mmDMA0_QM_PQ_BASE_HI_0 + q_off, upper_32_bits(qman_pq_addr));
 +
 +      WREG32(mmDMA0_QM_PQ_SIZE_0 + q_off, ilog2(HL_QUEUE_LENGTH));
 +      WREG32(mmDMA0_QM_PQ_PI_0 + q_off, 0);
 +      WREG32(mmDMA0_QM_PQ_CI_0 + q_off, 0);
 +
 +      WREG32(mmDMA0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off, QMAN_LDMA_SIZE_OFFSET);
 +      WREG32(mmDMA0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SRC_OFFSET);
 +      WREG32(mmDMA0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_DST_OFFSET);
 +
 +      WREG32(mmDMA0_QM_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_en_lo);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_en_hi);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_en_lo);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_en_hi);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE2_ADDR_LO_0 + q_off, mtr_base_ws_lo);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE2_ADDR_HI_0 + q_off, mtr_base_ws_hi);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE3_ADDR_LO_0 + q_off, so_base_ws_lo);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE3_ADDR_HI_0 + q_off, so_base_ws_hi);
 +
 +      WREG32(mmDMA0_QM_CP_BARRIER_CFG_0 + q_off, 0x100);
 +
 +      /* The following configuration is needed only once per QMAN */
 +      if (qman_id == 0) {
 +              irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                              mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                              le32_to_cpu(dyn_regs->gic_dma_qm_irq_ctrl);
 +
 +              /* Configure RAZWI IRQ */
 +              dma_qm_err_cfg = PCI_DMA_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
 +              if (hdev->stop_on_err)
 +                      dma_qm_err_cfg |=
 +                              PCI_DMA_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
 +
 +              WREG32(mmDMA0_QM_GLBL_ERR_CFG + dma_qm_offset, dma_qm_err_cfg);
 +
 +              WREG32(mmDMA0_QM_GLBL_ERR_ADDR_LO + dma_qm_offset,
 +                      lower_32_bits(CFG_BASE + irq_handler_offset));
 +              WREG32(mmDMA0_QM_GLBL_ERR_ADDR_HI + dma_qm_offset,
 +                      upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +              WREG32(mmDMA0_QM_GLBL_ERR_WDATA + dma_qm_offset,
 +                      gaudi_irq_map_table[GAUDI_EVENT_DMA0_QM].cpu_id +
 +                                                                      dma_id);
 +
 +              WREG32(mmDMA0_QM_ARB_ERR_MSG_EN + dma_qm_offset,
 +                              QM_ARB_ERR_MSG_EN_MASK);
 +
 +              /* Set timeout to maximum */
 +              WREG32(mmDMA0_QM_ARB_SLV_CHOISE_WDT + dma_qm_offset, GAUDI_ARB_WDT_TIMEOUT);
 +
 +              WREG32(mmDMA0_QM_GLBL_PROT + dma_qm_offset,
 +                              QMAN_EXTERNAL_MAKE_TRUSTED);
 +
 +              WREG32(mmDMA0_QM_GLBL_CFG1 + dma_qm_offset, 0);
 +      }
 +}
 +
 +static void gaudi_init_dma_core(struct hl_device *hdev, int dma_id)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 dma_err_cfg = 1 << DMA0_CORE_ERR_CFG_ERR_MSG_EN_SHIFT;
 +      u32 dma_offset = dma_id * DMA_CORE_OFFSET;
 +      u32 irq_handler_offset;
 +
 +      /* Set to maximum possible according to physical size */
 +      WREG32(mmDMA0_CORE_RD_MAX_OUTSTAND + dma_offset, 0);
 +      WREG32(mmDMA0_CORE_RD_MAX_SIZE + dma_offset, 0);
 +
 +      /* WA for H/W bug H3-2116 */
 +      WREG32(mmDMA0_CORE_LBW_MAX_OUTSTAND + dma_offset, 15);
 +
 +      /* STOP_ON bit implies no completion to operation in case of RAZWI */
 +      if (hdev->stop_on_err)
 +              dma_err_cfg |= 1 << DMA0_CORE_ERR_CFG_STOP_ON_ERR_SHIFT;
 +
 +      WREG32(mmDMA0_CORE_ERR_CFG + dma_offset, dma_err_cfg);
 +
 +      irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                      mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                      le32_to_cpu(dyn_regs->gic_dma_core_irq_ctrl);
 +
 +      WREG32(mmDMA0_CORE_ERRMSG_ADDR_LO + dma_offset,
 +              lower_32_bits(CFG_BASE + irq_handler_offset));
 +      WREG32(mmDMA0_CORE_ERRMSG_ADDR_HI + dma_offset,
 +              upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +      WREG32(mmDMA0_CORE_ERRMSG_WDATA + dma_offset,
 +              gaudi_irq_map_table[GAUDI_EVENT_DMA0_CORE].cpu_id + dma_id);
 +      WREG32(mmDMA0_CORE_PROT + dma_offset,
 +                      1 << DMA0_CORE_PROT_ERR_VAL_SHIFT);
 +      /* If the channel is secured, it should be in MMU bypass mode */
 +      WREG32(mmDMA0_CORE_SECURE_PROPS + dma_offset,
 +                      1 << DMA0_CORE_SECURE_PROPS_MMBP_SHIFT);
 +      WREG32(mmDMA0_CORE_CFG_0 + dma_offset, 1 << DMA0_CORE_CFG_0_EN_SHIFT);
 +}
 +
 +static void gaudi_enable_qman(struct hl_device *hdev, int dma_id,
 +                              u32 enable_mask)
 +{
 +      u32 dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +
 +      WREG32(mmDMA0_QM_GLBL_CFG0 + dma_qm_offset, enable_mask);
 +}
 +
 +static void gaudi_init_pci_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct hl_hw_queue *q;
 +      int i, j, dma_id, cpu_skip, nic_skip, cq_id = 0, q_idx, msi_vec = 0;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_PCI_DMA)
 +              return;
 +
 +      for (i = 0 ; i < PCI_DMA_NUMBER_OF_CHNLS ; i++) {
 +              dma_id = gaudi_dma_assignment[i];
 +              /*
 +               * For queues after the CPU Q need to add 1 to get the correct
 +               * queue. In addition, need to add the CPU EQ and NIC IRQs in
 +               * order to get the correct MSI register.
 +               */
 +              if (dma_id > 1) {
 +                      cpu_skip = 1;
 +                      nic_skip = NIC_NUMBER_OF_ENGINES;
 +              } else {
 +                      cpu_skip = 0;
 +                      nic_skip = 0;
 +              }
 +
 +              for (j = 0 ; j < QMAN_STREAMS ; j++) {
 +                      q_idx = 4 * dma_id + j + cpu_skip;
 +                      q = &hdev->kernel_queues[q_idx];
 +                      q->cq_id = cq_id++;
 +                      q->msi_vec = nic_skip + cpu_skip + msi_vec++;
 +                      gaudi_init_pci_dma_qman(hdev, dma_id, j,
 +                                              q->bus_address);
 +              }
 +
 +              gaudi_init_dma_core(hdev, dma_id);
 +
 +              gaudi_enable_qman(hdev, dma_id, PCI_DMA_QMAN_ENABLE);
 +      }
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_PCI_DMA;
 +}
 +
 +static void gaudi_init_hbm_dma_qman(struct hl_device *hdev, int dma_id,
 +                                      int qman_id, u64 qman_base_addr)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 mtr_base_en_lo, mtr_base_en_hi, mtr_base_ws_lo, mtr_base_ws_hi;
 +      u32 so_base_en_lo, so_base_en_hi, so_base_ws_lo, so_base_ws_hi;
 +      u32 dma_qm_err_cfg, irq_handler_offset;
 +      u32 q_off, dma_qm_offset;
 +
 +      dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +
 +      mtr_base_en_lo = lower_32_bits(CFG_BASE +
 +                      mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_en_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      mtr_base_ws_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_ws_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      q_off = dma_qm_offset + qman_id * 4;
 +
 +      if (qman_id < 4) {
 +              WREG32(mmDMA0_QM_PQ_BASE_LO_0 + q_off,
 +                                      lower_32_bits(qman_base_addr));
 +              WREG32(mmDMA0_QM_PQ_BASE_HI_0 + q_off,
 +                                      upper_32_bits(qman_base_addr));
 +
 +              WREG32(mmDMA0_QM_PQ_SIZE_0 + q_off, ilog2(HBM_DMA_QMAN_LENGTH));
 +              WREG32(mmDMA0_QM_PQ_PI_0 + q_off, 0);
 +              WREG32(mmDMA0_QM_PQ_CI_0 + q_off, 0);
 +
 +              WREG32(mmDMA0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_SIZE_OFFSET);
 +              WREG32(mmDMA0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_SRC_OFFSET);
 +              WREG32(mmDMA0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_DST_OFFSET);
 +      } else {
 +              irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                              mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                              le32_to_cpu(dyn_regs->gic_dma_qm_irq_ctrl);
 +
 +              WREG32(mmDMA0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SIZE_OFFSET);
 +              WREG32(mmDMA0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SRC_OFFSET);
 +              WREG32(mmDMA0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_DST_OFFSET);
 +
 +              /* Configure RAZWI IRQ */
 +              dma_qm_err_cfg = HBM_DMA_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
 +              if (hdev->stop_on_err)
 +                      dma_qm_err_cfg |=
 +                              HBM_DMA_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
 +
 +              WREG32(mmDMA0_QM_GLBL_ERR_CFG + dma_qm_offset, dma_qm_err_cfg);
 +
 +              WREG32(mmDMA0_QM_GLBL_ERR_ADDR_LO + dma_qm_offset,
 +                      lower_32_bits(CFG_BASE + irq_handler_offset));
 +              WREG32(mmDMA0_QM_GLBL_ERR_ADDR_HI + dma_qm_offset,
 +                      upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +              WREG32(mmDMA0_QM_GLBL_ERR_WDATA + dma_qm_offset,
 +                      gaudi_irq_map_table[GAUDI_EVENT_DMA0_QM].cpu_id +
 +                                                                      dma_id);
 +
 +              WREG32(mmDMA0_QM_ARB_ERR_MSG_EN + dma_qm_offset,
 +                              QM_ARB_ERR_MSG_EN_MASK);
 +
 +              /* Set timeout to maximum */
 +              WREG32(mmDMA0_QM_ARB_SLV_CHOISE_WDT + dma_qm_offset, GAUDI_ARB_WDT_TIMEOUT);
 +
 +              WREG32(mmDMA0_QM_GLBL_CFG1 + dma_qm_offset, 0);
 +              WREG32(mmDMA0_QM_GLBL_PROT + dma_qm_offset,
 +                              QMAN_INTERNAL_MAKE_TRUSTED);
 +      }
 +
 +      WREG32(mmDMA0_QM_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_en_lo);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_en_hi);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_en_lo);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_en_hi);
 +
 +      /* Configure DMA5 CP_MSG_BASE 2/3 for sync stream collective */
 +      if (gaudi_dma_assignment[dma_id] == GAUDI_ENGINE_ID_DMA_5) {
 +              WREG32(mmDMA0_QM_CP_MSG_BASE2_ADDR_LO_0 + q_off,
 +                              mtr_base_ws_lo);
 +              WREG32(mmDMA0_QM_CP_MSG_BASE2_ADDR_HI_0 + q_off,
 +                              mtr_base_ws_hi);
 +              WREG32(mmDMA0_QM_CP_MSG_BASE3_ADDR_LO_0 + q_off,
 +                              so_base_ws_lo);
 +              WREG32(mmDMA0_QM_CP_MSG_BASE3_ADDR_HI_0 + q_off,
 +                              so_base_ws_hi);
 +      }
 +}
 +
 +static void gaudi_init_hbm_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_internal_qman_info *q;
 +      u64 qman_base_addr;
 +      int i, j, dma_id, internal_q_index;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_HBM_DMA)
 +              return;
 +
 +      for (i = 0 ; i < HBM_DMA_NUMBER_OF_CHNLS ; i++) {
 +              dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_1 + i];
 +
 +              for (j = 0 ; j < QMAN_STREAMS ; j++) {
 +                       /*
 +                        * Add the CPU queue in order to get the correct queue
 +                        * number as all internal queue are placed after it
 +                        */
 +                      internal_q_index = dma_id * QMAN_STREAMS + j + 1;
 +
 +                      q = &gaudi->internal_qmans[internal_q_index];
 +                      qman_base_addr = (u64) q->pq_dma_addr;
 +                      gaudi_init_hbm_dma_qman(hdev, dma_id, j,
 +                                              qman_base_addr);
 +              }
 +
 +              /* Initializing lower CP for HBM DMA QMAN */
 +              gaudi_init_hbm_dma_qman(hdev, dma_id, 4, 0);
 +
 +              gaudi_init_dma_core(hdev, dma_id);
 +
 +              gaudi_enable_qman(hdev, dma_id, HBM_DMA_QMAN_ENABLE);
 +      }
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_HBM_DMA;
 +}
 +
 +static void gaudi_init_mme_qman(struct hl_device *hdev, u32 mme_offset,
 +                                      int qman_id, u64 qman_base_addr)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 mtr_base_lo, mtr_base_hi;
 +      u32 so_base_lo, so_base_hi;
 +      u32 irq_handler_offset;
 +      u32 q_off, mme_id;
 +      u32 mme_qm_err_cfg;
 +
 +      mtr_base_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      q_off = mme_offset + qman_id * 4;
 +
 +      if (qman_id < 4) {
 +              WREG32(mmMME0_QM_PQ_BASE_LO_0 + q_off,
 +                                      lower_32_bits(qman_base_addr));
 +              WREG32(mmMME0_QM_PQ_BASE_HI_0 + q_off,
 +                                      upper_32_bits(qman_base_addr));
 +
 +              WREG32(mmMME0_QM_PQ_SIZE_0 + q_off, ilog2(MME_QMAN_LENGTH));
 +              WREG32(mmMME0_QM_PQ_PI_0 + q_off, 0);
 +              WREG32(mmMME0_QM_PQ_CI_0 + q_off, 0);
 +
 +              WREG32(mmMME0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_SIZE_OFFSET);
 +              WREG32(mmMME0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_SRC_OFFSET);
 +              WREG32(mmMME0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_DST_OFFSET);
 +      } else {
 +              irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                              mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                              le32_to_cpu(dyn_regs->gic_mme_qm_irq_ctrl);
 +
 +              WREG32(mmMME0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SIZE_OFFSET);
 +              WREG32(mmMME0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SRC_OFFSET);
 +              WREG32(mmMME0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_DST_OFFSET);
 +
 +              /* Configure RAZWI IRQ */
 +              mme_id = mme_offset /
 +                              (mmMME1_QM_GLBL_CFG0 - mmMME0_QM_GLBL_CFG0) / 2;
 +
 +              mme_qm_err_cfg = MME_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
 +              if (hdev->stop_on_err)
 +                      mme_qm_err_cfg |=
 +                              MME_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
 +
 +              WREG32(mmMME0_QM_GLBL_ERR_CFG + mme_offset, mme_qm_err_cfg);
 +
 +              WREG32(mmMME0_QM_GLBL_ERR_ADDR_LO + mme_offset,
 +                      lower_32_bits(CFG_BASE + irq_handler_offset));
 +              WREG32(mmMME0_QM_GLBL_ERR_ADDR_HI + mme_offset,
 +                      upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +              WREG32(mmMME0_QM_GLBL_ERR_WDATA + mme_offset,
 +                      gaudi_irq_map_table[GAUDI_EVENT_MME0_QM].cpu_id +
 +                                                                      mme_id);
 +
 +              WREG32(mmMME0_QM_ARB_ERR_MSG_EN + mme_offset,
 +                              QM_ARB_ERR_MSG_EN_MASK);
 +
 +              /* Set timeout to maximum */
 +              WREG32(mmMME0_QM_ARB_SLV_CHOISE_WDT + mme_offset, GAUDI_ARB_WDT_TIMEOUT);
 +
 +              WREG32(mmMME0_QM_GLBL_CFG1 + mme_offset, 0);
 +              WREG32(mmMME0_QM_GLBL_PROT + mme_offset,
 +                              QMAN_INTERNAL_MAKE_TRUSTED);
 +      }
 +
 +      WREG32(mmMME0_QM_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_lo);
 +      WREG32(mmMME0_QM_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_hi);
 +      WREG32(mmMME0_QM_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_lo);
 +      WREG32(mmMME0_QM_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_hi);
 +}
 +
 +static void gaudi_init_mme_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_internal_qman_info *q;
 +      u64 qman_base_addr;
 +      u32 mme_offset;
 +      int i, internal_q_index;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_MME)
 +              return;
 +
 +      /*
 +       * map GAUDI_QUEUE_ID_MME_0_X to the N_W_MME (mmMME2_QM_BASE)
 +       * and GAUDI_QUEUE_ID_MME_1_X to the S_W_MME (mmMME0_QM_BASE)
 +       */
 +
 +      mme_offset = mmMME2_QM_GLBL_CFG0 - mmMME0_QM_GLBL_CFG0;
 +
 +      for (i = 0 ; i < MME_NUMBER_OF_QMANS ; i++) {
 +              internal_q_index = GAUDI_QUEUE_ID_MME_0_0 + i;
 +              q = &gaudi->internal_qmans[internal_q_index];
 +              qman_base_addr = (u64) q->pq_dma_addr;
 +              gaudi_init_mme_qman(hdev, mme_offset, (i & 0x3),
 +                                      qman_base_addr);
 +              if (i == 3)
 +                      mme_offset = 0;
 +      }
 +
 +      /* Initializing lower CP for MME QMANs */
 +      mme_offset = mmMME2_QM_GLBL_CFG0 - mmMME0_QM_GLBL_CFG0;
 +      gaudi_init_mme_qman(hdev, mme_offset, 4, 0);
 +      gaudi_init_mme_qman(hdev, 0, 4, 0);
 +
 +      WREG32(mmMME2_QM_GLBL_CFG0, QMAN_MME_ENABLE);
 +      WREG32(mmMME0_QM_GLBL_CFG0, QMAN_MME_ENABLE);
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_MME;
 +}
 +
 +static void gaudi_init_tpc_qman(struct hl_device *hdev, u32 tpc_offset,
 +                              int qman_id, u64 qman_base_addr)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 mtr_base_en_lo, mtr_base_en_hi, mtr_base_ws_lo, mtr_base_ws_hi;
 +      u32 so_base_en_lo, so_base_en_hi, so_base_ws_lo, so_base_ws_hi;
 +      u32 tpc_qm_err_cfg, irq_handler_offset;
 +      u32 q_off, tpc_id;
 +
 +      mtr_base_en_lo = lower_32_bits(CFG_BASE +
 +                      mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_en_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      mtr_base_ws_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_ws_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      q_off = tpc_offset + qman_id * 4;
 +
 +      tpc_id = tpc_offset /
 +                      (mmTPC1_QM_GLBL_CFG0 - mmTPC0_QM_GLBL_CFG0);
 +
 +      if (qman_id < 4) {
 +              WREG32(mmTPC0_QM_PQ_BASE_LO_0 + q_off,
 +                                      lower_32_bits(qman_base_addr));
 +              WREG32(mmTPC0_QM_PQ_BASE_HI_0 + q_off,
 +                                      upper_32_bits(qman_base_addr));
 +
 +              WREG32(mmTPC0_QM_PQ_SIZE_0 + q_off, ilog2(TPC_QMAN_LENGTH));
 +              WREG32(mmTPC0_QM_PQ_PI_0 + q_off, 0);
 +              WREG32(mmTPC0_QM_PQ_CI_0 + q_off, 0);
 +
 +              WREG32(mmTPC0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_SIZE_OFFSET);
 +              WREG32(mmTPC0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_SRC_OFFSET);
 +              WREG32(mmTPC0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_DST_OFFSET);
 +      } else {
 +              irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                              mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                              le32_to_cpu(dyn_regs->gic_tpc_qm_irq_ctrl);
 +
 +              WREG32(mmTPC0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SIZE_OFFSET);
 +              WREG32(mmTPC0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SRC_OFFSET);
 +              WREG32(mmTPC0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_DST_OFFSET);
 +
 +              /* Configure RAZWI IRQ */
 +              tpc_qm_err_cfg = TPC_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
 +              if (hdev->stop_on_err)
 +                      tpc_qm_err_cfg |=
 +                              TPC_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
 +
 +              WREG32(mmTPC0_QM_GLBL_ERR_CFG + tpc_offset, tpc_qm_err_cfg);
 +
 +              WREG32(mmTPC0_QM_GLBL_ERR_ADDR_LO + tpc_offset,
 +                      lower_32_bits(CFG_BASE + irq_handler_offset));
 +              WREG32(mmTPC0_QM_GLBL_ERR_ADDR_HI + tpc_offset,
 +                      upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +              WREG32(mmTPC0_QM_GLBL_ERR_WDATA + tpc_offset,
 +                      gaudi_irq_map_table[GAUDI_EVENT_TPC0_QM].cpu_id +
 +                                                                      tpc_id);
 +
 +              WREG32(mmTPC0_QM_ARB_ERR_MSG_EN + tpc_offset,
 +                              QM_ARB_ERR_MSG_EN_MASK);
 +
 +              /* Set timeout to maximum */
 +              WREG32(mmTPC0_QM_ARB_SLV_CHOISE_WDT + tpc_offset, GAUDI_ARB_WDT_TIMEOUT);
 +
 +              WREG32(mmTPC0_QM_GLBL_CFG1 + tpc_offset, 0);
 +              WREG32(mmTPC0_QM_GLBL_PROT + tpc_offset,
 +                              QMAN_INTERNAL_MAKE_TRUSTED);
 +      }
 +
 +      WREG32(mmTPC0_QM_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_en_lo);
 +      WREG32(mmTPC0_QM_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_en_hi);
 +      WREG32(mmTPC0_QM_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_en_lo);
 +      WREG32(mmTPC0_QM_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_en_hi);
 +
 +      /* Configure TPC7 CP_MSG_BASE 2/3 for sync stream collective */
 +      if (tpc_id == 6) {
 +              WREG32(mmTPC0_QM_CP_MSG_BASE2_ADDR_LO_0 + q_off,
 +                              mtr_base_ws_lo);
 +              WREG32(mmTPC0_QM_CP_MSG_BASE2_ADDR_HI_0 + q_off,
 +                              mtr_base_ws_hi);
 +              WREG32(mmTPC0_QM_CP_MSG_BASE3_ADDR_LO_0 + q_off,
 +                              so_base_ws_lo);
 +              WREG32(mmTPC0_QM_CP_MSG_BASE3_ADDR_HI_0 + q_off,
 +                              so_base_ws_hi);
 +      }
 +}
 +
 +static void gaudi_init_tpc_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_internal_qman_info *q;
 +      u64 qman_base_addr;
 +      u32 so_base_hi, tpc_offset = 0;
 +      u32 tpc_delta = mmTPC1_CFG_SM_BASE_ADDRESS_HIGH -
 +                      mmTPC0_CFG_SM_BASE_ADDRESS_HIGH;
 +      int i, tpc_id, internal_q_index;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_TPC_MASK)
 +              return;
 +
 +      so_base_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      for (tpc_id = 0 ; tpc_id < TPC_NUMBER_OF_ENGINES ; tpc_id++) {
 +              for (i = 0 ; i < QMAN_STREAMS ; i++) {
 +                      internal_q_index = GAUDI_QUEUE_ID_TPC_0_0 +
 +                                              tpc_id * QMAN_STREAMS + i;
 +                      q = &gaudi->internal_qmans[internal_q_index];
 +                      qman_base_addr = (u64) q->pq_dma_addr;
 +                      gaudi_init_tpc_qman(hdev, tpc_offset, i,
 +                                              qman_base_addr);
 +
 +                      if (i == 3) {
 +                              /* Initializing lower CP for TPC QMAN */
 +                              gaudi_init_tpc_qman(hdev, tpc_offset, 4, 0);
 +
 +                              /* Enable the QMAN and TPC channel */
 +                              WREG32(mmTPC0_QM_GLBL_CFG0 + tpc_offset,
 +                                              QMAN_TPC_ENABLE);
 +                      }
 +              }
 +
 +              WREG32(mmTPC0_CFG_SM_BASE_ADDRESS_HIGH + tpc_id * tpc_delta,
 +                              so_base_hi);
 +
 +              tpc_offset += mmTPC1_QM_GLBL_CFG0 - mmTPC0_QM_GLBL_CFG0;
 +
 +              gaudi->hw_cap_initialized |=
 +                              FIELD_PREP(HW_CAP_TPC_MASK, 1 << tpc_id);
 +      }
 +}
 +
 +static void gaudi_init_nic_qman(struct hl_device *hdev, u32 nic_offset,
 +                              int qman_id, u64 qman_base_addr, int nic_id)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 mtr_base_en_lo, mtr_base_en_hi, mtr_base_ws_lo, mtr_base_ws_hi;
 +      u32 so_base_en_lo, so_base_en_hi, so_base_ws_lo, so_base_ws_hi;
 +      u32 nic_qm_err_cfg, irq_handler_offset;
 +      u32 q_off;
 +
 +      mtr_base_en_lo = lower_32_bits((CFG_BASE & U32_MAX) +
 +                      mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_en_lo = lower_32_bits((CFG_BASE & U32_MAX) +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      mtr_base_ws_lo = lower_32_bits((CFG_BASE & U32_MAX) +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_ws_lo = lower_32_bits((CFG_BASE & U32_MAX) +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      q_off = nic_offset + qman_id * 4;
 +
 +      WREG32(mmNIC0_QM0_PQ_BASE_LO_0 + q_off, lower_32_bits(qman_base_addr));
 +      WREG32(mmNIC0_QM0_PQ_BASE_HI_0 + q_off, upper_32_bits(qman_base_addr));
 +
 +      WREG32(mmNIC0_QM0_PQ_SIZE_0 + q_off, ilog2(NIC_QMAN_LENGTH));
 +      WREG32(mmNIC0_QM0_PQ_PI_0 + q_off, 0);
 +      WREG32(mmNIC0_QM0_PQ_CI_0 + q_off, 0);
 +
 +      WREG32(mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SIZE_OFFSET);
 +      WREG32(mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SRC_OFFSET);
 +      WREG32(mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_DST_OFFSET);
 +
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_en_lo);
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_en_hi);
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_en_lo);
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_en_hi);
 +
 +      /* Configure NIC CP_MSG_BASE 2/3 for sync stream collective */
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_0 + q_off, mtr_base_ws_lo);
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE2_ADDR_HI_0 + q_off, mtr_base_ws_hi);
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE3_ADDR_LO_0 + q_off, so_base_ws_lo);
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE3_ADDR_HI_0 + q_off, so_base_ws_hi);
 +
 +      if (qman_id == 0) {
 +              irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                              mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                              le32_to_cpu(dyn_regs->gic_nic_qm_irq_ctrl);
 +
 +              /* Configure RAZWI IRQ */
 +              nic_qm_err_cfg = NIC_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
 +              if (hdev->stop_on_err)
 +                      nic_qm_err_cfg |=
 +                              NIC_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
 +
 +              WREG32(mmNIC0_QM0_GLBL_ERR_CFG + nic_offset, nic_qm_err_cfg);
 +
 +              WREG32(mmNIC0_QM0_GLBL_ERR_ADDR_LO + nic_offset,
 +                      lower_32_bits(CFG_BASE + irq_handler_offset));
 +              WREG32(mmNIC0_QM0_GLBL_ERR_ADDR_HI + nic_offset,
 +                      upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +              WREG32(mmNIC0_QM0_GLBL_ERR_WDATA + nic_offset,
 +                      gaudi_irq_map_table[GAUDI_EVENT_NIC0_QM0].cpu_id +
 +                                                                      nic_id);
 +
 +              WREG32(mmNIC0_QM0_ARB_ERR_MSG_EN + nic_offset,
 +                              QM_ARB_ERR_MSG_EN_MASK);
 +
 +              /* Set timeout to maximum */
 +              WREG32(mmNIC0_QM0_ARB_SLV_CHOISE_WDT + nic_offset, GAUDI_ARB_WDT_TIMEOUT);
 +
 +              WREG32(mmNIC0_QM0_GLBL_CFG1 + nic_offset, 0);
 +              WREG32(mmNIC0_QM0_GLBL_PROT + nic_offset,
 +                              QMAN_INTERNAL_MAKE_TRUSTED);
 +      }
 +}
 +
 +static void gaudi_init_nic_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_internal_qman_info *q;
 +      u64 qman_base_addr;
 +      u32 nic_offset = 0;
 +      u32 nic_delta_between_qmans =
 +                      mmNIC0_QM1_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
 +      u32 nic_delta_between_nics =
 +                      mmNIC1_QM0_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
 +      int i, nic_id, internal_q_index;
 +
 +      if (!hdev->nic_ports_mask)
 +              return;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC_MASK)
 +              return;
 +
 +      dev_dbg(hdev->dev, "Initializing NIC QMANs\n");
 +
 +      for (nic_id = 0 ; nic_id < NIC_NUMBER_OF_ENGINES ; nic_id++) {
 +              if (!(hdev->nic_ports_mask & (1 << nic_id))) {
 +                      nic_offset += nic_delta_between_qmans;
 +                      if (nic_id & 1) {
 +                              nic_offset -= (nic_delta_between_qmans * 2);
 +                              nic_offset += nic_delta_between_nics;
 +                      }
 +                      continue;
 +              }
 +
 +              for (i = 0 ; i < QMAN_STREAMS ; i++) {
 +                      internal_q_index = GAUDI_QUEUE_ID_NIC_0_0 +
 +                                              nic_id * QMAN_STREAMS + i;
 +                      q = &gaudi->internal_qmans[internal_q_index];
 +                      qman_base_addr = (u64) q->pq_dma_addr;
 +                      gaudi_init_nic_qman(hdev, nic_offset, (i & 0x3),
 +                                              qman_base_addr, nic_id);
 +              }
 +
 +              /* Enable the QMAN */
 +              WREG32(mmNIC0_QM0_GLBL_CFG0 + nic_offset, NIC_QMAN_ENABLE);
 +
 +              nic_offset += nic_delta_between_qmans;
 +              if (nic_id & 1) {
 +                      nic_offset -= (nic_delta_between_qmans * 2);
 +                      nic_offset += nic_delta_between_nics;
 +              }
 +
 +              gaudi->hw_cap_initialized |= 1 << (HW_CAP_NIC_SHIFT + nic_id);
 +      }
 +}
 +
 +static void gaudi_disable_pci_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_PCI_DMA))
 +              return;
 +
 +      WREG32(mmDMA0_QM_GLBL_CFG0, 0);
 +      WREG32(mmDMA1_QM_GLBL_CFG0, 0);
 +      WREG32(mmDMA5_QM_GLBL_CFG0, 0);
 +}
 +
 +static void gaudi_disable_hbm_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_HBM_DMA))
 +              return;
 +
 +      WREG32(mmDMA2_QM_GLBL_CFG0, 0);
 +      WREG32(mmDMA3_QM_GLBL_CFG0, 0);
 +      WREG32(mmDMA4_QM_GLBL_CFG0, 0);
 +      WREG32(mmDMA6_QM_GLBL_CFG0, 0);
 +      WREG32(mmDMA7_QM_GLBL_CFG0, 0);
 +}
 +
 +static void gaudi_disable_mme_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MME))
 +              return;
 +
 +      WREG32(mmMME2_QM_GLBL_CFG0, 0);
 +      WREG32(mmMME0_QM_GLBL_CFG0, 0);
 +}
 +
 +static void gaudi_disable_tpc_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u32 tpc_offset = 0;
 +      int tpc_id;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_TPC_MASK))
 +              return;
 +
 +      for (tpc_id = 0 ; tpc_id < TPC_NUMBER_OF_ENGINES ; tpc_id++) {
 +              WREG32(mmTPC0_QM_GLBL_CFG0 + tpc_offset, 0);
 +              tpc_offset += mmTPC1_QM_GLBL_CFG0 - mmTPC0_QM_GLBL_CFG0;
 +      }
 +}
 +
 +static void gaudi_disable_nic_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u32 nic_mask, nic_offset = 0;
 +      u32 nic_delta_between_qmans =
 +                      mmNIC0_QM1_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
 +      u32 nic_delta_between_nics =
 +                      mmNIC1_QM0_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
 +      int nic_id;
 +
 +      for (nic_id = 0 ; nic_id < NIC_NUMBER_OF_ENGINES ; nic_id++) {
 +              nic_mask = 1 << (HW_CAP_NIC_SHIFT + nic_id);
 +
 +              if (gaudi->hw_cap_initialized & nic_mask)
 +                      WREG32(mmNIC0_QM0_GLBL_CFG0 + nic_offset, 0);
 +
 +              nic_offset += nic_delta_between_qmans;
 +              if (nic_id & 1) {
 +                      nic_offset -= (nic_delta_between_qmans * 2);
 +                      nic_offset += nic_delta_between_nics;
 +              }
 +      }
 +}
 +
 +static void gaudi_stop_pci_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_PCI_DMA))
 +              return;
 +
 +      /* Stop upper CPs of QMANs 0.0 to 1.3 and 5.0 to 5.3 */
 +      WREG32(mmDMA0_QM_GLBL_CFG1, 0xF << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmDMA1_QM_GLBL_CFG1, 0xF << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmDMA5_QM_GLBL_CFG1, 0xF << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +}
 +
 +static void gaudi_stop_hbm_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_HBM_DMA))
 +              return;
 +
 +      /* Stop CPs of HBM DMA QMANs */
 +
 +      WREG32(mmDMA2_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmDMA3_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmDMA4_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmDMA6_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmDMA7_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +}
 +
 +static void gaudi_stop_mme_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MME))
 +              return;
 +
 +      /* Stop CPs of MME QMANs */
 +      WREG32(mmMME2_QM_GLBL_CFG1, 0x1F << MME0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmMME0_QM_GLBL_CFG1, 0x1F << MME0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +}
 +
 +static void gaudi_stop_tpc_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_TPC_MASK))
 +              return;
 +
 +      WREG32(mmTPC0_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmTPC1_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmTPC2_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmTPC3_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmTPC4_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmTPC5_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmTPC6_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmTPC7_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +}
 +
 +static void gaudi_stop_nic_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      /* Stop upper CPs of QMANs */
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC0)
 +              WREG32(mmNIC0_QM0_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC1)
 +              WREG32(mmNIC0_QM1_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC2)
 +              WREG32(mmNIC1_QM0_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC3)
 +              WREG32(mmNIC1_QM1_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC4)
 +              WREG32(mmNIC2_QM0_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC5)
 +              WREG32(mmNIC2_QM1_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC6)
 +              WREG32(mmNIC3_QM0_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC7)
 +              WREG32(mmNIC3_QM1_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC8)
 +              WREG32(mmNIC4_QM0_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC9)
 +              WREG32(mmNIC4_QM1_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +}
 +
 +static void gaudi_pci_dma_stall(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_PCI_DMA))
 +              return;
 +
 +      WREG32(mmDMA0_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +      WREG32(mmDMA1_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +      WREG32(mmDMA5_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +}
 +
 +static void gaudi_hbm_dma_stall(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_HBM_DMA))
 +              return;
 +
 +      WREG32(mmDMA2_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +      WREG32(mmDMA3_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +      WREG32(mmDMA4_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +      WREG32(mmDMA6_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +      WREG32(mmDMA7_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +}
 +
 +static void gaudi_mme_stall(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MME))
 +              return;
 +
 +      /* WA for H3-1800 bug: do ACC and SBAB writes twice */
 +      WREG32(mmMME0_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME0_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME0_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +      WREG32(mmMME0_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +      WREG32(mmMME1_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME1_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME1_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +      WREG32(mmMME1_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +      WREG32(mmMME2_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME2_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME2_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +      WREG32(mmMME2_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +      WREG32(mmMME3_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME3_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME3_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +      WREG32(mmMME3_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +}
 +
 +static void gaudi_tpc_stall(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_TPC_MASK))
 +              return;
 +
 +      WREG32(mmTPC0_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC1_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC2_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC3_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC4_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC5_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC6_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC7_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +}
 +
 +static void gaudi_disable_clock_gating(struct hl_device *hdev)
 +{
 +      u32 qman_offset;
 +      int i;
 +
 +      if (hdev->asic_prop.fw_security_enabled)
 +              return;
 +
 +      for (i = 0, qman_offset = 0 ; i < DMA_NUMBER_OF_CHANNELS ; i++) {
 +              WREG32(mmDMA0_QM_CGM_CFG + qman_offset, 0);
 +              WREG32(mmDMA0_QM_CGM_CFG1 + qman_offset, 0);
 +
 +              qman_offset += (mmDMA1_QM_CGM_CFG - mmDMA0_QM_CGM_CFG);
 +      }
 +
 +      WREG32(mmMME0_QM_CGM_CFG, 0);
 +      WREG32(mmMME0_QM_CGM_CFG1, 0);
 +      WREG32(mmMME2_QM_CGM_CFG, 0);
 +      WREG32(mmMME2_QM_CGM_CFG1, 0);
 +
 +      for (i = 0, qman_offset = 0 ; i < TPC_NUMBER_OF_ENGINES ; i++) {
 +              WREG32(mmTPC0_QM_CGM_CFG + qman_offset, 0);
 +              WREG32(mmTPC0_QM_CGM_CFG1 + qman_offset, 0);
 +
 +              qman_offset += (mmTPC1_QM_CGM_CFG - mmTPC0_QM_CGM_CFG);
 +      }
 +}
 +
 +static void gaudi_enable_timestamp(struct hl_device *hdev)
 +{
 +      /* Disable the timestamp counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 0);
 +
 +      /* Zero the lower/upper parts of the 64-bit counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE + 0xC, 0);
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE + 0x8, 0);
 +
 +      /* Enable the counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 1);
 +}
 +
 +static void gaudi_disable_timestamp(struct hl_device *hdev)
 +{
 +      /* Disable the timestamp counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 0);
 +}
 +
 +static void gaudi_halt_engines(struct hl_device *hdev, bool hard_reset, bool fw_reset)
 +{
 +      u32 wait_timeout_ms;
 +
 +      if (hdev->pldm)
 +              wait_timeout_ms = GAUDI_PLDM_RESET_WAIT_MSEC;
 +      else
 +              wait_timeout_ms = GAUDI_RESET_WAIT_MSEC;
 +
 +      if (fw_reset)
 +              goto skip_engines;
 +
 +      gaudi_stop_nic_qmans(hdev);
 +      gaudi_stop_mme_qmans(hdev);
 +      gaudi_stop_tpc_qmans(hdev);
 +      gaudi_stop_hbm_dma_qmans(hdev);
 +      gaudi_stop_pci_dma_qmans(hdev);
 +
 +      msleep(wait_timeout_ms);
 +
 +      gaudi_pci_dma_stall(hdev);
 +      gaudi_hbm_dma_stall(hdev);
 +      gaudi_tpc_stall(hdev);
 +      gaudi_mme_stall(hdev);
 +
 +      msleep(wait_timeout_ms);
 +
 +      gaudi_disable_nic_qmans(hdev);
 +      gaudi_disable_mme_qmans(hdev);
 +      gaudi_disable_tpc_qmans(hdev);
 +      gaudi_disable_hbm_dma_qmans(hdev);
 +      gaudi_disable_pci_dma_qmans(hdev);
 +
 +      gaudi_disable_timestamp(hdev);
 +
 +skip_engines:
 +      gaudi_disable_msi(hdev);
 +}
 +
 +static int gaudi_mmu_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u64 hop0_addr;
 +      int rc, i;
 +
 +      if (!hdev->mmu_enable)
 +              return 0;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_MMU)
 +              return 0;
 +
 +      for (i = 0 ; i < prop->max_asid ; i++) {
 +              hop0_addr = prop->mmu_pgt_addr +
 +                              (i * prop->mmu_hop_table_size);
 +
 +              rc = gaudi_mmu_update_asid_hop0_addr(hdev, i, hop0_addr);
 +              if (rc) {
 +                      dev_err(hdev->dev,
 +                              "failed to set hop0 addr for asid %d\n", i);
 +                      goto err;
 +              }
 +      }
 +
 +      /* init MMU cache manage page */
 +      WREG32(mmSTLB_CACHE_INV_BASE_39_8, prop->mmu_cache_mng_addr >> 8);
 +      WREG32(mmSTLB_CACHE_INV_BASE_49_40, prop->mmu_cache_mng_addr >> 40);
 +
 +      /* mem cache invalidation */
 +      WREG32(mmSTLB_MEM_CACHE_INVALIDATION, 1);
 +
 +      hl_mmu_invalidate_cache(hdev, true, 0);
 +
 +      WREG32(mmMMU_UP_MMU_ENABLE, 1);
 +      WREG32(mmMMU_UP_SPI_MASK, 0xF);
 +
 +      WREG32(mmSTLB_HOP_CONFIGURATION, 0x30440);
 +
 +      /*
 +       * The H/W expects the first PI after init to be 1. After wraparound
 +       * we'll write 0.
 +       */
 +      gaudi->mmu_cache_inv_pi = 1;
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_MMU;
 +
 +      return 0;
 +
 +err:
 +      return rc;
 +}
 +
 +static int gaudi_load_firmware_to_device(struct hl_device *hdev)
 +{
 +      void __iomem *dst;
 +
 +      dst = hdev->pcie_bar[HBM_BAR_ID] + LINUX_FW_OFFSET;
 +
 +      return hl_fw_load_fw_to_device(hdev, GAUDI_LINUX_FW_FILE, dst, 0, 0);
 +}
 +
 +static int gaudi_load_boot_fit_to_device(struct hl_device *hdev)
 +{
 +      void __iomem *dst;
 +
 +      dst = hdev->pcie_bar[SRAM_BAR_ID] + BOOT_FIT_SRAM_OFFSET;
 +
 +      return hl_fw_load_fw_to_device(hdev, GAUDI_BOOT_FIT_FILE, dst, 0, 0);
 +}
 +
 +static void gaudi_init_dynamic_firmware_loader(struct hl_device *hdev)
 +{
 +      struct dynamic_fw_load_mgr *dynamic_loader;
 +      struct cpu_dyn_regs *dyn_regs;
 +
 +      dynamic_loader = &hdev->fw_loader.dynamic_loader;
 +
 +      /*
 +       * here we update initial values for few specific dynamic regs (as
 +       * before reading the first descriptor from FW those value has to be
 +       * hard-coded) in later stages of the protocol those values will be
 +       * updated automatically by reading the FW descriptor so data there
 +       * will always be up-to-date
 +       */
 +      dyn_regs = &dynamic_loader->comm_desc.cpu_dyn_regs;
 +      dyn_regs->kmd_msg_to_cpu =
 +                              cpu_to_le32(mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU);
 +      dyn_regs->cpu_cmd_status_to_host =
 +                              cpu_to_le32(mmCPU_CMD_STATUS_TO_HOST);
 +
 +      dynamic_loader->wait_for_bl_timeout = GAUDI_WAIT_FOR_BL_TIMEOUT_USEC;
 +}
 +
 +static void gaudi_init_static_firmware_loader(struct hl_device *hdev)
 +{
 +      struct static_fw_load_mgr *static_loader;
 +
 +      static_loader = &hdev->fw_loader.static_loader;
 +
 +      static_loader->preboot_version_max_off = SRAM_SIZE - VERSION_MAX_LEN;
 +      static_loader->boot_fit_version_max_off = SRAM_SIZE - VERSION_MAX_LEN;
 +      static_loader->kmd_msg_to_cpu_reg = mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU;
 +      static_loader->cpu_cmd_status_to_host_reg = mmCPU_CMD_STATUS_TO_HOST;
 +      static_loader->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
 +      static_loader->cpu_boot_dev_status0_reg = mmCPU_BOOT_DEV_STS0;
 +      static_loader->cpu_boot_dev_status1_reg = mmCPU_BOOT_DEV_STS1;
 +      static_loader->boot_err0_reg = mmCPU_BOOT_ERR0;
 +      static_loader->boot_err1_reg = mmCPU_BOOT_ERR1;
 +      static_loader->preboot_version_offset_reg = mmPREBOOT_VER_OFFSET;
 +      static_loader->boot_fit_version_offset_reg = mmUBOOT_VER_OFFSET;
 +      static_loader->sram_offset_mask = ~(lower_32_bits(SRAM_BASE_ADDR));
 +      static_loader->cpu_reset_wait_msec = hdev->pldm ?
 +                      GAUDI_PLDM_RESET_WAIT_MSEC :
 +                      GAUDI_CPU_RESET_WAIT_MSEC;
 +}
 +
 +static void gaudi_init_firmware_preload_params(struct hl_device *hdev)
 +{
 +      struct pre_fw_load_props *pre_fw_load = &hdev->fw_loader.pre_fw_load;
 +
 +      pre_fw_load->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
 +      pre_fw_load->sts_boot_dev_sts0_reg = mmCPU_BOOT_DEV_STS0;
 +      pre_fw_load->sts_boot_dev_sts1_reg = mmCPU_BOOT_DEV_STS1;
 +      pre_fw_load->boot_err0_reg = mmCPU_BOOT_ERR0;
 +      pre_fw_load->boot_err1_reg = mmCPU_BOOT_ERR1;
 +      pre_fw_load->wait_for_preboot_timeout = GAUDI_BOOT_FIT_REQ_TIMEOUT_USEC;
 +}
 +
 +static void gaudi_init_firmware_loader(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct fw_load_mgr *fw_loader = &hdev->fw_loader;
 +
 +      /* fill common fields */
 +      fw_loader->fw_comp_loaded = FW_TYPE_NONE;
 +      fw_loader->boot_fit_img.image_name = GAUDI_BOOT_FIT_FILE;
 +      fw_loader->linux_img.image_name = GAUDI_LINUX_FW_FILE;
 +      fw_loader->cpu_timeout = GAUDI_CPU_TIMEOUT_USEC;
 +      fw_loader->boot_fit_timeout = GAUDI_BOOT_FIT_REQ_TIMEOUT_USEC;
 +      fw_loader->skip_bmc = !hdev->bmc_enable;
 +      fw_loader->sram_bar_id = SRAM_BAR_ID;
 +      fw_loader->dram_bar_id = HBM_BAR_ID;
 +
 +      if (prop->dynamic_fw_load)
 +              gaudi_init_dynamic_firmware_loader(hdev);
 +      else
 +              gaudi_init_static_firmware_loader(hdev);
 +}
 +
 +static int gaudi_init_cpu(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      int rc;
 +
 +      if (!(hdev->fw_components & FW_TYPE_PREBOOT_CPU))
 +              return 0;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_CPU)
 +              return 0;
 +
 +      /*
 +       * The device CPU works with 40 bits addresses.
 +       * This register sets the extension to 50 bits.
 +       */
 +      if (!hdev->asic_prop.fw_security_enabled)
 +              WREG32(mmCPU_IF_CPU_MSB_ADDR, hdev->cpu_pci_msb_addr);
 +
 +      rc = hl_fw_init_cpu(hdev);
 +
 +      if (rc)
 +              return rc;
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_CPU;
 +
 +      return 0;
 +}
 +
 +static int gaudi_init_cpu_queues(struct hl_device *hdev, u32 cpu_timeout)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u32 status, irq_handler_offset;
 +      struct hl_eq *eq;
 +      struct hl_hw_queue *cpu_pq =
 +                      &hdev->kernel_queues[GAUDI_QUEUE_ID_CPU_PQ];
 +      int err;
 +
 +      if (!hdev->cpu_queues_enable)
 +              return 0;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_CPU_Q)
 +              return 0;
 +
 +      eq = &hdev->event_queue;
 +
 +      WREG32(mmCPU_IF_PQ_BASE_ADDR_LOW, lower_32_bits(cpu_pq->bus_address));
 +      WREG32(mmCPU_IF_PQ_BASE_ADDR_HIGH, upper_32_bits(cpu_pq->bus_address));
 +
 +      WREG32(mmCPU_IF_EQ_BASE_ADDR_LOW, lower_32_bits(eq->bus_address));
 +      WREG32(mmCPU_IF_EQ_BASE_ADDR_HIGH, upper_32_bits(eq->bus_address));
 +
 +      WREG32(mmCPU_IF_CQ_BASE_ADDR_LOW,
 +                      lower_32_bits(hdev->cpu_accessible_dma_address));
 +      WREG32(mmCPU_IF_CQ_BASE_ADDR_HIGH,
 +                      upper_32_bits(hdev->cpu_accessible_dma_address));
 +
 +      WREG32(mmCPU_IF_PQ_LENGTH, HL_QUEUE_SIZE_IN_BYTES);
 +      WREG32(mmCPU_IF_EQ_LENGTH, HL_EQ_SIZE_IN_BYTES);
 +      WREG32(mmCPU_IF_CQ_LENGTH, HL_CPU_ACCESSIBLE_MEM_SIZE);
 +
 +      /* Used for EQ CI */
 +      WREG32(mmCPU_IF_EQ_RD_OFFS, 0);
 +
 +      WREG32(mmCPU_IF_PF_PQ_PI, 0);
 +
 +      if (gaudi->multi_msi_mode)
 +              WREG32(mmCPU_IF_QUEUE_INIT, PQ_INIT_STATUS_READY_FOR_CP);
 +      else
 +              WREG32(mmCPU_IF_QUEUE_INIT,
 +                      PQ_INIT_STATUS_READY_FOR_CP_SINGLE_MSI);
 +
 +      irq_handler_offset = prop->gic_interrupts_enable ?
 +                      mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                      le32_to_cpu(dyn_regs->gic_host_pi_upd_irq);
 +
 +      WREG32(irq_handler_offset,
 +              gaudi_irq_map_table[GAUDI_EVENT_PI_UPDATE].cpu_id);
 +
 +      err = hl_poll_timeout(
 +              hdev,
 +              mmCPU_IF_QUEUE_INIT,
 +              status,
 +              (status == PQ_INIT_STATUS_READY_FOR_HOST),
 +              1000,
 +              cpu_timeout);
 +
 +      if (err) {
 +              dev_err(hdev->dev,
 +                      "Failed to communicate with Device CPU (CPU-CP timeout)\n");
 +              return -EIO;
 +      }
 +
 +      /* update FW application security bits */
 +      if (prop->fw_cpu_boot_dev_sts0_valid)
 +              prop->fw_app_cpu_boot_dev_sts0 = RREG32(mmCPU_BOOT_DEV_STS0);
 +      if (prop->fw_cpu_boot_dev_sts1_valid)
 +              prop->fw_app_cpu_boot_dev_sts1 = RREG32(mmCPU_BOOT_DEV_STS1);
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_CPU_Q;
 +      return 0;
 +}
 +
 +static void gaudi_pre_hw_init(struct hl_device *hdev)
 +{
 +      /* Perform read from the device to make sure device is up */
 +      RREG32(mmHW_STATE);
 +
 +      if (!hdev->asic_prop.fw_security_enabled) {
 +              /* Set the access through PCI bars (Linux driver only) as
 +               * secured
 +               */
 +              WREG32(mmPCIE_WRAP_LBW_PROT_OVR,
 +                              (PCIE_WRAP_LBW_PROT_OVR_RD_EN_MASK |
 +                              PCIE_WRAP_LBW_PROT_OVR_WR_EN_MASK));
 +
 +              /* Perform read to flush the waiting writes to ensure
 +               * configuration was set in the device
 +               */
 +              RREG32(mmPCIE_WRAP_LBW_PROT_OVR);
 +      }
 +
 +      /*
 +       * Let's mark in the H/W that we have reached this point. We check
 +       * this value in the reset_before_init function to understand whether
 +       * we need to reset the chip before doing H/W init. This register is
 +       * cleared by the H/W upon H/W reset
 +       */
 +      WREG32(mmHW_STATE, HL_DEVICE_HW_STATE_DIRTY);
 +}
 +
 +static int gaudi_hw_init(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      int rc;
 +
 +      gaudi_pre_hw_init(hdev);
 +
 +      /* If iATU is done by FW, the HBM bar ALWAYS points to DRAM_PHYS_BASE.
 +       * So we set it here and if anyone tries to move it later to
 +       * a different address, there will be an error
 +       */
 +      if (hdev->asic_prop.iatu_done_by_fw)
 +              gaudi->hbm_bar_cur_addr = DRAM_PHYS_BASE;
 +
 +      /*
 +       * Before pushing u-boot/linux to device, need to set the hbm bar to
 +       * base address of dram
 +       */
 +      if (gaudi_set_hbm_bar_base(hdev, DRAM_PHYS_BASE) == U64_MAX) {
 +              dev_err(hdev->dev,
 +                      "failed to map HBM bar to DRAM base address\n");
 +              return -EIO;
 +      }
 +
 +      rc = gaudi_init_cpu(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to initialize CPU\n");
 +              return rc;
 +      }
 +
 +      /* In case the clock gating was enabled in preboot we need to disable
 +       * it here before touching the MME/TPC registers.
 +       */
 +      gaudi_disable_clock_gating(hdev);
 +
 +      /* SRAM scrambler must be initialized after CPU is running from HBM */
 +      gaudi_init_scrambler_sram(hdev);
 +
 +      /* This is here just in case we are working without CPU */
 +      gaudi_init_scrambler_hbm(hdev);
 +
 +      gaudi_init_golden_registers(hdev);
 +
 +      rc = gaudi_mmu_init(hdev);
 +      if (rc)
 +              return rc;
 +
 +      gaudi_init_security(hdev);
 +
 +      gaudi_init_pci_dma_qmans(hdev);
 +
 +      gaudi_init_hbm_dma_qmans(hdev);
 +
 +      gaudi_init_mme_qmans(hdev);
 +
 +      gaudi_init_tpc_qmans(hdev);
 +
 +      gaudi_init_nic_qmans(hdev);
 +
 +      gaudi_enable_timestamp(hdev);
 +
 +      /* MSI must be enabled before CPU queues and NIC are initialized */
 +      rc = gaudi_enable_msi(hdev);
 +      if (rc)
 +              goto disable_queues;
 +
 +      /* must be called after MSI was enabled */
 +      rc = gaudi_init_cpu_queues(hdev, GAUDI_CPU_TIMEOUT_USEC);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to initialize CPU H/W queues %d\n",
 +                      rc);
 +              goto disable_msi;
 +      }
 +
 +      /* Perform read from the device to flush all configuration */
 +      RREG32(mmHW_STATE);
 +
 +      return 0;
 +
 +disable_msi:
 +      gaudi_disable_msi(hdev);
 +disable_queues:
 +      gaudi_disable_mme_qmans(hdev);
 +      gaudi_disable_pci_dma_qmans(hdev);
 +
 +      return rc;
 +}
 +
 +static void gaudi_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 status, reset_timeout_ms, cpu_timeout_ms, irq_handler_offset;
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      bool driver_performs_reset;
 +
 +      if (!hard_reset) {
 +              dev_err(hdev->dev, "GAUDI doesn't support soft-reset\n");
 +              return;
 +      }
 +
 +      if (hdev->pldm) {
 +              reset_timeout_ms = GAUDI_PLDM_HRESET_TIMEOUT_MSEC;
 +              cpu_timeout_ms = GAUDI_PLDM_RESET_WAIT_MSEC;
 +      } else {
 +              reset_timeout_ms = GAUDI_RESET_TIMEOUT_MSEC;
 +              cpu_timeout_ms = GAUDI_CPU_RESET_WAIT_MSEC;
 +      }
 +
 +      if (fw_reset) {
 +              dev_dbg(hdev->dev,
 +                      "Firmware performs HARD reset, going to wait %dms\n",
 +                      reset_timeout_ms);
 +
 +              goto skip_reset;
 +      }
 +
 +      driver_performs_reset = !!(!hdev->asic_prop.fw_security_enabled &&
 +                                      !hdev->asic_prop.hard_reset_done_by_fw);
 +
 +      /* Set device to handle FLR by H/W as we will put the device CPU to
 +       * halt mode
 +       */
 +      if (driver_performs_reset)
 +              WREG32(mmPCIE_AUX_FLR_CTRL, (PCIE_AUX_FLR_CTRL_HW_CTRL_MASK |
 +                                      PCIE_AUX_FLR_CTRL_INT_MASK_MASK));
 +
 +      /* If linux is loaded in the device CPU we need to communicate with it
 +       * via the GIC. Otherwise, we need to use COMMS or the MSG_TO_CPU
 +       * registers in case of old F/Ws
 +       */
 +      if (hdev->fw_loader.fw_comp_loaded & FW_TYPE_LINUX) {
 +              irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                              mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                              le32_to_cpu(dyn_regs->gic_host_halt_irq);
 +
 +              WREG32(irq_handler_offset,
 +                      gaudi_irq_map_table[GAUDI_EVENT_HALT_MACHINE].cpu_id);
 +
 +              /* This is a hail-mary attempt to revive the card in the small chance that the
 +               * f/w has experienced a watchdog event, which caused it to return back to preboot.
 +               * In that case, triggering reset through GIC won't help. We need to trigger the
 +               * reset as if Linux wasn't loaded.
 +               *
 +               * We do it only if the reset cause was HB, because that would be the indication
 +               * of such an event.
 +               *
 +               * In case watchdog hasn't expired but we still got HB, then this won't do any
 +               * damage.
 +               */
 +              if (hdev->reset_info.curr_reset_cause == HL_RESET_CAUSE_HEARTBEAT) {
 +                      if (hdev->asic_prop.hard_reset_done_by_fw)
 +                              hl_fw_ask_hard_reset_without_linux(hdev);
 +                      else
 +                              hl_fw_ask_halt_machine_without_linux(hdev);
 +              }
 +      } else {
 +              if (hdev->asic_prop.hard_reset_done_by_fw)
 +                      hl_fw_ask_hard_reset_without_linux(hdev);
 +              else
 +                      hl_fw_ask_halt_machine_without_linux(hdev);
 +      }
 +
 +      if (driver_performs_reset) {
 +
 +              /* Configure the reset registers. Must be done as early as
 +               * possible in case we fail during H/W initialization
 +               */
 +              WREG32(mmPSOC_GLOBAL_CONF_SOFT_RST_CFG_H,
 +                                              (CFG_RST_H_DMA_MASK |
 +                                              CFG_RST_H_MME_MASK |
 +                                              CFG_RST_H_SM_MASK |
 +                                              CFG_RST_H_TPC_7_MASK));
 +
 +              WREG32(mmPSOC_GLOBAL_CONF_SOFT_RST_CFG_L, CFG_RST_L_TPC_MASK);
 +
 +              WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST_CFG_H,
 +                                              (CFG_RST_H_HBM_MASK |
 +                                              CFG_RST_H_TPC_7_MASK |
 +                                              CFG_RST_H_NIC_MASK |
 +                                              CFG_RST_H_SM_MASK |
 +                                              CFG_RST_H_DMA_MASK |
 +                                              CFG_RST_H_MME_MASK |
 +                                              CFG_RST_H_CPU_MASK |
 +                                              CFG_RST_H_MMU_MASK));
 +
 +              WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST_CFG_L,
 +                                              (CFG_RST_L_IF_MASK |
 +                                              CFG_RST_L_PSOC_MASK |
 +                                              CFG_RST_L_TPC_MASK));
 +
 +              msleep(cpu_timeout_ms);
 +
 +              /* Tell ASIC not to re-initialize PCIe */
 +              WREG32(mmPREBOOT_PCIE_EN, LKD_HARD_RESET_MAGIC);
 +
 +              /* Restart BTL/BLR upon hard-reset */
 +              WREG32(mmPSOC_GLOBAL_CONF_BOOT_SEQ_RE_START, 1);
 +
 +              WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST,
 +                      1 << PSOC_GLOBAL_CONF_SW_ALL_RST_IND_SHIFT);
 +
 +              dev_dbg(hdev->dev,
 +                      "Issued HARD reset command, going to wait %dms\n",
 +                      reset_timeout_ms);
 +      } else {
 +              dev_dbg(hdev->dev,
 +                      "Firmware performs HARD reset, going to wait %dms\n",
 +                      reset_timeout_ms);
 +      }
 +
 +skip_reset:
 +      /*
 +       * After hard reset, we can't poll the BTM_FSM register because the PSOC
 +       * itself is in reset. Need to wait until the reset is deasserted
 +       */
 +      msleep(reset_timeout_ms);
 +
 +      status = RREG32(mmPSOC_GLOBAL_CONF_BTM_FSM);
 +      if (status & PSOC_GLOBAL_CONF_BTM_FSM_STATE_MASK)
 +              dev_err(hdev->dev,
 +                      "Timeout while waiting for device to reset 0x%x\n",
 +                      status);
 +
 +      if (gaudi) {
 +              gaudi->hw_cap_initialized &= ~(HW_CAP_CPU | HW_CAP_CPU_Q | HW_CAP_HBM |
 +                                              HW_CAP_PCI_DMA | HW_CAP_MME | HW_CAP_TPC_MASK |
 +                                              HW_CAP_HBM_DMA | HW_CAP_PLL | HW_CAP_NIC_MASK |
 +                                              HW_CAP_MMU | HW_CAP_SRAM_SCRAMBLER |
 +                                              HW_CAP_HBM_SCRAMBLER);
 +
 +              memset(gaudi->events_stat, 0, sizeof(gaudi->events_stat));
 +
 +              hdev->device_cpu_is_halted = false;
 +      }
 +}
 +
 +static int gaudi_suspend(struct hl_device *hdev)
 +{
 +      int rc;
 +
 +      rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
 +      if (rc)
 +              dev_err(hdev->dev, "Failed to disable PCI access from CPU\n");
 +
 +      return rc;
 +}
 +
 +static int gaudi_resume(struct hl_device *hdev)
 +{
 +      return gaudi_init_iatu(hdev);
 +}
 +
 +static int gaudi_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
 +                      void *cpu_addr, dma_addr_t dma_addr, size_t size)
 +{
 +      int rc;
 +
++      vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
++                      VM_DONTCOPY | VM_NORESERVE);
 +
 +      rc = dma_mmap_coherent(hdev->dev, vma, cpu_addr,
 +                              (dma_addr - HOST_PHYS_BASE), size);
 +      if (rc)
 +              dev_err(hdev->dev, "dma_mmap_coherent error %d", rc);
 +
 +      return rc;
 +}
 +
 +static void gaudi_ring_doorbell(struct hl_device *hdev, u32 hw_queue_id, u32 pi)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 db_reg_offset, db_value, dma_qm_offset, q_off, irq_handler_offset;
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      bool invalid_queue = false;
 +      int dma_id;
 +
 +      switch (hw_queue_id) {
 +      case GAUDI_QUEUE_ID_DMA_0_0...GAUDI_QUEUE_ID_DMA_0_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_PCI_DMA_1];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + (hw_queue_id & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_DMA_1_0...GAUDI_QUEUE_ID_DMA_1_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_PCI_DMA_2];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + (hw_queue_id & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_DMA_2_0...GAUDI_QUEUE_ID_DMA_2_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_1];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_DMA_3_0...GAUDI_QUEUE_ID_DMA_3_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_2];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_DMA_4_0...GAUDI_QUEUE_ID_DMA_4_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_3];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_DMA_5_0...GAUDI_QUEUE_ID_DMA_5_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_4];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_DMA_6_0...GAUDI_QUEUE_ID_DMA_6_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_5];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_DMA_7_0...GAUDI_QUEUE_ID_DMA_7_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_6];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_CPU_PQ:
 +              if (gaudi->hw_cap_initialized & HW_CAP_CPU_Q)
 +                      db_reg_offset = mmCPU_IF_PF_PQ_PI;
 +              else
 +                      invalid_queue = true;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_0_0:
 +              db_reg_offset = mmMME2_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_0_1:
 +              db_reg_offset = mmMME2_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_0_2:
 +              db_reg_offset = mmMME2_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_0_3:
 +              db_reg_offset = mmMME2_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_1_0:
 +              db_reg_offset = mmMME0_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_1_1:
 +              db_reg_offset = mmMME0_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_1_2:
 +              db_reg_offset = mmMME0_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_1_3:
 +              db_reg_offset = mmMME0_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_0_0:
 +              db_reg_offset = mmTPC0_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_0_1:
 +              db_reg_offset = mmTPC0_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_0_2:
 +              db_reg_offset = mmTPC0_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_0_3:
 +              db_reg_offset = mmTPC0_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_1_0:
 +              db_reg_offset = mmTPC1_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_1_1:
 +              db_reg_offset = mmTPC1_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_1_2:
 +              db_reg_offset = mmTPC1_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_1_3:
 +              db_reg_offset = mmTPC1_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_2_0:
 +              db_reg_offset = mmTPC2_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_2_1:
 +              db_reg_offset = mmTPC2_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_2_2:
 +              db_reg_offset = mmTPC2_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_2_3:
 +              db_reg_offset = mmTPC2_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_3_0:
 +              db_reg_offset = mmTPC3_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_3_1:
 +              db_reg_offset = mmTPC3_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_3_2:
 +              db_reg_offset = mmTPC3_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_3_3:
 +              db_reg_offset = mmTPC3_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_4_0:
 +              db_reg_offset = mmTPC4_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_4_1:
 +              db_reg_offset = mmTPC4_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_4_2:
 +              db_reg_offset = mmTPC4_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_4_3:
 +              db_reg_offset = mmTPC4_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_5_0:
 +              db_reg_offset = mmTPC5_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_5_1:
 +              db_reg_offset = mmTPC5_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_5_2:
 +              db_reg_offset = mmTPC5_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_5_3:
 +              db_reg_offset = mmTPC5_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_6_0:
 +              db_reg_offset = mmTPC6_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_6_1:
 +              db_reg_offset = mmTPC6_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_6_2:
 +              db_reg_offset = mmTPC6_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_6_3:
 +              db_reg_offset = mmTPC6_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_7_0:
 +              db_reg_offset = mmTPC7_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_7_1:
 +              db_reg_offset = mmTPC7_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_7_2:
 +              db_reg_offset = mmTPC7_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_7_3:
 +              db_reg_offset = mmTPC7_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_0_0...GAUDI_QUEUE_ID_NIC_0_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC0))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC0_QM0_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_1_0...GAUDI_QUEUE_ID_NIC_1_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC1))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC0_QM1_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_2_0...GAUDI_QUEUE_ID_NIC_2_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC2))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC1_QM0_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_3_0...GAUDI_QUEUE_ID_NIC_3_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC3))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC1_QM1_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_4_0...GAUDI_QUEUE_ID_NIC_4_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC4))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC2_QM0_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_5_0...GAUDI_QUEUE_ID_NIC_5_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC5))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC2_QM1_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_6_0...GAUDI_QUEUE_ID_NIC_6_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC6))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC3_QM0_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_7_0...GAUDI_QUEUE_ID_NIC_7_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC7))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC3_QM1_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_8_0...GAUDI_QUEUE_ID_NIC_8_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC8))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC4_QM0_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_9_0...GAUDI_QUEUE_ID_NIC_9_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC9))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC4_QM1_PQ_PI_0 + q_off;
 +              break;
 +
 +      default:
 +              invalid_queue = true;
 +      }
 +
 +      if (invalid_queue) {
 +              /* Should never get here */
 +              dev_err(hdev->dev, "h/w queue %d is invalid. Can't set pi\n",
 +                      hw_queue_id);
 +              return;
 +      }
 +
 +      db_value = pi;
 +
 +      /* ring the doorbell */
 +      WREG32(db_reg_offset, db_value);
 +
 +      if (hw_queue_id == GAUDI_QUEUE_ID_CPU_PQ) {
 +              /* make sure device CPU will read latest data from host */
 +              mb();
 +
 +              irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                              mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                              le32_to_cpu(dyn_regs->gic_host_pi_upd_irq);
 +
 +              WREG32(irq_handler_offset,
 +                      gaudi_irq_map_table[GAUDI_EVENT_PI_UPDATE].cpu_id);
 +      }
 +}
 +
 +static void gaudi_pqe_write(struct hl_device *hdev, __le64 *pqe,
 +                              struct hl_bd *bd)
 +{
 +      __le64 *pbd = (__le64 *) bd;
 +
 +      /* The QMANs are on the host memory so a simple copy suffice */
 +      pqe[0] = pbd[0];
 +      pqe[1] = pbd[1];
 +}
 +
 +static void *gaudi_dma_alloc_coherent(struct hl_device *hdev, size_t size,
 +                                      dma_addr_t *dma_handle, gfp_t flags)
 +{
 +      void *kernel_addr = dma_alloc_coherent(&hdev->pdev->dev, size,
 +                                              dma_handle, flags);
 +
 +      /* Shift to the device's base physical address of host memory */
 +      if (kernel_addr)
 +              *dma_handle += HOST_PHYS_BASE;
 +
 +      return kernel_addr;
 +}
 +
 +static void gaudi_dma_free_coherent(struct hl_device *hdev, size_t size,
 +              void *cpu_addr, dma_addr_t dma_handle)
 +{
 +      /* Cancel the device's base physical address of host memory */
 +      dma_addr_t fixed_dma_handle = dma_handle - HOST_PHYS_BASE;
 +
 +      dma_free_coherent(&hdev->pdev->dev, size, cpu_addr, fixed_dma_handle);
 +}
 +
 +static int gaudi_scrub_device_dram(struct hl_device *hdev, u64 val)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 cur_addr = prop->dram_user_base_address;
 +      u32 chunk_size, busy;
 +      int rc, dma_id;
 +
 +      while (cur_addr < prop->dram_end_address) {
 +              for (dma_id = 0 ; dma_id < DMA_NUMBER_OF_CHANNELS ; dma_id++) {
 +                      u32 dma_offset = dma_id * DMA_CORE_OFFSET;
 +
 +                      chunk_size =
 +                      min((u64)SZ_2G, prop->dram_end_address - cur_addr);
 +
 +                      dev_dbg(hdev->dev,
 +                              "Doing HBM scrubbing for 0x%09llx - 0x%09llx\n",
 +                              cur_addr, cur_addr + chunk_size);
 +
 +                      WREG32(mmDMA0_CORE_SRC_BASE_LO + dma_offset,
 +                                      lower_32_bits(val));
 +                      WREG32(mmDMA0_CORE_SRC_BASE_HI + dma_offset,
 +                                      upper_32_bits(val));
 +                      WREG32(mmDMA0_CORE_DST_BASE_LO + dma_offset,
 +                                              lower_32_bits(cur_addr));
 +                      WREG32(mmDMA0_CORE_DST_BASE_HI + dma_offset,
 +                                              upper_32_bits(cur_addr));
 +                      WREG32(mmDMA0_CORE_DST_TSIZE_0 + dma_offset,
 +                                      chunk_size);
 +                      WREG32(mmDMA0_CORE_COMMIT + dma_offset,
 +                                      ((1 << DMA0_CORE_COMMIT_LIN_SHIFT) |
 +                                      (1 << DMA0_CORE_COMMIT_MEM_SET_SHIFT)));
 +
 +                      cur_addr += chunk_size;
 +
 +                      if (cur_addr == prop->dram_end_address)
 +                              break;
 +              }
 +
 +              for (dma_id = 0 ; dma_id < DMA_NUMBER_OF_CHANNELS ; dma_id++) {
 +                      u32 dma_offset = dma_id * DMA_CORE_OFFSET;
 +
 +                      rc = hl_poll_timeout(
 +                              hdev,
 +                              mmDMA0_CORE_STS0 + dma_offset,
 +                              busy,
 +                              ((busy & DMA0_CORE_STS0_BUSY_MASK) == 0),
 +                              1000,
 +                              HBM_SCRUBBING_TIMEOUT_US);
 +
 +                      if (rc) {
 +                              dev_err(hdev->dev,
 +                                      "DMA Timeout during HBM scrubbing of DMA #%d\n",
 +                                      dma_id);
 +                              return -EIO;
 +                      }
 +              }
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi_scrub_device_mem(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 wait_to_idle_time = hdev->pdev ? HBM_SCRUBBING_TIMEOUT_US :
 +                      min_t(u64, HBM_SCRUBBING_TIMEOUT_US * 10, HL_SIM_MAX_TIMEOUT_US);
 +      u64 addr, size, val = hdev->memory_scrub_val;
 +      ktime_t timeout;
 +      int rc = 0;
 +
 +      if (!hdev->memory_scrub)
 +              return 0;
 +
 +      timeout = ktime_add_us(ktime_get(), wait_to_idle_time);
 +      while (!hdev->asic_funcs->is_device_idle(hdev, NULL, 0, NULL)) {
 +              if (ktime_compare(ktime_get(), timeout) > 0) {
 +                      dev_err(hdev->dev, "waiting for idle timeout\n");
 +                      return -ETIMEDOUT;
 +              }
 +              usleep_range((1000 >> 2) + 1, 1000);
 +      }
 +
 +      /* Scrub SRAM */
 +      addr = prop->sram_user_base_address;
 +      size = hdev->pldm ? 0x10000 : prop->sram_size - SRAM_USER_BASE_OFFSET;
 +
 +      dev_dbg(hdev->dev, "Scrubbing SRAM: 0x%09llx - 0x%09llx val: 0x%llx\n",
 +                      addr, addr + size, val);
 +      rc = gaudi_memset_device_memory(hdev, addr, size, val);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to clear SRAM (%d)\n", rc);
 +              return rc;
 +      }
 +
 +      /* Scrub HBM using all DMA channels in parallel */
 +      rc = gaudi_scrub_device_dram(hdev, val);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to clear HBM (%d)\n", rc);
 +              return rc;
 +      }
 +
 +      return 0;
 +}
 +
 +static void *gaudi_get_int_queue_base(struct hl_device *hdev,
 +                              u32 queue_id, dma_addr_t *dma_handle,
 +                              u16 *queue_len)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_internal_qman_info *q;
 +
 +      if (queue_id >= GAUDI_QUEUE_ID_SIZE ||
 +                      gaudi_queue_type[queue_id] != QUEUE_TYPE_INT) {
 +              dev_err(hdev->dev, "Got invalid queue id %d\n", queue_id);
 +              return NULL;
 +      }
 +
 +      q = &gaudi->internal_qmans[queue_id];
 +      *dma_handle = q->pq_dma_addr;
 +      *queue_len = q->pq_size / QMAN_PQ_ENTRY_SIZE;
 +
 +      return q->pq_kernel_addr;
 +}
 +
 +static int gaudi_send_cpu_message(struct hl_device *hdev, u32 *msg,
 +                              u16 len, u32 timeout, u64 *result)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q)) {
 +              if (result)
 +                      *result = 0;
 +              return 0;
 +      }
 +
 +      if (!timeout)
 +              timeout = GAUDI_MSG_TO_CPU_TIMEOUT_USEC;
 +
 +      return hl_fw_send_cpu_message(hdev, GAUDI_QUEUE_ID_CPU_PQ, msg, len,
 +                                              timeout, result);
 +}
 +
 +static int gaudi_test_queue(struct hl_device *hdev, u32 hw_queue_id)
 +{
 +      struct packet_msg_prot *fence_pkt;
 +      dma_addr_t pkt_dma_addr;
 +      u32 fence_val, tmp, timeout_usec;
 +      dma_addr_t fence_dma_addr;
 +      u32 *fence_ptr;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GAUDI_PLDM_TEST_QUEUE_WAIT_USEC;
 +      else
 +              timeout_usec = GAUDI_TEST_QUEUE_WAIT_USEC;
 +
 +      fence_val = GAUDI_QMAN0_FENCE_VAL;
 +
 +      fence_ptr = hl_asic_dma_pool_zalloc(hdev, 4, GFP_KERNEL, &fence_dma_addr);
 +      if (!fence_ptr) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate memory for H/W queue %d testing\n",
 +                      hw_queue_id);
 +              return -ENOMEM;
 +      }
 +
 +      *fence_ptr = 0;
 +
 +      fence_pkt = hl_asic_dma_pool_zalloc(hdev, sizeof(struct packet_msg_prot), GFP_KERNEL,
 +                                              &pkt_dma_addr);
 +      if (!fence_pkt) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate packet for H/W queue %d testing\n",
 +                      hw_queue_id);
 +              rc = -ENOMEM;
 +              goto free_fence_ptr;
 +      }
 +
 +      tmp = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_PROT);
 +      tmp |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 1);
 +      tmp |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      fence_pkt->ctl = cpu_to_le32(tmp);
 +      fence_pkt->value = cpu_to_le32(fence_val);
 +      fence_pkt->addr = cpu_to_le64(fence_dma_addr);
 +
 +      rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id,
 +                                      sizeof(struct packet_msg_prot),
 +                                      pkt_dma_addr);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to send fence packet to H/W queue %d\n",
 +                      hw_queue_id);
 +              goto free_pkt;
 +      }
 +
 +      rc = hl_poll_timeout_memory(hdev, fence_ptr, tmp, (tmp == fence_val),
 +                                      1000, timeout_usec, true);
 +
 +      hl_hw_queue_inc_ci_kernel(hdev, hw_queue_id);
 +
 +      if (rc == -ETIMEDOUT) {
 +              dev_err(hdev->dev,
 +                      "H/W queue %d test failed (scratch(0x%08llX) == 0x%08X)\n",
 +                      hw_queue_id, (unsigned long long) fence_dma_addr, tmp);
 +              rc = -EIO;
 +      }
 +
 +free_pkt:
 +      hl_asic_dma_pool_free(hdev, (void *) fence_pkt, pkt_dma_addr);
 +free_fence_ptr:
 +      hl_asic_dma_pool_free(hdev, (void *) fence_ptr, fence_dma_addr);
 +      return rc;
 +}
 +
 +static int gaudi_test_cpu_queue(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      /*
 +       * check capability here as send_cpu_message() won't update the result
 +       * value if no capability
 +       */
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_test_cpu_queue(hdev);
 +}
 +
 +static int gaudi_test_queues(struct hl_device *hdev)
 +{
 +      int i, rc, ret_val = 0;
 +
 +      for (i = 0 ; i < hdev->asic_prop.max_queues ; i++) {
 +              if (hdev->asic_prop.hw_queues_props[i].type == QUEUE_TYPE_EXT) {
 +                      rc = gaudi_test_queue(hdev, i);
 +                      if (rc)
 +                              ret_val = -EINVAL;
 +              }
 +      }
 +
 +      rc = gaudi_test_cpu_queue(hdev);
 +      if (rc)
 +              ret_val = -EINVAL;
 +
 +      return ret_val;
 +}
 +
 +static void *gaudi_dma_pool_zalloc(struct hl_device *hdev, size_t size,
 +              gfp_t mem_flags, dma_addr_t *dma_handle)
 +{
 +      void *kernel_addr;
 +
 +      if (size > GAUDI_DMA_POOL_BLK_SIZE)
 +              return NULL;
 +
 +      kernel_addr = dma_pool_zalloc(hdev->dma_pool, mem_flags, dma_handle);
 +
 +      /* Shift to the device's base physical address of host memory */
 +      if (kernel_addr)
 +              *dma_handle += HOST_PHYS_BASE;
 +
 +      return kernel_addr;
 +}
 +
 +static void gaudi_dma_pool_free(struct hl_device *hdev, void *vaddr,
 +                      dma_addr_t dma_addr)
 +{
 +      /* Cancel the device's base physical address of host memory */
 +      dma_addr_t fixed_dma_addr = dma_addr - HOST_PHYS_BASE;
 +
 +      dma_pool_free(hdev->dma_pool, vaddr, fixed_dma_addr);
 +}
 +
 +static void *gaudi_cpu_accessible_dma_pool_alloc(struct hl_device *hdev,
 +                                      size_t size, dma_addr_t *dma_handle)
 +{
 +      return hl_fw_cpu_accessible_dma_pool_alloc(hdev, size, dma_handle);
 +}
 +
 +static void gaudi_cpu_accessible_dma_pool_free(struct hl_device *hdev,
 +                                              size_t size, void *vaddr)
 +{
 +      hl_fw_cpu_accessible_dma_pool_free(hdev, size, vaddr);
 +}
 +
 +static u32 gaudi_get_dma_desc_list_size(struct hl_device *hdev, struct sg_table *sgt)
 +{
 +      struct scatterlist *sg, *sg_next_iter;
 +      u32 count, dma_desc_cnt;
 +      u64 len, len_next;
 +      dma_addr_t addr, addr_next;
 +
 +      dma_desc_cnt = 0;
 +
 +      for_each_sgtable_dma_sg(sgt, sg, count) {
 +              len = sg_dma_len(sg);
 +              addr = sg_dma_address(sg);
 +
 +              if (len == 0)
 +                      break;
 +
 +              while ((count + 1) < sgt->nents) {
 +                      sg_next_iter = sg_next(sg);
 +                      len_next = sg_dma_len(sg_next_iter);
 +                      addr_next = sg_dma_address(sg_next_iter);
 +
 +                      if (len_next == 0)
 +                              break;
 +
 +                      if ((addr + len == addr_next) &&
 +                              (len + len_next <= DMA_MAX_TRANSFER_SIZE)) {
 +                              len += len_next;
 +                              count++;
 +                              sg = sg_next_iter;
 +                      } else {
 +                              break;
 +                      }
 +              }
 +
 +              dma_desc_cnt++;
 +      }
 +
 +      return dma_desc_cnt * sizeof(struct packet_lin_dma);
 +}
 +
 +static int gaudi_pin_memory_before_cs(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt,
 +                              u64 addr, enum dma_data_direction dir)
 +{
 +      struct hl_userptr *userptr;
 +      int rc;
 +
 +      if (hl_userptr_is_pinned(hdev, addr, le32_to_cpu(user_dma_pkt->tsize),
 +                      parser->job_userptr_list, &userptr))
 +              goto already_pinned;
 +
 +      userptr = kzalloc(sizeof(*userptr), GFP_KERNEL);
 +      if (!userptr)
 +              return -ENOMEM;
 +
 +      rc = hl_pin_host_memory(hdev, addr, le32_to_cpu(user_dma_pkt->tsize),
 +                              userptr);
 +      if (rc)
 +              goto free_userptr;
 +
 +      list_add_tail(&userptr->job_node, parser->job_userptr_list);
 +
 +      rc = hdev->asic_funcs->asic_dma_map_sgtable(hdev, userptr->sgt, dir);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to map sgt with DMA region\n");
 +              goto unpin_memory;
 +      }
 +
 +      userptr->dma_mapped = true;
 +      userptr->dir = dir;
 +
 +already_pinned:
 +      parser->patched_cb_size +=
 +                      gaudi_get_dma_desc_list_size(hdev, userptr->sgt);
 +
 +      return 0;
 +
 +unpin_memory:
 +      list_del(&userptr->job_node);
 +      hl_unpin_host_memory(hdev, userptr);
 +free_userptr:
 +      kfree(userptr);
 +      return rc;
 +}
 +
 +static int gaudi_validate_dma_pkt_host(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt,
 +                              bool src_in_host)
 +{
 +      enum dma_data_direction dir;
 +      bool skip_host_mem_pin = false, user_memset;
 +      u64 addr;
 +      int rc = 0;
 +
 +      user_memset = (le32_to_cpu(user_dma_pkt->ctl) &
 +                      GAUDI_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
 +                      GAUDI_PKT_LIN_DMA_CTL_MEMSET_SHIFT;
 +
 +      if (src_in_host) {
 +              if (user_memset)
 +                      skip_host_mem_pin = true;
 +
 +              dev_dbg(hdev->dev, "DMA direction is HOST --> DEVICE\n");
 +              dir = DMA_TO_DEVICE;
 +              addr = le64_to_cpu(user_dma_pkt->src_addr);
 +      } else {
 +              dev_dbg(hdev->dev, "DMA direction is DEVICE --> HOST\n");
 +              dir = DMA_FROM_DEVICE;
 +              addr = (le64_to_cpu(user_dma_pkt->dst_addr) &
 +                              GAUDI_PKT_LIN_DMA_DST_ADDR_MASK) >>
 +                              GAUDI_PKT_LIN_DMA_DST_ADDR_SHIFT;
 +      }
 +
 +      if (skip_host_mem_pin)
 +              parser->patched_cb_size += sizeof(*user_dma_pkt);
 +      else
 +              rc = gaudi_pin_memory_before_cs(hdev, parser, user_dma_pkt,
 +                                              addr, dir);
 +
 +      return rc;
 +}
 +
 +static int gaudi_validate_dma_pkt_no_mmu(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt)
 +{
 +      bool src_in_host = false;
 +      u64 dst_addr = (le64_to_cpu(user_dma_pkt->dst_addr) &
 +                      GAUDI_PKT_LIN_DMA_DST_ADDR_MASK) >>
 +                      GAUDI_PKT_LIN_DMA_DST_ADDR_SHIFT;
 +
 +      dev_dbg(hdev->dev, "DMA packet details:\n");
 +      dev_dbg(hdev->dev, "source == 0x%llx\n",
 +                              le64_to_cpu(user_dma_pkt->src_addr));
 +      dev_dbg(hdev->dev, "destination == 0x%llx\n", dst_addr);
 +      dev_dbg(hdev->dev, "size == %u\n", le32_to_cpu(user_dma_pkt->tsize));
 +
 +      /*
 +       * Special handling for DMA with size 0. Bypass all validations
 +       * because no transactions will be done except for WR_COMP, which
 +       * is not a security issue
 +       */
 +      if (!le32_to_cpu(user_dma_pkt->tsize)) {
 +              parser->patched_cb_size += sizeof(*user_dma_pkt);
 +              return 0;
 +      }
 +
 +      if (parser->hw_queue_id <= GAUDI_QUEUE_ID_DMA_0_3)
 +              src_in_host = true;
 +
 +      return gaudi_validate_dma_pkt_host(hdev, parser, user_dma_pkt,
 +                                              src_in_host);
 +}
 +
 +static int gaudi_validate_load_and_exe_pkt(struct hl_device *hdev,
 +                                      struct hl_cs_parser *parser,
 +                                      struct packet_load_and_exe *user_pkt)
 +{
 +      u32 cfg;
 +
 +      cfg = le32_to_cpu(user_pkt->cfg);
 +
 +      if (cfg & GAUDI_PKT_LOAD_AND_EXE_CFG_DST_MASK) {
 +              dev_err(hdev->dev,
 +                      "User not allowed to use Load and Execute\n");
 +              return -EPERM;
 +      }
 +
 +      parser->patched_cb_size += sizeof(struct packet_load_and_exe);
 +
 +      return 0;
 +}
 +
 +static int gaudi_validate_cb(struct hl_device *hdev,
 +                      struct hl_cs_parser *parser, bool is_mmu)
 +{
 +      u32 cb_parsed_length = 0;
 +      int rc = 0;
 +
 +      parser->patched_cb_size = 0;
 +
 +      /* cb_user_size is more than 0 so loop will always be executed */
 +      while (cb_parsed_length < parser->user_cb_size) {
 +              enum packet_id pkt_id;
 +              u16 pkt_size;
 +              struct gaudi_packet *user_pkt;
 +
 +              user_pkt = parser->user_cb->kernel_address + cb_parsed_length;
 +
 +              pkt_id = (enum packet_id) (
 +                              (le64_to_cpu(user_pkt->header) &
 +                              PACKET_HEADER_PACKET_ID_MASK) >>
 +                                      PACKET_HEADER_PACKET_ID_SHIFT);
 +
 +              if (!validate_packet_id(pkt_id)) {
 +                      dev_err(hdev->dev, "Invalid packet id %u\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              pkt_size = gaudi_packet_sizes[pkt_id];
 +              cb_parsed_length += pkt_size;
 +              if (cb_parsed_length > parser->user_cb_size) {
 +                      dev_err(hdev->dev,
 +                              "packet 0x%x is out of CB boundary\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              switch (pkt_id) {
 +              case PACKET_MSG_PROT:
 +                      dev_err(hdev->dev,
 +                              "User not allowed to use MSG_PROT\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_CP_DMA:
 +                      dev_err(hdev->dev, "User not allowed to use CP_DMA\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_STOP:
 +                      dev_err(hdev->dev, "User not allowed to use STOP\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_WREG_BULK:
 +                      dev_err(hdev->dev,
 +                              "User not allowed to use WREG_BULK\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_LOAD_AND_EXE:
 +                      rc = gaudi_validate_load_and_exe_pkt(hdev, parser,
 +                              (struct packet_load_and_exe *) user_pkt);
 +                      break;
 +
 +              case PACKET_LIN_DMA:
 +                      parser->contains_dma_pkt = true;
 +                      if (is_mmu)
 +                              parser->patched_cb_size += pkt_size;
 +                      else
 +                              rc = gaudi_validate_dma_pkt_no_mmu(hdev, parser,
 +                                      (struct packet_lin_dma *) user_pkt);
 +                      break;
 +
 +              case PACKET_WREG_32:
 +              case PACKET_MSG_LONG:
 +              case PACKET_MSG_SHORT:
 +              case PACKET_REPEAT:
 +              case PACKET_FENCE:
 +              case PACKET_NOP:
 +              case PACKET_ARB_POINT:
 +                      parser->patched_cb_size += pkt_size;
 +                      break;
 +
 +              default:
 +                      dev_err(hdev->dev, "Invalid packet header 0x%x\n",
 +                              pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              if (rc)
 +                      break;
 +      }
 +
 +      /*
 +       * The new CB should have space at the end for two MSG_PROT packets:
 +       * 1. Optional NOP padding for cacheline alignment
 +       * 2. A packet that will act as a completion packet
 +       * 3. A packet that will generate MSI interrupt
 +       */
 +      if (parser->completion)
 +              parser->patched_cb_size += gaudi_get_patched_cb_extra_size(
 +                      parser->patched_cb_size);
 +
 +      return rc;
 +}
 +
 +static int gaudi_patch_dma_packet(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt,
 +                              struct packet_lin_dma *new_dma_pkt,
 +                              u32 *new_dma_pkt_size)
 +{
 +      struct hl_userptr *userptr;
 +      struct scatterlist *sg, *sg_next_iter;
 +      u32 count, dma_desc_cnt, user_wrcomp_en_mask, ctl;
 +      u64 len, len_next;
 +      dma_addr_t dma_addr, dma_addr_next;
 +      u64 device_memory_addr, addr;
 +      enum dma_data_direction dir;
 +      struct sg_table *sgt;
 +      bool src_in_host = false;
 +      bool skip_host_mem_pin = false;
 +      bool user_memset;
 +
 +      ctl = le32_to_cpu(user_dma_pkt->ctl);
 +
 +      if (parser->hw_queue_id <= GAUDI_QUEUE_ID_DMA_0_3)
 +              src_in_host = true;
 +
 +      user_memset = (ctl & GAUDI_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
 +                      GAUDI_PKT_LIN_DMA_CTL_MEMSET_SHIFT;
 +
 +      if (src_in_host) {
 +              addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              dir = DMA_TO_DEVICE;
 +              if (user_memset)
 +                      skip_host_mem_pin = true;
 +      } else {
 +              addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              dir = DMA_FROM_DEVICE;
 +      }
 +
 +      if ((!skip_host_mem_pin) &&
 +              (!hl_userptr_is_pinned(hdev, addr,
 +                                      le32_to_cpu(user_dma_pkt->tsize),
 +                                      parser->job_userptr_list, &userptr))) {
 +              dev_err(hdev->dev, "Userptr 0x%llx + 0x%x NOT mapped\n",
 +                              addr, user_dma_pkt->tsize);
 +              return -EFAULT;
 +      }
 +
 +      if ((user_memset) && (dir == DMA_TO_DEVICE)) {
 +              memcpy(new_dma_pkt, user_dma_pkt, sizeof(*user_dma_pkt));
 +              *new_dma_pkt_size = sizeof(*user_dma_pkt);
 +              return 0;
 +      }
 +
 +      user_wrcomp_en_mask = ctl & GAUDI_PKT_LIN_DMA_CTL_WRCOMP_EN_MASK;
 +
 +      sgt = userptr->sgt;
 +      dma_desc_cnt = 0;
 +
 +      for_each_sgtable_dma_sg(sgt, sg, count) {
 +              len = sg_dma_len(sg);
 +              dma_addr = sg_dma_address(sg);
 +
 +              if (len == 0)
 +                      break;
 +
 +              while ((count + 1) < sgt->nents) {
 +                      sg_next_iter = sg_next(sg);
 +                      len_next = sg_dma_len(sg_next_iter);
 +                      dma_addr_next = sg_dma_address(sg_next_iter);
 +
 +                      if (len_next == 0)
 +                              break;
 +
 +                      if ((dma_addr + len == dma_addr_next) &&
 +                              (len + len_next <= DMA_MAX_TRANSFER_SIZE)) {
 +                              len += len_next;
 +                              count++;
 +                              sg = sg_next_iter;
 +                      } else {
 +                              break;
 +                      }
 +              }
 +
 +              ctl = le32_to_cpu(user_dma_pkt->ctl);
 +              if (likely(dma_desc_cnt))
 +                      ctl &= ~GAUDI_PKT_CTL_EB_MASK;
 +              ctl &= ~GAUDI_PKT_LIN_DMA_CTL_WRCOMP_EN_MASK;
 +              new_dma_pkt->ctl = cpu_to_le32(ctl);
 +              new_dma_pkt->tsize = cpu_to_le32(len);
 +
 +              if (dir == DMA_TO_DEVICE) {
 +                      new_dma_pkt->src_addr = cpu_to_le64(dma_addr);
 +                      new_dma_pkt->dst_addr = cpu_to_le64(device_memory_addr);
 +              } else {
 +                      new_dma_pkt->src_addr = cpu_to_le64(device_memory_addr);
 +                      new_dma_pkt->dst_addr = cpu_to_le64(dma_addr);
 +              }
 +
 +              if (!user_memset)
 +                      device_memory_addr += len;
 +              dma_desc_cnt++;
 +              new_dma_pkt++;
 +      }
 +
 +      if (!dma_desc_cnt) {
 +              dev_err(hdev->dev,
 +                      "Error of 0 SG entries when patching DMA packet\n");
 +              return -EFAULT;
 +      }
 +
 +      /* Fix the last dma packet - wrcomp must be as user set it */
 +      new_dma_pkt--;
 +      new_dma_pkt->ctl |= cpu_to_le32(user_wrcomp_en_mask);
 +
 +      *new_dma_pkt_size = dma_desc_cnt * sizeof(struct packet_lin_dma);
 +
 +      return 0;
 +}
 +
 +static int gaudi_patch_cb(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser)
 +{
 +      u32 cb_parsed_length = 0;
 +      u32 cb_patched_cur_length = 0;
 +      int rc = 0;
 +
 +      /* cb_user_size is more than 0 so loop will always be executed */
 +      while (cb_parsed_length < parser->user_cb_size) {
 +              enum packet_id pkt_id;
 +              u16 pkt_size;
 +              u32 new_pkt_size = 0;
 +              struct gaudi_packet *user_pkt, *kernel_pkt;
 +
 +              user_pkt = parser->user_cb->kernel_address + cb_parsed_length;
 +              kernel_pkt = parser->patched_cb->kernel_address +
 +                                      cb_patched_cur_length;
 +
 +              pkt_id = (enum packet_id) (
 +                              (le64_to_cpu(user_pkt->header) &
 +                              PACKET_HEADER_PACKET_ID_MASK) >>
 +                                      PACKET_HEADER_PACKET_ID_SHIFT);
 +
 +              if (!validate_packet_id(pkt_id)) {
 +                      dev_err(hdev->dev, "Invalid packet id %u\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              pkt_size = gaudi_packet_sizes[pkt_id];
 +              cb_parsed_length += pkt_size;
 +              if (cb_parsed_length > parser->user_cb_size) {
 +                      dev_err(hdev->dev,
 +                              "packet 0x%x is out of CB boundary\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              switch (pkt_id) {
 +              case PACKET_LIN_DMA:
 +                      rc = gaudi_patch_dma_packet(hdev, parser,
 +                                      (struct packet_lin_dma *) user_pkt,
 +                                      (struct packet_lin_dma *) kernel_pkt,
 +                                      &new_pkt_size);
 +                      cb_patched_cur_length += new_pkt_size;
 +                      break;
 +
 +              case PACKET_MSG_PROT:
 +                      dev_err(hdev->dev,
 +                              "User not allowed to use MSG_PROT\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_CP_DMA:
 +                      dev_err(hdev->dev, "User not allowed to use CP_DMA\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_STOP:
 +                      dev_err(hdev->dev, "User not allowed to use STOP\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_WREG_32:
 +              case PACKET_WREG_BULK:
 +              case PACKET_MSG_LONG:
 +              case PACKET_MSG_SHORT:
 +              case PACKET_REPEAT:
 +              case PACKET_FENCE:
 +              case PACKET_NOP:
 +              case PACKET_ARB_POINT:
 +              case PACKET_LOAD_AND_EXE:
 +                      memcpy(kernel_pkt, user_pkt, pkt_size);
 +                      cb_patched_cur_length += pkt_size;
 +                      break;
 +
 +              default:
 +                      dev_err(hdev->dev, "Invalid packet header 0x%x\n",
 +                              pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              if (rc)
 +                      break;
 +      }
 +
 +      return rc;
 +}
 +
 +static int gaudi_parse_cb_mmu(struct hl_device *hdev,
 +              struct hl_cs_parser *parser)
 +{
 +      u64 handle;
 +      u32 patched_cb_size;
 +      struct hl_cb *user_cb;
 +      int rc;
 +
 +      /*
 +       * The new CB should have space at the end for two MSG_PROT packets:
 +       * 1. Optional NOP padding for cacheline alignment
 +       * 2. A packet that will act as a completion packet
 +       * 3. A packet that will generate MSI interrupt
 +       */
 +      if (parser->completion)
 +              parser->patched_cb_size = parser->user_cb_size +
 +                              gaudi_get_patched_cb_extra_size(parser->user_cb_size);
 +      else
 +              parser->patched_cb_size = parser->user_cb_size;
 +
 +      rc = hl_cb_create(hdev, &hdev->kernel_mem_mgr, hdev->kernel_ctx,
 +                              parser->patched_cb_size, false, false,
 +                              &handle);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate patched CB for DMA CS %d\n",
 +                      rc);
 +              return rc;
 +      }
 +
 +      parser->patched_cb = hl_cb_get(&hdev->kernel_mem_mgr, handle);
 +      /* hl_cb_get should never fail */
 +      if (!parser->patched_cb) {
 +              dev_crit(hdev->dev, "DMA CB handle invalid 0x%llx\n", handle);
 +              rc = -EFAULT;
 +              goto out;
 +      }
 +
 +      /*
 +       * We are protected from overflow because the check
 +       * "parser->user_cb_size <= parser->user_cb->size" was done in get_cb_from_cs_chunk()
 +       * in the common code. That check is done only if is_kernel_allocated_cb is true.
 +       *
 +       * There is no option to reach here without going through that check because:
 +       * 1. validate_queue_index() assigns true to is_kernel_allocated_cb for any submission to
 +       *    an external queue.
 +       * 2. For Gaudi, we only parse CBs that were submitted to the external queues.
 +       */
 +      memcpy(parser->patched_cb->kernel_address,
 +              parser->user_cb->kernel_address,
 +              parser->user_cb_size);
 +
 +      patched_cb_size = parser->patched_cb_size;
 +
 +      /* Validate patched CB instead of user CB */
 +      user_cb = parser->user_cb;
 +      parser->user_cb = parser->patched_cb;
 +      rc = gaudi_validate_cb(hdev, parser, true);
 +      parser->user_cb = user_cb;
 +
 +      if (rc) {
 +              hl_cb_put(parser->patched_cb);
 +              goto out;
 +      }
 +
 +      if (patched_cb_size != parser->patched_cb_size) {
 +              dev_err(hdev->dev, "user CB size mismatch\n");
 +              hl_cb_put(parser->patched_cb);
 +              rc = -EINVAL;
 +              goto out;
 +      }
 +
 +out:
 +      /*
 +       * Always call cb destroy here because we still have 1 reference
 +       * to it by calling cb_get earlier. After the job will be completed,
 +       * cb_put will release it, but here we want to remove it from the
 +       * idr
 +       */
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, handle);
 +
 +      return rc;
 +}
 +
 +static int gaudi_parse_cb_no_mmu(struct hl_device *hdev,
 +              struct hl_cs_parser *parser)
 +{
 +      u64 handle;
 +      int rc;
 +
 +      rc = gaudi_validate_cb(hdev, parser, false);
 +
 +      if (rc)
 +              goto free_userptr;
 +
 +      rc = hl_cb_create(hdev, &hdev->kernel_mem_mgr, hdev->kernel_ctx,
 +                              parser->patched_cb_size, false, false,
 +                              &handle);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate patched CB for DMA CS %d\n", rc);
 +              goto free_userptr;
 +      }
 +
 +      parser->patched_cb = hl_cb_get(&hdev->kernel_mem_mgr, handle);
 +      /* hl_cb_get should never fail here */
 +      if (!parser->patched_cb) {
 +              dev_crit(hdev->dev, "DMA CB handle invalid 0x%llx\n", handle);
 +              rc = -EFAULT;
 +              goto out;
 +      }
 +
 +      rc = gaudi_patch_cb(hdev, parser);
 +
 +      if (rc)
 +              hl_cb_put(parser->patched_cb);
 +
 +out:
 +      /*
 +       * Always call cb destroy here because we still have 1 reference
 +       * to it by calling cb_get earlier. After the job will be completed,
 +       * cb_put will release it, but here we want to remove it from the
 +       * idr
 +       */
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, handle);
 +
 +free_userptr:
 +      if (rc)
 +              hl_userptr_delete_list(hdev, parser->job_userptr_list);
 +      return rc;
 +}
 +
 +static int gaudi_parse_cb_no_ext_queue(struct hl_device *hdev,
 +                                      struct hl_cs_parser *parser)
 +{
 +      struct asic_fixed_properties *asic_prop = &hdev->asic_prop;
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u32 nic_queue_offset, nic_mask_q_id;
 +
 +      if ((parser->hw_queue_id >= GAUDI_QUEUE_ID_NIC_0_0) &&
 +                      (parser->hw_queue_id <= GAUDI_QUEUE_ID_NIC_9_3)) {
 +              nic_queue_offset = parser->hw_queue_id - GAUDI_QUEUE_ID_NIC_0_0;
 +              nic_mask_q_id = 1 << (HW_CAP_NIC_SHIFT + (nic_queue_offset >> 2));
 +
 +              if (!(gaudi->hw_cap_initialized & nic_mask_q_id)) {
 +                      dev_err(hdev->dev, "h/w queue %d is disabled\n", parser->hw_queue_id);
 +                      return -EINVAL;
 +              }
 +      }
 +
 +      /* For internal queue jobs just check if CB address is valid */
 +      if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
 +                                      parser->user_cb_size,
 +                                      asic_prop->sram_user_base_address,
 +                                      asic_prop->sram_end_address))
 +              return 0;
 +
 +      if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
 +                                      parser->user_cb_size,
 +                                      asic_prop->dram_user_base_address,
 +                                      asic_prop->dram_end_address))
 +              return 0;
 +
 +      /* PMMU and HPMMU addresses are equal, check only one of them */
 +      if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
 +                                      parser->user_cb_size,
 +                                      asic_prop->pmmu.start_addr,
 +                                      asic_prop->pmmu.end_addr))
 +              return 0;
 +
 +      dev_err(hdev->dev,
 +              "CB address 0x%px + 0x%x for internal QMAN is not valid\n",
 +              parser->user_cb, parser->user_cb_size);
 +
 +      return -EFAULT;
 +}
 +
 +static int gaudi_cs_parser(struct hl_device *hdev, struct hl_cs_parser *parser)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (parser->queue_type == QUEUE_TYPE_INT)
 +              return gaudi_parse_cb_no_ext_queue(hdev, parser);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_MMU)
 +              return gaudi_parse_cb_mmu(hdev, parser);
 +      else
 +              return gaudi_parse_cb_no_mmu(hdev, parser);
 +}
 +
 +static void gaudi_add_end_of_cb_packets(struct hl_device *hdev, void *kernel_address,
 +                              u32 len, u32 original_len, u64 cq_addr, u32 cq_val,
 +                              u32 msi_vec, bool eb)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct packet_msg_prot *cq_pkt;
 +      struct packet_nop *cq_padding;
 +      u64 msi_addr;
 +      u32 tmp;
 +
 +      cq_padding = kernel_address + original_len;
 +      cq_pkt = kernel_address + len - (sizeof(struct packet_msg_prot) * 2);
 +
 +      while ((void *)cq_padding < (void *)cq_pkt) {
 +              cq_padding->ctl = cpu_to_le32(FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_NOP));
 +              cq_padding++;
 +      }
 +
 +      tmp = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_PROT);
 +      tmp |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      if (eb)
 +              tmp |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 1);
 +
 +      cq_pkt->ctl = cpu_to_le32(tmp);
 +      cq_pkt->value = cpu_to_le32(cq_val);
 +      cq_pkt->addr = cpu_to_le64(cq_addr);
 +
 +      cq_pkt++;
 +
 +      tmp = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_PROT);
 +      tmp |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +      cq_pkt->ctl = cpu_to_le32(tmp);
 +      cq_pkt->value = cpu_to_le32(1);
 +
 +      if (gaudi->multi_msi_mode)
 +              msi_addr = mmPCIE_MSI_INTR_0 + msi_vec * 4;
 +      else
 +              msi_addr = mmPCIE_CORE_MSI_REQ;
 +
 +      cq_pkt->addr = cpu_to_le64(CFG_BASE + msi_addr);
 +}
 +
 +static void gaudi_update_eq_ci(struct hl_device *hdev, u32 val)
 +{
 +      WREG32(mmCPU_IF_EQ_RD_OFFS, val);
 +}
 +
 +static int gaudi_memset_device_memory(struct hl_device *hdev, u64 addr,
 +                                      u32 size, u64 val)
 +{
 +      struct packet_lin_dma *lin_dma_pkt;
 +      struct hl_cs_job *job;
 +      u32 cb_size, ctl, err_cause;
 +      struct hl_cb *cb;
 +      int rc;
 +
 +      cb = hl_cb_kernel_create(hdev, PAGE_SIZE, false);
 +      if (!cb)
 +              return -EFAULT;
 +
 +      lin_dma_pkt = cb->kernel_address;
 +      memset(lin_dma_pkt, 0, sizeof(*lin_dma_pkt));
 +      cb_size = sizeof(*lin_dma_pkt);
 +
 +      ctl = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_LIN_DMA);
 +      ctl |= FIELD_PREP(GAUDI_PKT_LIN_DMA_CTL_MEMSET_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_LIN_DMA_CTL_LIN_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
 +
 +      lin_dma_pkt->ctl = cpu_to_le32(ctl);
 +      lin_dma_pkt->src_addr = cpu_to_le64(val);
 +      lin_dma_pkt->dst_addr |= cpu_to_le64(addr);
 +      lin_dma_pkt->tsize = cpu_to_le32(size);
 +
 +      job = hl_cs_allocate_job(hdev, QUEUE_TYPE_EXT, true);
 +      if (!job) {
 +              dev_err(hdev->dev, "Failed to allocate a new job\n");
 +              rc = -ENOMEM;
 +              goto release_cb;
 +      }
 +
 +      /* Verify DMA is OK */
 +      err_cause = RREG32(mmDMA0_CORE_ERR_CAUSE);
 +      if (err_cause && !hdev->init_done) {
 +              dev_dbg(hdev->dev,
 +                      "Clearing DMA0 engine from errors (cause 0x%x)\n",
 +                      err_cause);
 +              WREG32(mmDMA0_CORE_ERR_CAUSE, err_cause);
 +      }
 +
 +      job->id = 0;
 +      job->user_cb = cb;
 +      atomic_inc(&job->user_cb->cs_cnt);
 +      job->user_cb_size = cb_size;
 +      job->hw_queue_id = GAUDI_QUEUE_ID_DMA_0_0;
 +      job->patched_cb = job->user_cb;
 +      job->job_cb_size = job->user_cb_size + sizeof(struct packet_msg_prot);
 +
 +      hl_debugfs_add_job(hdev, job);
 +
 +      rc = gaudi_send_job_on_qman0(hdev, job);
 +      hl_debugfs_remove_job(hdev, job);
 +      kfree(job);
 +      atomic_dec(&cb->cs_cnt);
 +
 +      /* Verify DMA is OK */
 +      err_cause = RREG32(mmDMA0_CORE_ERR_CAUSE);
 +      if (err_cause) {
 +              dev_err(hdev->dev, "DMA Failed, cause 0x%x\n", err_cause);
 +              rc = -EIO;
 +              if (!hdev->init_done) {
 +                      dev_dbg(hdev->dev,
 +                              "Clearing DMA0 engine from errors (cause 0x%x)\n",
 +                              err_cause);
 +                      WREG32(mmDMA0_CORE_ERR_CAUSE, err_cause);
 +              }
 +      }
 +
 +release_cb:
 +      hl_cb_put(cb);
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
 +
 +      return rc;
 +}
 +
 +static int gaudi_memset_registers(struct hl_device *hdev, u64 reg_base,
 +                                      u32 num_regs, u32 val)
 +{
 +      struct packet_msg_long *pkt;
 +      struct hl_cs_job *job;
 +      u32 cb_size, ctl;
 +      struct hl_cb *cb;
 +      int i, rc;
 +
 +      cb_size = (sizeof(*pkt) * num_regs) + sizeof(struct packet_msg_prot);
 +
 +      if (cb_size > SZ_2M) {
 +              dev_err(hdev->dev, "CB size must be smaller than %uMB", SZ_2M);
 +              return -ENOMEM;
 +      }
 +
 +      cb = hl_cb_kernel_create(hdev, cb_size, false);
 +      if (!cb)
 +              return -EFAULT;
 +
 +      pkt = cb->kernel_address;
 +
 +      ctl = FIELD_PREP(GAUDI_PKT_LONG_CTL_OP_MASK, 0); /* write the value */
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_LONG);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      for (i = 0; i < num_regs ; i++, pkt++) {
 +              pkt->ctl = cpu_to_le32(ctl);
 +              pkt->value = cpu_to_le32(val);
 +              pkt->addr = cpu_to_le64(reg_base + (i * 4));
 +      }
 +
 +      job = hl_cs_allocate_job(hdev, QUEUE_TYPE_EXT, true);
 +      if (!job) {
 +              dev_err(hdev->dev, "Failed to allocate a new job\n");
 +              rc = -ENOMEM;
 +              goto release_cb;
 +      }
 +
 +      job->id = 0;
 +      job->user_cb = cb;
 +      atomic_inc(&job->user_cb->cs_cnt);
 +      job->user_cb_size = cb_size;
 +      job->hw_queue_id = GAUDI_QUEUE_ID_DMA_0_0;
 +      job->patched_cb = job->user_cb;
 +      job->job_cb_size = cb_size;
 +
 +      hl_debugfs_add_job(hdev, job);
 +
 +      rc = gaudi_send_job_on_qman0(hdev, job);
 +      hl_debugfs_remove_job(hdev, job);
 +      kfree(job);
 +      atomic_dec(&cb->cs_cnt);
 +
 +release_cb:
 +      hl_cb_put(cb);
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
 +
 +      return rc;
 +}
 +
 +static int gaudi_restore_sm_registers(struct hl_device *hdev)
 +{
 +      u64 base_addr;
 +      u32 num_regs;
 +      int rc;
 +
 +      base_addr = CFG_BASE + mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0;
 +      num_regs = NUM_OF_SOB_IN_BLOCK;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      base_addr = CFG_BASE +  mmSYNC_MNGR_E_S_SYNC_MNGR_OBJS_SOB_OBJ_0;
 +      num_regs = NUM_OF_SOB_IN_BLOCK;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      base_addr = CFG_BASE +  mmSYNC_MNGR_W_N_SYNC_MNGR_OBJS_SOB_OBJ_0;
 +      num_regs = NUM_OF_SOB_IN_BLOCK;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      base_addr = CFG_BASE +  mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_STATUS_0;
 +      num_regs = NUM_OF_MONITORS_IN_BLOCK;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      base_addr = CFG_BASE +  mmSYNC_MNGR_E_S_SYNC_MNGR_OBJS_MON_STATUS_0;
 +      num_regs = NUM_OF_MONITORS_IN_BLOCK;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      base_addr = CFG_BASE +  mmSYNC_MNGR_W_N_SYNC_MNGR_OBJS_MON_STATUS_0;
 +      num_regs = NUM_OF_MONITORS_IN_BLOCK;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      base_addr = CFG_BASE +  mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 +
 +                      (GAUDI_FIRST_AVAILABLE_W_S_SYNC_OBJECT * 4);
 +      num_regs = NUM_OF_SOB_IN_BLOCK - GAUDI_FIRST_AVAILABLE_W_S_SYNC_OBJECT;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      base_addr = CFG_BASE +  mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_STATUS_0 +
 +                      (GAUDI_FIRST_AVAILABLE_W_S_MONITOR * 4);
 +      num_regs = NUM_OF_MONITORS_IN_BLOCK - GAUDI_FIRST_AVAILABLE_W_S_MONITOR;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      return 0;
 +}
 +
 +static void gaudi_restore_dma_registers(struct hl_device *hdev)
 +{
 +      u32 sob_delta = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_1 -
 +                      mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0;
 +      int i;
 +
 +      for (i = 0 ; i < DMA_NUMBER_OF_CHANNELS ; i++) {
 +              u64 sob_addr = CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0 +
 +                              (i * sob_delta);
 +              u32 dma_offset = i * DMA_CORE_OFFSET;
 +
 +              WREG32(mmDMA0_CORE_WR_COMP_ADDR_LO + dma_offset,
 +                              lower_32_bits(sob_addr));
 +              WREG32(mmDMA0_CORE_WR_COMP_ADDR_HI + dma_offset,
 +                              upper_32_bits(sob_addr));
 +              WREG32(mmDMA0_CORE_WR_COMP_WDATA + dma_offset, 0x80000001);
 +
 +              /* For DMAs 2-7, need to restore WR_AWUSER_31_11 as it can be
 +               * modified by the user for SRAM reduction
 +               */
 +              if (i > 1)
 +                      WREG32(mmDMA0_CORE_WR_AWUSER_31_11 + dma_offset,
 +                                                              0x00000001);
 +      }
 +}
 +
 +static void gaudi_restore_qm_registers(struct hl_device *hdev)
 +{
 +      u32 qman_offset;
 +      int i;
 +
 +      for (i = 0 ; i < DMA_NUMBER_OF_CHANNELS ; i++) {
 +              qman_offset = i * DMA_QMAN_OFFSET;
 +              WREG32(mmDMA0_QM_ARB_CFG_0 + qman_offset, 0);
 +      }
 +
 +      for (i = 0 ; i < MME_NUMBER_OF_MASTER_ENGINES ; i++) {
 +              qman_offset = i * (mmMME2_QM_BASE - mmMME0_QM_BASE);
 +              WREG32(mmMME0_QM_ARB_CFG_0 + qman_offset, 0);
 +      }
 +
 +      for (i = 0 ; i < TPC_NUMBER_OF_ENGINES ; i++) {
 +              qman_offset = i * TPC_QMAN_OFFSET;
 +              WREG32(mmTPC0_QM_ARB_CFG_0 + qman_offset, 0);
 +      }
 +
 +      for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++) {
 +              qman_offset = (i >> 1) * NIC_MACRO_QMAN_OFFSET +
 +                              (i & 0x1) * NIC_ENGINE_QMAN_OFFSET;
 +              WREG32(mmNIC0_QM0_ARB_CFG_0 + qman_offset, 0);
 +      }
 +}
 +
 +static int gaudi_restore_user_registers(struct hl_device *hdev)
 +{
 +      int rc;
 +
 +      rc = gaudi_restore_sm_registers(hdev);
 +      if (rc)
 +              return rc;
 +
 +      gaudi_restore_dma_registers(hdev);
 +      gaudi_restore_qm_registers(hdev);
 +
 +      return 0;
 +}
 +
 +static int gaudi_context_switch(struct hl_device *hdev, u32 asid)
 +{
 +      return 0;
 +}
 +
 +static int gaudi_mmu_clear_pgt_range(struct hl_device *hdev)
 +{
 +      u32 size = hdev->asic_prop.mmu_pgt_size +
 +                      hdev->asic_prop.mmu_cache_mng_size;
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u64 addr = hdev->asic_prop.mmu_pgt_addr;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
 +              return 0;
 +
 +      return gaudi_memset_device_memory(hdev, addr, size, 0);
 +}
 +
 +static void gaudi_restore_phase_topology(struct hl_device *hdev)
 +{
 +
 +}
 +
 +static int gaudi_dma_core_transfer(struct hl_device *hdev, int dma_id, u64 addr,
 +                                      u32 size_to_dma, dma_addr_t dma_addr)
 +{
 +      u32 err_cause, val;
 +      u64 dma_offset;
 +      int rc;
 +
 +      dma_offset = dma_id * DMA_CORE_OFFSET;
 +
 +      WREG32(mmDMA0_CORE_SRC_BASE_LO + dma_offset, lower_32_bits(addr));
 +      WREG32(mmDMA0_CORE_SRC_BASE_HI + dma_offset, upper_32_bits(addr));
 +      WREG32(mmDMA0_CORE_DST_BASE_LO + dma_offset, lower_32_bits(dma_addr));
 +      WREG32(mmDMA0_CORE_DST_BASE_HI + dma_offset, upper_32_bits(dma_addr));
 +      WREG32(mmDMA0_CORE_DST_TSIZE_0 + dma_offset, size_to_dma);
 +      WREG32(mmDMA0_CORE_COMMIT + dma_offset,
 +                      (1 << DMA0_CORE_COMMIT_LIN_SHIFT));
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              mmDMA0_CORE_STS0 + dma_offset,
 +              val,
 +              ((val & DMA0_CORE_STS0_BUSY_MASK) == 0),
 +              0,
 +              1000000);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "DMA %d timed-out during reading of 0x%llx\n",
 +                      dma_id, addr);
 +              return -EIO;
 +      }
 +
 +      /* Verify DMA is OK */
 +      err_cause = RREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset);
 +      if (err_cause) {
 +              dev_err(hdev->dev, "DMA Failed, cause 0x%x\n", err_cause);
 +              dev_dbg(hdev->dev,
 +                      "Clearing DMA0 engine from errors (cause 0x%x)\n",
 +                      err_cause);
 +              WREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset, err_cause);
 +
 +              return -EIO;
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi_debugfs_read_dma(struct hl_device *hdev, u64 addr, u32 size,
 +                              void *blob_addr)
 +{
 +      u32 dma_core_sts0, err_cause, cfg1, size_left, pos, size_to_dma;
 +      u32 qm_glbl_sts0, qm_cgm_sts;
 +      u64 dma_offset, qm_offset;
 +      dma_addr_t dma_addr;
 +      void *kernel_addr;
 +      bool is_eng_idle;
 +      int rc = 0, dma_id;
 +
 +      kernel_addr = hl_asic_dma_alloc_coherent(hdev, SZ_2M, &dma_addr, GFP_KERNEL | __GFP_ZERO);
 +
 +      if (!kernel_addr)
 +              return -ENOMEM;
 +
 +      hdev->asic_funcs->hw_queues_lock(hdev);
 +
 +      dma_id = gaudi_dma_assignment[GAUDI_PCI_DMA_1];
 +      dma_offset = dma_id * DMA_CORE_OFFSET;
 +      qm_offset = dma_id * DMA_QMAN_OFFSET;
 +      dma_core_sts0 = RREG32(mmDMA0_CORE_STS0 + dma_offset);
 +      qm_glbl_sts0 = RREG32(mmDMA0_QM_GLBL_STS0 + qm_offset);
 +      qm_cgm_sts = RREG32(mmDMA0_QM_CGM_STS + qm_offset);
 +      is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts) &&
 +                    IS_DMA_IDLE(dma_core_sts0);
 +
 +      if (!is_eng_idle) {
 +              dma_id = gaudi_dma_assignment[GAUDI_PCI_DMA_2];
 +              dma_offset = dma_id * DMA_CORE_OFFSET;
 +              qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              dma_core_sts0 = RREG32(mmDMA0_CORE_STS0 + dma_offset);
 +              qm_glbl_sts0 = RREG32(mmDMA0_QM_GLBL_STS0 + qm_offset);
 +              qm_cgm_sts = RREG32(mmDMA0_QM_CGM_STS + qm_offset);
 +              is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts) &&
 +                            IS_DMA_IDLE(dma_core_sts0);
 +
 +              if (!is_eng_idle) {
 +                      dev_err_ratelimited(hdev->dev,
 +                              "Can't read via DMA because it is BUSY\n");
 +                      rc = -EAGAIN;
 +                      goto out;
 +              }
 +      }
 +
 +      cfg1 = RREG32(mmDMA0_QM_GLBL_CFG1 + qm_offset);
 +      WREG32(mmDMA0_QM_GLBL_CFG1 + qm_offset,
 +                      0xF << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +
 +      /* TODO: remove this by mapping the DMA temporary buffer to the MMU
 +       * using the compute ctx ASID, if exists. If not, use the kernel ctx
 +       * ASID
 +       */
 +      WREG32_OR(mmDMA0_CORE_PROT + dma_offset, BIT(DMA0_CORE_PROT_VAL_SHIFT));
 +
 +      /* Verify DMA is OK */
 +      err_cause = RREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset);
 +      if (err_cause) {
 +              dev_dbg(hdev->dev,
 +                      "Clearing DMA0 engine from errors (cause 0x%x)\n",
 +                      err_cause);
 +              WREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset, err_cause);
 +      }
 +
 +      pos = 0;
 +      size_left = size;
 +      size_to_dma = SZ_2M;
 +
 +      while (size_left > 0) {
 +
 +              if (size_left < SZ_2M)
 +                      size_to_dma = size_left;
 +
 +              rc = gaudi_dma_core_transfer(hdev, dma_id, addr, size_to_dma,
 +                                              dma_addr);
 +              if (rc)
 +                      break;
 +
 +              memcpy(blob_addr + pos, kernel_addr, size_to_dma);
 +
 +              if (size_left <= SZ_2M)
 +                      break;
 +
 +              pos += SZ_2M;
 +              addr += SZ_2M;
 +              size_left -= SZ_2M;
 +      }
 +
 +      /* TODO: remove this by mapping the DMA temporary buffer to the MMU
 +       * using the compute ctx ASID, if exists. If not, use the kernel ctx
 +       * ASID
 +       */
 +      WREG32_AND(mmDMA0_CORE_PROT + dma_offset,
 +                      ~BIT(DMA0_CORE_PROT_VAL_SHIFT));
 +
 +      WREG32(mmDMA0_QM_GLBL_CFG1 + qm_offset, cfg1);
 +
 +out:
 +      hdev->asic_funcs->hw_queues_unlock(hdev);
 +
 +      hl_asic_dma_free_coherent(hdev, SZ_2M, kernel_addr, dma_addr);
 +
 +      return rc;
 +}
 +
 +static u64 gaudi_read_pte(struct hl_device *hdev, u64 addr)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (hdev->reset_info.hard_reset_pending)
 +              return U64_MAX;
 +
 +      return readq(hdev->pcie_bar[HBM_BAR_ID] +
 +                      (addr - gaudi->hbm_bar_cur_addr));
 +}
 +
 +static void gaudi_write_pte(struct hl_device *hdev, u64 addr, u64 val)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (hdev->reset_info.hard_reset_pending)
 +              return;
 +
 +      writeq(val, hdev->pcie_bar[HBM_BAR_ID] +
 +                      (addr - gaudi->hbm_bar_cur_addr));
 +}
 +
 +void gaudi_mmu_prepare_reg(struct hl_device *hdev, u64 reg, u32 asid)
 +{
 +      /* mask to zero the MMBP and ASID bits */
 +      WREG32_AND(reg, ~0x7FF);
 +      WREG32_OR(reg, asid);
 +}
 +
 +static void gaudi_mmu_prepare(struct hl_device *hdev, u32 asid)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
 +              return;
 +
 +      if (asid & ~DMA0_QM_GLBL_NON_SECURE_PROPS_0_ASID_MASK) {
 +              dev_crit(hdev->dev, "asid %u is too big\n", asid);
 +              return;
 +      }
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA0_CORE_NON_SECURE_PROPS, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA1_CORE_NON_SECURE_PROPS, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA2_CORE_NON_SECURE_PROPS, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA3_CORE_NON_SECURE_PROPS, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA4_CORE_NON_SECURE_PROPS, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA5_CORE_NON_SECURE_PROPS, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA6_CORE_NON_SECURE_PROPS, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA7_CORE_NON_SECURE_PROPS, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC0_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC0_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC1_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC1_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC2_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC2_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC3_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC3_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC4_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC4_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC5_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC5_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC6_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC6_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC7_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC7_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_SBAB_ARUSER0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_SBAB_ARUSER1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME1_SBAB_ARUSER0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME1_SBAB_ARUSER1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_SBAB_ARUSER0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_SBAB_ARUSER1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME3_SBAB_ARUSER0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME3_SBAB_ARUSER1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_ACC_WBC, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME1_ACC_WBC, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_ACC_WBC, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME3_ACC_WBC, asid);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC0) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC1) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC2) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC3) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC4) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC5) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC6) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC7) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC8) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC9) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      gaudi_mmu_prepare_reg(hdev, mmPSOC_GLOBAL_CONF_TRACE_ARUSER, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmPSOC_GLOBAL_CONF_TRACE_AWUSER, asid);
 +}
 +
 +static int gaudi_send_job_on_qman0(struct hl_device *hdev,
 +              struct hl_cs_job *job)
 +{
 +      struct packet_msg_prot *fence_pkt;
 +      u32 *fence_ptr;
 +      dma_addr_t fence_dma_addr;
 +      struct hl_cb *cb;
 +      u32 tmp, timeout, dma_offset;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout = GAUDI_PLDM_QMAN0_TIMEOUT_USEC;
 +      else
 +              timeout = HL_DEVICE_TIMEOUT_USEC;
 +
 +      fence_ptr = hl_asic_dma_pool_zalloc(hdev, 4, GFP_KERNEL, &fence_dma_addr);
 +      if (!fence_ptr) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate fence memory for QMAN0\n");
 +              return -ENOMEM;
 +      }
 +
 +      cb = job->patched_cb;
 +
 +      fence_pkt = cb->kernel_address +
 +                      job->job_cb_size - sizeof(struct packet_msg_prot);
 +
 +      tmp = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_PROT);
 +      tmp |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 1);
 +      tmp |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      fence_pkt->ctl = cpu_to_le32(tmp);
 +      fence_pkt->value = cpu_to_le32(GAUDI_QMAN0_FENCE_VAL);
 +      fence_pkt->addr = cpu_to_le64(fence_dma_addr);
 +
 +      dma_offset = gaudi_dma_assignment[GAUDI_PCI_DMA_1] * DMA_CORE_OFFSET;
 +
 +      WREG32(mmDMA0_CORE_PROT + dma_offset,
 +                      BIT(DMA0_CORE_PROT_ERR_VAL_SHIFT) | BIT(DMA0_CORE_PROT_VAL_SHIFT));
 +
 +      rc = hl_hw_queue_send_cb_no_cmpl(hdev, GAUDI_QUEUE_ID_DMA_0_0,
 +                                      job->job_cb_size, cb->bus_address);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to send CB on QMAN0, %d\n", rc);
 +              goto free_fence_ptr;
 +      }
 +
 +      rc = hl_poll_timeout_memory(hdev, fence_ptr, tmp,
 +                              (tmp == GAUDI_QMAN0_FENCE_VAL), 1000,
 +                              timeout, true);
 +
 +      hl_hw_queue_inc_ci_kernel(hdev, GAUDI_QUEUE_ID_DMA_0_0);
 +
 +      if (rc == -ETIMEDOUT) {
 +              dev_err(hdev->dev, "QMAN0 Job timeout (0x%x)\n", tmp);
 +              goto free_fence_ptr;
 +      }
 +
 +free_fence_ptr:
 +      WREG32(mmDMA0_CORE_PROT + dma_offset, BIT(DMA0_CORE_PROT_ERR_VAL_SHIFT));
 +
 +      hl_asic_dma_pool_free(hdev, (void *) fence_ptr, fence_dma_addr);
 +      return rc;
 +}
 +
 +static void gaudi_get_event_desc(u16 event_type, char *desc, size_t size)
 +{
 +      if (event_type >= GAUDI_EVENT_SIZE)
 +              goto event_not_supported;
 +
 +      if (!gaudi_irq_map_table[event_type].valid)
 +              goto event_not_supported;
 +
 +      snprintf(desc, size, gaudi_irq_map_table[event_type].name);
 +
 +      return;
 +
 +event_not_supported:
 +      snprintf(desc, size, "N/A");
 +}
 +
 +static const char *gaudi_get_razwi_initiator_dma_name(struct hl_device *hdev, u32 x_y,
 +                                                      bool is_write, u16 *engine_id_1,
 +                                                      u16 *engine_id_2)
 +{
 +      u32 dma_id[2], dma_offset, err_cause[2], mask, i;
 +
 +      mask = is_write ? DMA0_CORE_ERR_CAUSE_HBW_WR_ERR_MASK :
 +                              DMA0_CORE_ERR_CAUSE_HBW_RD_ERR_MASK;
 +
 +      switch (x_y) {
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_1:
 +              dma_id[0] = 0;
 +              dma_id[1] = 2;
 +              break;
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_1:
 +              dma_id[0] = 1;
 +              dma_id[1] = 3;
 +              break;
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_1:
 +              dma_id[0] = 4;
 +              dma_id[1] = 6;
 +              break;
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_1:
 +              dma_id[0] = 5;
 +              dma_id[1] = 7;
 +              break;
 +      default:
 +              goto unknown_initiator;
 +      }
 +
 +      for (i = 0 ; i < 2 ; i++) {
 +              dma_offset = dma_id[i] * DMA_CORE_OFFSET;
 +              err_cause[i] = RREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset);
 +      }
 +
 +      switch (x_y) {
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_1:
 +              if ((err_cause[0] & mask) && !(err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_0;
 +                      return "DMA0";
 +              } else if (!(err_cause[0] & mask) && (err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_2;
 +                      return "DMA2";
 +              } else {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_0;
 +                      *engine_id_2 = GAUDI_ENGINE_ID_DMA_2;
 +                      return "DMA0 or DMA2";
 +              }
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_1:
 +              if ((err_cause[0] & mask) && !(err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_1;
 +                      return "DMA1";
 +              } else if (!(err_cause[0] & mask) && (err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_3;
 +                      return "DMA3";
 +              } else {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_1;
 +                      *engine_id_2 = GAUDI_ENGINE_ID_DMA_3;
 +                      return "DMA1 or DMA3";
 +              }
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_1:
 +              if ((err_cause[0] & mask) && !(err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_4;
 +                      return "DMA4";
 +              } else if (!(err_cause[0] & mask) && (err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_6;
 +                      return "DMA6";
 +              } else {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_4;
 +                      *engine_id_2 = GAUDI_ENGINE_ID_DMA_6;
 +                      return "DMA4 or DMA6";
 +              }
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_1:
 +              if ((err_cause[0] & mask) && !(err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_5;
 +                      return "DMA5";
 +              } else if (!(err_cause[0] & mask) && (err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_7;
 +                      return "DMA7";
 +              } else {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_5;
 +                      *engine_id_2 = GAUDI_ENGINE_ID_DMA_7;
 +                      return "DMA5 or DMA7";
 +              }
 +      }
 +
 +unknown_initiator:
 +      return "unknown initiator";
 +}
 +
 +static const char *gaudi_get_razwi_initiator_name(struct hl_device *hdev, bool is_write,
 +                                                      u16 *engine_id_1, u16 *engine_id_2)
 +{
 +      u32 val, x_y, axi_id;
 +
 +      val = is_write ? RREG32(mmMMU_UP_RAZWI_WRITE_ID) :
 +                              RREG32(mmMMU_UP_RAZWI_READ_ID);
 +      x_y = val & ((RAZWI_INITIATOR_Y_MASK << RAZWI_INITIATOR_Y_SHIFT) |
 +                      (RAZWI_INITIATOR_X_MASK << RAZWI_INITIATOR_X_SHIFT));
 +      axi_id = val & (RAZWI_INITIATOR_AXI_ID_MASK <<
 +                      RAZWI_INITIATOR_AXI_ID_SHIFT);
 +
 +      switch (x_y) {
 +      case RAZWI_INITIATOR_ID_X_Y_TPC0_NIC0:
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_TPC)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_TPC_0;
 +                      return "TPC0";
 +              }
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_NIC_0;
 +                      return "NIC0";
 +              }
 +              break;
 +      case RAZWI_INITIATOR_ID_X_Y_TPC1:
 +              *engine_id_1 = GAUDI_ENGINE_ID_TPC_1;
 +              return "TPC1";
 +      case RAZWI_INITIATOR_ID_X_Y_MME0_0:
 +      case RAZWI_INITIATOR_ID_X_Y_MME0_1:
 +              *engine_id_1 = GAUDI_ENGINE_ID_MME_0;
 +              return "MME0";
 +      case RAZWI_INITIATOR_ID_X_Y_MME1_0:
 +      case RAZWI_INITIATOR_ID_X_Y_MME1_1:
 +              *engine_id_1 = GAUDI_ENGINE_ID_MME_1;
 +              return "MME1";
 +      case RAZWI_INITIATOR_ID_X_Y_TPC2:
 +              *engine_id_1 = GAUDI_ENGINE_ID_TPC_2;
 +              return "TPC2";
 +      case RAZWI_INITIATOR_ID_X_Y_TPC3_PCI_CPU_PSOC:
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_TPC)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_TPC_3;
 +                      return "TPC3";
 +              }
 +              /* PCI, CPU or PSOC does not have engine id*/
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_PCI))
 +                      return "PCI";
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_CPU))
 +                      return "CPU";
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_PSOC))
 +                      return "PSOC";
 +              break;
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_1:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_1:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_1:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_1:
 +              return gaudi_get_razwi_initiator_dma_name(hdev, x_y, is_write,
 +                              engine_id_1, engine_id_2);
 +      case RAZWI_INITIATOR_ID_X_Y_TPC4_NIC1_NIC2:
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_TPC)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_TPC_4;
 +                      return "TPC4";
 +              }
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_NIC_1;
 +                      return "NIC1";
 +              }
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC_FT)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_NIC_2;
 +                      return "NIC2";
 +              }
 +              break;
 +      case RAZWI_INITIATOR_ID_X_Y_TPC5:
 +              *engine_id_1 = GAUDI_ENGINE_ID_TPC_5;
 +              return "TPC5";
 +      case RAZWI_INITIATOR_ID_X_Y_MME2_0:
 +      case RAZWI_INITIATOR_ID_X_Y_MME2_1:
 +              *engine_id_1 = GAUDI_ENGINE_ID_MME_2;
 +              return "MME2";
 +      case RAZWI_INITIATOR_ID_X_Y_MME3_0:
 +      case RAZWI_INITIATOR_ID_X_Y_MME3_1:
 +              *engine_id_1 = GAUDI_ENGINE_ID_MME_3;
 +              return "MME3";
 +      case RAZWI_INITIATOR_ID_X_Y_TPC6:
 +              *engine_id_1 = GAUDI_ENGINE_ID_TPC_6;
 +              return "TPC6";
 +      case RAZWI_INITIATOR_ID_X_Y_TPC7_NIC4_NIC5:
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_TPC)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_TPC_7;
 +                      return "TPC7";
 +              }
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_NIC_4;
 +                      return "NIC4";
 +              }
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC_FT)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_NIC_5;
 +                      return "NIC5";
 +              }
 +              break;
 +      default:
 +              break;
 +      }
 +
 +      dev_err(hdev->dev,
 +              "Unknown RAZWI initiator ID 0x%x [Y=%d, X=%d, AXI_ID=%d]\n",
 +              val,
 +              (val >> RAZWI_INITIATOR_Y_SHIFT) & RAZWI_INITIATOR_Y_MASK,
 +              (val >> RAZWI_INITIATOR_X_SHIFT) & RAZWI_INITIATOR_X_MASK,
 +              (val >> RAZWI_INITIATOR_AXI_ID_SHIFT) &
 +                      RAZWI_INITIATOR_AXI_ID_MASK);
 +
 +      return "unknown initiator";
 +}
 +
 +static void gaudi_print_and_get_razwi_info(struct hl_device *hdev, u16 *engine_id_1,
 +                                              u16 *engine_id_2, bool *is_read, bool *is_write)
 +{
 +
 +      if (RREG32(mmMMU_UP_RAZWI_WRITE_VLD)) {
 +              dev_err_ratelimited(hdev->dev,
 +                      "RAZWI event caused by illegal write of %s\n",
 +                      gaudi_get_razwi_initiator_name(hdev, true, engine_id_1, engine_id_2));
 +              WREG32(mmMMU_UP_RAZWI_WRITE_VLD, 0);
 +              *is_write = true;
 +      }
 +
 +      if (RREG32(mmMMU_UP_RAZWI_READ_VLD)) {
 +              dev_err_ratelimited(hdev->dev,
 +                      "RAZWI event caused by illegal read of %s\n",
 +                      gaudi_get_razwi_initiator_name(hdev, false, engine_id_1, engine_id_2));
 +              WREG32(mmMMU_UP_RAZWI_READ_VLD, 0);
 +              *is_read = true;
 +      }
 +}
 +
 +static void gaudi_print_and_get_mmu_error_info(struct hl_device *hdev, u64 *addr, u64 *event_mask)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u32 val;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
 +              return;
 +
 +      val = RREG32(mmMMU_UP_PAGE_ERROR_CAPTURE);
 +      if (val & MMU_UP_PAGE_ERROR_CAPTURE_ENTRY_VALID_MASK) {
 +              *addr = val & MMU_UP_PAGE_ERROR_CAPTURE_VA_49_32_MASK;
 +              *addr <<= 32;
 +              *addr |= RREG32(mmMMU_UP_PAGE_ERROR_CAPTURE_VA);
 +
 +              dev_err_ratelimited(hdev->dev, "MMU page fault on va 0x%llx\n", *addr);
 +              hl_handle_page_fault(hdev, *addr, 0, true, event_mask);
 +
 +              WREG32(mmMMU_UP_PAGE_ERROR_CAPTURE, 0);
 +      }
 +
 +      val = RREG32(mmMMU_UP_ACCESS_ERROR_CAPTURE);
 +      if (val & MMU_UP_ACCESS_ERROR_CAPTURE_ENTRY_VALID_MASK) {
 +              *addr = val & MMU_UP_ACCESS_ERROR_CAPTURE_VA_49_32_MASK;
 +              *addr <<= 32;
 +              *addr |= RREG32(mmMMU_UP_ACCESS_ERROR_CAPTURE_VA);
 +
 +              dev_err_ratelimited(hdev->dev, "MMU access error on va 0x%llx\n", *addr);
 +
 +              WREG32(mmMMU_UP_ACCESS_ERROR_CAPTURE, 0);
 +      }
 +}
 +
 +/*
 + *  +-------------------+------------------------------------------------------+
 + *  | Configuration Reg |                     Description                      |
 + *  |      Address      |                                                      |
 + *  +-------------------+------------------------------------------------------+
 + *  |  0xF30 - 0xF3F    |ECC single error indication (1 bit per memory wrapper)|
 + *  |                   |0xF30 memory wrappers 31:0 (MSB to LSB)               |
 + *  |                   |0xF34 memory wrappers 63:32                           |
 + *  |                   |0xF38 memory wrappers 95:64                           |
 + *  |                   |0xF3C memory wrappers 127:96                          |
 + *  +-------------------+------------------------------------------------------+
 + *  |  0xF40 - 0xF4F    |ECC double error indication (1 bit per memory wrapper)|
 + *  |                   |0xF40 memory wrappers 31:0 (MSB to LSB)               |
 + *  |                   |0xF44 memory wrappers 63:32                           |
 + *  |                   |0xF48 memory wrappers 95:64                           |
 + *  |                   |0xF4C memory wrappers 127:96                          |
 + *  +-------------------+------------------------------------------------------+
 + */
 +static int gaudi_extract_ecc_info(struct hl_device *hdev,
 +              struct ecc_info_extract_params *params, u64 *ecc_address,
 +              u64 *ecc_syndrom, u8 *memory_wrapper_idx)
 +{
 +      u32 i, num_mem_regs, reg, err_bit;
 +      u64 err_addr, err_word = 0;
 +
 +      num_mem_regs = params->num_memories / 32 +
 +                      ((params->num_memories % 32) ? 1 : 0);
 +
 +      if (params->block_address >= CFG_BASE)
 +              params->block_address -= CFG_BASE;
 +
 +      if (params->derr)
 +              err_addr = params->block_address + GAUDI_ECC_DERR0_OFFSET;
 +      else
 +              err_addr = params->block_address + GAUDI_ECC_SERR0_OFFSET;
 +
 +      /* Set invalid wrapper index */
 +      *memory_wrapper_idx = 0xFF;
 +
 +      /* Iterate through memory wrappers, a single bit must be set */
 +      for (i = 0 ; i < num_mem_regs ; i++) {
 +              err_addr += i * 4;
 +              err_word = RREG32(err_addr);
 +              if (err_word) {
 +                      err_bit = __ffs(err_word);
 +                      *memory_wrapper_idx = err_bit + (32 * i);
 +                      break;
 +              }
 +      }
 +
 +      if (*memory_wrapper_idx == 0xFF) {
 +              dev_err(hdev->dev, "ECC error information cannot be found\n");
 +              return -EINVAL;
 +      }
 +
 +      WREG32(params->block_address + GAUDI_ECC_MEM_SEL_OFFSET,
 +                      *memory_wrapper_idx);
 +
 +      *ecc_address =
 +              RREG32(params->block_address + GAUDI_ECC_ADDRESS_OFFSET);
 +      *ecc_syndrom =
 +              RREG32(params->block_address + GAUDI_ECC_SYNDROME_OFFSET);
 +
 +      /* Clear error indication */
 +      reg = RREG32(params->block_address + GAUDI_ECC_MEM_INFO_CLR_OFFSET);
 +      if (params->derr)
 +              reg |= FIELD_PREP(GAUDI_ECC_MEM_INFO_CLR_DERR_MASK, 1);
 +      else
 +              reg |= FIELD_PREP(GAUDI_ECC_MEM_INFO_CLR_SERR_MASK, 1);
 +
 +      WREG32(params->block_address + GAUDI_ECC_MEM_INFO_CLR_OFFSET, reg);
 +
 +      return 0;
 +}
 +
 +/*
 + * gaudi_queue_idx_dec - decrement queue index (pi/ci) and handle wrap
 + *
 + * @idx: the current pi/ci value
 + * @q_len: the queue length (power of 2)
 + *
 + * @return the cyclically decremented index
 + */
 +static inline u32 gaudi_queue_idx_dec(u32 idx, u32 q_len)
 +{
 +      u32 mask = q_len - 1;
 +
 +      /*
 +       * modular decrement is equivalent to adding (queue_size -1)
 +       * later we take LSBs to make sure the value is in the
 +       * range [0, queue_len - 1]
 +       */
 +      return (idx + q_len - 1) & mask;
 +}
 +
 +/**
 + * gaudi_handle_sw_config_stream_data - print SW config stream data
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @stream: the QMAN's stream
 + * @qman_base: base address of QMAN registers block
 + * @event_mask: mask of the last events occurred
 + */
 +static void gaudi_handle_sw_config_stream_data(struct hl_device *hdev, u32 stream,
 +                                              u64 qman_base, u64 event_mask)
 +{
 +      u64 cq_ptr_lo, cq_ptr_hi, cq_tsize, cq_ptr;
 +      u32 cq_ptr_lo_off, size;
 +
 +      cq_ptr_lo_off = mmTPC0_QM_CQ_PTR_LO_1 - mmTPC0_QM_CQ_PTR_LO_0;
 +
 +      cq_ptr_lo = qman_base + (mmTPC0_QM_CQ_PTR_LO_0 - mmTPC0_QM_BASE) +
 +                                              stream * cq_ptr_lo_off;
 +      cq_ptr_hi = cq_ptr_lo +
 +                              (mmTPC0_QM_CQ_PTR_HI_0 - mmTPC0_QM_CQ_PTR_LO_0);
 +      cq_tsize = cq_ptr_lo +
 +                              (mmTPC0_QM_CQ_TSIZE_0 - mmTPC0_QM_CQ_PTR_LO_0);
 +
 +      cq_ptr = (((u64) RREG32(cq_ptr_hi)) << 32) | RREG32(cq_ptr_lo);
 +      size = RREG32(cq_tsize);
 +      dev_info(hdev->dev, "stop on err: stream: %u, addr: %#llx, size: %u\n",
 +                                                      stream, cq_ptr, size);
 +
 +      if (event_mask & HL_NOTIFIER_EVENT_UNDEFINED_OPCODE) {
 +              hdev->captured_err_info.undef_opcode.cq_addr = cq_ptr;
 +              hdev->captured_err_info.undef_opcode.cq_size = size;
 +              hdev->captured_err_info.undef_opcode.stream_id = stream;
 +      }
 +}
 +
 +/**
 + * gaudi_handle_last_pqes_on_err - print last PQEs on error
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @qid_base: first QID of the QMAN (out of 4 streams)
 + * @stream: the QMAN's stream
 + * @qman_base: base address of QMAN registers block
 + * @event_mask: mask of the last events occurred
 + * @pr_sw_conf: if true print the SW config stream data (CQ PTR and SIZE)
 + */
 +static void gaudi_handle_last_pqes_on_err(struct hl_device *hdev, u32 qid_base,
 +                                              u32 stream, u64 qman_base,
 +                                              u64 event_mask,
 +                                              bool pr_sw_conf)
 +{
 +      u32 ci, qm_ci_stream_off, queue_len;
 +      struct hl_hw_queue *q;
 +      u64 pq_ci, addr[PQ_FETCHER_CACHE_SIZE];
 +      int i;
 +
 +      q = &hdev->kernel_queues[qid_base + stream];
 +
 +      qm_ci_stream_off = mmTPC0_QM_PQ_CI_1 - mmTPC0_QM_PQ_CI_0;
 +      pq_ci = qman_base + (mmTPC0_QM_PQ_CI_0 - mmTPC0_QM_BASE) +
 +                                              stream * qm_ci_stream_off;
 +
 +      queue_len = (q->queue_type == QUEUE_TYPE_INT) ?
 +                                      q->int_queue_len : HL_QUEUE_LENGTH;
 +
 +      hdev->asic_funcs->hw_queues_lock(hdev);
 +
 +      if (pr_sw_conf)
 +              gaudi_handle_sw_config_stream_data(hdev, stream, qman_base, event_mask);
 +
 +      ci = RREG32(pq_ci);
 +
 +      /* we should start printing form ci -1 */
 +      ci = gaudi_queue_idx_dec(ci, queue_len);
 +      memset(addr, 0, sizeof(addr));
 +
 +      for (i = 0; i < PQ_FETCHER_CACHE_SIZE; i++) {
 +              struct hl_bd *bd;
 +              u32 len;
 +
 +              bd = q->kernel_address;
 +              bd += ci;
 +
 +              len = le32_to_cpu(bd->len);
 +              /* len 0 means uninitialized entry- break */
 +              if (!len)
 +                      break;
 +
 +              addr[i] = le64_to_cpu(bd->ptr);
 +
 +              dev_info(hdev->dev, "stop on err PQE(stream %u): ci: %u, addr: %#llx, size: %u\n",
 +                                                      stream, ci, addr[i], len);
 +
 +              /* get previous ci, wrap if needed */
 +              ci = gaudi_queue_idx_dec(ci, queue_len);
 +      }
 +
 +      if (event_mask & HL_NOTIFIER_EVENT_UNDEFINED_OPCODE) {
 +              struct undefined_opcode_info *undef_opcode = &hdev->captured_err_info.undef_opcode;
 +              u32 arr_idx = undef_opcode->cb_addr_streams_len;
 +
 +              if (arr_idx == 0) {
 +                      undef_opcode->timestamp = ktime_get();
 +                      undef_opcode->engine_id = gaudi_queue_id_to_engine_id[qid_base];
 +              }
 +
 +              memcpy(undef_opcode->cb_addr_streams[arr_idx], addr, sizeof(addr));
 +              undef_opcode->cb_addr_streams_len++;
 +      }
 +
 +      hdev->asic_funcs->hw_queues_unlock(hdev);
 +}
 +
 +/**
 + * handle_qman_data_on_err - extract QMAN data on error
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @qid_base: first QID of the QMAN (out of 4 streams)
 + * @stream: the QMAN's stream
 + * @qman_base: base address of QMAN registers block
 + * @event_mask: mask of the last events occurred
 + *
 + * This function attempt to exatract as much data as possible on QMAN error.
 + * On upper CP print the SW config stream data and last 8 PQEs.
 + * On lower CP print SW config data and last PQEs of ALL 4 upper CPs
 + */
 +static void handle_qman_data_on_err(struct hl_device *hdev, u32 qid_base,
 +                                 u32 stream, u64 qman_base, u64 event_mask)
 +{
 +      u32 i;
 +
 +      if (stream != QMAN_STREAMS) {
 +              gaudi_handle_last_pqes_on_err(hdev, qid_base, stream,
 +                      qman_base, event_mask, true);
 +              return;
 +      }
 +
 +      /* handle Lower-CP */
 +      gaudi_handle_sw_config_stream_data(hdev, stream, qman_base, event_mask);
 +
 +      for (i = 0; i < QMAN_STREAMS; i++)
 +              gaudi_handle_last_pqes_on_err(hdev, qid_base, i,
 +                      qman_base, event_mask, false);
 +}
 +
 +static void gaudi_handle_qman_err_generic(struct hl_device *hdev,
 +                                        const char *qm_name,
 +                                        u64 qman_base,
 +                                        u32 qid_base,
 +                                        u64 *event_mask)
 +{
 +      u32 i, j, glbl_sts_val, arb_err_val, glbl_sts_clr_val;
 +      u64 glbl_sts_addr, arb_err_addr;
 +      char reg_desc[32];
 +
 +      glbl_sts_addr = qman_base + (mmTPC0_QM_GLBL_STS1_0 - mmTPC0_QM_BASE);
 +      arb_err_addr = qman_base + (mmTPC0_QM_ARB_ERR_CAUSE - mmTPC0_QM_BASE);
 +
 +      /* Iterate through all stream GLBL_STS1 registers + Lower CP */
 +      for (i = 0 ; i < QMAN_STREAMS + 1 ; i++) {
 +              glbl_sts_clr_val = 0;
 +              glbl_sts_val = RREG32(glbl_sts_addr + 4 * i);
 +
 +              if (!glbl_sts_val)
 +                      continue;
 +
 +              if (i == QMAN_STREAMS)
 +                      snprintf(reg_desc, ARRAY_SIZE(reg_desc), "LowerCP");
 +              else
 +                      snprintf(reg_desc, ARRAY_SIZE(reg_desc), "stream%u", i);
 +
 +              for (j = 0 ; j < GAUDI_NUM_OF_QM_ERR_CAUSE ; j++) {
 +                      if (glbl_sts_val & BIT(j)) {
 +                              dev_err_ratelimited(hdev->dev,
 +                                              "%s %s. err cause: %s\n",
 +                                              qm_name, reg_desc,
 +                                              gaudi_qman_error_cause[j]);
 +                              glbl_sts_clr_val |= BIT(j);
 +                      }
 +              }
 +              /* check for undefined opcode */
 +              if (glbl_sts_val & TPC0_QM_GLBL_STS1_CP_UNDEF_CMD_ERR_MASK &&
 +                              hdev->captured_err_info.undef_opcode.write_enable) {
 +                      memset(&hdev->captured_err_info.undef_opcode, 0,
 +                                              sizeof(hdev->captured_err_info.undef_opcode));
 +
 +                      hdev->captured_err_info.undef_opcode.write_enable = false;
 +                      *event_mask |= HL_NOTIFIER_EVENT_UNDEFINED_OPCODE;
 +              }
 +
 +              /* Write 1 clear errors */
 +              if (!hdev->stop_on_err)
 +                      WREG32(glbl_sts_addr + 4 * i, glbl_sts_clr_val);
 +              else
 +                      handle_qman_data_on_err(hdev, qid_base, i, qman_base, *event_mask);
 +      }
 +
 +      arb_err_val = RREG32(arb_err_addr);
 +
 +      if (!arb_err_val)
 +              return;
 +
 +      for (j = 0 ; j < GAUDI_NUM_OF_QM_ARB_ERR_CAUSE ; j++) {
 +              if (arb_err_val & BIT(j)) {
 +                      dev_err_ratelimited(hdev->dev,
 +                                      "%s ARB_ERR. err cause: %s\n",
 +                                      qm_name,
 +                                      gaudi_qman_arb_error_cause[j]);
 +              }
 +      }
 +}
 +
 +static void gaudi_print_sm_sei_info(struct hl_device *hdev, u16 event_type,
 +              struct hl_eq_sm_sei_data *sei_data)
 +{
 +      u32 index = event_type - GAUDI_EVENT_DMA_IF_SEI_0;
 +
 +      /* Flip the bits as the enum is ordered in the opposite way */
 +      index = (index ^ 0x3) & 0x3;
 +
 +      switch (sei_data->sei_cause) {
 +      case SM_SEI_SO_OVERFLOW:
 +              dev_err_ratelimited(hdev->dev,
 +                      "%s SEI Error: SOB Group %u overflow/underflow",
 +                      gaudi_sync_manager_names[index],
 +                      le32_to_cpu(sei_data->sei_log));
 +              break;
 +      case SM_SEI_LBW_4B_UNALIGNED:
 +              dev_err_ratelimited(hdev->dev,
 +                      "%s SEI Error: Unaligned 4B LBW access, monitor agent address low - %#x",
 +                      gaudi_sync_manager_names[index],
 +                      le32_to_cpu(sei_data->sei_log));
 +              break;
 +      case SM_SEI_AXI_RESPONSE_ERR:
 +              dev_err_ratelimited(hdev->dev,
 +                      "%s SEI Error: AXI ID %u response error",
 +                      gaudi_sync_manager_names[index],
 +                      le32_to_cpu(sei_data->sei_log));
 +              break;
 +      default:
 +              dev_err_ratelimited(hdev->dev, "Unknown SM SEI cause %u",
 +                              le32_to_cpu(sei_data->sei_log));
 +              break;
 +      }
 +}
 +
 +static void gaudi_handle_ecc_event(struct hl_device *hdev, u16 event_type,
 +              struct hl_eq_ecc_data *ecc_data)
 +{
 +      struct ecc_info_extract_params params;
 +      u64 ecc_address = 0, ecc_syndrom = 0;
 +      u8 index, memory_wrapper_idx = 0;
 +      bool extract_info_from_fw;
 +      int rc;
 +
 +      if (hdev->asic_prop.fw_security_enabled) {
 +              extract_info_from_fw = true;
 +              goto extract_ecc_info;
 +      }
 +
 +      switch (event_type) {
 +      case GAUDI_EVENT_PCIE_CORE_SERR ... GAUDI_EVENT_PCIE_PHY_DERR:
 +      case GAUDI_EVENT_DMA0_SERR_ECC ... GAUDI_EVENT_MMU_DERR:
 +              extract_info_from_fw = true;
 +              break;
 +      case GAUDI_EVENT_TPC0_SERR ... GAUDI_EVENT_TPC7_SERR:
 +              index = event_type - GAUDI_EVENT_TPC0_SERR;
 +              params.block_address = mmTPC0_CFG_BASE + index * TPC_CFG_OFFSET;
 +              params.num_memories = 90;
 +              params.derr = false;
 +              extract_info_from_fw = false;
 +              break;
 +      case GAUDI_EVENT_TPC0_DERR ... GAUDI_EVENT_TPC7_DERR:
 +              index = event_type - GAUDI_EVENT_TPC0_DERR;
 +              params.block_address =
 +                      mmTPC0_CFG_BASE + index * TPC_CFG_OFFSET;
 +              params.num_memories = 90;
 +              params.derr = true;
 +              extract_info_from_fw = false;
 +              break;
 +      case GAUDI_EVENT_MME0_ACC_SERR:
 +      case GAUDI_EVENT_MME1_ACC_SERR:
 +      case GAUDI_EVENT_MME2_ACC_SERR:
 +      case GAUDI_EVENT_MME3_ACC_SERR:
 +              index = (event_type - GAUDI_EVENT_MME0_ACC_SERR) / 4;
 +              params.block_address = mmMME0_ACC_BASE + index * MME_ACC_OFFSET;
 +              params.num_memories = 128;
 +              params.derr = false;
 +              extract_info_from_fw = false;
 +              break;
 +      case GAUDI_EVENT_MME0_ACC_DERR:
 +      case GAUDI_EVENT_MME1_ACC_DERR:
 +      case GAUDI_EVENT_MME2_ACC_DERR:
 +      case GAUDI_EVENT_MME3_ACC_DERR:
 +              index = (event_type - GAUDI_EVENT_MME0_ACC_DERR) / 4;
 +              params.block_address = mmMME0_ACC_BASE + index * MME_ACC_OFFSET;
 +              params.num_memories = 128;
 +              params.derr = true;
 +              extract_info_from_fw = false;
 +              break;
 +      case GAUDI_EVENT_MME0_SBAB_SERR:
 +      case GAUDI_EVENT_MME1_SBAB_SERR:
 +      case GAUDI_EVENT_MME2_SBAB_SERR:
 +      case GAUDI_EVENT_MME3_SBAB_SERR:
 +              index = (event_type - GAUDI_EVENT_MME0_SBAB_SERR) / 4;
 +              params.block_address =
 +                      mmMME0_SBAB_BASE + index * MME_ACC_OFFSET;
 +              params.num_memories = 33;
 +              params.derr = false;
 +              extract_info_from_fw = false;
 +              break;
 +      case GAUDI_EVENT_MME0_SBAB_DERR:
 +      case GAUDI_EVENT_MME1_SBAB_DERR:
 +      case GAUDI_EVENT_MME2_SBAB_DERR:
 +      case GAUDI_EVENT_MME3_SBAB_DERR:
 +              index = (event_type - GAUDI_EVENT_MME0_SBAB_DERR) / 4;
 +              params.block_address =
 +                      mmMME0_SBAB_BASE + index * MME_ACC_OFFSET;
 +              params.num_memories = 33;
 +              params.derr = true;
 +              extract_info_from_fw = false;
 +              break;
 +      default:
 +              return;
 +      }
 +
 +extract_ecc_info:
 +      if (extract_info_from_fw) {
 +              ecc_address = le64_to_cpu(ecc_data->ecc_address);
 +              ecc_syndrom = le64_to_cpu(ecc_data->ecc_syndrom);
 +              memory_wrapper_idx = ecc_data->memory_wrapper_idx;
 +      } else {
 +              rc = gaudi_extract_ecc_info(hdev, &params, &ecc_address,
 +                              &ecc_syndrom, &memory_wrapper_idx);
 +              if (rc)
 +                      return;
 +      }
 +
 +      dev_err(hdev->dev,
 +              "ECC error detected. address: %#llx. Syndrom: %#llx. block id %u\n",
 +              ecc_address, ecc_syndrom, memory_wrapper_idx);
 +}
 +
 +static void gaudi_handle_qman_err(struct hl_device *hdev, u16 event_type, u64 *event_mask)
 +{
 +      u64 qman_base;
 +      char desc[32];
 +      u32 qid_base;
 +      u8 index;
 +
 +      switch (event_type) {
 +      case GAUDI_EVENT_TPC0_QM ... GAUDI_EVENT_TPC7_QM:
 +              index = event_type - GAUDI_EVENT_TPC0_QM;
 +              qid_base = GAUDI_QUEUE_ID_TPC_0_0 + index * QMAN_STREAMS;
 +              qman_base = mmTPC0_QM_BASE + index * TPC_QMAN_OFFSET;
 +              snprintf(desc, ARRAY_SIZE(desc), "%s%d", "TPC_QM", index);
 +              break;
 +      case GAUDI_EVENT_MME0_QM ... GAUDI_EVENT_MME2_QM:
 +              if (event_type == GAUDI_EVENT_MME0_QM) {
 +                      index = 0;
 +                      qid_base = GAUDI_QUEUE_ID_MME_0_0;
 +              } else { /* event_type == GAUDI_EVENT_MME2_QM */
 +                      index = 2;
 +                      qid_base = GAUDI_QUEUE_ID_MME_1_0;
 +              }
 +              qman_base = mmMME0_QM_BASE + index * MME_QMAN_OFFSET;
 +              snprintf(desc, ARRAY_SIZE(desc), "%s%d", "MME_QM", index);
 +              break;
 +      case GAUDI_EVENT_DMA0_QM ... GAUDI_EVENT_DMA7_QM:
 +              index = event_type - GAUDI_EVENT_DMA0_QM;
 +              qid_base = GAUDI_QUEUE_ID_DMA_0_0 + index * QMAN_STREAMS;
 +              /* skip GAUDI_QUEUE_ID_CPU_PQ if necessary */
 +              if (index > 1)
 +                      qid_base++;
 +              qman_base = mmDMA0_QM_BASE + index * DMA_QMAN_OFFSET;
 +              snprintf(desc, ARRAY_SIZE(desc), "%s%d", "DMA_QM", index);
 +              break;
 +      case GAUDI_EVENT_NIC0_QM0:
 +              qid_base = GAUDI_QUEUE_ID_NIC_0_0;
 +              qman_base = mmNIC0_QM0_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC0_QM0");
 +              break;
 +      case GAUDI_EVENT_NIC0_QM1:
 +              qid_base = GAUDI_QUEUE_ID_NIC_1_0;
 +              qman_base = mmNIC0_QM1_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC0_QM1");
 +              break;
 +      case GAUDI_EVENT_NIC1_QM0:
 +              qid_base = GAUDI_QUEUE_ID_NIC_2_0;
 +              qman_base = mmNIC1_QM0_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC1_QM0");
 +              break;
 +      case GAUDI_EVENT_NIC1_QM1:
 +              qid_base = GAUDI_QUEUE_ID_NIC_3_0;
 +              qman_base = mmNIC1_QM1_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC1_QM1");
 +              break;
 +      case GAUDI_EVENT_NIC2_QM0:
 +              qid_base = GAUDI_QUEUE_ID_NIC_4_0;
 +              qman_base = mmNIC2_QM0_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC2_QM0");
 +              break;
 +      case GAUDI_EVENT_NIC2_QM1:
 +              qid_base = GAUDI_QUEUE_ID_NIC_5_0;
 +              qman_base = mmNIC2_QM1_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC2_QM1");
 +              break;
 +      case GAUDI_EVENT_NIC3_QM0:
 +              qid_base = GAUDI_QUEUE_ID_NIC_6_0;
 +              qman_base = mmNIC3_QM0_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC3_QM0");
 +              break;
 +      case GAUDI_EVENT_NIC3_QM1:
 +              qid_base = GAUDI_QUEUE_ID_NIC_7_0;
 +              qman_base = mmNIC3_QM1_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC3_QM1");
 +              break;
 +      case GAUDI_EVENT_NIC4_QM0:
 +              qid_base = GAUDI_QUEUE_ID_NIC_8_0;
 +              qman_base = mmNIC4_QM0_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC4_QM0");
 +              break;
 +      case GAUDI_EVENT_NIC4_QM1:
 +              qid_base = GAUDI_QUEUE_ID_NIC_9_0;
 +              qman_base = mmNIC4_QM1_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC4_QM1");
 +              break;
 +      default:
 +              return;
 +      }
 +
 +      gaudi_handle_qman_err_generic(hdev, desc, qman_base, qid_base, event_mask);
 +}
 +
 +static void gaudi_print_irq_info(struct hl_device *hdev, u16 event_type,
 +                                      bool razwi, u64 *event_mask)
 +{
 +      bool is_read = false, is_write = false;
 +      u16 engine_id[2], num_of_razwi_eng = 0;
 +      char desc[64] = "";
 +      u64 razwi_addr = 0;
 +      u8 razwi_flags = 0;
 +
 +      /*
 +       * Init engine id by default as not valid and only if razwi initiated from engine with
 +       * engine id it will get valid value.
 +       */
 +      engine_id[0] = HL_RAZWI_NA_ENG_ID;
 +      engine_id[1] = HL_RAZWI_NA_ENG_ID;
 +
 +      gaudi_get_event_desc(event_type, desc, sizeof(desc));
 +      dev_err_ratelimited(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n",
 +              event_type, desc);
 +
 +      if (razwi) {
 +              gaudi_print_and_get_razwi_info(hdev, &engine_id[0], &engine_id[1], &is_read,
 +                                              &is_write);
 +              gaudi_print_and_get_mmu_error_info(hdev, &razwi_addr, event_mask);
 +
 +              if (is_read)
 +                      razwi_flags |= HL_RAZWI_READ;
 +              if (is_write)
 +                      razwi_flags |= HL_RAZWI_WRITE;
 +
 +              if (engine_id[0] != HL_RAZWI_NA_ENG_ID) {
 +                      if (engine_id[1] != HL_RAZWI_NA_ENG_ID)
 +                              num_of_razwi_eng = 2;
 +                      else
 +                              num_of_razwi_eng = 1;
 +              }
 +
 +              hl_handle_razwi(hdev, razwi_addr, engine_id, num_of_razwi_eng, razwi_flags,
 +                              event_mask);
 +      }
 +}
 +
 +static void gaudi_print_out_of_sync_info(struct hl_device *hdev,
 +                                      struct cpucp_pkt_sync_err *sync_err)
 +{
 +      struct hl_hw_queue *q = &hdev->kernel_queues[GAUDI_QUEUE_ID_CPU_PQ];
 +
 +      dev_err(hdev->dev, "Out of sync with FW, FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n",
 +              le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci));
 +}
 +
 +static void gaudi_print_fw_alive_info(struct hl_device *hdev,
 +                                      struct hl_eq_fw_alive *fw_alive)
 +{
 +      dev_err(hdev->dev,
 +              "FW alive report: severity=%s, process_id=%u, thread_id=%u, uptime=%llu seconds\n",
 +              (fw_alive->severity == FW_ALIVE_SEVERITY_MINOR) ? "Minor" : "Critical",
 +              le32_to_cpu(fw_alive->process_id),
 +              le32_to_cpu(fw_alive->thread_id),
 +              le64_to_cpu(fw_alive->uptime_seconds));
 +}
 +
 +static void gaudi_print_nic_axi_irq_info(struct hl_device *hdev, u16 event_type,
 +                                              void *data)
 +{
 +      char desc[64] = "", *type;
 +      struct eq_nic_sei_event *eq_nic_sei = data;
 +      u16 nic_id = event_type - GAUDI_EVENT_NIC_SEI_0;
 +
 +      switch (eq_nic_sei->axi_error_cause) {
 +      case RXB:
 +              type = "RXB";
 +              break;
 +      case RXE:
 +              type = "RXE";
 +              break;
 +      case TXS:
 +              type = "TXS";
 +              break;
 +      case TXE:
 +              type = "TXE";
 +              break;
 +      case QPC_RESP:
 +              type = "QPC_RESP";
 +              break;
 +      case NON_AXI_ERR:
 +              type = "NON_AXI_ERR";
 +              break;
 +      case TMR:
 +              type = "TMR";
 +              break;
 +      default:
 +              dev_err(hdev->dev, "unknown NIC AXI cause %d\n",
 +                      eq_nic_sei->axi_error_cause);
 +              type = "N/A";
 +              break;
 +      }
 +
 +      snprintf(desc, sizeof(desc), "NIC%d_%s%d", nic_id, type,
 +                      eq_nic_sei->id);
 +      dev_err_ratelimited(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n",
 +              event_type, desc);
 +}
 +
 +static int gaudi_compute_reset_late_init(struct hl_device *hdev)
 +{
 +      /* GAUDI doesn't support any reset except hard-reset */
 +      return -EPERM;
 +}
 +
 +static int gaudi_hbm_read_interrupts(struct hl_device *hdev, int device,
 +                      struct hl_eq_hbm_ecc_data *hbm_ecc_data)
 +{
 +      u32 base, val, val2, wr_par, rd_par, ca_par, derr, serr, type, ch;
 +      int rc = 0;
 +
 +      if (hdev->asic_prop.fw_app_cpu_boot_dev_sts0 &
 +                                      CPU_BOOT_DEV_STS0_HBM_ECC_EN) {
 +              if (!hbm_ecc_data) {
 +                      dev_err(hdev->dev, "No FW ECC data");
 +                      return 0;
 +              }
 +
 +              wr_par = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_WR_PAR_MASK,
 +                              le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
 +              rd_par = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_RD_PAR_MASK,
 +                              le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
 +              ca_par = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_CA_PAR_MASK,
 +                              le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
 +              derr = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_DERR_MASK,
 +                              le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
 +              serr = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_SERR_MASK,
 +                              le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
 +              type = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_TYPE_MASK,
 +                              le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
 +              ch = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_HBM_CH_MASK,
 +                              le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
 +
 +              dev_err(hdev->dev,
 +                      "HBM%d pc%d interrupts info: WR_PAR=%d, RD_PAR=%d, CA_PAR=%d, SERR=%d, DERR=%d\n",
 +                      device, ch, wr_par, rd_par, ca_par, serr, derr);
 +              dev_err(hdev->dev,
 +                      "HBM%d pc%d ECC info: 1ST_ERR_ADDR=0x%x, 1ST_ERR_TYPE=%d, SEC_CONT_CNT=%u, SEC_CNT=%d, DEC_CNT=%d\n",
 +                      device, ch, hbm_ecc_data->first_addr, type,
 +                      hbm_ecc_data->sec_cont_cnt, hbm_ecc_data->sec_cnt,
 +                      hbm_ecc_data->dec_cnt);
 +              return 0;
 +      }
 +
 +      if (hdev->asic_prop.fw_security_enabled) {
 +              dev_info(hdev->dev, "Cannot access MC regs for ECC data while security is enabled\n");
 +              return 0;
 +      }
 +
 +      base = GAUDI_HBM_CFG_BASE + device * GAUDI_HBM_CFG_OFFSET;
 +      for (ch = 0 ; ch < GAUDI_HBM_CHANNELS ; ch++) {
 +              val = RREG32_MASK(base + ch * 0x1000 + 0x06C, 0x0000FFFF);
 +              val = (val & 0xFF) | ((val >> 8) & 0xFF);
 +              if (val) {
 +                      rc = -EIO;
 +                      dev_err(hdev->dev,
 +                              "HBM%d pc%d interrupts info: WR_PAR=%d, RD_PAR=%d, CA_PAR=%d, SERR=%d, DERR=%d\n",
 +                              device, ch * 2, val & 0x1, (val >> 1) & 0x1,
 +                              (val >> 2) & 0x1, (val >> 3) & 0x1,
 +                              (val >> 4) & 0x1);
 +
 +                      val2 = RREG32(base + ch * 0x1000 + 0x060);
 +                      dev_err(hdev->dev,
 +                              "HBM%d pc%d ECC info: 1ST_ERR_ADDR=0x%x, 1ST_ERR_TYPE=%d, SEC_CONT_CNT=%d, SEC_CNT=%d, DEC_CNT=%d\n",
 +                              device, ch * 2,
 +                              RREG32(base + ch * 0x1000 + 0x064),
 +                              (val2 & 0x200) >> 9, (val2 & 0xFC00) >> 10,
 +                              (val2 & 0xFF0000) >> 16,
 +                              (val2 & 0xFF000000) >> 24);
 +              }
 +
 +              val = RREG32_MASK(base + ch * 0x1000 + 0x07C, 0x0000FFFF);
 +              val = (val & 0xFF) | ((val >> 8) & 0xFF);
 +              if (val) {
 +                      rc = -EIO;
 +                      dev_err(hdev->dev,
 +                              "HBM%d pc%d interrupts info: WR_PAR=%d, RD_PAR=%d, CA_PAR=%d, SERR=%d, DERR=%d\n",
 +                              device, ch * 2 + 1, val & 0x1, (val >> 1) & 0x1,
 +                              (val >> 2) & 0x1, (val >> 3) & 0x1,
 +                              (val >> 4) & 0x1);
 +
 +                      val2 = RREG32(base + ch * 0x1000 + 0x070);
 +                      dev_err(hdev->dev,
 +                              "HBM%d pc%d ECC info: 1ST_ERR_ADDR=0x%x, 1ST_ERR_TYPE=%d, SEC_CONT_CNT=%d, SEC_CNT=%d, DEC_CNT=%d\n",
 +                              device, ch * 2 + 1,
 +                              RREG32(base + ch * 0x1000 + 0x074),
 +                              (val2 & 0x200) >> 9, (val2 & 0xFC00) >> 10,
 +                              (val2 & 0xFF0000) >> 16,
 +                              (val2 & 0xFF000000) >> 24);
 +              }
 +
 +              /* Clear interrupts */
 +              RMWREG32(base + (ch * 0x1000) + 0x060, 0x1C8, 0x1FF);
 +              RMWREG32(base + (ch * 0x1000) + 0x070, 0x1C8, 0x1FF);
 +              WREG32(base + (ch * 0x1000) + 0x06C, 0x1F1F);
 +              WREG32(base + (ch * 0x1000) + 0x07C, 0x1F1F);
 +              RMWREG32(base + (ch * 0x1000) + 0x060, 0x0, 0xF);
 +              RMWREG32(base + (ch * 0x1000) + 0x070, 0x0, 0xF);
 +      }
 +
 +      val  = RREG32(base + 0x8F30);
 +      val2 = RREG32(base + 0x8F34);
 +      if (val | val2) {
 +              rc = -EIO;
 +              dev_err(hdev->dev,
 +                      "HBM %d MC SRAM SERR info: Reg 0x8F30=0x%x, Reg 0x8F34=0x%x\n",
 +                      device, val, val2);
 +      }
 +      val  = RREG32(base + 0x8F40);
 +      val2 = RREG32(base + 0x8F44);
 +      if (val | val2) {
 +              rc = -EIO;
 +              dev_err(hdev->dev,
 +                      "HBM %d MC SRAM DERR info: Reg 0x8F40=0x%x, Reg 0x8F44=0x%x\n",
 +                      device, val, val2);
 +      }
 +
 +      return rc;
 +}
 +
 +static int gaudi_hbm_event_to_dev(u16 hbm_event_type)
 +{
 +      switch (hbm_event_type) {
 +      case GAUDI_EVENT_HBM0_SPI_0:
 +      case GAUDI_EVENT_HBM0_SPI_1:
 +              return 0;
 +      case GAUDI_EVENT_HBM1_SPI_0:
 +      case GAUDI_EVENT_HBM1_SPI_1:
 +              return 1;
 +      case GAUDI_EVENT_HBM2_SPI_0:
 +      case GAUDI_EVENT_HBM2_SPI_1:
 +              return 2;
 +      case GAUDI_EVENT_HBM3_SPI_0:
 +      case GAUDI_EVENT_HBM3_SPI_1:
 +              return 3;
 +      default:
 +              break;
 +      }
 +
 +      /* Should never happen */
 +      return 0;
 +}
 +
 +static bool gaudi_tpc_read_interrupts(struct hl_device *hdev, u8 tpc_id,
 +                                      char *interrupt_name)
 +{
 +      u32 tpc_offset = tpc_id * TPC_CFG_OFFSET, tpc_interrupts_cause, i;
 +      bool soft_reset_required = false;
 +
 +      tpc_interrupts_cause = RREG32(mmTPC0_CFG_TPC_INTR_CAUSE + tpc_offset) &
 +                              TPC0_CFG_TPC_INTR_CAUSE_CAUSE_MASK;
 +
 +      for (i = 0 ; i < GAUDI_NUM_OF_TPC_INTR_CAUSE ; i++)
 +              if (tpc_interrupts_cause & BIT(i)) {
 +                      dev_err_ratelimited(hdev->dev,
 +                                      "TPC%d_%s interrupt cause: %s\n",
 +                                      tpc_id, interrupt_name,
 +                                      gaudi_tpc_interrupts_cause[i]);
 +                      /* If this is QM error, we need to soft-reset */
 +                      if (i == 15)
 +                              soft_reset_required = true;
 +              }
 +
 +      /* Clear interrupts */
 +      WREG32(mmTPC0_CFG_TPC_INTR_CAUSE + tpc_offset, 0);
 +
 +      return soft_reset_required;
 +}
 +
 +static int tpc_dec_event_to_tpc_id(u16 tpc_dec_event_type)
 +{
 +      return (tpc_dec_event_type - GAUDI_EVENT_TPC0_DEC) >> 1;
 +}
 +
 +static int tpc_krn_event_to_tpc_id(u16 tpc_dec_event_type)
 +{
 +      return (tpc_dec_event_type - GAUDI_EVENT_TPC0_KRN_ERR) / 6;
 +}
 +
 +static void gaudi_print_clk_change_info(struct hl_device *hdev, u16 event_type, u64 *event_mask)
 +{
 +      ktime_t zero_time = ktime_set(0, 0);
 +
 +      mutex_lock(&hdev->clk_throttling.lock);
 +
 +      switch (event_type) {
 +      case GAUDI_EVENT_FIX_POWER_ENV_S:
 +              hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].start = ktime_get();
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = zero_time;
 +              dev_info_ratelimited(hdev->dev,
 +                      "Clock throttling due to power consumption\n");
 +              break;
 +
 +      case GAUDI_EVENT_FIX_POWER_ENV_E:
 +              hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = ktime_get();
 +              dev_info_ratelimited(hdev->dev,
 +                      "Power envelop is safe, back to optimal clock\n");
 +              break;
 +
 +      case GAUDI_EVENT_FIX_THERMAL_ENV_S:
 +              hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].start = ktime_get();
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = zero_time;
 +              *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              dev_info_ratelimited(hdev->dev,
 +                      "Clock throttling due to overheating\n");
 +              break;
 +
 +      case GAUDI_EVENT_FIX_THERMAL_ENV_E:
 +              hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = ktime_get();
 +              *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              dev_info_ratelimited(hdev->dev,
 +                      "Thermal envelop is safe, back to optimal clock\n");
 +              break;
 +
 +      default:
 +              dev_err(hdev->dev, "Received invalid clock change event %d\n",
 +                      event_type);
 +              break;
 +      }
 +
 +      mutex_unlock(&hdev->clk_throttling.lock);
 +}
 +
 +static void gaudi_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entry)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u64 data = le64_to_cpu(eq_entry->data[0]), event_mask = 0;
 +      u32 ctl = le32_to_cpu(eq_entry->hdr.ctl);
 +      u32 fw_fatal_err_flag = 0, flags = 0;
 +      u16 event_type = ((ctl & EQ_CTL_EVENT_TYPE_MASK)
 +                      >> EQ_CTL_EVENT_TYPE_SHIFT);
 +      bool reset_required, reset_direct = false;
 +      u8 cause;
 +      int rc;
 +
 +      if (event_type >= GAUDI_EVENT_SIZE) {
 +              dev_err(hdev->dev, "Event type %u exceeds maximum of %u",
 +                              event_type, GAUDI_EVENT_SIZE - 1);
 +              return;
 +      }
 +
 +      gaudi->events_stat[event_type]++;
 +      gaudi->events_stat_aggregate[event_type]++;
 +
 +      switch (event_type) {
 +      case GAUDI_EVENT_PCIE_CORE_DERR:
 +      case GAUDI_EVENT_PCIE_IF_DERR:
 +      case GAUDI_EVENT_PCIE_PHY_DERR:
 +      case GAUDI_EVENT_TPC0_DERR ... GAUDI_EVENT_TPC7_DERR:
 +      case GAUDI_EVENT_MME0_ACC_DERR:
 +      case GAUDI_EVENT_MME0_SBAB_DERR:
 +      case GAUDI_EVENT_MME1_ACC_DERR:
 +      case GAUDI_EVENT_MME1_SBAB_DERR:
 +      case GAUDI_EVENT_MME2_ACC_DERR:
 +      case GAUDI_EVENT_MME2_SBAB_DERR:
 +      case GAUDI_EVENT_MME3_ACC_DERR:
 +      case GAUDI_EVENT_MME3_SBAB_DERR:
 +      case GAUDI_EVENT_DMA0_DERR_ECC ... GAUDI_EVENT_DMA7_DERR_ECC:
 +              fallthrough;
 +      case GAUDI_EVENT_CPU_IF_ECC_DERR:
 +      case GAUDI_EVENT_PSOC_MEM_DERR:
 +      case GAUDI_EVENT_PSOC_CORESIGHT_DERR:
 +      case GAUDI_EVENT_SRAM0_DERR ... GAUDI_EVENT_SRAM28_DERR:
 +      case GAUDI_EVENT_NIC0_DERR ... GAUDI_EVENT_NIC4_DERR:
 +      case GAUDI_EVENT_DMA_IF0_DERR ... GAUDI_EVENT_DMA_IF3_DERR:
 +      case GAUDI_EVENT_HBM_0_DERR ... GAUDI_EVENT_HBM_3_DERR:
 +      case GAUDI_EVENT_MMU_DERR:
 +      case GAUDI_EVENT_NIC0_CS_DBG_DERR ... GAUDI_EVENT_NIC4_CS_DBG_DERR:
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              gaudi_handle_ecc_event(hdev, event_type, &eq_entry->ecc_data);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              fw_fatal_err_flag = HL_DRV_RESET_FW_FATAL_ERR;
 +              goto reset_device;
 +
 +      case GAUDI_EVENT_GIC500:
 +      case GAUDI_EVENT_AXI_ECC:
 +      case GAUDI_EVENT_L2_RAM_ECC:
 +      case GAUDI_EVENT_PLL0 ... GAUDI_EVENT_PLL17:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              fw_fatal_err_flag = HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              goto reset_device;
 +
 +      case GAUDI_EVENT_HBM0_SPI_0:
 +      case GAUDI_EVENT_HBM1_SPI_0:
 +      case GAUDI_EVENT_HBM2_SPI_0:
 +      case GAUDI_EVENT_HBM3_SPI_0:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              gaudi_hbm_read_interrupts(hdev,
 +                              gaudi_hbm_event_to_dev(event_type),
 +                              &eq_entry->hbm_ecc_data);
 +              fw_fatal_err_flag = HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              goto reset_device;
 +
 +      case GAUDI_EVENT_HBM0_SPI_1:
 +      case GAUDI_EVENT_HBM1_SPI_1:
 +      case GAUDI_EVENT_HBM2_SPI_1:
 +      case GAUDI_EVENT_HBM3_SPI_1:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              gaudi_hbm_read_interrupts(hdev,
 +                              gaudi_hbm_event_to_dev(event_type),
 +                              &eq_entry->hbm_ecc_data);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI_EVENT_TPC0_DEC:
 +      case GAUDI_EVENT_TPC1_DEC:
 +      case GAUDI_EVENT_TPC2_DEC:
 +      case GAUDI_EVENT_TPC3_DEC:
 +      case GAUDI_EVENT_TPC4_DEC:
 +      case GAUDI_EVENT_TPC5_DEC:
 +      case GAUDI_EVENT_TPC6_DEC:
 +      case GAUDI_EVENT_TPC7_DEC:
 +              /* In TPC DEC event, notify on TPC assertion. While there isn't
 +               * a specific event for assertion yet, the FW generates TPC DEC event.
 +               * The SW upper layer will inspect an internal mapped area to indicate
 +               * if the event is a TPC Assertion or a "real" TPC DEC.
 +               */
 +              event_mask |= HL_NOTIFIER_EVENT_TPC_ASSERT;
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              reset_required = gaudi_tpc_read_interrupts(hdev,
 +                                      tpc_dec_event_to_tpc_id(event_type),
 +                                      "AXI_SLV_DEC_Error");
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              if (reset_required) {
 +                      dev_err(hdev->dev, "reset required due to %s\n",
 +                              gaudi_irq_map_table[event_type].name);
 +
 +                      reset_direct = true;
 +                      goto reset_device;
 +              } else {
 +                      hl_fw_unmask_irq(hdev, event_type);
 +                      event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
 +              }
 +              break;
 +
 +      case GAUDI_EVENT_TPC0_KRN_ERR:
 +      case GAUDI_EVENT_TPC1_KRN_ERR:
 +      case GAUDI_EVENT_TPC2_KRN_ERR:
 +      case GAUDI_EVENT_TPC3_KRN_ERR:
 +      case GAUDI_EVENT_TPC4_KRN_ERR:
 +      case GAUDI_EVENT_TPC5_KRN_ERR:
 +      case GAUDI_EVENT_TPC6_KRN_ERR:
 +      case GAUDI_EVENT_TPC7_KRN_ERR:
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              reset_required = gaudi_tpc_read_interrupts(hdev,
 +                                      tpc_krn_event_to_tpc_id(event_type),
 +                                      "KRN_ERR");
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              if (reset_required) {
 +                      dev_err(hdev->dev, "reset required due to %s\n",
 +                              gaudi_irq_map_table[event_type].name);
 +
 +                      reset_direct = true;
 +                      goto reset_device;
 +              } else {
 +                      hl_fw_unmask_irq(hdev, event_type);
 +                      event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
 +              }
 +              break;
 +
 +      case GAUDI_EVENT_PCIE_CORE_SERR:
 +      case GAUDI_EVENT_PCIE_IF_SERR:
 +      case GAUDI_EVENT_PCIE_PHY_SERR:
 +      case GAUDI_EVENT_TPC0_SERR ... GAUDI_EVENT_TPC7_SERR:
 +      case GAUDI_EVENT_MME0_ACC_SERR:
 +      case GAUDI_EVENT_MME0_SBAB_SERR:
 +      case GAUDI_EVENT_MME1_ACC_SERR:
 +      case GAUDI_EVENT_MME1_SBAB_SERR:
 +      case GAUDI_EVENT_MME2_ACC_SERR:
 +      case GAUDI_EVENT_MME2_SBAB_SERR:
 +      case GAUDI_EVENT_MME3_ACC_SERR:
 +      case GAUDI_EVENT_MME3_SBAB_SERR:
 +      case GAUDI_EVENT_DMA0_SERR_ECC ... GAUDI_EVENT_DMA7_SERR_ECC:
 +      case GAUDI_EVENT_CPU_IF_ECC_SERR:
 +      case GAUDI_EVENT_PSOC_MEM_SERR:
 +      case GAUDI_EVENT_PSOC_CORESIGHT_SERR:
 +      case GAUDI_EVENT_SRAM0_SERR ... GAUDI_EVENT_SRAM28_SERR:
 +      case GAUDI_EVENT_NIC0_SERR ... GAUDI_EVENT_NIC4_SERR:
 +      case GAUDI_EVENT_DMA_IF0_SERR ... GAUDI_EVENT_DMA_IF3_SERR:
 +      case GAUDI_EVENT_HBM_0_SERR ... GAUDI_EVENT_HBM_3_SERR:
 +              fallthrough;
 +      case GAUDI_EVENT_MMU_SERR:
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              gaudi_handle_ecc_event(hdev, event_type, &eq_entry->ecc_data);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI_EVENT_PCIE_DEC:
 +      case GAUDI_EVENT_CPU_AXI_SPLITTER:
 +      case GAUDI_EVENT_PSOC_AXI_DEC:
 +      case GAUDI_EVENT_PSOC_PRSTN_FALL:
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI_EVENT_MMU_PAGE_FAULT:
 +      case GAUDI_EVENT_MMU_WR_PERM:
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI_EVENT_MME0_WBC_RSP:
 +      case GAUDI_EVENT_MME0_SBAB0_RSP:
 +      case GAUDI_EVENT_MME1_WBC_RSP:
 +      case GAUDI_EVENT_MME1_SBAB0_RSP:
 +      case GAUDI_EVENT_MME2_WBC_RSP:
 +      case GAUDI_EVENT_MME2_SBAB0_RSP:
 +      case GAUDI_EVENT_MME3_WBC_RSP:
 +      case GAUDI_EVENT_MME3_SBAB0_RSP:
 +      case GAUDI_EVENT_RAZWI_OR_ADC:
 +      case GAUDI_EVENT_MME0_QM ... GAUDI_EVENT_MME2_QM:
 +      case GAUDI_EVENT_DMA0_QM ... GAUDI_EVENT_DMA7_QM:
 +              fallthrough;
 +      case GAUDI_EVENT_NIC0_QM0:
 +      case GAUDI_EVENT_NIC0_QM1:
 +      case GAUDI_EVENT_NIC1_QM0:
 +      case GAUDI_EVENT_NIC1_QM1:
 +      case GAUDI_EVENT_NIC2_QM0:
 +      case GAUDI_EVENT_NIC2_QM1:
 +      case GAUDI_EVENT_NIC3_QM0:
 +      case GAUDI_EVENT_NIC3_QM1:
 +      case GAUDI_EVENT_NIC4_QM0:
 +      case GAUDI_EVENT_NIC4_QM1:
 +      case GAUDI_EVENT_DMA0_CORE ... GAUDI_EVENT_DMA7_CORE:
 +      case GAUDI_EVENT_TPC0_QM ... GAUDI_EVENT_TPC7_QM:
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              gaudi_handle_qman_err(hdev, event_type, &event_mask);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              event_mask |= (HL_NOTIFIER_EVENT_USER_ENGINE_ERR | HL_NOTIFIER_EVENT_DEVICE_RESET);
 +              break;
 +
 +      case GAUDI_EVENT_RAZWI_OR_ADC_SW:
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              goto reset_device;
 +
 +      case GAUDI_EVENT_TPC0_BMON_SPMU:
 +      case GAUDI_EVENT_TPC1_BMON_SPMU:
 +      case GAUDI_EVENT_TPC2_BMON_SPMU:
 +      case GAUDI_EVENT_TPC3_BMON_SPMU:
 +      case GAUDI_EVENT_TPC4_BMON_SPMU:
 +      case GAUDI_EVENT_TPC5_BMON_SPMU:
 +      case GAUDI_EVENT_TPC6_BMON_SPMU:
 +      case GAUDI_EVENT_TPC7_BMON_SPMU:
 +      case GAUDI_EVENT_DMA_BM_CH0 ... GAUDI_EVENT_DMA_BM_CH7:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI_EVENT_NIC_SEI_0 ... GAUDI_EVENT_NIC_SEI_4:
 +              gaudi_print_nic_axi_irq_info(hdev, event_type, &data);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI_EVENT_DMA_IF_SEI_0 ... GAUDI_EVENT_DMA_IF_SEI_3:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              gaudi_print_sm_sei_info(hdev, event_type,
 +                                      &eq_entry->sm_sei_data);
 +              rc = hl_state_dump(hdev);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              if (rc)
 +                      dev_err(hdev->dev,
 +                              "Error during system state dump %d\n", rc);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              break;
 +
 +      case GAUDI_EVENT_STATUS_NIC0_ENG0 ... GAUDI_EVENT_STATUS_NIC4_ENG1:
 +              break;
 +
 +      case GAUDI_EVENT_FIX_POWER_ENV_S ... GAUDI_EVENT_FIX_THERMAL_ENV_E:
 +              gaudi_print_clk_change_info(hdev, event_type, &event_mask);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              break;
 +
 +      case GAUDI_EVENT_PSOC_GPIO_U16_0:
 +              cause = le64_to_cpu(eq_entry->data[0]) & 0xFF;
 +              dev_err(hdev->dev,
 +                      "Received high temp H/W interrupt %d (cause %d)\n",
 +                      event_type, cause);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI_EVENT_DEV_RESET_REQ:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              goto reset_device;
 +
 +      case GAUDI_EVENT_PKT_QUEUE_OUT_SYNC:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              gaudi_print_out_of_sync_info(hdev, &eq_entry->pkt_sync_err);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              goto reset_device;
 +
 +      case GAUDI_EVENT_FW_ALIVE_S:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              gaudi_print_fw_alive_info(hdev, &eq_entry->fw_alive);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              goto reset_device;
 +
 +      default:
 +              dev_err(hdev->dev, "Received invalid H/W interrupt %d\n",
 +                              event_type);
 +              break;
 +      }
 +
 +      if (event_mask)
 +              hl_notifier_event_send_all(hdev, event_mask);
 +
 +      return;
 +
 +reset_device:
 +      reset_required = true;
 +
 +      if (hdev->asic_prop.fw_security_enabled && !reset_direct) {
 +              flags = HL_DRV_RESET_HARD | HL_DRV_RESET_BYPASS_REQ_TO_FW | fw_fatal_err_flag;
 +
 +              /* notify on device unavailable while the reset triggered by fw */
 +              event_mask |= (HL_NOTIFIER_EVENT_DEVICE_RESET |
 +                                      HL_NOTIFIER_EVENT_DEVICE_UNAVAILABLE);
 +      } else if (hdev->hard_reset_on_fw_events) {
 +              flags = HL_DRV_RESET_HARD | HL_DRV_RESET_DELAY | fw_fatal_err_flag;
 +              event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
 +      } else {
 +              reset_required = false;
 +      }
 +
 +      if (reset_required) {
 +              hl_device_cond_reset(hdev, flags, event_mask);
 +      } else {
 +              hl_fw_unmask_irq(hdev, event_type);
 +              /* Notification on occurred event needs to be sent although reset is not executed */
 +              if (event_mask)
 +                      hl_notifier_event_send_all(hdev, event_mask);
 +      }
 +}
 +
 +static void *gaudi_get_events_stat(struct hl_device *hdev, bool aggregate, u32 *size)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (aggregate) {
 +              *size = (u32) sizeof(gaudi->events_stat_aggregate);
 +              return gaudi->events_stat_aggregate;
 +      }
 +
 +      *size = (u32) sizeof(gaudi->events_stat);
 +      return gaudi->events_stat;
 +}
 +
 +static int gaudi_mmu_invalidate_cache(struct hl_device *hdev, bool is_hard, u32 flags)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u32 status, timeout_usec;
 +      int rc;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MMU) ||
 +              hdev->reset_info.hard_reset_pending)
 +              return 0;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GAUDI_PLDM_MMU_TIMEOUT_USEC;
 +      else
 +              timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
 +
 +      /* L0 & L1 invalidation */
 +      WREG32(mmSTLB_INV_PS, 3);
 +      WREG32(mmSTLB_CACHE_INV, gaudi->mmu_cache_inv_pi++);
 +      WREG32(mmSTLB_INV_PS, 2);
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              mmSTLB_INV_PS,
 +              status,
 +              !status,
 +              1000,
 +              timeout_usec);
 +
 +      WREG32(mmSTLB_INV_SET, 0);
 +
 +      return rc;
 +}
 +
 +static int gaudi_mmu_invalidate_cache_range(struct hl_device *hdev,
 +                                              bool is_hard, u32 flags,
 +                                              u32 asid, u64 va, u64 size)
 +{
 +      /* Treat as invalidate all because there is no range invalidation
 +       * in Gaudi
 +       */
 +      return hdev->asic_funcs->mmu_invalidate_cache(hdev, is_hard, flags);
 +}
 +
 +static int gaudi_mmu_update_asid_hop0_addr(struct hl_device *hdev, u32 asid, u64 phys_addr)
 +{
 +      u32 status, timeout_usec;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GAUDI_PLDM_MMU_TIMEOUT_USEC;
 +      else
 +              timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
 +
 +      WREG32(MMU_ASID, asid);
 +      WREG32(MMU_HOP0_PA43_12, phys_addr >> MMU_HOP0_PA43_12_SHIFT);
 +      WREG32(MMU_HOP0_PA49_44, phys_addr >> MMU_HOP0_PA49_44_SHIFT);
 +      WREG32(MMU_BUSY, 0x80000000);
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              MMU_BUSY,
 +              status,
 +              !(status & 0x80000000),
 +              1000,
 +              timeout_usec);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Timeout during MMU hop0 config of asid %d\n", asid);
 +              return rc;
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi_send_heartbeat(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_send_heartbeat(hdev);
 +}
 +
 +static int gaudi_cpucp_info_get(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int rc;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      rc = hl_fw_cpucp_handshake(hdev, mmCPU_BOOT_DEV_STS0,
 +                                      mmCPU_BOOT_DEV_STS1, mmCPU_BOOT_ERR0,
 +                                      mmCPU_BOOT_ERR1);
 +      if (rc)
 +              return rc;
 +
 +      if (!strlen(prop->cpucp_info.card_name))
 +              strncpy(prop->cpucp_info.card_name, GAUDI_DEFAULT_CARD_NAME,
 +                              CARD_NAME_MAX_LEN);
 +
 +      hdev->card_type = le32_to_cpu(hdev->asic_prop.cpucp_info.card_type);
 +
 +      set_default_power_values(hdev);
 +
 +      return 0;
 +}
 +
 +static bool gaudi_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask_len,
 +              struct engines_data *e)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      const char *fmt = "%-5d%-9s%#-14x%#-12x%#x\n";
 +      const char *mme_slave_fmt = "%-5d%-9s%-14s%-12s%#x\n";
 +      const char *nic_fmt = "%-5d%-9s%#-14x%#x\n";
 +      unsigned long *mask = (unsigned long *)mask_arr;
 +      u32 qm_glbl_sts0, qm_cgm_sts, dma_core_sts0, tpc_cfg_sts, mme_arch_sts;
 +      bool is_idle = true, is_eng_idle, is_slave;
 +      u64 offset;
 +      int i, dma_id, port;
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                      "\nDMA  is_idle  QM_GLBL_STS0  QM_CGM_STS  DMA_CORE_STS0\n"
 +                      "---  -------  ------------  ----------  -------------\n");
 +
 +      for (i = 0 ; i < DMA_NUMBER_OF_CHNLS ; i++) {
 +              dma_id = gaudi_dma_assignment[i];
 +              offset = dma_id * DMA_QMAN_OFFSET;
 +
 +              qm_glbl_sts0 = RREG32(mmDMA0_QM_GLBL_STS0 + offset);
 +              qm_cgm_sts = RREG32(mmDMA0_QM_CGM_STS + offset);
 +              dma_core_sts0 = RREG32(mmDMA0_CORE_STS0 + offset);
 +              is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts) &&
 +                              IS_DMA_IDLE(dma_core_sts0);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(GAUDI_ENGINE_ID_DMA_0 + dma_id, mask);
 +              if (e)
 +                      hl_engine_data_sprintf(e, fmt, dma_id,
 +                              is_eng_idle ? "Y" : "N", qm_glbl_sts0,
 +                              qm_cgm_sts, dma_core_sts0);
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                      "\nTPC  is_idle  QM_GLBL_STS0  QM_CGM_STS  CFG_STATUS\n"
 +                      "---  -------  ------------  ----------  ----------\n");
 +
 +      for (i = 0 ; i < TPC_NUMBER_OF_ENGINES ; i++) {
 +              offset = i * TPC_QMAN_OFFSET;
 +              qm_glbl_sts0 = RREG32(mmTPC0_QM_GLBL_STS0 + offset);
 +              qm_cgm_sts = RREG32(mmTPC0_QM_CGM_STS + offset);
 +              tpc_cfg_sts = RREG32(mmTPC0_CFG_STATUS + offset);
 +              is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts) &&
 +                              IS_TPC_IDLE(tpc_cfg_sts);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(GAUDI_ENGINE_ID_TPC_0 + i, mask);
 +              if (e)
 +                      hl_engine_data_sprintf(e, fmt, i,
 +                              is_eng_idle ? "Y" : "N",
 +                              qm_glbl_sts0, qm_cgm_sts, tpc_cfg_sts);
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                      "\nMME  is_idle  QM_GLBL_STS0  QM_CGM_STS  ARCH_STATUS\n"
 +                      "---  -------  ------------  ----------  -----------\n");
 +
 +      for (i = 0 ; i < MME_NUMBER_OF_ENGINES ; i++) {
 +              offset = i * MME_QMAN_OFFSET;
 +              mme_arch_sts = RREG32(mmMME0_CTRL_ARCH_STATUS + offset);
 +              is_eng_idle = IS_MME_IDLE(mme_arch_sts);
 +
 +              /* MME 1 & 3 are slaves, no need to check their QMANs */
 +              is_slave = i % 2;
 +              if (!is_slave) {
 +                      qm_glbl_sts0 = RREG32(mmMME0_QM_GLBL_STS0 + offset);
 +                      qm_cgm_sts = RREG32(mmMME0_QM_CGM_STS + offset);
 +                      is_eng_idle &= IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
 +              }
 +
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(GAUDI_ENGINE_ID_MME_0 + i, mask);
 +              if (e) {
 +                      if (!is_slave)
 +                              hl_engine_data_sprintf(e, fmt, i,
 +                                      is_eng_idle ? "Y" : "N",
 +                                      qm_glbl_sts0, qm_cgm_sts, mme_arch_sts);
 +                      else
 +                              hl_engine_data_sprintf(e, mme_slave_fmt, i,
 +                                      is_eng_idle ? "Y" : "N", "-",
 +                                      "-", mme_arch_sts);
 +              }
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                              "\nNIC  is_idle  QM_GLBL_STS0  QM_CGM_STS\n"
 +                              "---  -------  ------------  ----------\n");
 +
 +      for (i = 0 ; i < (NIC_NUMBER_OF_ENGINES / 2) ; i++) {
 +              offset = i * NIC_MACRO_QMAN_OFFSET;
 +              port = 2 * i;
 +              if (gaudi->hw_cap_initialized & BIT(HW_CAP_NIC_SHIFT + port)) {
 +                      qm_glbl_sts0 = RREG32(mmNIC0_QM0_GLBL_STS0 + offset);
 +                      qm_cgm_sts = RREG32(mmNIC0_QM0_CGM_STS + offset);
 +                      is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
 +                      is_idle &= is_eng_idle;
 +
 +                      if (mask && !is_eng_idle)
 +                              set_bit(GAUDI_ENGINE_ID_NIC_0 + port, mask);
 +                      if (e)
 +                              hl_engine_data_sprintf(e, nic_fmt, port,
 +                                              is_eng_idle ? "Y" : "N",
 +                                              qm_glbl_sts0, qm_cgm_sts);
 +              }
 +
 +              port = 2 * i + 1;
 +              if (gaudi->hw_cap_initialized & BIT(HW_CAP_NIC_SHIFT + port)) {
 +                      qm_glbl_sts0 = RREG32(mmNIC0_QM1_GLBL_STS0 + offset);
 +                      qm_cgm_sts = RREG32(mmNIC0_QM1_CGM_STS + offset);
 +                      is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
 +                      is_idle &= is_eng_idle;
 +
 +                      if (mask && !is_eng_idle)
 +                              set_bit(GAUDI_ENGINE_ID_NIC_0 + port, mask);
 +                      if (e)
 +                              hl_engine_data_sprintf(e, nic_fmt, port,
 +                                              is_eng_idle ? "Y" : "N",
 +                                              qm_glbl_sts0, qm_cgm_sts);
 +              }
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e, "\n");
 +
 +      return is_idle;
 +}
 +
 +static void gaudi_hw_queues_lock(struct hl_device *hdev)
 +      __acquires(&gaudi->hw_queues_lock)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      spin_lock(&gaudi->hw_queues_lock);
 +}
 +
 +static void gaudi_hw_queues_unlock(struct hl_device *hdev)
 +      __releases(&gaudi->hw_queues_lock)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      spin_unlock(&gaudi->hw_queues_lock);
 +}
 +
 +static u32 gaudi_get_pci_id(struct hl_device *hdev)
 +{
 +      return hdev->pdev->device;
 +}
 +
 +static int gaudi_get_eeprom_data(struct hl_device *hdev, void *data,
 +                              size_t max_size)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_get_eeprom_data(hdev, data, max_size);
 +}
 +
 +static int gaudi_get_monitor_dump(struct hl_device *hdev, void *data)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_get_monitor_dump(hdev, data);
 +}
 +
 +/*
 + * this function should be used only during initialization and/or after reset,
 + * when there are no active users.
 + */
 +static int gaudi_run_tpc_kernel(struct hl_device *hdev, u64 tpc_kernel,       u32 tpc_id)
 +{
 +      u64 kernel_timeout;
 +      u32 status, offset;
 +      int rc;
 +
 +      offset = tpc_id * (mmTPC1_CFG_STATUS - mmTPC0_CFG_STATUS);
 +
 +      if (hdev->pldm)
 +              kernel_timeout = GAUDI_PLDM_TPC_KERNEL_WAIT_USEC;
 +      else
 +              kernel_timeout = HL_DEVICE_TIMEOUT_USEC;
 +
 +      WREG32(mmTPC0_CFG_QM_KERNEL_BASE_ADDRESS_LOW + offset,
 +                      lower_32_bits(tpc_kernel));
 +      WREG32(mmTPC0_CFG_QM_KERNEL_BASE_ADDRESS_HIGH + offset,
 +                      upper_32_bits(tpc_kernel));
 +
 +      WREG32(mmTPC0_CFG_ICACHE_BASE_ADDERESS_LOW + offset,
 +                      lower_32_bits(tpc_kernel));
 +      WREG32(mmTPC0_CFG_ICACHE_BASE_ADDERESS_HIGH + offset,
 +                      upper_32_bits(tpc_kernel));
 +      /* set a valid LUT pointer, content is of no significance */
 +      WREG32(mmTPC0_CFG_LUT_FUNC256_BASE_ADDR_LO + offset,
 +                      lower_32_bits(tpc_kernel));
 +      WREG32(mmTPC0_CFG_LUT_FUNC256_BASE_ADDR_HI + offset,
 +                      upper_32_bits(tpc_kernel));
 +
 +      WREG32(mmTPC0_CFG_QM_SYNC_OBJECT_ADDR + offset,
 +                      lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0));
 +
 +      WREG32(mmTPC0_CFG_TPC_CMD + offset,
 +                      (1 << TPC0_CFG_TPC_CMD_ICACHE_INVALIDATE_SHIFT |
 +                      1 << TPC0_CFG_TPC_CMD_ICACHE_PREFETCH_64KB_SHIFT));
 +      /* wait a bit for the engine to start executing */
 +      usleep_range(1000, 1500);
 +
 +      /* wait until engine has finished executing */
 +      rc = hl_poll_timeout(
 +              hdev,
 +              mmTPC0_CFG_STATUS + offset,
 +              status,
 +              (status & TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK) ==
 +                              TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK,
 +              1000,
 +              kernel_timeout);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Timeout while waiting for TPC%d icache prefetch\n",
 +                      tpc_id);
 +              return -EIO;
 +      }
 +
 +      WREG32(mmTPC0_CFG_TPC_EXECUTE + offset,
 +                      1 << TPC0_CFG_TPC_EXECUTE_V_SHIFT);
 +
 +      /* wait a bit for the engine to start executing */
 +      usleep_range(1000, 1500);
 +
 +      /* wait until engine has finished executing */
 +      rc = hl_poll_timeout(
 +              hdev,
 +              mmTPC0_CFG_STATUS + offset,
 +              status,
 +              (status & TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK) ==
 +                              TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK,
 +              1000,
 +              kernel_timeout);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Timeout while waiting for TPC%d vector pipe\n",
 +                      tpc_id);
 +              return -EIO;
 +      }
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              mmTPC0_CFG_WQ_INFLIGHT_CNTR + offset,
 +              status,
 +              (status == 0),
 +              1000,
 +              kernel_timeout);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Timeout while waiting for TPC%d kernel to execute\n",
 +                      tpc_id);
 +              return -EIO;
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi_internal_cb_pool_init(struct hl_device *hdev,
 +              struct hl_ctx *ctx)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      int min_alloc_order, rc, collective_cb_size;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
 +              return 0;
 +
 +      hdev->internal_cb_pool_virt_addr = hl_asic_dma_alloc_coherent(hdev,
 +                                                      HOST_SPACE_INTERNAL_CB_SZ,
 +                                                      &hdev->internal_cb_pool_dma_addr,
 +                                                      GFP_KERNEL | __GFP_ZERO);
 +
 +      if (!hdev->internal_cb_pool_virt_addr)
 +              return -ENOMEM;
 +
 +      collective_cb_size = sizeof(struct packet_msg_short) * 5 +
 +                      sizeof(struct packet_fence);
 +      min_alloc_order = ilog2(collective_cb_size);
 +
 +      hdev->internal_cb_pool = gen_pool_create(min_alloc_order, -1);
 +      if (!hdev->internal_cb_pool) {
 +              dev_err(hdev->dev,
 +                      "Failed to create internal CB pool\n");
 +              rc = -ENOMEM;
 +              goto free_internal_cb_pool;
 +      }
 +
 +      rc = gen_pool_add(hdev->internal_cb_pool,
 +                              (uintptr_t) hdev->internal_cb_pool_virt_addr,
 +                              HOST_SPACE_INTERNAL_CB_SZ, -1);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to add memory to internal CB pool\n");
 +              rc = -EFAULT;
 +              goto destroy_internal_cb_pool;
 +      }
 +
 +      hdev->internal_cb_va_base = hl_reserve_va_block(hdev, ctx,
 +                      HL_VA_RANGE_TYPE_HOST, HOST_SPACE_INTERNAL_CB_SZ,
 +                      HL_MMU_VA_ALIGNMENT_NOT_NEEDED);
 +
 +      if (!hdev->internal_cb_va_base) {
 +              rc = -ENOMEM;
 +              goto destroy_internal_cb_pool;
 +      }
 +
 +      mutex_lock(&hdev->mmu_lock);
 +      rc = hl_mmu_map_contiguous(ctx, hdev->internal_cb_va_base,
 +                      hdev->internal_cb_pool_dma_addr,
 +                      HOST_SPACE_INTERNAL_CB_SZ);
 +
 +      hl_mmu_invalidate_cache(hdev, false, MMU_OP_USERPTR);
 +      mutex_unlock(&hdev->mmu_lock);
 +
 +      if (rc)
 +              goto unreserve_internal_cb_pool;
 +
 +      return 0;
 +
 +unreserve_internal_cb_pool:
 +      hl_unreserve_va_block(hdev, ctx, hdev->internal_cb_va_base,
 +                      HOST_SPACE_INTERNAL_CB_SZ);
 +destroy_internal_cb_pool:
 +      gen_pool_destroy(hdev->internal_cb_pool);
 +free_internal_cb_pool:
 +      hl_asic_dma_free_coherent(hdev, HOST_SPACE_INTERNAL_CB_SZ, hdev->internal_cb_pool_virt_addr,
 +                                      hdev->internal_cb_pool_dma_addr);
 +
 +      return rc;
 +}
 +
 +static void gaudi_internal_cb_pool_fini(struct hl_device *hdev,
 +              struct hl_ctx *ctx)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
 +              return;
 +
 +      mutex_lock(&hdev->mmu_lock);
 +      hl_mmu_unmap_contiguous(ctx, hdev->internal_cb_va_base,
 +                      HOST_SPACE_INTERNAL_CB_SZ);
 +      hl_unreserve_va_block(hdev, ctx, hdev->internal_cb_va_base,
 +                      HOST_SPACE_INTERNAL_CB_SZ);
 +      hl_mmu_invalidate_cache(hdev, true, MMU_OP_USERPTR);
 +      mutex_unlock(&hdev->mmu_lock);
 +
 +      gen_pool_destroy(hdev->internal_cb_pool);
 +
 +      hl_asic_dma_free_coherent(hdev, HOST_SPACE_INTERNAL_CB_SZ, hdev->internal_cb_pool_virt_addr,
 +                                      hdev->internal_cb_pool_dma_addr);
 +}
 +
 +static int gaudi_ctx_init(struct hl_ctx *ctx)
 +{
 +      int rc;
 +
 +      if (ctx->asid == HL_KERNEL_ASID_ID)
 +              return 0;
 +
 +      rc = gaudi_internal_cb_pool_init(ctx->hdev, ctx);
 +      if (rc)
 +              return rc;
 +
 +      rc = gaudi_restore_user_registers(ctx->hdev);
 +      if (rc)
 +              gaudi_internal_cb_pool_fini(ctx->hdev, ctx);
 +
 +      return rc;
 +}
 +
 +static void gaudi_ctx_fini(struct hl_ctx *ctx)
 +{
 +      if (ctx->asid == HL_KERNEL_ASID_ID)
 +              return;
 +
 +      gaudi_internal_cb_pool_fini(ctx->hdev, ctx);
 +}
 +
 +static int gaudi_pre_schedule_cs(struct hl_cs *cs)
 +{
 +      return 0;
 +}
 +
 +static u32 gaudi_get_queue_id_for_cq(struct hl_device *hdev, u32 cq_idx)
 +{
 +      return gaudi_cq_assignment[cq_idx];
 +}
 +
 +static u32 gaudi_get_signal_cb_size(struct hl_device *hdev)
 +{
 +      return sizeof(struct packet_msg_short) +
 +                      sizeof(struct packet_msg_prot) * 2;
 +}
 +
 +static u32 gaudi_get_wait_cb_size(struct hl_device *hdev)
 +{
 +      return sizeof(struct packet_msg_short) * 4 +
 +                      sizeof(struct packet_fence) +
 +                      sizeof(struct packet_msg_prot) * 2;
 +}
 +
 +static u32 gaudi_get_sob_addr(struct hl_device *hdev, u32 sob_id)
 +{
 +      return mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 + (sob_id * 4);
 +}
 +
 +static u32 gaudi_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id,
 +                              u32 size, bool eb)
 +{
 +      struct hl_cb *cb = (struct hl_cb *) data;
 +      struct packet_msg_short *pkt;
 +      u32 value, ctl, pkt_size = sizeof(*pkt);
 +
 +      pkt = cb->kernel_address + size;
 +      memset(pkt, 0, pkt_size);
 +
 +      /* Inc by 1, Mode ADD */
 +      value = FIELD_PREP(GAUDI_PKT_SHORT_VAL_SOB_SYNC_VAL_MASK, 1);
 +      value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_SOB_MOD_MASK, 1);
 +
 +      ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, sob_id * 4);
 +      ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OP_MASK, 0); /* write the value */
 +      ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_BASE_MASK, 3); /* W_S SOB base */
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, eb);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      pkt->value = cpu_to_le32(value);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return size + pkt_size;
 +}
 +
 +static u32 gaudi_add_mon_msg_short(struct packet_msg_short *pkt, u32 value,
 +                                      u16 addr)
 +{
 +      u32 ctl, pkt_size = sizeof(*pkt);
 +
 +      memset(pkt, 0, pkt_size);
 +
 +      ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, addr);
 +      ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_BASE_MASK, 2);  /* W_S MON base */
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 0);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 0); /* last pkt MB */
 +
 +      pkt->value = cpu_to_le32(value);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return pkt_size;
 +}
 +
 +static u32 gaudi_add_arm_monitor_pkt(struct hl_device *hdev,
 +              struct packet_msg_short *pkt, u16 sob_base, u8 sob_mask,
 +              u16 sob_val, u16 mon_id)
 +{
 +      u64 monitor_base;
 +      u32 ctl, value, pkt_size = sizeof(*pkt);
 +      u16 msg_addr_offset;
 +      u8 mask;
 +
 +      if (hl_gen_sob_mask(sob_base, sob_mask, &mask)) {
 +              dev_err(hdev->dev,
 +                      "sob_base %u (mask %#x) is not valid\n",
 +                      sob_base, sob_mask);
 +              return 0;
 +      }
 +
 +      /*
 +       * monitor_base should be the content of the base0 address registers,
 +       * so it will be added to the msg short offsets
 +       */
 +      monitor_base = mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
 +
 +      msg_addr_offset =
 +              (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0 + mon_id * 4) -
 +                              monitor_base;
 +
 +      memset(pkt, 0, pkt_size);
 +
 +      /* Monitor config packet: bind the monitor to a sync object */
 +      value = FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_SYNC_GID_MASK, sob_base / 8);
 +      value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_SYNC_VAL_MASK, sob_val);
 +      value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_MODE_MASK,
 +                      0); /* GREATER OR EQUAL*/
 +      value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_MASK_MASK, mask);
 +
 +      ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, msg_addr_offset);
 +      ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OP_MASK, 0); /* write the value */
 +      ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_BASE_MASK, 2); /* W_S MON base */
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 0);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      pkt->value = cpu_to_le32(value);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return pkt_size;
 +}
 +
 +static u32 gaudi_add_fence_pkt(struct packet_fence *pkt)
 +{
 +      u32 ctl, cfg, pkt_size = sizeof(*pkt);
 +
 +      memset(pkt, 0, pkt_size);
 +
 +      cfg = FIELD_PREP(GAUDI_PKT_FENCE_CFG_DEC_VAL_MASK, 1);
 +      cfg |= FIELD_PREP(GAUDI_PKT_FENCE_CFG_TARGET_VAL_MASK, 1);
 +      cfg |= FIELD_PREP(GAUDI_PKT_FENCE_CFG_ID_MASK, 2);
 +
 +      ctl = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_FENCE);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 0);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      pkt->cfg = cpu_to_le32(cfg);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return pkt_size;
 +}
 +
 +static int gaudi_get_fence_addr(struct hl_device *hdev, u32 queue_id, u64 *addr)
 +{
 +      u32 offset, nic_index;
 +
 +      switch (queue_id) {
 +      case GAUDI_QUEUE_ID_DMA_0_0:
 +              offset = mmDMA0_QM_CP_FENCE2_RDATA_0;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_0_1:
 +              offset = mmDMA0_QM_CP_FENCE2_RDATA_1;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_0_2:
 +              offset = mmDMA0_QM_CP_FENCE2_RDATA_2;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_0_3:
 +              offset = mmDMA0_QM_CP_FENCE2_RDATA_3;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_1_0:
 +              offset = mmDMA1_QM_CP_FENCE2_RDATA_0;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_1_1:
 +              offset = mmDMA1_QM_CP_FENCE2_RDATA_1;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_1_2:
 +              offset = mmDMA1_QM_CP_FENCE2_RDATA_2;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_1_3:
 +              offset = mmDMA1_QM_CP_FENCE2_RDATA_3;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_5_0:
 +              offset = mmDMA5_QM_CP_FENCE2_RDATA_0;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_5_1:
 +              offset = mmDMA5_QM_CP_FENCE2_RDATA_1;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_5_2:
 +              offset = mmDMA5_QM_CP_FENCE2_RDATA_2;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_5_3:
 +              offset = mmDMA5_QM_CP_FENCE2_RDATA_3;
 +              break;
 +      case GAUDI_QUEUE_ID_TPC_7_0:
 +              offset = mmTPC7_QM_CP_FENCE2_RDATA_0;
 +              break;
 +      case GAUDI_QUEUE_ID_TPC_7_1:
 +              offset = mmTPC7_QM_CP_FENCE2_RDATA_1;
 +              break;
 +      case GAUDI_QUEUE_ID_TPC_7_2:
 +              offset = mmTPC7_QM_CP_FENCE2_RDATA_2;
 +              break;
 +      case GAUDI_QUEUE_ID_TPC_7_3:
 +              offset = mmTPC7_QM_CP_FENCE2_RDATA_3;
 +              break;
 +      case GAUDI_QUEUE_ID_NIC_0_0:
 +      case GAUDI_QUEUE_ID_NIC_1_0:
 +      case GAUDI_QUEUE_ID_NIC_2_0:
 +      case GAUDI_QUEUE_ID_NIC_3_0:
 +      case GAUDI_QUEUE_ID_NIC_4_0:
 +      case GAUDI_QUEUE_ID_NIC_5_0:
 +      case GAUDI_QUEUE_ID_NIC_6_0:
 +      case GAUDI_QUEUE_ID_NIC_7_0:
 +      case GAUDI_QUEUE_ID_NIC_8_0:
 +      case GAUDI_QUEUE_ID_NIC_9_0:
 +              nic_index = (queue_id - GAUDI_QUEUE_ID_NIC_0_0) >> 2;
 +              offset = mmNIC0_QM0_CP_FENCE2_RDATA_0 +
 +                              (nic_index >> 1) * NIC_MACRO_QMAN_OFFSET +
 +                              (nic_index & 0x1) * NIC_ENGINE_QMAN_OFFSET;
 +              break;
 +      case GAUDI_QUEUE_ID_NIC_0_1:
 +      case GAUDI_QUEUE_ID_NIC_1_1:
 +      case GAUDI_QUEUE_ID_NIC_2_1:
 +      case GAUDI_QUEUE_ID_NIC_3_1:
 +      case GAUDI_QUEUE_ID_NIC_4_1:
 +      case GAUDI_QUEUE_ID_NIC_5_1:
 +      case GAUDI_QUEUE_ID_NIC_6_1:
 +      case GAUDI_QUEUE_ID_NIC_7_1:
 +      case GAUDI_QUEUE_ID_NIC_8_1:
 +      case GAUDI_QUEUE_ID_NIC_9_1:
 +              nic_index = (queue_id - GAUDI_QUEUE_ID_NIC_0_1) >> 2;
 +              offset = mmNIC0_QM0_CP_FENCE2_RDATA_1 +
 +                              (nic_index >> 1) * NIC_MACRO_QMAN_OFFSET +
 +                              (nic_index & 0x1) * NIC_ENGINE_QMAN_OFFSET;
 +              break;
 +      case GAUDI_QUEUE_ID_NIC_0_2:
 +      case GAUDI_QUEUE_ID_NIC_1_2:
 +      case GAUDI_QUEUE_ID_NIC_2_2:
 +      case GAUDI_QUEUE_ID_NIC_3_2:
 +      case GAUDI_QUEUE_ID_NIC_4_2:
 +      case GAUDI_QUEUE_ID_NIC_5_2:
 +      case GAUDI_QUEUE_ID_NIC_6_2:
 +      case GAUDI_QUEUE_ID_NIC_7_2:
 +      case GAUDI_QUEUE_ID_NIC_8_2:
 +      case GAUDI_QUEUE_ID_NIC_9_2:
 +              nic_index = (queue_id - GAUDI_QUEUE_ID_NIC_0_2) >> 2;
 +              offset = mmNIC0_QM0_CP_FENCE2_RDATA_2 +
 +                              (nic_index >> 1) * NIC_MACRO_QMAN_OFFSET +
 +                              (nic_index & 0x1) * NIC_ENGINE_QMAN_OFFSET;
 +              break;
 +      case GAUDI_QUEUE_ID_NIC_0_3:
 +      case GAUDI_QUEUE_ID_NIC_1_3:
 +      case GAUDI_QUEUE_ID_NIC_2_3:
 +      case GAUDI_QUEUE_ID_NIC_3_3:
 +      case GAUDI_QUEUE_ID_NIC_4_3:
 +      case GAUDI_QUEUE_ID_NIC_5_3:
 +      case GAUDI_QUEUE_ID_NIC_6_3:
 +      case GAUDI_QUEUE_ID_NIC_7_3:
 +      case GAUDI_QUEUE_ID_NIC_8_3:
 +      case GAUDI_QUEUE_ID_NIC_9_3:
 +              nic_index = (queue_id - GAUDI_QUEUE_ID_NIC_0_3) >> 2;
 +              offset = mmNIC0_QM0_CP_FENCE2_RDATA_3 +
 +                              (nic_index >> 1) * NIC_MACRO_QMAN_OFFSET +
 +                              (nic_index & 0x1) * NIC_ENGINE_QMAN_OFFSET;
 +              break;
 +      default:
 +              return -EINVAL;
 +      }
 +
 +      *addr = CFG_BASE + offset;
 +
 +      return 0;
 +}
 +
 +static u32 gaudi_add_mon_pkts(void *buf, u16 mon_id, u64 fence_addr)
 +{
 +      u64 monitor_base;
 +      u32 size = 0;
 +      u16 msg_addr_offset;
 +
 +      /*
 +       * monitor_base should be the content of the base0 address registers,
 +       * so it will be added to the msg short offsets
 +       */
 +      monitor_base = mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
 +
 +      /* First monitor config packet: low address of the sync */
 +      msg_addr_offset =
 +              (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_id * 4) -
 +                              monitor_base;
 +
 +      size += gaudi_add_mon_msg_short(buf + size, (u32) fence_addr,
 +                                      msg_addr_offset);
 +
 +      /* Second monitor config packet: high address of the sync */
 +      msg_addr_offset =
 +              (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_id * 4) -
 +                              monitor_base;
 +
 +      size += gaudi_add_mon_msg_short(buf + size, (u32) (fence_addr >> 32),
 +                                      msg_addr_offset);
 +
 +      /*
 +       * Third monitor config packet: the payload, i.e. what to write when the
 +       * sync triggers
 +       */
 +      msg_addr_offset =
 +              (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_id * 4) -
 +                              monitor_base;
 +
 +      size += gaudi_add_mon_msg_short(buf + size, 1, msg_addr_offset);
 +
 +      return size;
 +}
 +
 +static u32 gaudi_gen_wait_cb(struct hl_device *hdev,
 +                              struct hl_gen_wait_properties *prop)
 +{
 +      struct hl_cb *cb = (struct hl_cb *) prop->data;
 +      void *buf = cb->kernel_address;
 +      u64 fence_addr = 0;
 +      u32 size = prop->size;
 +
 +      if (gaudi_get_fence_addr(hdev, prop->q_idx, &fence_addr)) {
 +              dev_crit(hdev->dev, "wrong queue id %d for wait packet\n",
 +                              prop->q_idx);
 +              return 0;
 +      }
 +
 +      size += gaudi_add_mon_pkts(buf + size, prop->mon_id, fence_addr);
 +      size += gaudi_add_arm_monitor_pkt(hdev, buf + size, prop->sob_base,
 +                      prop->sob_mask, prop->sob_val, prop->mon_id);
 +      size += gaudi_add_fence_pkt(buf + size);
 +
 +      return size;
 +}
 +
 +static void gaudi_reset_sob(struct hl_device *hdev, void *data)
 +{
 +      struct hl_hw_sob *hw_sob = (struct hl_hw_sob *) data;
 +
 +      dev_dbg(hdev->dev, "reset SOB, q_idx: %d, sob_id: %d\n", hw_sob->q_idx,
 +              hw_sob->sob_id);
 +
 +      WREG32(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 +
 +                      hw_sob->sob_id * 4, 0);
 +
 +      kref_init(&hw_sob->kref);
 +}
 +
 +static u64 gaudi_get_device_time(struct hl_device *hdev)
 +{
 +      u64 device_time = ((u64) RREG32(mmPSOC_TIMESTAMP_CNTCVU)) << 32;
 +
 +      return device_time | RREG32(mmPSOC_TIMESTAMP_CNTCVL);
 +}
 +
 +static int gaudi_get_hw_block_id(struct hl_device *hdev, u64 block_addr,
 +                              u32 *block_size, u32 *block_id)
 +{
 +      return -EPERM;
 +}
 +
 +static int gaudi_block_mmap(struct hl_device *hdev,
 +                              struct vm_area_struct *vma,
 +                              u32 block_id, u32 block_size)
 +{
 +      return -EPERM;
 +}
 +
 +static void gaudi_enable_events_from_fw(struct hl_device *hdev)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                      mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                      le32_to_cpu(dyn_regs->gic_host_ints_irq);
 +
 +      WREG32(irq_handler_offset,
 +              gaudi_irq_map_table[GAUDI_EVENT_INTS_REGISTER].cpu_id);
 +}
 +
 +static int gaudi_ack_mmu_page_fault_or_access_error(struct hl_device *hdev, u64 mmu_cap_mask)
 +{
 +      return -EINVAL;
 +}
 +
 +static int gaudi_map_pll_idx_to_fw_idx(u32 pll_idx)
 +{
 +      switch (pll_idx) {
 +      case HL_GAUDI_CPU_PLL: return CPU_PLL;
 +      case HL_GAUDI_PCI_PLL: return PCI_PLL;
 +      case HL_GAUDI_NIC_PLL: return NIC_PLL;
 +      case HL_GAUDI_DMA_PLL: return DMA_PLL;
 +      case HL_GAUDI_MESH_PLL: return MESH_PLL;
 +      case HL_GAUDI_MME_PLL: return MME_PLL;
 +      case HL_GAUDI_TPC_PLL: return TPC_PLL;
 +      case HL_GAUDI_IF_PLL: return IF_PLL;
 +      case HL_GAUDI_SRAM_PLL: return SRAM_PLL;
 +      case HL_GAUDI_HBM_PLL: return HBM_PLL;
 +      default: return -EINVAL;
 +      }
 +}
 +
 +static int gaudi_add_sync_to_engine_map_entry(
 +      struct hl_sync_to_engine_map *map, u32 reg_value,
 +      enum hl_sync_engine_type engine_type, u32 engine_id)
 +{
 +      struct hl_sync_to_engine_map_entry *entry;
 +
 +      /* Reg value represents a partial address of sync object,
 +       * it is used as unique identifier. For this we need to
 +       * clear the cutoff cfg base bits from the value.
 +       */
 +      if (reg_value == 0 || reg_value == 0xffffffff)
 +              return 0;
 +      reg_value -= lower_32_bits(CFG_BASE);
 +
 +      /* create a new hash entry */
 +      entry = kzalloc(sizeof(*entry), GFP_KERNEL);
 +      if (!entry)
 +              return -ENOMEM;
 +      entry->engine_type = engine_type;
 +      entry->engine_id = engine_id;
 +      entry->sync_id = reg_value;
 +      hash_add(map->tb, &entry->node, reg_value);
 +
 +      return 0;
 +}
 +
 +static int gaudi_gen_sync_to_engine_map(struct hl_device *hdev,
 +                              struct hl_sync_to_engine_map *map)
 +{
 +      struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
 +      int i, j, rc;
 +      u32 reg_value;
 +
 +      /* Iterate over TPC engines */
 +      for (i = 0; i < sds->props[SP_NUM_OF_TPC_ENGINES]; ++i) {
 +
 +              reg_value = RREG32(sds->props[SP_TPC0_CFG_SO] +
 +                                      sds->props[SP_NEXT_TPC] * i);
 +
 +              rc = gaudi_add_sync_to_engine_map_entry(map, reg_value,
 +                                                      ENGINE_TPC, i);
 +              if (rc)
 +                      goto free_sync_to_engine_map;
 +      }
 +
 +      /* Iterate over MME engines */
 +      for (i = 0; i < sds->props[SP_NUM_OF_MME_ENGINES]; ++i) {
 +              for (j = 0; j < sds->props[SP_SUB_MME_ENG_NUM]; ++j) {
 +
 +                      reg_value = RREG32(sds->props[SP_MME_CFG_SO] +
 +                                              sds->props[SP_NEXT_MME] * i +
 +                                              j * sizeof(u32));
 +
 +                      rc = gaudi_add_sync_to_engine_map_entry(
 +                              map, reg_value, ENGINE_MME,
 +                              i * sds->props[SP_SUB_MME_ENG_NUM] + j);
 +                      if (rc)
 +                              goto free_sync_to_engine_map;
 +              }
 +      }
 +
 +      /* Iterate over DMA engines */
 +      for (i = 0; i < sds->props[SP_NUM_OF_DMA_ENGINES]; ++i) {
 +              reg_value = RREG32(sds->props[SP_DMA_CFG_SO] +
 +                                      sds->props[SP_DMA_QUEUES_OFFSET] * i);
 +              rc = gaudi_add_sync_to_engine_map_entry(map, reg_value,
 +                                                      ENGINE_DMA, i);
 +              if (rc)
 +                      goto free_sync_to_engine_map;
 +      }
 +
 +      return 0;
 +
 +free_sync_to_engine_map:
 +      hl_state_dump_free_sync_to_engine_map(map);
 +
 +      return rc;
 +}
 +
 +static int gaudi_monitor_valid(struct hl_mon_state_dump *mon)
 +{
 +      return FIELD_GET(
 +              SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_STATUS_0_VALID_MASK,
 +              mon->status);
 +}
 +
 +static void gaudi_fill_sobs_from_mon(char *sobs, struct hl_mon_state_dump *mon)
 +{
 +      const size_t max_write = 10;
 +      u32 gid, mask, sob;
 +      int i, offset;
 +
 +      /* Sync object ID is calculated as follows:
 +       * (8 * group_id + cleared bits in mask)
 +       */
 +      gid = FIELD_GET(SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_SID_MASK,
 +                      mon->arm_data);
 +      mask = FIELD_GET(SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_MASK_MASK,
 +                      mon->arm_data);
 +
 +      for (i = 0, offset = 0; mask && offset < MONITOR_SOB_STRING_SIZE -
 +              max_write; mask >>= 1, i++) {
 +              if (!(mask & 1)) {
 +                      sob = gid * MONITOR_MAX_SOBS + i;
 +
 +                      if (offset > 0)
 +                              offset += snprintf(sobs + offset, max_write,
 +                                                      ", ");
 +
 +                      offset += snprintf(sobs + offset, max_write, "%u", sob);
 +              }
 +      }
 +}
 +
 +static int gaudi_print_single_monitor(char **buf, size_t *size, size_t *offset,
 +                              struct hl_device *hdev,
 +                              struct hl_mon_state_dump *mon)
 +{
 +      const char *name;
 +      char scratch_buf1[BIN_REG_STRING_SIZE],
 +              scratch_buf2[BIN_REG_STRING_SIZE];
 +      char monitored_sobs[MONITOR_SOB_STRING_SIZE] = {0};
 +
 +      name = hl_state_dump_get_monitor_name(hdev, mon);
 +      if (!name)
 +              name = "";
 +
 +      gaudi_fill_sobs_from_mon(monitored_sobs, mon);
 +
 +      return hl_snprintf_resize(
 +              buf, size, offset,
 +              "Mon id: %u%s, wait for group id: %u mask %s to reach val: %u and write %u to address 0x%llx. Pending: %s. Means sync objects [%s] are being monitored.",
 +              mon->id, name,
 +              FIELD_GET(SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_SID_MASK,
 +                              mon->arm_data),
 +              hl_format_as_binary(
 +                      scratch_buf1, sizeof(scratch_buf1),
 +                      FIELD_GET(
 +                              SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_MASK_MASK,
 +                              mon->arm_data)),
 +              FIELD_GET(SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_SOD_MASK,
 +                              mon->arm_data),
 +              mon->wr_data,
 +              (((u64)mon->wr_addr_high) << 32) | mon->wr_addr_low,
 +              hl_format_as_binary(
 +                      scratch_buf2, sizeof(scratch_buf2),
 +                      FIELD_GET(
 +                              SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_STATUS_0_PENDING_MASK,
 +                              mon->status)),
 +              monitored_sobs);
 +}
 +
 +
 +static int gaudi_print_fences_single_engine(
 +      struct hl_device *hdev, u64 base_offset, u64 status_base_offset,
 +      enum hl_sync_engine_type engine_type, u32 engine_id, char **buf,
 +      size_t *size, size_t *offset)
 +{
 +      struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
 +      int rc = -ENOMEM, i;
 +      u32 *statuses, *fences;
 +
 +      statuses = kcalloc(sds->props[SP_ENGINE_NUM_OF_QUEUES],
 +                      sizeof(*statuses), GFP_KERNEL);
 +      if (!statuses)
 +              goto out;
 +
 +      fences = kcalloc(sds->props[SP_ENGINE_NUM_OF_FENCES] *
 +                              sds->props[SP_ENGINE_NUM_OF_QUEUES],
 +                       sizeof(*fences), GFP_KERNEL);
 +      if (!fences)
 +              goto free_status;
 +
 +      for (i = 0; i < sds->props[SP_ENGINE_NUM_OF_FENCES]; ++i)
 +              statuses[i] = RREG32(status_base_offset + i * sizeof(u32));
 +
 +      for (i = 0; i < sds->props[SP_ENGINE_NUM_OF_FENCES] *
 +                              sds->props[SP_ENGINE_NUM_OF_QUEUES]; ++i)
 +              fences[i] = RREG32(base_offset + i * sizeof(u32));
 +
 +      /* The actual print */
 +      for (i = 0; i < sds->props[SP_ENGINE_NUM_OF_QUEUES]; ++i) {
 +              u32 fence_id;
 +              u64 fence_cnt, fence_rdata;
 +              const char *engine_name;
 +
 +              if (!FIELD_GET(TPC0_QM_CP_STS_0_FENCE_IN_PROGRESS_MASK,
 +                      statuses[i]))
 +                      continue;
 +
 +              fence_id =
 +                      FIELD_GET(TPC0_QM_CP_STS_0_FENCE_ID_MASK, statuses[i]);
 +              fence_cnt = base_offset + CFG_BASE +
 +                      sizeof(u32) *
 +                      (i + fence_id * sds->props[SP_ENGINE_NUM_OF_QUEUES]);
 +              fence_rdata = fence_cnt - sds->props[SP_FENCE0_CNT_OFFSET] +
 +                              sds->props[SP_FENCE0_RDATA_OFFSET];
 +              engine_name = hl_sync_engine_to_string(engine_type);
 +
 +              rc = hl_snprintf_resize(
 +                      buf, size, offset,
 +                      "%s%u, stream %u: fence id %u cnt = 0x%llx (%s%u_QM.CP_FENCE%u_CNT_%u) rdata = 0x%llx (%s%u_QM.CP_FENCE%u_RDATA_%u) value = %u, cp_status = %u\n",
 +                      engine_name, engine_id,
 +                      i, fence_id,
 +                      fence_cnt, engine_name, engine_id, fence_id, i,
 +                      fence_rdata, engine_name, engine_id, fence_id, i,
 +                      fences[fence_id],
 +                      statuses[i]);
 +              if (rc)
 +                      goto free_fences;
 +      }
 +
 +      rc = 0;
 +
 +free_fences:
 +      kfree(fences);
 +free_status:
 +      kfree(statuses);
 +out:
 +      return rc;
 +}
 +
 +
 +static struct hl_state_dump_specs_funcs gaudi_state_dump_funcs = {
 +      .monitor_valid = gaudi_monitor_valid,
 +      .print_single_monitor = gaudi_print_single_monitor,
 +      .gen_sync_to_engine_map = gaudi_gen_sync_to_engine_map,
 +      .print_fences_single_engine = gaudi_print_fences_single_engine,
 +};
 +
 +static void gaudi_state_dump_init(struct hl_device *hdev)
 +{
 +      struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
 +      int i;
 +
 +      for (i = 0; i < ARRAY_SIZE(gaudi_so_id_to_str); ++i)
 +              hash_add(sds->so_id_to_str_tb,
 +                      &gaudi_so_id_to_str[i].node,
 +                      gaudi_so_id_to_str[i].id);
 +
 +      for (i = 0; i < ARRAY_SIZE(gaudi_monitor_id_to_str); ++i)
 +              hash_add(sds->monitor_id_to_str_tb,
 +                      &gaudi_monitor_id_to_str[i].node,
 +                      gaudi_monitor_id_to_str[i].id);
 +
 +      sds->props = gaudi_state_dump_specs_props;
 +
 +      sds->sync_namager_names = gaudi_sync_manager_names;
 +
 +      sds->funcs = gaudi_state_dump_funcs;
 +}
 +
 +static u32 *gaudi_get_stream_master_qid_arr(void)
 +{
 +      return gaudi_stream_master;
 +}
 +
 +static int gaudi_set_dram_properties(struct hl_device *hdev)
 +{
 +      return 0;
 +}
 +
 +static int gaudi_set_binning_masks(struct hl_device *hdev)
 +{
 +      return 0;
 +}
 +
 +static void gaudi_check_if_razwi_happened(struct hl_device *hdev)
 +{
 +}
 +
 +static ssize_t infineon_ver_show(struct device *dev, struct device_attribute *attr, char *buf)
 +{
 +      struct hl_device *hdev = dev_get_drvdata(dev);
 +      struct cpucp_info *cpucp_info;
 +
 +      cpucp_info = &hdev->asic_prop.cpucp_info;
 +
 +      return sprintf(buf, "%#04x\n", le32_to_cpu(cpucp_info->infineon_version));
 +}
 +
 +static DEVICE_ATTR_RO(infineon_ver);
 +
 +static struct attribute *gaudi_vrm_dev_attrs[] = {
 +      &dev_attr_infineon_ver.attr,
 +      NULL,
 +};
 +
 +static void gaudi_add_device_attr(struct hl_device *hdev, struct attribute_group *dev_clk_attr_grp,
 +                                      struct attribute_group *dev_vrm_attr_grp)
 +{
 +      hl_sysfs_add_dev_clk_attr(hdev, dev_clk_attr_grp);
 +      dev_vrm_attr_grp->attrs = gaudi_vrm_dev_attrs;
 +}
 +
 +static int gaudi_send_device_activity(struct hl_device *hdev, bool open)
 +{
 +      return 0;
 +}
 +
 +static const struct hl_asic_funcs gaudi_funcs = {
 +      .early_init = gaudi_early_init,
 +      .early_fini = gaudi_early_fini,
 +      .late_init = gaudi_late_init,
 +      .late_fini = gaudi_late_fini,
 +      .sw_init = gaudi_sw_init,
 +      .sw_fini = gaudi_sw_fini,
 +      .hw_init = gaudi_hw_init,
 +      .hw_fini = gaudi_hw_fini,
 +      .halt_engines = gaudi_halt_engines,
 +      .suspend = gaudi_suspend,
 +      .resume = gaudi_resume,
 +      .mmap = gaudi_mmap,
 +      .ring_doorbell = gaudi_ring_doorbell,
 +      .pqe_write = gaudi_pqe_write,
 +      .asic_dma_alloc_coherent = gaudi_dma_alloc_coherent,
 +      .asic_dma_free_coherent = gaudi_dma_free_coherent,
 +      .scrub_device_mem = gaudi_scrub_device_mem,
 +      .scrub_device_dram = gaudi_scrub_device_dram,
 +      .get_int_queue_base = gaudi_get_int_queue_base,
 +      .test_queues = gaudi_test_queues,
 +      .asic_dma_pool_zalloc = gaudi_dma_pool_zalloc,
 +      .asic_dma_pool_free = gaudi_dma_pool_free,
 +      .cpu_accessible_dma_pool_alloc = gaudi_cpu_accessible_dma_pool_alloc,
 +      .cpu_accessible_dma_pool_free = gaudi_cpu_accessible_dma_pool_free,
 +      .hl_dma_unmap_sgtable = hl_dma_unmap_sgtable,
 +      .cs_parser = gaudi_cs_parser,
 +      .asic_dma_map_sgtable = hl_dma_map_sgtable,
 +      .add_end_of_cb_packets = gaudi_add_end_of_cb_packets,
 +      .update_eq_ci = gaudi_update_eq_ci,
 +      .context_switch = gaudi_context_switch,
 +      .restore_phase_topology = gaudi_restore_phase_topology,
 +      .debugfs_read_dma = gaudi_debugfs_read_dma,
 +      .add_device_attr = gaudi_add_device_attr,
 +      .handle_eqe = gaudi_handle_eqe,
 +      .get_events_stat = gaudi_get_events_stat,
 +      .read_pte = gaudi_read_pte,
 +      .write_pte = gaudi_write_pte,
 +      .mmu_invalidate_cache = gaudi_mmu_invalidate_cache,
 +      .mmu_invalidate_cache_range = gaudi_mmu_invalidate_cache_range,
 +      .mmu_prefetch_cache_range = NULL,
 +      .send_heartbeat = gaudi_send_heartbeat,
 +      .debug_coresight = gaudi_debug_coresight,
 +      .is_device_idle = gaudi_is_device_idle,
 +      .compute_reset_late_init = gaudi_compute_reset_late_init,
 +      .hw_queues_lock = gaudi_hw_queues_lock,
 +      .hw_queues_unlock = gaudi_hw_queues_unlock,
 +      .get_pci_id = gaudi_get_pci_id,
 +      .get_eeprom_data = gaudi_get_eeprom_data,
 +      .get_monitor_dump = gaudi_get_monitor_dump,
 +      .send_cpu_message = gaudi_send_cpu_message,
 +      .pci_bars_map = gaudi_pci_bars_map,
 +      .init_iatu = gaudi_init_iatu,
 +      .rreg = hl_rreg,
 +      .wreg = hl_wreg,
 +      .halt_coresight = gaudi_halt_coresight,
 +      .ctx_init = gaudi_ctx_init,
 +      .ctx_fini = gaudi_ctx_fini,
 +      .pre_schedule_cs = gaudi_pre_schedule_cs,
 +      .get_queue_id_for_cq = gaudi_get_queue_id_for_cq,
 +      .load_firmware_to_device = gaudi_load_firmware_to_device,
 +      .load_boot_fit_to_device = gaudi_load_boot_fit_to_device,
 +      .get_signal_cb_size = gaudi_get_signal_cb_size,
 +      .get_wait_cb_size = gaudi_get_wait_cb_size,
 +      .gen_signal_cb = gaudi_gen_signal_cb,
 +      .gen_wait_cb = gaudi_gen_wait_cb,
 +      .reset_sob = gaudi_reset_sob,
 +      .reset_sob_group = gaudi_reset_sob_group,
 +      .get_device_time = gaudi_get_device_time,
 +      .pb_print_security_errors = NULL,
 +      .collective_wait_init_cs = gaudi_collective_wait_init_cs,
 +      .collective_wait_create_jobs = gaudi_collective_wait_create_jobs,
 +      .get_dec_base_addr = NULL,
 +      .scramble_addr = hl_mmu_scramble_addr,
 +      .descramble_addr = hl_mmu_descramble_addr,
 +      .ack_protection_bits_errors = gaudi_ack_protection_bits_errors,
 +      .get_hw_block_id = gaudi_get_hw_block_id,
 +      .hw_block_mmap = gaudi_block_mmap,
 +      .enable_events_from_fw = gaudi_enable_events_from_fw,
 +      .ack_mmu_errors = gaudi_ack_mmu_page_fault_or_access_error,
 +      .map_pll_idx_to_fw_idx = gaudi_map_pll_idx_to_fw_idx,
 +      .init_firmware_preload_params = gaudi_init_firmware_preload_params,
 +      .init_firmware_loader = gaudi_init_firmware_loader,
 +      .init_cpu_scrambler_dram = gaudi_init_scrambler_hbm,
 +      .state_dump_init = gaudi_state_dump_init,
 +      .get_sob_addr = gaudi_get_sob_addr,
 +      .set_pci_memory_regions = gaudi_set_pci_memory_regions,
 +      .get_stream_master_qid_arr = gaudi_get_stream_master_qid_arr,
 +      .check_if_razwi_happened = gaudi_check_if_razwi_happened,
 +      .mmu_get_real_page_size = hl_mmu_get_real_page_size,
 +      .access_dev_mem = hl_access_dev_mem,
 +      .set_dram_bar_base = gaudi_set_hbm_bar_base,
 +      .send_device_activity = gaudi_send_device_activity,
 +      .set_dram_properties = gaudi_set_dram_properties,
 +      .set_binning_masks = gaudi_set_binning_masks,
 +};
 +
 +/**
 + * gaudi_set_asic_funcs - set GAUDI function pointers
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + */
 +void gaudi_set_asic_funcs(struct hl_device *hdev)
 +{
 +      hdev->asic_funcs = &gaudi_funcs;
 +}
index f1f2a58ee68c2aaae77b77fc956172fbef501e93,0000000000000000000000000000000000000000..6f415fa94eee9d314aefa406cf11875da4bec3b7
mode 100644,000000..100644
--- /dev/null
@@@ -1,10735 -1,0 +1,10735 @@@
-       vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
-                       VM_DONTCOPY | VM_NORESERVE;
 +// SPDX-License-Identifier: GPL-2.0
 +
 +/*
 + * Copyright 2020-2022 HabanaLabs, Ltd.
 + * All Rights Reserved.
 + */
 +
 +#include "gaudi2P.h"
 +#include "gaudi2_masks.h"
 +#include "../include/gaudi2/gaudi2_special_blocks.h"
 +#include "../include/hw_ip/mmu/mmu_general.h"
 +#include "../include/hw_ip/mmu/mmu_v2_0.h"
 +#include "../include/gaudi2/gaudi2_packets.h"
 +#include "../include/gaudi2/gaudi2_reg_map.h"
 +#include "../include/gaudi2/gaudi2_async_ids_map_extended.h"
 +#include "../include/gaudi2/arc/gaudi2_arc_common_packets.h"
 +
 +#include <linux/module.h>
 +#include <linux/pci.h>
 +#include <linux/hwmon.h>
 +#include <linux/iommu.h>
 +
 +#define GAUDI2_DMA_POOL_BLK_SIZE              SZ_256          /* 256 bytes */
 +
 +#define GAUDI2_RESET_TIMEOUT_MSEC             2000            /* 2000ms */
 +#define GAUDI2_RESET_POLL_TIMEOUT_USEC                50000           /* 50ms */
 +#define GAUDI2_PLDM_HRESET_TIMEOUT_MSEC               25000           /* 25s */
 +#define GAUDI2_PLDM_SRESET_TIMEOUT_MSEC               25000           /* 25s */
 +#define GAUDI2_PLDM_RESET_POLL_TIMEOUT_USEC   3000000         /* 3s */
 +#define GAUDI2_RESET_POLL_CNT                 3
 +#define GAUDI2_RESET_WAIT_MSEC                        1               /* 1ms */
 +#define GAUDI2_CPU_RESET_WAIT_MSEC            100             /* 100ms */
 +#define GAUDI2_PLDM_RESET_WAIT_MSEC           1000            /* 1s */
 +#define GAUDI2_CB_POOL_CB_CNT                 512
 +#define GAUDI2_CB_POOL_CB_SIZE                        SZ_128K         /* 128KB */
 +#define GAUDI2_MSG_TO_CPU_TIMEOUT_USEC                4000000         /* 4s */
 +#define GAUDI2_WAIT_FOR_BL_TIMEOUT_USEC               25000000        /* 25s */
 +#define GAUDI2_TEST_QUEUE_WAIT_USEC           100000          /* 100ms */
 +#define GAUDI2_PLDM_TEST_QUEUE_WAIT_USEC      1000000         /* 1s */
 +
 +#define GAUDI2_ALLOC_CPU_MEM_RETRY_CNT                3
 +
 +/*
 + * since the code already has built-in support for binning of up to MAX_FAULTY_TPCS TPCs
 + * and the code relies on that value (for array size etc..) we define another value
 + * for MAX faulty TPCs which reflects the cluster binning requirements
 + */
 +#define MAX_CLUSTER_BINNING_FAULTY_TPCS               1
 +#define MAX_FAULTY_XBARS                      1
 +#define MAX_FAULTY_EDMAS                      1
 +#define MAX_FAULTY_DECODERS                   1
 +
 +#define GAUDI2_TPC_FULL_MASK                  0x1FFFFFF
 +#define GAUDI2_HIF_HMMU_FULL_MASK             0xFFFF
 +#define GAUDI2_DECODER_FULL_MASK              0x3FF
 +
 +#define GAUDI2_NA_EVENT_CAUSE                 0xFF
 +#define GAUDI2_NUM_OF_QM_ERR_CAUSE            18
 +#define GAUDI2_NUM_OF_QM_LCP_ERR_CAUSE                25
 +#define GAUDI2_NUM_OF_QM_ARB_ERR_CAUSE                3
 +#define GAUDI2_NUM_OF_ARC_SEI_ERR_CAUSE               14
 +#define GAUDI2_NUM_OF_CPU_SEI_ERR_CAUSE               3
 +#define GAUDI2_NUM_OF_QM_SEI_ERR_CAUSE                2
 +#define GAUDI2_NUM_OF_ROT_ERR_CAUSE           22
 +#define GAUDI2_NUM_OF_TPC_INTR_CAUSE          30
 +#define GAUDI2_NUM_OF_DEC_ERR_CAUSE           25
 +#define GAUDI2_NUM_OF_MME_ERR_CAUSE           16
 +#define GAUDI2_NUM_OF_MME_SBTE_ERR_CAUSE      5
 +#define GAUDI2_NUM_OF_MME_WAP_ERR_CAUSE               7
 +#define GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE     8
 +#define GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE               19
 +#define GAUDI2_NUM_OF_HBM_SEI_CAUSE           9
 +#define GAUDI2_NUM_OF_SM_SEI_ERR_CAUSE                3
 +#define GAUDI2_NUM_OF_PCIE_ADDR_DEC_ERR_CAUSE 3
 +#define GAUDI2_NUM_OF_PMMU_FATAL_ERR_CAUSE    2
 +#define GAUDI2_NUM_OF_HIF_FATAL_ERR_CAUSE     2
 +#define GAUDI2_NUM_OF_AXI_DRAIN_ERR_CAUSE     2
 +#define GAUDI2_NUM_OF_HBM_MC_SPI_CAUSE                5
 +
 +#define GAUDI2_MMU_CACHE_INV_TIMEOUT_USEC     (MMU_CONFIG_TIMEOUT_USEC * 10)
 +#define GAUDI2_PLDM_MMU_TIMEOUT_USEC          (MMU_CONFIG_TIMEOUT_USEC * 200)
 +#define GAUDI2_ARB_WDT_TIMEOUT                        (0x1000000)
 +
 +#define GAUDI2_VDEC_TIMEOUT_USEC              10000           /* 10ms */
 +#define GAUDI2_PLDM_VDEC_TIMEOUT_USEC         (GAUDI2_VDEC_TIMEOUT_USEC * 100)
 +
 +#define KDMA_TIMEOUT_USEC                     USEC_PER_SEC
 +
 +#define IS_DMA_IDLE(dma_core_idle_ind_mask)   \
 +      (!((dma_core_idle_ind_mask) &           \
 +      ((DCORE0_EDMA0_CORE_IDLE_IND_MASK_DESC_CNT_STS_MASK) | \
 +      (DCORE0_EDMA0_CORE_IDLE_IND_MASK_COMP_MASK))))
 +
 +#define IS_MME_IDLE(mme_arch_sts) (((mme_arch_sts) & MME_ARCH_IDLE_MASK) == MME_ARCH_IDLE_MASK)
 +
 +#define IS_TPC_IDLE(tpc_cfg_sts) (((tpc_cfg_sts) & (TPC_IDLE_MASK)) == (TPC_IDLE_MASK))
 +
 +#define IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts) \
 +      ((((qm_glbl_sts0) & (QM_IDLE_MASK)) == (QM_IDLE_MASK)) && \
 +      (((qm_glbl_sts1) & (QM_ARC_IDLE_MASK)) == (QM_ARC_IDLE_MASK)) && \
 +      (((qm_cgm_sts) & (CGM_IDLE_MASK)) == (CGM_IDLE_MASK)))
 +
 +#define PCIE_DEC_EN_MASK                      0x300
 +#define DEC_WORK_STATE_IDLE                   0
 +#define DEC_WORK_STATE_PEND                   3
 +#define IS_DEC_IDLE(dec_swreg15) \
 +      (((dec_swreg15) & DCORE0_DEC0_CMD_SWREG15_SW_WORK_STATE_MASK) == DEC_WORK_STATE_IDLE || \
 +      ((dec_swreg15) & DCORE0_DEC0_CMD_SWREG15_SW_WORK_STATE_MASK) ==  DEC_WORK_STATE_PEND)
 +
 +/* HBM MMU address scrambling parameters */
 +#define GAUDI2_HBM_MMU_SCRM_MEM_SIZE          SZ_8M
 +#define GAUDI2_HBM_MMU_SCRM_DIV_SHIFT         26
 +#define GAUDI2_HBM_MMU_SCRM_MOD_SHIFT         0
 +#define GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK      DRAM_VA_HINT_MASK
 +#define GAUDI2_COMPENSATE_TLB_PAGE_SIZE_FACTOR        16
 +#define MMU_RANGE_INV_VA_LSB_SHIFT            12
 +#define MMU_RANGE_INV_VA_MSB_SHIFT            44
 +#define MMU_RANGE_INV_EN_SHIFT                        0
 +#define MMU_RANGE_INV_ASID_EN_SHIFT           1
 +#define MMU_RANGE_INV_ASID_SHIFT              2
 +
 +/* The last SPI_SEI cause bit, "burst_fifo_full", is expected to be triggered in PMMU because it has
 + * a 2 entries FIFO, and hence it is not enabled for it.
 + */
 +#define GAUDI2_PMMU_SPI_SEI_ENABLE_MASK               GENMASK(GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE - 2, 0)
 +#define GAUDI2_HMMU_SPI_SEI_ENABLE_MASK               GENMASK(GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE - 1, 0)
 +
 +#define GAUDI2_MAX_STRING_LEN                 64
 +
 +#define GAUDI2_VDEC_MSIX_ENTRIES              (GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM - \
 +                                                      GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM + 1)
 +
 +#define ENGINE_ID_DCORE_OFFSET (GAUDI2_DCORE1_ENGINE_ID_EDMA_0 - GAUDI2_DCORE0_ENGINE_ID_EDMA_0)
 +
 +enum hl_pmmu_fatal_cause {
 +      LATENCY_RD_OUT_FIFO_OVERRUN,
 +      LATENCY_WR_OUT_FIFO_OVERRUN,
 +};
 +
 +enum hl_pcie_drain_ind_cause {
 +      LBW_AXI_DRAIN_IND,
 +      HBW_AXI_DRAIN_IND
 +};
 +
 +static const u32 cluster_hmmu_hif_enabled_mask[GAUDI2_HBM_NUM] = {
 +      [HBM_ID0] = 0xFFFC,
 +      [HBM_ID1] = 0xFFCF,
 +      [HBM_ID2] = 0xF7F7,
 +      [HBM_ID3] = 0x7F7F,
 +      [HBM_ID4] = 0xFCFF,
 +      [HBM_ID5] = 0xCFFF,
 +};
 +
 +static const u8 xbar_edge_to_hbm_cluster[EDMA_ID_SIZE] = {
 +      [0] = HBM_ID0,
 +      [1] = HBM_ID1,
 +      [2] = HBM_ID4,
 +      [3] = HBM_ID5,
 +};
 +
 +static const u8 edma_to_hbm_cluster[EDMA_ID_SIZE] = {
 +      [EDMA_ID_DCORE0_INSTANCE0] = HBM_ID0,
 +      [EDMA_ID_DCORE0_INSTANCE1] = HBM_ID2,
 +      [EDMA_ID_DCORE1_INSTANCE0] = HBM_ID1,
 +      [EDMA_ID_DCORE1_INSTANCE1] = HBM_ID3,
 +      [EDMA_ID_DCORE2_INSTANCE0] = HBM_ID2,
 +      [EDMA_ID_DCORE2_INSTANCE1] = HBM_ID4,
 +      [EDMA_ID_DCORE3_INSTANCE0] = HBM_ID3,
 +      [EDMA_ID_DCORE3_INSTANCE1] = HBM_ID5,
 +};
 +
 +static const int gaudi2_qman_async_event_id[] = {
 +      [GAUDI2_QUEUE_ID_PDMA_0_0] = GAUDI2_EVENT_PDMA0_QM,
 +      [GAUDI2_QUEUE_ID_PDMA_0_1] = GAUDI2_EVENT_PDMA0_QM,
 +      [GAUDI2_QUEUE_ID_PDMA_0_2] = GAUDI2_EVENT_PDMA0_QM,
 +      [GAUDI2_QUEUE_ID_PDMA_0_3] = GAUDI2_EVENT_PDMA0_QM,
 +      [GAUDI2_QUEUE_ID_PDMA_1_0] = GAUDI2_EVENT_PDMA1_QM,
 +      [GAUDI2_QUEUE_ID_PDMA_1_1] = GAUDI2_EVENT_PDMA1_QM,
 +      [GAUDI2_QUEUE_ID_PDMA_1_2] = GAUDI2_EVENT_PDMA1_QM,
 +      [GAUDI2_QUEUE_ID_PDMA_1_3] = GAUDI2_EVENT_PDMA1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0] = GAUDI2_EVENT_HDMA0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_1] = GAUDI2_EVENT_HDMA0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_2] = GAUDI2_EVENT_HDMA0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_3] = GAUDI2_EVENT_HDMA0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0] = GAUDI2_EVENT_HDMA1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_1] = GAUDI2_EVENT_HDMA1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_2] = GAUDI2_EVENT_HDMA1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3] = GAUDI2_EVENT_HDMA1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_0] = GAUDI2_EVENT_MME0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_1] = GAUDI2_EVENT_MME0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_2] = GAUDI2_EVENT_MME0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_3] = GAUDI2_EVENT_MME0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_0] = GAUDI2_EVENT_TPC0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_1] = GAUDI2_EVENT_TPC0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_2] = GAUDI2_EVENT_TPC0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_3] = GAUDI2_EVENT_TPC0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_0] = GAUDI2_EVENT_TPC1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_1] = GAUDI2_EVENT_TPC1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_2] = GAUDI2_EVENT_TPC1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_3] = GAUDI2_EVENT_TPC1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_0] = GAUDI2_EVENT_TPC2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_1] = GAUDI2_EVENT_TPC2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_2] = GAUDI2_EVENT_TPC2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_3] = GAUDI2_EVENT_TPC2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_0] = GAUDI2_EVENT_TPC3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_1] = GAUDI2_EVENT_TPC3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_2] = GAUDI2_EVENT_TPC3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_3] = GAUDI2_EVENT_TPC3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_0] = GAUDI2_EVENT_TPC4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_1] = GAUDI2_EVENT_TPC4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_2] = GAUDI2_EVENT_TPC4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_3] = GAUDI2_EVENT_TPC4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_0] = GAUDI2_EVENT_TPC5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_1] = GAUDI2_EVENT_TPC5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_2] = GAUDI2_EVENT_TPC5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_3] = GAUDI2_EVENT_TPC5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_0] = GAUDI2_EVENT_TPC24_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_1] = GAUDI2_EVENT_TPC24_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_2] = GAUDI2_EVENT_TPC24_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_3] = GAUDI2_EVENT_TPC24_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0] = GAUDI2_EVENT_HDMA2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_1] = GAUDI2_EVENT_HDMA2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_2] = GAUDI2_EVENT_HDMA2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_3] = GAUDI2_EVENT_HDMA2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0] = GAUDI2_EVENT_HDMA3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_1] = GAUDI2_EVENT_HDMA3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_2] = GAUDI2_EVENT_HDMA3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3] = GAUDI2_EVENT_HDMA3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_0] = GAUDI2_EVENT_MME1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_1] = GAUDI2_EVENT_MME1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_2] = GAUDI2_EVENT_MME1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_3] = GAUDI2_EVENT_MME1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_0] = GAUDI2_EVENT_TPC6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_1] = GAUDI2_EVENT_TPC6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_2] = GAUDI2_EVENT_TPC6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_3] = GAUDI2_EVENT_TPC6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_0] = GAUDI2_EVENT_TPC7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_1] = GAUDI2_EVENT_TPC7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_2] = GAUDI2_EVENT_TPC7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_3] = GAUDI2_EVENT_TPC7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_0] = GAUDI2_EVENT_TPC8_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_1] = GAUDI2_EVENT_TPC8_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_2] = GAUDI2_EVENT_TPC8_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_3] = GAUDI2_EVENT_TPC8_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_0] = GAUDI2_EVENT_TPC9_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_1] = GAUDI2_EVENT_TPC9_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_2] = GAUDI2_EVENT_TPC9_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_3] = GAUDI2_EVENT_TPC9_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_0] = GAUDI2_EVENT_TPC10_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_1] = GAUDI2_EVENT_TPC10_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_2] = GAUDI2_EVENT_TPC10_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_3] = GAUDI2_EVENT_TPC10_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_0] = GAUDI2_EVENT_TPC11_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_1] = GAUDI2_EVENT_TPC11_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_2] = GAUDI2_EVENT_TPC11_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_3] = GAUDI2_EVENT_TPC11_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0] = GAUDI2_EVENT_HDMA4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_1] = GAUDI2_EVENT_HDMA4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_2] = GAUDI2_EVENT_HDMA4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_3] = GAUDI2_EVENT_HDMA4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0] = GAUDI2_EVENT_HDMA5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_1] = GAUDI2_EVENT_HDMA5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_2] = GAUDI2_EVENT_HDMA5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3] = GAUDI2_EVENT_HDMA5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_0] = GAUDI2_EVENT_MME2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_1] = GAUDI2_EVENT_MME2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_2] = GAUDI2_EVENT_MME2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_3] = GAUDI2_EVENT_MME2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_0] = GAUDI2_EVENT_TPC12_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_1] = GAUDI2_EVENT_TPC12_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_2] = GAUDI2_EVENT_TPC12_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_3] = GAUDI2_EVENT_TPC12_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_0] = GAUDI2_EVENT_TPC13_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_1] = GAUDI2_EVENT_TPC13_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_2] = GAUDI2_EVENT_TPC13_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_3] = GAUDI2_EVENT_TPC13_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_0] = GAUDI2_EVENT_TPC14_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_1] = GAUDI2_EVENT_TPC14_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_2] = GAUDI2_EVENT_TPC14_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_3] = GAUDI2_EVENT_TPC14_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_0] = GAUDI2_EVENT_TPC15_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_1] = GAUDI2_EVENT_TPC15_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_2] = GAUDI2_EVENT_TPC15_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_3] = GAUDI2_EVENT_TPC15_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_0] = GAUDI2_EVENT_TPC16_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_1] = GAUDI2_EVENT_TPC16_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_2] = GAUDI2_EVENT_TPC16_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_3] = GAUDI2_EVENT_TPC16_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_0] = GAUDI2_EVENT_TPC17_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_1] = GAUDI2_EVENT_TPC17_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_2] = GAUDI2_EVENT_TPC17_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_3] = GAUDI2_EVENT_TPC17_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0] = GAUDI2_EVENT_HDMA6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_1] = GAUDI2_EVENT_HDMA6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_2] = GAUDI2_EVENT_HDMA6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_3] = GAUDI2_EVENT_HDMA6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0] = GAUDI2_EVENT_HDMA7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_1] = GAUDI2_EVENT_HDMA7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_2] = GAUDI2_EVENT_HDMA7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3] = GAUDI2_EVENT_HDMA7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_0] = GAUDI2_EVENT_MME3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_1] = GAUDI2_EVENT_MME3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_2] = GAUDI2_EVENT_MME3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_3] = GAUDI2_EVENT_MME3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_0] = GAUDI2_EVENT_TPC18_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_1] = GAUDI2_EVENT_TPC18_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_2] = GAUDI2_EVENT_TPC18_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_3] = GAUDI2_EVENT_TPC18_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_0] = GAUDI2_EVENT_TPC19_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_1] = GAUDI2_EVENT_TPC19_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_2] = GAUDI2_EVENT_TPC19_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_3] = GAUDI2_EVENT_TPC19_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_0] = GAUDI2_EVENT_TPC20_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_1] = GAUDI2_EVENT_TPC20_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_2] = GAUDI2_EVENT_TPC20_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_3] = GAUDI2_EVENT_TPC20_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_0] = GAUDI2_EVENT_TPC21_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_1] = GAUDI2_EVENT_TPC21_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_2] = GAUDI2_EVENT_TPC21_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_3] = GAUDI2_EVENT_TPC21_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_0] = GAUDI2_EVENT_TPC22_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_1] = GAUDI2_EVENT_TPC22_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_2] = GAUDI2_EVENT_TPC22_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_3] = GAUDI2_EVENT_TPC22_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_0] = GAUDI2_EVENT_TPC23_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_1] = GAUDI2_EVENT_TPC23_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_2] = GAUDI2_EVENT_TPC23_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_3] = GAUDI2_EVENT_TPC23_QM,
 +      [GAUDI2_QUEUE_ID_NIC_0_0] = GAUDI2_EVENT_NIC0_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_0_1] = GAUDI2_EVENT_NIC0_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_0_2] = GAUDI2_EVENT_NIC0_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_0_3] = GAUDI2_EVENT_NIC0_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_1_0] = GAUDI2_EVENT_NIC0_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_1_1] = GAUDI2_EVENT_NIC0_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_1_2] = GAUDI2_EVENT_NIC0_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_1_3] = GAUDI2_EVENT_NIC0_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_2_0] = GAUDI2_EVENT_NIC1_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_2_1] = GAUDI2_EVENT_NIC1_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_2_2] = GAUDI2_EVENT_NIC1_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_2_3] = GAUDI2_EVENT_NIC1_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_3_0] = GAUDI2_EVENT_NIC1_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_3_1] = GAUDI2_EVENT_NIC1_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_3_2] = GAUDI2_EVENT_NIC1_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_3_3] = GAUDI2_EVENT_NIC1_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_4_0] = GAUDI2_EVENT_NIC2_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_4_1] = GAUDI2_EVENT_NIC2_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_4_2] = GAUDI2_EVENT_NIC2_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_4_3] = GAUDI2_EVENT_NIC2_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_5_0] = GAUDI2_EVENT_NIC2_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_5_1] = GAUDI2_EVENT_NIC2_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_5_2] = GAUDI2_EVENT_NIC2_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_5_3] = GAUDI2_EVENT_NIC2_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_6_0] = GAUDI2_EVENT_NIC3_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_6_1] = GAUDI2_EVENT_NIC3_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_6_2] = GAUDI2_EVENT_NIC3_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_6_3] = GAUDI2_EVENT_NIC3_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_7_0] = GAUDI2_EVENT_NIC3_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_7_1] = GAUDI2_EVENT_NIC3_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_7_2] = GAUDI2_EVENT_NIC3_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_7_3] = GAUDI2_EVENT_NIC3_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_8_0] = GAUDI2_EVENT_NIC4_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_8_1] = GAUDI2_EVENT_NIC4_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_8_2] = GAUDI2_EVENT_NIC4_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_8_3] = GAUDI2_EVENT_NIC4_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_9_0] = GAUDI2_EVENT_NIC4_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_9_1] = GAUDI2_EVENT_NIC4_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_9_2] = GAUDI2_EVENT_NIC4_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_9_3] = GAUDI2_EVENT_NIC4_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_10_0] = GAUDI2_EVENT_NIC5_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_10_1] = GAUDI2_EVENT_NIC5_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_10_2] = GAUDI2_EVENT_NIC5_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_10_3] = GAUDI2_EVENT_NIC5_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_11_0] = GAUDI2_EVENT_NIC5_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_11_1] = GAUDI2_EVENT_NIC5_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_11_2] = GAUDI2_EVENT_NIC5_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_11_3] = GAUDI2_EVENT_NIC5_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_12_0] = GAUDI2_EVENT_NIC6_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_12_1] = GAUDI2_EVENT_NIC6_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_12_2] = GAUDI2_EVENT_NIC6_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_12_3] = GAUDI2_EVENT_NIC6_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_13_0] = GAUDI2_EVENT_NIC6_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_13_1] = GAUDI2_EVENT_NIC6_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_13_2] = GAUDI2_EVENT_NIC6_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_13_3] = GAUDI2_EVENT_NIC6_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_14_0] = GAUDI2_EVENT_NIC7_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_14_1] = GAUDI2_EVENT_NIC7_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_14_2] = GAUDI2_EVENT_NIC7_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_14_3] = GAUDI2_EVENT_NIC7_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_15_0] = GAUDI2_EVENT_NIC7_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_15_1] = GAUDI2_EVENT_NIC7_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_15_2] = GAUDI2_EVENT_NIC7_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_15_3] = GAUDI2_EVENT_NIC7_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_16_0] = GAUDI2_EVENT_NIC8_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_16_1] = GAUDI2_EVENT_NIC8_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_16_2] = GAUDI2_EVENT_NIC8_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_16_3] = GAUDI2_EVENT_NIC8_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_17_0] = GAUDI2_EVENT_NIC8_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_17_1] = GAUDI2_EVENT_NIC8_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_17_2] = GAUDI2_EVENT_NIC8_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_17_3] = GAUDI2_EVENT_NIC8_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_18_0] = GAUDI2_EVENT_NIC9_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_18_1] = GAUDI2_EVENT_NIC9_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_18_2] = GAUDI2_EVENT_NIC9_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_18_3] = GAUDI2_EVENT_NIC9_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_19_0] = GAUDI2_EVENT_NIC9_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_19_1] = GAUDI2_EVENT_NIC9_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_19_2] = GAUDI2_EVENT_NIC9_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_19_3] = GAUDI2_EVENT_NIC9_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_20_0] = GAUDI2_EVENT_NIC10_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_20_1] = GAUDI2_EVENT_NIC10_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_20_2] = GAUDI2_EVENT_NIC10_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_20_3] = GAUDI2_EVENT_NIC10_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_21_0] = GAUDI2_EVENT_NIC10_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_21_1] = GAUDI2_EVENT_NIC10_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_21_2] = GAUDI2_EVENT_NIC10_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_21_3] = GAUDI2_EVENT_NIC10_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_22_0] = GAUDI2_EVENT_NIC11_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_22_1] = GAUDI2_EVENT_NIC11_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_22_2] = GAUDI2_EVENT_NIC11_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_22_3] = GAUDI2_EVENT_NIC11_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_23_0] = GAUDI2_EVENT_NIC11_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_23_1] = GAUDI2_EVENT_NIC11_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_23_2] = GAUDI2_EVENT_NIC11_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_23_3] = GAUDI2_EVENT_NIC11_QM1,
 +      [GAUDI2_QUEUE_ID_ROT_0_0] = GAUDI2_EVENT_ROTATOR0_ROT0_QM,
 +      [GAUDI2_QUEUE_ID_ROT_0_1] = GAUDI2_EVENT_ROTATOR0_ROT0_QM,
 +      [GAUDI2_QUEUE_ID_ROT_0_2] = GAUDI2_EVENT_ROTATOR0_ROT0_QM,
 +      [GAUDI2_QUEUE_ID_ROT_0_3] = GAUDI2_EVENT_ROTATOR0_ROT0_QM,
 +      [GAUDI2_QUEUE_ID_ROT_1_0] = GAUDI2_EVENT_ROTATOR1_ROT1_QM,
 +      [GAUDI2_QUEUE_ID_ROT_1_1] = GAUDI2_EVENT_ROTATOR1_ROT1_QM,
 +      [GAUDI2_QUEUE_ID_ROT_1_2] = GAUDI2_EVENT_ROTATOR1_ROT1_QM,
 +      [GAUDI2_QUEUE_ID_ROT_1_3] = GAUDI2_EVENT_ROTATOR1_ROT1_QM
 +};
 +
 +static const int gaudi2_dma_core_async_event_id[] = {
 +      [DMA_CORE_ID_EDMA0] = GAUDI2_EVENT_HDMA0_CORE,
 +      [DMA_CORE_ID_EDMA1] = GAUDI2_EVENT_HDMA1_CORE,
 +      [DMA_CORE_ID_EDMA2] = GAUDI2_EVENT_HDMA2_CORE,
 +      [DMA_CORE_ID_EDMA3] = GAUDI2_EVENT_HDMA3_CORE,
 +      [DMA_CORE_ID_EDMA4] = GAUDI2_EVENT_HDMA4_CORE,
 +      [DMA_CORE_ID_EDMA5] = GAUDI2_EVENT_HDMA5_CORE,
 +      [DMA_CORE_ID_EDMA6] = GAUDI2_EVENT_HDMA6_CORE,
 +      [DMA_CORE_ID_EDMA7] = GAUDI2_EVENT_HDMA7_CORE,
 +      [DMA_CORE_ID_PDMA0] = GAUDI2_EVENT_PDMA0_CORE,
 +      [DMA_CORE_ID_PDMA1] = GAUDI2_EVENT_PDMA1_CORE,
 +      [DMA_CORE_ID_KDMA] = GAUDI2_EVENT_KDMA0_CORE,
 +};
 +
 +static const char * const gaudi2_qm_sei_error_cause[GAUDI2_NUM_OF_QM_SEI_ERR_CAUSE] = {
 +      "qman sei intr",
 +      "arc sei intr"
 +};
 +
 +static const char * const gaudi2_cpu_sei_error_cause[GAUDI2_NUM_OF_CPU_SEI_ERR_CAUSE] = {
 +      "AXI_TERMINATOR WR",
 +      "AXI_TERMINATOR RD",
 +      "AXI SPLIT SEI Status"
 +};
 +
 +static const char * const gaudi2_arc_sei_error_cause[GAUDI2_NUM_OF_ARC_SEI_ERR_CAUSE] = {
 +      "cbu_bresp_sei_intr_cause",
 +      "cbu_rresp_sei_intr_cause",
 +      "lbu_bresp_sei_intr_cause",
 +      "lbu_rresp_sei_intr_cause",
 +      "cbu_axi_split_intr_cause",
 +      "lbu_axi_split_intr_cause",
 +      "arc_ip_excptn_sei_intr_cause",
 +      "dmi_bresp_sei_intr_cause",
 +      "aux2apb_err_sei_intr_cause",
 +      "cfg_lbw_wr_terminated_intr_cause",
 +      "cfg_lbw_rd_terminated_intr_cause",
 +      "cfg_dccm_wr_terminated_intr_cause",
 +      "cfg_dccm_rd_terminated_intr_cause",
 +      "cfg_hbw_rd_terminated_intr_cause"
 +};
 +
 +static const char * const gaudi2_dec_error_cause[GAUDI2_NUM_OF_DEC_ERR_CAUSE] = {
 +      "msix_vcd_hbw_sei",
 +      "msix_l2c_hbw_sei",
 +      "msix_nrm_hbw_sei",
 +      "msix_abnrm_hbw_sei",
 +      "msix_vcd_lbw_sei",
 +      "msix_l2c_lbw_sei",
 +      "msix_nrm_lbw_sei",
 +      "msix_abnrm_lbw_sei",
 +      "apb_vcd_lbw_sei",
 +      "apb_l2c_lbw_sei",
 +      "apb_nrm_lbw_sei",
 +      "apb_abnrm_lbw_sei",
 +      "dec_sei",
 +      "dec_apb_sei",
 +      "trc_apb_sei",
 +      "lbw_mstr_if_sei",
 +      "axi_split_bresp_err_sei",
 +      "hbw_axi_wr_viol_sei",
 +      "hbw_axi_rd_viol_sei",
 +      "lbw_axi_wr_viol_sei",
 +      "lbw_axi_rd_viol_sei",
 +      "vcd_spi",
 +      "l2c_spi",
 +      "nrm_spi",
 +      "abnrm_spi",
 +};
 +
 +static const char * const gaudi2_qman_error_cause[GAUDI2_NUM_OF_QM_ERR_CAUSE] = {
 +      "PQ AXI HBW error",
 +      "CQ AXI HBW error",
 +      "CP AXI HBW error",
 +      "CP error due to undefined OPCODE",
 +      "CP encountered STOP OPCODE",
 +      "CP AXI LBW error",
 +      "CP WRREG32 or WRBULK returned error",
 +      "N/A",
 +      "FENCE 0 inc over max value and clipped",
 +      "FENCE 1 inc over max value and clipped",
 +      "FENCE 2 inc over max value and clipped",
 +      "FENCE 3 inc over max value and clipped",
 +      "FENCE 0 dec under min value and clipped",
 +      "FENCE 1 dec under min value and clipped",
 +      "FENCE 2 dec under min value and clipped",
 +      "FENCE 3 dec under min value and clipped",
 +      "CPDMA Up overflow",
 +      "PQC L2H error"
 +};
 +
 +static const char * const gaudi2_qman_lower_cp_error_cause[GAUDI2_NUM_OF_QM_LCP_ERR_CAUSE] = {
 +      "RSVD0",
 +      "CQ AXI HBW error",
 +      "CP AXI HBW error",
 +      "CP error due to undefined OPCODE",
 +      "CP encountered STOP OPCODE",
 +      "CP AXI LBW error",
 +      "CP WRREG32 or WRBULK returned error",
 +      "N/A",
 +      "FENCE 0 inc over max value and clipped",
 +      "FENCE 1 inc over max value and clipped",
 +      "FENCE 2 inc over max value and clipped",
 +      "FENCE 3 inc over max value and clipped",
 +      "FENCE 0 dec under min value and clipped",
 +      "FENCE 1 dec under min value and clipped",
 +      "FENCE 2 dec under min value and clipped",
 +      "FENCE 3 dec under min value and clipped",
 +      "CPDMA Up overflow",
 +      "RSVD17",
 +      "CQ_WR_IFIFO_CI_ERR",
 +      "CQ_WR_CTL_CI_ERR",
 +      "ARC_CQF_RD_ERR",
 +      "ARC_CQ_WR_IFIFO_CI_ERR",
 +      "ARC_CQ_WR_CTL_CI_ERR",
 +      "ARC_AXI_ERR",
 +      "CP_SWITCH_WDT_ERR"
 +};
 +
 +static const char * const gaudi2_qman_arb_error_cause[GAUDI2_NUM_OF_QM_ARB_ERR_CAUSE] = {
 +      "Choice push while full error",
 +      "Choice Q watchdog error",
 +      "MSG AXI LBW returned with error"
 +};
 +
 +static const char * const guadi2_rot_error_cause[GAUDI2_NUM_OF_ROT_ERR_CAUSE] = {
 +      "qm_axi_err",
 +      "qm_trace_fence_events",
 +      "qm_sw_err",
 +      "qm_cp_sw_stop",
 +      "lbw_mstr_rresp_err",
 +      "lbw_mstr_bresp_err",
 +      "lbw_msg_slverr",
 +      "hbw_msg_slverr",
 +      "wbc_slverr",
 +      "hbw_mstr_rresp_err",
 +      "hbw_mstr_bresp_err",
 +      "sb_resp_intr",
 +      "mrsb_resp_intr",
 +      "core_dw_status_0",
 +      "core_dw_status_1",
 +      "core_dw_status_2",
 +      "core_dw_status_3",
 +      "core_dw_status_4",
 +      "core_dw_status_5",
 +      "core_dw_status_6",
 +      "core_dw_status_7",
 +      "async_arc2cpu_sei_intr",
 +};
 +
 +static const char * const gaudi2_tpc_interrupts_cause[GAUDI2_NUM_OF_TPC_INTR_CAUSE] = {
 +      "tpc_address_exceed_slm",
 +      "tpc_div_by_0",
 +      "tpc_spu_mac_overflow",
 +      "tpc_spu_addsub_overflow",
 +      "tpc_spu_abs_overflow",
 +      "tpc_spu_fma_fp_dst_nan",
 +      "tpc_spu_fma_fp_dst_inf",
 +      "tpc_spu_convert_fp_dst_nan",
 +      "tpc_spu_convert_fp_dst_inf",
 +      "tpc_spu_fp_dst_denorm",
 +      "tpc_vpu_mac_overflow",
 +      "tpc_vpu_addsub_overflow",
 +      "tpc_vpu_abs_overflow",
 +      "tpc_vpu_convert_fp_dst_nan",
 +      "tpc_vpu_convert_fp_dst_inf",
 +      "tpc_vpu_fma_fp_dst_nan",
 +      "tpc_vpu_fma_fp_dst_inf",
 +      "tpc_vpu_fp_dst_denorm",
 +      "tpc_assertions",
 +      "tpc_illegal_instruction",
 +      "tpc_pc_wrap_around",
 +      "tpc_qm_sw_err",
 +      "tpc_hbw_rresp_err",
 +      "tpc_hbw_bresp_err",
 +      "tpc_lbw_rresp_err",
 +      "tpc_lbw_bresp_err",
 +      "st_unlock_already_locked",
 +      "invalid_lock_access",
 +      "LD_L protection violation",
 +      "ST_L protection violation",
 +};
 +
 +static const char * const guadi2_mme_error_cause[GAUDI2_NUM_OF_MME_ERR_CAUSE] = {
 +      "agu_resp_intr",
 +      "qman_axi_err",
 +      "wap sei (wbc axi err)",
 +      "arc sei",
 +      "cfg access error",
 +      "qm_sw_err",
 +      "sbte_dbg_intr_0",
 +      "sbte_dbg_intr_1",
 +      "sbte_dbg_intr_2",
 +      "sbte_dbg_intr_3",
 +      "sbte_dbg_intr_4",
 +      "sbte_prtn_intr_0",
 +      "sbte_prtn_intr_1",
 +      "sbte_prtn_intr_2",
 +      "sbte_prtn_intr_3",
 +      "sbte_prtn_intr_4",
 +};
 +
 +static const char * const guadi2_mme_sbte_error_cause[GAUDI2_NUM_OF_MME_SBTE_ERR_CAUSE] = {
 +      "i0",
 +      "i1",
 +      "i2",
 +      "i3",
 +      "i4",
 +};
 +
 +static const char * const guadi2_mme_wap_error_cause[GAUDI2_NUM_OF_MME_WAP_ERR_CAUSE] = {
 +      "WBC ERR RESP_0",
 +      "WBC ERR RESP_1",
 +      "AP SOURCE POS INF",
 +      "AP SOURCE NEG INF",
 +      "AP SOURCE NAN",
 +      "AP RESULT POS INF",
 +      "AP RESULT NEG INF",
 +};
 +
 +static const char * const gaudi2_dma_core_interrupts_cause[GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE] = {
 +      "HBW Read returned with error RRESP",
 +      "HBW write returned with error BRESP",
 +      "LBW write returned with error BRESP",
 +      "descriptor_fifo_overflow",
 +      "KDMA SB LBW Read returned with error",
 +      "KDMA WBC LBW Write returned with error",
 +      "TRANSPOSE ENGINE DESC FIFO OVERFLOW",
 +      "WRONG CFG FOR COMMIT IN LIN DMA"
 +};
 +
 +static const char * const gaudi2_kdma_core_interrupts_cause[GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE] = {
 +      "HBW/LBW Read returned with error RRESP",
 +      "HBW/LBW write returned with error BRESP",
 +      "LBW write returned with error BRESP",
 +      "descriptor_fifo_overflow",
 +      "KDMA SB LBW Read returned with error",
 +      "KDMA WBC LBW Write returned with error",
 +      "TRANSPOSE ENGINE DESC FIFO OVERFLOW",
 +      "WRONG CFG FOR COMMIT IN LIN DMA"
 +};
 +
 +struct gaudi2_sm_sei_cause_data {
 +      const char *cause_name;
 +      const char *log_name;
 +};
 +
 +static const struct gaudi2_sm_sei_cause_data
 +gaudi2_sm_sei_cause[GAUDI2_NUM_OF_SM_SEI_ERR_CAUSE] = {
 +      {"calculated SO value overflow/underflow", "SOB ID"},
 +      {"payload address of monitor is not aligned to 4B", "monitor addr"},
 +      {"armed monitor write got BRESP (SLVERR or DECERR)", "AXI id"},
 +};
 +
 +static const char * const
 +gaudi2_pmmu_fatal_interrupts_cause[GAUDI2_NUM_OF_PMMU_FATAL_ERR_CAUSE] = {
 +      "LATENCY_RD_OUT_FIFO_OVERRUN",
 +      "LATENCY_WR_OUT_FIFO_OVERRUN",
 +};
 +
 +static const char * const
 +gaudi2_hif_fatal_interrupts_cause[GAUDI2_NUM_OF_HIF_FATAL_ERR_CAUSE] = {
 +      "LATENCY_RD_OUT_FIFO_OVERRUN",
 +      "LATENCY_WR_OUT_FIFO_OVERRUN",
 +};
 +
 +static const char * const
 +gaudi2_psoc_axi_drain_interrupts_cause[GAUDI2_NUM_OF_AXI_DRAIN_ERR_CAUSE] = {
 +      "AXI drain HBW",
 +      "AXI drain LBW",
 +};
 +
 +static const char * const
 +gaudi2_pcie_addr_dec_error_cause[GAUDI2_NUM_OF_PCIE_ADDR_DEC_ERR_CAUSE] = {
 +      "HBW error response",
 +      "LBW error response",
 +      "TLP is blocked by RR"
 +};
 +
 +const u32 gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_SIZE] = {
 +      [GAUDI2_QUEUE_ID_PDMA_0_0] = mmPDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_PDMA_0_1] = mmPDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_PDMA_0_2] = mmPDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_PDMA_0_3] = mmPDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_PDMA_1_0] = mmPDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_PDMA_1_1] = mmPDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_PDMA_1_2] = mmPDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_PDMA_1_3] = mmPDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0] = mmDCORE0_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_1] = mmDCORE0_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_2] = mmDCORE0_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_3] = mmDCORE0_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0] = mmDCORE0_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_1] = mmDCORE0_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_2] = mmDCORE0_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3] = mmDCORE0_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_0] = mmDCORE0_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_1] = mmDCORE0_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_2] = mmDCORE0_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_3] = mmDCORE0_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_0] = mmDCORE0_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_1] = mmDCORE0_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_2] = mmDCORE0_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_3] = mmDCORE0_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_0] = mmDCORE0_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_1] = mmDCORE0_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_2] = mmDCORE0_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_3] = mmDCORE0_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_0] = mmDCORE0_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_1] = mmDCORE0_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_2] = mmDCORE0_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_3] = mmDCORE0_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_0] = mmDCORE0_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_1] = mmDCORE0_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_2] = mmDCORE0_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_3] = mmDCORE0_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_0] = mmDCORE0_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_1] = mmDCORE0_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_2] = mmDCORE0_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_3] = mmDCORE0_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_0] = mmDCORE0_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_1] = mmDCORE0_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_2] = mmDCORE0_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_3] = mmDCORE0_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_0] = mmDCORE0_TPC6_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_1] = mmDCORE0_TPC6_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_2] = mmDCORE0_TPC6_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_3] = mmDCORE0_TPC6_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0] = mmDCORE1_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_1] = mmDCORE1_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_2] = mmDCORE1_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_3] = mmDCORE1_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0] = mmDCORE1_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_1] = mmDCORE1_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_2] = mmDCORE1_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3] = mmDCORE1_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_0] = mmDCORE1_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_1] = mmDCORE1_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_2] = mmDCORE1_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_3] = mmDCORE1_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_0] = mmDCORE1_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_1] = mmDCORE1_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_2] = mmDCORE1_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_3] = mmDCORE1_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_0] = mmDCORE1_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_1] = mmDCORE1_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_2] = mmDCORE1_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_3] = mmDCORE1_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_0] = mmDCORE1_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_1] = mmDCORE1_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_2] = mmDCORE1_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_3] = mmDCORE1_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_0] = mmDCORE1_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_1] = mmDCORE1_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_2] = mmDCORE1_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_3] = mmDCORE1_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_0] = mmDCORE1_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_1] = mmDCORE1_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_2] = mmDCORE1_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_3] = mmDCORE1_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_0] = mmDCORE1_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_1] = mmDCORE1_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_2] = mmDCORE1_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_3] = mmDCORE1_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0] = mmDCORE2_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_1] = mmDCORE2_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_2] = mmDCORE2_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_3] = mmDCORE2_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0] = mmDCORE2_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_1] = mmDCORE2_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_2] = mmDCORE2_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3] = mmDCORE2_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_0] = mmDCORE2_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_1] = mmDCORE2_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_2] = mmDCORE2_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_3] = mmDCORE2_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_0] = mmDCORE2_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_1] = mmDCORE2_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_2] = mmDCORE2_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_3] = mmDCORE2_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_0] = mmDCORE2_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_1] = mmDCORE2_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_2] = mmDCORE2_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_3] = mmDCORE2_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_0] = mmDCORE2_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_1] = mmDCORE2_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_2] = mmDCORE2_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_3] = mmDCORE2_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_0] = mmDCORE2_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_1] = mmDCORE2_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_2] = mmDCORE2_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_3] = mmDCORE2_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_0] = mmDCORE2_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_1] = mmDCORE2_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_2] = mmDCORE2_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_3] = mmDCORE2_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_0] = mmDCORE2_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_1] = mmDCORE2_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_2] = mmDCORE2_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_3] = mmDCORE2_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0] = mmDCORE3_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_1] = mmDCORE3_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_2] = mmDCORE3_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_3] = mmDCORE3_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0] = mmDCORE3_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_1] = mmDCORE3_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_2] = mmDCORE3_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3] = mmDCORE3_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_0] = mmDCORE3_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_1] = mmDCORE3_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_2] = mmDCORE3_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_3] = mmDCORE3_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_0] = mmDCORE3_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_1] = mmDCORE3_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_2] = mmDCORE3_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_3] = mmDCORE3_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_0] = mmDCORE3_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_1] = mmDCORE3_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_2] = mmDCORE3_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_3] = mmDCORE3_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_0] = mmDCORE3_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_1] = mmDCORE3_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_2] = mmDCORE3_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_3] = mmDCORE3_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_0] = mmDCORE3_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_1] = mmDCORE3_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_2] = mmDCORE3_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_3] = mmDCORE3_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_0] = mmDCORE3_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_1] = mmDCORE3_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_2] = mmDCORE3_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_3] = mmDCORE3_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_0] = mmDCORE3_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_1] = mmDCORE3_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_2] = mmDCORE3_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_3] = mmDCORE3_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_0_0] = mmNIC0_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_0_1] = mmNIC0_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_0_2] = mmNIC0_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_0_3] = mmNIC0_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_1_0] = mmNIC0_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_1_1] = mmNIC0_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_1_2] = mmNIC0_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_1_3] = mmNIC0_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_2_0] = mmNIC1_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_2_1] = mmNIC1_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_2_2] = mmNIC1_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_2_3] = mmNIC1_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_3_0] = mmNIC1_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_3_1] = mmNIC1_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_3_2] = mmNIC1_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_3_3] = mmNIC1_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_4_0] = mmNIC2_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_4_1] = mmNIC2_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_4_2] = mmNIC2_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_4_3] = mmNIC2_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_5_0] = mmNIC2_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_5_1] = mmNIC2_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_5_2] = mmNIC2_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_5_3] = mmNIC2_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_6_0] = mmNIC3_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_6_1] = mmNIC3_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_6_2] = mmNIC3_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_6_3] = mmNIC3_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_7_0] = mmNIC3_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_7_1] = mmNIC3_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_7_2] = mmNIC3_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_7_3] = mmNIC3_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_8_0] = mmNIC4_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_8_1] = mmNIC4_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_8_2] = mmNIC4_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_8_3] = mmNIC4_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_9_0] = mmNIC4_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_9_1] = mmNIC4_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_9_2] = mmNIC4_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_9_3] = mmNIC4_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_10_0] = mmNIC5_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_10_1] = mmNIC5_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_10_2] = mmNIC5_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_10_3] = mmNIC5_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_11_0] = mmNIC5_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_11_1] = mmNIC5_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_11_2] = mmNIC5_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_11_3] = mmNIC5_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_12_0] = mmNIC6_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_12_1] = mmNIC6_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_12_2] = mmNIC6_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_12_3] = mmNIC6_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_13_0] = mmNIC6_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_13_1] = mmNIC6_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_13_2] = mmNIC6_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_13_3] = mmNIC6_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_14_0] = mmNIC7_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_14_1] = mmNIC7_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_14_2] = mmNIC7_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_14_3] = mmNIC7_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_15_0] = mmNIC7_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_15_1] = mmNIC7_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_15_2] = mmNIC7_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_15_3] = mmNIC7_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_16_0] = mmNIC8_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_16_1] = mmNIC8_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_16_2] = mmNIC8_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_16_3] = mmNIC8_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_17_0] = mmNIC8_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_17_1] = mmNIC8_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_17_2] = mmNIC8_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_17_3] = mmNIC8_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_18_0] = mmNIC9_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_18_1] = mmNIC9_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_18_2] = mmNIC9_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_18_3] = mmNIC9_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_19_0] = mmNIC9_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_19_1] = mmNIC9_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_19_2] = mmNIC9_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_19_3] = mmNIC9_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_20_0] = mmNIC10_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_20_1] = mmNIC10_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_20_2] = mmNIC10_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_20_3] = mmNIC10_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_21_0] = mmNIC10_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_21_1] = mmNIC10_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_21_2] = mmNIC10_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_21_3] = mmNIC10_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_22_0] = mmNIC11_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_22_1] = mmNIC11_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_22_2] = mmNIC11_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_22_3] = mmNIC11_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_23_0] = mmNIC11_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_23_1] = mmNIC11_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_23_2] = mmNIC11_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_23_3] = mmNIC11_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_0_0] = mmROT0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_0_1] = mmROT0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_0_2] = mmROT0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_0_3] = mmROT0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_1_0] = mmROT1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_1_1] = mmROT1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_1_2] = mmROT1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_1_3] = mmROT1_QM_BASE
 +};
 +
 +static const u32 gaudi2_arc_blocks_bases[NUM_ARC_CPUS] = {
 +      [CPU_ID_SCHED_ARC0] = mmARC_FARM_ARC0_AUX_BASE,
 +      [CPU_ID_SCHED_ARC1] = mmARC_FARM_ARC1_AUX_BASE,
 +      [CPU_ID_SCHED_ARC2] = mmARC_FARM_ARC2_AUX_BASE,
 +      [CPU_ID_SCHED_ARC3] = mmARC_FARM_ARC3_AUX_BASE,
 +      [CPU_ID_SCHED_ARC4] = mmDCORE1_MME_QM_ARC_AUX_BASE,
 +      [CPU_ID_SCHED_ARC5] = mmDCORE3_MME_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC0] = mmDCORE0_TPC0_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC1] = mmDCORE0_TPC1_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC2] = mmDCORE0_TPC2_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC3] = mmDCORE0_TPC3_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC4] = mmDCORE0_TPC4_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC5] = mmDCORE0_TPC5_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC6] = mmDCORE1_TPC0_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC7] = mmDCORE1_TPC1_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC8] = mmDCORE1_TPC2_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC9] = mmDCORE1_TPC3_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC10] = mmDCORE1_TPC4_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC11] = mmDCORE1_TPC5_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC12] = mmDCORE2_TPC0_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC13] = mmDCORE2_TPC1_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC14] = mmDCORE2_TPC2_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC15] = mmDCORE2_TPC3_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC16] = mmDCORE2_TPC4_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC17] = mmDCORE2_TPC5_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC18] = mmDCORE3_TPC0_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC19] = mmDCORE3_TPC1_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC20] = mmDCORE3_TPC2_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC21] = mmDCORE3_TPC3_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC22] = mmDCORE3_TPC4_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC23] = mmDCORE3_TPC5_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC24] = mmDCORE0_TPC6_QM_ARC_AUX_BASE,
 +      [CPU_ID_MME_QMAN_ARC0] = mmDCORE0_MME_QM_ARC_AUX_BASE,
 +      [CPU_ID_MME_QMAN_ARC1] = mmDCORE2_MME_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC0] = mmDCORE0_EDMA0_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC1] = mmDCORE0_EDMA1_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC2] = mmDCORE1_EDMA0_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC3] = mmDCORE1_EDMA1_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC4] = mmDCORE2_EDMA0_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC5] = mmDCORE2_EDMA1_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC6] = mmDCORE3_EDMA0_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC7] = mmDCORE3_EDMA1_QM_ARC_AUX_BASE,
 +      [CPU_ID_PDMA_QMAN_ARC0] = mmPDMA0_QM_ARC_AUX_BASE,
 +      [CPU_ID_PDMA_QMAN_ARC1] = mmPDMA1_QM_ARC_AUX_BASE,
 +      [CPU_ID_ROT_QMAN_ARC0] = mmROT0_QM_ARC_AUX_BASE,
 +      [CPU_ID_ROT_QMAN_ARC1] = mmROT1_QM_ARC_AUX_BASE,
 +      [CPU_ID_NIC_QMAN_ARC0] = mmNIC0_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC1] = mmNIC0_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC2] = mmNIC1_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC3] = mmNIC1_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC4] = mmNIC2_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC5] = mmNIC2_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC6] = mmNIC3_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC7] = mmNIC3_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC8] = mmNIC4_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC9] = mmNIC4_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC10] = mmNIC5_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC11] = mmNIC5_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC12] = mmNIC6_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC13] = mmNIC6_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC14] = mmNIC7_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC15] = mmNIC7_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC16] = mmNIC8_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC17] = mmNIC8_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC18] = mmNIC9_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC19] = mmNIC9_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC20] = mmNIC10_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC21] = mmNIC10_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC22] = mmNIC11_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC23] = mmNIC11_QM_ARC_AUX1_BASE,
 +};
 +
 +static const u32 gaudi2_arc_dccm_bases[NUM_ARC_CPUS] = {
 +      [CPU_ID_SCHED_ARC0] = mmARC_FARM_ARC0_DCCM0_BASE,
 +      [CPU_ID_SCHED_ARC1] = mmARC_FARM_ARC1_DCCM0_BASE,
 +      [CPU_ID_SCHED_ARC2] = mmARC_FARM_ARC2_DCCM0_BASE,
 +      [CPU_ID_SCHED_ARC3] = mmARC_FARM_ARC3_DCCM0_BASE,
 +      [CPU_ID_SCHED_ARC4] = mmDCORE1_MME_QM_ARC_DCCM_BASE,
 +      [CPU_ID_SCHED_ARC5] = mmDCORE3_MME_QM_ARC_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC0] = mmDCORE0_TPC0_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC1] = mmDCORE0_TPC1_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC2] = mmDCORE0_TPC2_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC3] = mmDCORE0_TPC3_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC4] = mmDCORE0_TPC4_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC5] = mmDCORE0_TPC5_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC6] = mmDCORE1_TPC0_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC7] = mmDCORE1_TPC1_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC8] = mmDCORE1_TPC2_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC9] = mmDCORE1_TPC3_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC10] = mmDCORE1_TPC4_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC11] = mmDCORE1_TPC5_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC12] = mmDCORE2_TPC0_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC13] = mmDCORE2_TPC1_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC14] = mmDCORE2_TPC2_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC15] = mmDCORE2_TPC3_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC16] = mmDCORE2_TPC4_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC17] = mmDCORE2_TPC5_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC18] = mmDCORE3_TPC0_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC19] = mmDCORE3_TPC1_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC20] = mmDCORE3_TPC2_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC21] = mmDCORE3_TPC3_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC22] = mmDCORE3_TPC4_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC23] = mmDCORE3_TPC5_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC24] = mmDCORE0_TPC6_QM_DCCM_BASE,
 +      [CPU_ID_MME_QMAN_ARC0] = mmDCORE0_MME_QM_ARC_DCCM_BASE,
 +      [CPU_ID_MME_QMAN_ARC1] = mmDCORE2_MME_QM_ARC_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC0] = mmDCORE0_EDMA0_QM_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC1] = mmDCORE0_EDMA1_QM_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC2] = mmDCORE1_EDMA0_QM_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC3] = mmDCORE1_EDMA1_QM_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC4] = mmDCORE2_EDMA0_QM_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC5] = mmDCORE2_EDMA1_QM_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC6] = mmDCORE3_EDMA0_QM_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC7] = mmDCORE3_EDMA1_QM_DCCM_BASE,
 +      [CPU_ID_PDMA_QMAN_ARC0] = mmPDMA0_QM_ARC_DCCM_BASE,
 +      [CPU_ID_PDMA_QMAN_ARC1] = mmPDMA1_QM_ARC_DCCM_BASE,
 +      [CPU_ID_ROT_QMAN_ARC0] = mmROT0_QM_ARC_DCCM_BASE,
 +      [CPU_ID_ROT_QMAN_ARC1] = mmROT1_QM_ARC_DCCM_BASE,
 +      [CPU_ID_NIC_QMAN_ARC0] = mmNIC0_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC1] = mmNIC0_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC2] = mmNIC1_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC3] = mmNIC1_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC4] = mmNIC2_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC5] = mmNIC2_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC6] = mmNIC3_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC7] = mmNIC3_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC8] = mmNIC4_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC9] = mmNIC4_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC10] = mmNIC5_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC11] = mmNIC5_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC12] = mmNIC6_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC13] = mmNIC6_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC14] = mmNIC7_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC15] = mmNIC7_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC16] = mmNIC8_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC17] = mmNIC8_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC18] = mmNIC9_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC19] = mmNIC9_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC20] = mmNIC10_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC21] = mmNIC10_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC22] = mmNIC11_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC23] = mmNIC11_QM_DCCM1_BASE,
 +};
 +
 +const u32 gaudi2_mme_ctrl_lo_blocks_bases[MME_ID_SIZE] = {
 +      [MME_ID_DCORE0] = mmDCORE0_MME_CTRL_LO_BASE,
 +      [MME_ID_DCORE1] = mmDCORE1_MME_CTRL_LO_BASE,
 +      [MME_ID_DCORE2] = mmDCORE2_MME_CTRL_LO_BASE,
 +      [MME_ID_DCORE3] = mmDCORE3_MME_CTRL_LO_BASE,
 +};
 +
 +static const u32 gaudi2_queue_id_to_arc_id[GAUDI2_QUEUE_ID_SIZE] = {
 +      [GAUDI2_QUEUE_ID_PDMA_0_0] = CPU_ID_PDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_PDMA_0_1] = CPU_ID_PDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_PDMA_0_2] = CPU_ID_PDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_PDMA_0_3] = CPU_ID_PDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_PDMA_1_0] = CPU_ID_PDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_PDMA_1_1] = CPU_ID_PDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_PDMA_1_2] = CPU_ID_PDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_PDMA_1_3] = CPU_ID_PDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0] = CPU_ID_EDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_1] = CPU_ID_EDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_2] = CPU_ID_EDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_3] = CPU_ID_EDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0] = CPU_ID_EDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_1] = CPU_ID_EDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_2] = CPU_ID_EDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3] = CPU_ID_EDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_0] = CPU_ID_MME_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_1] = CPU_ID_MME_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_2] = CPU_ID_MME_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_3] = CPU_ID_MME_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_0] = CPU_ID_TPC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_1] = CPU_ID_TPC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_2] = CPU_ID_TPC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_3] = CPU_ID_TPC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_0] = CPU_ID_TPC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_1] = CPU_ID_TPC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_2] = CPU_ID_TPC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_3] = CPU_ID_TPC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_0] = CPU_ID_TPC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_1] = CPU_ID_TPC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_2] = CPU_ID_TPC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_3] = CPU_ID_TPC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_0] = CPU_ID_TPC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_1] = CPU_ID_TPC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_2] = CPU_ID_TPC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_3] = CPU_ID_TPC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_0] = CPU_ID_TPC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_1] = CPU_ID_TPC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_2] = CPU_ID_TPC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_3] = CPU_ID_TPC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_0] = CPU_ID_TPC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_1] = CPU_ID_TPC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_2] = CPU_ID_TPC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_3] = CPU_ID_TPC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_0] = CPU_ID_TPC_QMAN_ARC24,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_1] = CPU_ID_TPC_QMAN_ARC24,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_2] = CPU_ID_TPC_QMAN_ARC24,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_3] = CPU_ID_TPC_QMAN_ARC24,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0] = CPU_ID_EDMA_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_1] = CPU_ID_EDMA_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_2] = CPU_ID_EDMA_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_3] = CPU_ID_EDMA_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0] = CPU_ID_EDMA_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_1] = CPU_ID_EDMA_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_2] = CPU_ID_EDMA_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3] = CPU_ID_EDMA_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_0] = CPU_ID_SCHED_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_1] = CPU_ID_SCHED_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_2] = CPU_ID_SCHED_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_3] = CPU_ID_SCHED_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_0] = CPU_ID_TPC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_1] = CPU_ID_TPC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_2] = CPU_ID_TPC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_3] = CPU_ID_TPC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_0] = CPU_ID_TPC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_1] = CPU_ID_TPC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_2] = CPU_ID_TPC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_3] = CPU_ID_TPC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_0] = CPU_ID_TPC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_1] = CPU_ID_TPC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_2] = CPU_ID_TPC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_3] = CPU_ID_TPC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_0] = CPU_ID_TPC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_1] = CPU_ID_TPC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_2] = CPU_ID_TPC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_3] = CPU_ID_TPC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_0] = CPU_ID_TPC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_1] = CPU_ID_TPC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_2] = CPU_ID_TPC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_3] = CPU_ID_TPC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_0] = CPU_ID_TPC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_1] = CPU_ID_TPC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_2] = CPU_ID_TPC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_3] = CPU_ID_TPC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0] = CPU_ID_EDMA_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_1] = CPU_ID_EDMA_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_2] = CPU_ID_EDMA_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_3] = CPU_ID_EDMA_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0] = CPU_ID_EDMA_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_1] = CPU_ID_EDMA_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_2] = CPU_ID_EDMA_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3] = CPU_ID_EDMA_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_0] = CPU_ID_MME_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_1] = CPU_ID_MME_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_2] = CPU_ID_MME_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_3] = CPU_ID_MME_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_0] = CPU_ID_TPC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_1] = CPU_ID_TPC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_2] = CPU_ID_TPC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_3] = CPU_ID_TPC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_0] = CPU_ID_TPC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_1] = CPU_ID_TPC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_2] = CPU_ID_TPC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_3] = CPU_ID_TPC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_0] = CPU_ID_TPC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_1] = CPU_ID_TPC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_2] = CPU_ID_TPC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_3] = CPU_ID_TPC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_0] = CPU_ID_TPC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_1] = CPU_ID_TPC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_2] = CPU_ID_TPC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_3] = CPU_ID_TPC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_0] = CPU_ID_TPC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_1] = CPU_ID_TPC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_2] = CPU_ID_TPC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_3] = CPU_ID_TPC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_0] = CPU_ID_TPC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_1] = CPU_ID_TPC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_2] = CPU_ID_TPC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_3] = CPU_ID_TPC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0] = CPU_ID_EDMA_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_1] = CPU_ID_EDMA_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_2] = CPU_ID_EDMA_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_3] = CPU_ID_EDMA_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0] = CPU_ID_EDMA_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_1] = CPU_ID_EDMA_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_2] = CPU_ID_EDMA_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3] = CPU_ID_EDMA_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_0] = CPU_ID_SCHED_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_1] = CPU_ID_SCHED_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_2] = CPU_ID_SCHED_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_3] = CPU_ID_SCHED_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_0] = CPU_ID_TPC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_1] = CPU_ID_TPC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_2] = CPU_ID_TPC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_3] = CPU_ID_TPC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_0] = CPU_ID_TPC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_1] = CPU_ID_TPC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_2] = CPU_ID_TPC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_3] = CPU_ID_TPC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_0] = CPU_ID_TPC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_1] = CPU_ID_TPC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_2] = CPU_ID_TPC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_3] = CPU_ID_TPC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_0] = CPU_ID_TPC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_1] = CPU_ID_TPC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_2] = CPU_ID_TPC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_3] = CPU_ID_TPC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_0] = CPU_ID_TPC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_1] = CPU_ID_TPC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_2] = CPU_ID_TPC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_3] = CPU_ID_TPC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_0] = CPU_ID_TPC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_1] = CPU_ID_TPC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_2] = CPU_ID_TPC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_3] = CPU_ID_TPC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_NIC_0_0] = CPU_ID_NIC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_NIC_0_1] = CPU_ID_NIC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_NIC_0_2] = CPU_ID_NIC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_NIC_0_3] = CPU_ID_NIC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_NIC_1_0] = CPU_ID_NIC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_NIC_1_1] = CPU_ID_NIC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_NIC_1_2] = CPU_ID_NIC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_NIC_1_3] = CPU_ID_NIC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_NIC_2_0] = CPU_ID_NIC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_NIC_2_1] = CPU_ID_NIC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_NIC_2_2] = CPU_ID_NIC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_NIC_2_3] = CPU_ID_NIC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_NIC_3_0] = CPU_ID_NIC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_NIC_3_1] = CPU_ID_NIC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_NIC_3_2] = CPU_ID_NIC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_NIC_3_3] = CPU_ID_NIC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_NIC_4_0] = CPU_ID_NIC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_NIC_4_1] = CPU_ID_NIC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_NIC_4_2] = CPU_ID_NIC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_NIC_4_3] = CPU_ID_NIC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_NIC_5_0] = CPU_ID_NIC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_NIC_5_1] = CPU_ID_NIC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_NIC_5_2] = CPU_ID_NIC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_NIC_5_3] = CPU_ID_NIC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_NIC_6_0] = CPU_ID_NIC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_NIC_6_1] = CPU_ID_NIC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_NIC_6_2] = CPU_ID_NIC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_NIC_6_3] = CPU_ID_NIC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_NIC_7_0] = CPU_ID_NIC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_NIC_7_1] = CPU_ID_NIC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_NIC_7_2] = CPU_ID_NIC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_NIC_7_3] = CPU_ID_NIC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_NIC_8_0] = CPU_ID_NIC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_NIC_8_1] = CPU_ID_NIC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_NIC_8_2] = CPU_ID_NIC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_NIC_8_3] = CPU_ID_NIC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_NIC_9_0] = CPU_ID_NIC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_NIC_9_1] = CPU_ID_NIC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_NIC_9_2] = CPU_ID_NIC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_NIC_9_3] = CPU_ID_NIC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_NIC_10_0] = CPU_ID_NIC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_NIC_10_1] = CPU_ID_NIC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_NIC_10_2] = CPU_ID_NIC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_NIC_10_3] = CPU_ID_NIC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_NIC_11_0] = CPU_ID_NIC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_NIC_11_1] = CPU_ID_NIC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_NIC_11_2] = CPU_ID_NIC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_NIC_11_3] = CPU_ID_NIC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_NIC_12_0] = CPU_ID_NIC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_NIC_12_1] = CPU_ID_NIC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_NIC_12_2] = CPU_ID_NIC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_NIC_12_3] = CPU_ID_NIC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_NIC_13_0] = CPU_ID_NIC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_NIC_13_1] = CPU_ID_NIC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_NIC_13_2] = CPU_ID_NIC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_NIC_13_3] = CPU_ID_NIC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_NIC_14_0] = CPU_ID_NIC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_NIC_14_1] = CPU_ID_NIC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_NIC_14_2] = CPU_ID_NIC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_NIC_14_3] = CPU_ID_NIC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_NIC_15_0] = CPU_ID_NIC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_NIC_15_1] = CPU_ID_NIC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_NIC_15_2] = CPU_ID_NIC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_NIC_15_3] = CPU_ID_NIC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_NIC_16_0] = CPU_ID_NIC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_NIC_16_1] = CPU_ID_NIC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_NIC_16_2] = CPU_ID_NIC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_NIC_16_3] = CPU_ID_NIC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_NIC_17_0] = CPU_ID_NIC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_NIC_17_1] = CPU_ID_NIC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_NIC_17_2] = CPU_ID_NIC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_NIC_17_3] = CPU_ID_NIC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_NIC_18_0] = CPU_ID_NIC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_NIC_18_1] = CPU_ID_NIC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_NIC_18_2] = CPU_ID_NIC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_NIC_18_3] = CPU_ID_NIC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_NIC_19_0] = CPU_ID_NIC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_NIC_19_1] = CPU_ID_NIC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_NIC_19_2] = CPU_ID_NIC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_NIC_19_3] = CPU_ID_NIC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_NIC_20_0] = CPU_ID_NIC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_NIC_20_1] = CPU_ID_NIC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_NIC_20_2] = CPU_ID_NIC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_NIC_20_3] = CPU_ID_NIC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_NIC_21_0] = CPU_ID_NIC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_NIC_21_1] = CPU_ID_NIC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_NIC_21_2] = CPU_ID_NIC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_NIC_21_3] = CPU_ID_NIC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_NIC_22_0] = CPU_ID_NIC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_NIC_22_1] = CPU_ID_NIC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_NIC_22_2] = CPU_ID_NIC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_NIC_22_3] = CPU_ID_NIC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_NIC_23_0] = CPU_ID_NIC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_NIC_23_1] = CPU_ID_NIC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_NIC_23_2] = CPU_ID_NIC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_NIC_23_3] = CPU_ID_NIC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_ROT_0_0] = CPU_ID_ROT_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_ROT_0_1] = CPU_ID_ROT_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_ROT_0_2] = CPU_ID_ROT_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_ROT_0_3] = CPU_ID_ROT_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_ROT_1_0] = CPU_ID_ROT_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_ROT_1_1] = CPU_ID_ROT_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_ROT_1_2] = CPU_ID_ROT_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_ROT_1_3] = CPU_ID_ROT_QMAN_ARC1
 +};
 +
 +const u32 gaudi2_dma_core_blocks_bases[DMA_CORE_ID_SIZE] = {
 +      [DMA_CORE_ID_PDMA0] = mmPDMA0_CORE_BASE,
 +      [DMA_CORE_ID_PDMA1] = mmPDMA1_CORE_BASE,
 +      [DMA_CORE_ID_EDMA0] = mmDCORE0_EDMA0_CORE_BASE,
 +      [DMA_CORE_ID_EDMA1] = mmDCORE0_EDMA1_CORE_BASE,
 +      [DMA_CORE_ID_EDMA2] = mmDCORE1_EDMA0_CORE_BASE,
 +      [DMA_CORE_ID_EDMA3] = mmDCORE1_EDMA1_CORE_BASE,
 +      [DMA_CORE_ID_EDMA4] = mmDCORE2_EDMA0_CORE_BASE,
 +      [DMA_CORE_ID_EDMA5] = mmDCORE2_EDMA1_CORE_BASE,
 +      [DMA_CORE_ID_EDMA6] = mmDCORE3_EDMA0_CORE_BASE,
 +      [DMA_CORE_ID_EDMA7] = mmDCORE3_EDMA1_CORE_BASE,
 +      [DMA_CORE_ID_KDMA] = mmARC_FARM_KDMA_BASE
 +};
 +
 +const u32 gaudi2_mme_acc_blocks_bases[MME_ID_SIZE] = {
 +      [MME_ID_DCORE0] = mmDCORE0_MME_ACC_BASE,
 +      [MME_ID_DCORE1] = mmDCORE1_MME_ACC_BASE,
 +      [MME_ID_DCORE2] = mmDCORE2_MME_ACC_BASE,
 +      [MME_ID_DCORE3] = mmDCORE3_MME_ACC_BASE
 +};
 +
 +static const u32 gaudi2_tpc_cfg_blocks_bases[TPC_ID_SIZE] = {
 +      [TPC_ID_DCORE0_TPC0] = mmDCORE0_TPC0_CFG_BASE,
 +      [TPC_ID_DCORE0_TPC1] = mmDCORE0_TPC1_CFG_BASE,
 +      [TPC_ID_DCORE0_TPC2] = mmDCORE0_TPC2_CFG_BASE,
 +      [TPC_ID_DCORE0_TPC3] = mmDCORE0_TPC3_CFG_BASE,
 +      [TPC_ID_DCORE0_TPC4] = mmDCORE0_TPC4_CFG_BASE,
 +      [TPC_ID_DCORE0_TPC5] = mmDCORE0_TPC5_CFG_BASE,
 +      [TPC_ID_DCORE1_TPC0] = mmDCORE1_TPC0_CFG_BASE,
 +      [TPC_ID_DCORE1_TPC1] = mmDCORE1_TPC1_CFG_BASE,
 +      [TPC_ID_DCORE1_TPC2] = mmDCORE1_TPC2_CFG_BASE,
 +      [TPC_ID_DCORE1_TPC3] = mmDCORE1_TPC3_CFG_BASE,
 +      [TPC_ID_DCORE1_TPC4] = mmDCORE1_TPC4_CFG_BASE,
 +      [TPC_ID_DCORE1_TPC5] = mmDCORE1_TPC5_CFG_BASE,
 +      [TPC_ID_DCORE2_TPC0] = mmDCORE2_TPC0_CFG_BASE,
 +      [TPC_ID_DCORE2_TPC1] = mmDCORE2_TPC1_CFG_BASE,
 +      [TPC_ID_DCORE2_TPC2] = mmDCORE2_TPC2_CFG_BASE,
 +      [TPC_ID_DCORE2_TPC3] = mmDCORE2_TPC3_CFG_BASE,
 +      [TPC_ID_DCORE2_TPC4] = mmDCORE2_TPC4_CFG_BASE,
 +      [TPC_ID_DCORE2_TPC5] = mmDCORE2_TPC5_CFG_BASE,
 +      [TPC_ID_DCORE3_TPC0] = mmDCORE3_TPC0_CFG_BASE,
 +      [TPC_ID_DCORE3_TPC1] = mmDCORE3_TPC1_CFG_BASE,
 +      [TPC_ID_DCORE3_TPC2] = mmDCORE3_TPC2_CFG_BASE,
 +      [TPC_ID_DCORE3_TPC3] = mmDCORE3_TPC3_CFG_BASE,
 +      [TPC_ID_DCORE3_TPC4] = mmDCORE3_TPC4_CFG_BASE,
 +      [TPC_ID_DCORE3_TPC5] = mmDCORE3_TPC5_CFG_BASE,
 +      [TPC_ID_DCORE0_TPC6] = mmDCORE0_TPC6_CFG_BASE,
 +};
 +
 +const u32 gaudi2_rot_blocks_bases[ROTATOR_ID_SIZE] = {
 +      [ROTATOR_ID_0] = mmROT0_BASE,
 +      [ROTATOR_ID_1] = mmROT1_BASE
 +};
 +
 +static const u32 gaudi2_tpc_id_to_queue_id[TPC_ID_SIZE] = {
 +      [TPC_ID_DCORE0_TPC0] = GAUDI2_QUEUE_ID_DCORE0_TPC_0_0,
 +      [TPC_ID_DCORE0_TPC1] = GAUDI2_QUEUE_ID_DCORE0_TPC_1_0,
 +      [TPC_ID_DCORE0_TPC2] = GAUDI2_QUEUE_ID_DCORE0_TPC_2_0,
 +      [TPC_ID_DCORE0_TPC3] = GAUDI2_QUEUE_ID_DCORE0_TPC_3_0,
 +      [TPC_ID_DCORE0_TPC4] = GAUDI2_QUEUE_ID_DCORE0_TPC_4_0,
 +      [TPC_ID_DCORE0_TPC5] = GAUDI2_QUEUE_ID_DCORE0_TPC_5_0,
 +      [TPC_ID_DCORE1_TPC0] = GAUDI2_QUEUE_ID_DCORE1_TPC_0_0,
 +      [TPC_ID_DCORE1_TPC1] = GAUDI2_QUEUE_ID_DCORE1_TPC_1_0,
 +      [TPC_ID_DCORE1_TPC2] = GAUDI2_QUEUE_ID_DCORE1_TPC_2_0,
 +      [TPC_ID_DCORE1_TPC3] = GAUDI2_QUEUE_ID_DCORE1_TPC_3_0,
 +      [TPC_ID_DCORE1_TPC4] = GAUDI2_QUEUE_ID_DCORE1_TPC_4_0,
 +      [TPC_ID_DCORE1_TPC5] = GAUDI2_QUEUE_ID_DCORE1_TPC_5_0,
 +      [TPC_ID_DCORE2_TPC0] = GAUDI2_QUEUE_ID_DCORE2_TPC_0_0,
 +      [TPC_ID_DCORE2_TPC1] = GAUDI2_QUEUE_ID_DCORE2_TPC_1_0,
 +      [TPC_ID_DCORE2_TPC2] = GAUDI2_QUEUE_ID_DCORE2_TPC_2_0,
 +      [TPC_ID_DCORE2_TPC3] = GAUDI2_QUEUE_ID_DCORE2_TPC_3_0,
 +      [TPC_ID_DCORE2_TPC4] = GAUDI2_QUEUE_ID_DCORE2_TPC_4_0,
 +      [TPC_ID_DCORE2_TPC5] = GAUDI2_QUEUE_ID_DCORE2_TPC_5_0,
 +      [TPC_ID_DCORE3_TPC0] = GAUDI2_QUEUE_ID_DCORE3_TPC_0_0,
 +      [TPC_ID_DCORE3_TPC1] = GAUDI2_QUEUE_ID_DCORE3_TPC_1_0,
 +      [TPC_ID_DCORE3_TPC2] = GAUDI2_QUEUE_ID_DCORE3_TPC_2_0,
 +      [TPC_ID_DCORE3_TPC3] = GAUDI2_QUEUE_ID_DCORE3_TPC_3_0,
 +      [TPC_ID_DCORE3_TPC4] = GAUDI2_QUEUE_ID_DCORE3_TPC_4_0,
 +      [TPC_ID_DCORE3_TPC5] = GAUDI2_QUEUE_ID_DCORE3_TPC_5_0,
 +      [TPC_ID_DCORE0_TPC6] = GAUDI2_QUEUE_ID_DCORE0_TPC_6_0,
 +};
 +
 +static const u32 gaudi2_rot_id_to_queue_id[ROTATOR_ID_SIZE] = {
 +      [ROTATOR_ID_0] = GAUDI2_QUEUE_ID_ROT_0_0,
 +      [ROTATOR_ID_1] = GAUDI2_QUEUE_ID_ROT_1_0,
 +};
 +
 +const u32 edma_stream_base[NUM_OF_EDMA_PER_DCORE * NUM_OF_DCORES] = {
 +      GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0,
 +      GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0,
 +      GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0,
 +      GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0,
 +      GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0,
 +      GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0,
 +      GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0,
 +      GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0,
 +};
 +
 +static const char gaudi2_vdec_irq_name[GAUDI2_VDEC_MSIX_ENTRIES][GAUDI2_MAX_STRING_LEN] = {
 +      "gaudi2 vdec 0_0", "gaudi2 vdec 0_0 abnormal",
 +      "gaudi2 vdec 0_1", "gaudi2 vdec 0_1 abnormal",
 +      "gaudi2 vdec 1_0", "gaudi2 vdec 1_0 abnormal",
 +      "gaudi2 vdec 1_1", "gaudi2 vdec 1_1 abnormal",
 +      "gaudi2 vdec 2_0", "gaudi2 vdec 2_0 abnormal",
 +      "gaudi2 vdec 2_1", "gaudi2 vdec 2_1 abnormal",
 +      "gaudi2 vdec 3_0", "gaudi2 vdec 3_0 abnormal",
 +      "gaudi2 vdec 3_1", "gaudi2 vdec 3_1 abnormal",
 +      "gaudi2 vdec s_0", "gaudi2 vdec s_0 abnormal",
 +      "gaudi2 vdec s_1", "gaudi2 vdec s_1 abnormal"
 +};
 +
 +static const u32 rtr_coordinates_to_rtr_id[NUM_OF_RTR_PER_DCORE * NUM_OF_DCORES] = {
 +      RTR_ID_X_Y(2, 4),
 +      RTR_ID_X_Y(3, 4),
 +      RTR_ID_X_Y(4, 4),
 +      RTR_ID_X_Y(5, 4),
 +      RTR_ID_X_Y(6, 4),
 +      RTR_ID_X_Y(7, 4),
 +      RTR_ID_X_Y(8, 4),
 +      RTR_ID_X_Y(9, 4),
 +      RTR_ID_X_Y(10, 4),
 +      RTR_ID_X_Y(11, 4),
 +      RTR_ID_X_Y(12, 4),
 +      RTR_ID_X_Y(13, 4),
 +      RTR_ID_X_Y(14, 4),
 +      RTR_ID_X_Y(15, 4),
 +      RTR_ID_X_Y(16, 4),
 +      RTR_ID_X_Y(17, 4),
 +      RTR_ID_X_Y(2, 11),
 +      RTR_ID_X_Y(3, 11),
 +      RTR_ID_X_Y(4, 11),
 +      RTR_ID_X_Y(5, 11),
 +      RTR_ID_X_Y(6, 11),
 +      RTR_ID_X_Y(7, 11),
 +      RTR_ID_X_Y(8, 11),
 +      RTR_ID_X_Y(9, 11),
 +      RTR_ID_X_Y(0, 0),/* 24 no id */
 +      RTR_ID_X_Y(0, 0),/* 25 no id */
 +      RTR_ID_X_Y(0, 0),/* 26 no id */
 +      RTR_ID_X_Y(0, 0),/* 27 no id */
 +      RTR_ID_X_Y(14, 11),
 +      RTR_ID_X_Y(15, 11),
 +      RTR_ID_X_Y(16, 11),
 +      RTR_ID_X_Y(17, 11)
 +};
 +
 +enum rtr_id {
 +      DCORE0_RTR0,
 +      DCORE0_RTR1,
 +      DCORE0_RTR2,
 +      DCORE0_RTR3,
 +      DCORE0_RTR4,
 +      DCORE0_RTR5,
 +      DCORE0_RTR6,
 +      DCORE0_RTR7,
 +      DCORE1_RTR0,
 +      DCORE1_RTR1,
 +      DCORE1_RTR2,
 +      DCORE1_RTR3,
 +      DCORE1_RTR4,
 +      DCORE1_RTR5,
 +      DCORE1_RTR6,
 +      DCORE1_RTR7,
 +      DCORE2_RTR0,
 +      DCORE2_RTR1,
 +      DCORE2_RTR2,
 +      DCORE2_RTR3,
 +      DCORE2_RTR4,
 +      DCORE2_RTR5,
 +      DCORE2_RTR6,
 +      DCORE2_RTR7,
 +      DCORE3_RTR0,
 +      DCORE3_RTR1,
 +      DCORE3_RTR2,
 +      DCORE3_RTR3,
 +      DCORE3_RTR4,
 +      DCORE3_RTR5,
 +      DCORE3_RTR6,
 +      DCORE3_RTR7,
 +};
 +
 +static const u32 gaudi2_tpc_initiator_hbw_rtr_id[NUM_OF_TPC_PER_DCORE * NUM_OF_DCORES + 1] = {
 +      DCORE0_RTR1, DCORE0_RTR1, DCORE0_RTR2, DCORE0_RTR2, DCORE0_RTR3, DCORE0_RTR3,
 +      DCORE1_RTR6, DCORE1_RTR6, DCORE1_RTR5, DCORE1_RTR5, DCORE1_RTR4, DCORE1_RTR4,
 +      DCORE2_RTR3, DCORE2_RTR3, DCORE2_RTR2, DCORE2_RTR2, DCORE2_RTR1, DCORE2_RTR1,
 +      DCORE3_RTR4, DCORE3_RTR4, DCORE3_RTR5, DCORE3_RTR5, DCORE3_RTR6, DCORE3_RTR6,
 +      DCORE0_RTR0
 +};
 +
 +static const u32 gaudi2_tpc_initiator_lbw_rtr_id[NUM_OF_TPC_PER_DCORE * NUM_OF_DCORES + 1] = {
 +      DCORE0_RTR1, DCORE0_RTR1, DCORE0_RTR1, DCORE0_RTR1, DCORE0_RTR2, DCORE0_RTR2,
 +      DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR6, DCORE1_RTR6, DCORE1_RTR5, DCORE1_RTR5,
 +      DCORE2_RTR2, DCORE2_RTR2, DCORE2_RTR1, DCORE2_RTR1, DCORE2_RTR0, DCORE2_RTR0,
 +      DCORE3_RTR5, DCORE3_RTR5, DCORE3_RTR6, DCORE3_RTR6, DCORE3_RTR7, DCORE3_RTR7,
 +      DCORE0_RTR0
 +};
 +
 +static const u32 gaudi2_dec_initiator_hbw_rtr_id[NUMBER_OF_DEC] = {
 +      DCORE0_RTR0, DCORE0_RTR0, DCORE1_RTR7, DCORE1_RTR7, DCORE2_RTR0, DCORE2_RTR0,
 +      DCORE3_RTR7, DCORE3_RTR7, DCORE0_RTR0, DCORE0_RTR0
 +};
 +
 +static const u32 gaudi2_dec_initiator_lbw_rtr_id[NUMBER_OF_DEC] = {
 +      DCORE0_RTR1, DCORE0_RTR1, DCORE1_RTR6, DCORE1_RTR6, DCORE2_RTR1, DCORE2_RTR1,
 +      DCORE3_RTR6, DCORE3_RTR6, DCORE0_RTR0, DCORE0_RTR0
 +};
 +
 +static const u32 gaudi2_nic_initiator_hbw_rtr_id[NIC_NUMBER_OF_MACROS] = {
 +      DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE2_RTR0,
 +      DCORE2_RTR0, DCORE2_RTR0, DCORE2_RTR0, DCORE3_RTR7, DCORE3_RTR7, DCORE3_RTR7
 +};
 +
 +static const u32 gaudi2_nic_initiator_lbw_rtr_id[NIC_NUMBER_OF_MACROS] = {
 +      DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE2_RTR0,
 +      DCORE2_RTR0, DCORE2_RTR0, DCORE2_RTR0, DCORE3_RTR7, DCORE3_RTR7, DCORE3_RTR7
 +};
 +
 +static const u32 gaudi2_edma_initiator_hbw_sft[NUM_OF_EDMA_PER_DCORE * NUM_OF_DCORES] = {
 +      mmSFT0_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE,
 +      mmSFT0_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
 +      mmSFT1_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE,
 +      mmSFT1_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
 +      mmSFT2_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
 +      mmSFT2_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE,
 +      mmSFT3_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
 +      mmSFT3_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE
 +};
 +
 +static const u32 gaudi2_pdma_initiator_hbw_rtr_id[NUM_OF_PDMA] = {
 +      DCORE0_RTR0, DCORE0_RTR0
 +};
 +
 +static const u32 gaudi2_pdma_initiator_lbw_rtr_id[NUM_OF_PDMA] = {
 +      DCORE0_RTR2, DCORE0_RTR2
 +};
 +
 +static const u32 gaudi2_rot_initiator_hbw_rtr_id[NUM_OF_ROT] = {
 +      DCORE2_RTR0, DCORE3_RTR7
 +};
 +
 +static const u32 gaudi2_rot_initiator_lbw_rtr_id[NUM_OF_ROT] = {
 +      DCORE2_RTR2, DCORE3_RTR5
 +};
 +
 +struct mme_initiators_rtr_id {
 +      u32 wap0;
 +      u32 wap1;
 +      u32 write;
 +      u32 read;
 +      u32 sbte0;
 +      u32 sbte1;
 +      u32 sbte2;
 +      u32 sbte3;
 +      u32 sbte4;
 +};
 +
 +enum mme_initiators {
 +      MME_WAP0 = 0,
 +      MME_WAP1,
 +      MME_WRITE,
 +      MME_READ,
 +      MME_SBTE0,
 +      MME_SBTE1,
 +      MME_SBTE2,
 +      MME_SBTE3,
 +      MME_SBTE4,
 +      MME_INITIATORS_MAX
 +};
 +
 +static const struct mme_initiators_rtr_id
 +gaudi2_mme_initiator_rtr_id[NUM_OF_MME_PER_DCORE * NUM_OF_DCORES] = {
 +      { .wap0 = 5, .wap1 = 7, .write = 6, .read = 7,
 +      .sbte0 = 7, .sbte1 = 4, .sbte2 = 4, .sbte3 = 5, .sbte4 = 6},
 +      { .wap0 = 10, .wap1 = 8, .write = 9, .read = 8,
 +      .sbte0 = 11, .sbte1 = 11, .sbte2 = 10, .sbte3 = 9, .sbte4 = 8},
 +      { .wap0 = 21, .wap1 = 23, .write = 22, .read = 23,
 +      .sbte0 = 20, .sbte1 = 20, .sbte2 = 21, .sbte3 = 22, .sbte4 = 23},
 +      { .wap0 = 30, .wap1 = 28, .write = 29, .read = 30,
 +      .sbte0 = 31, .sbte1 = 31, .sbte2 = 30, .sbte3 = 29, .sbte4 = 28},
 +};
 +
 +enum razwi_event_sources {
 +      RAZWI_TPC,
 +      RAZWI_MME,
 +      RAZWI_EDMA,
 +      RAZWI_PDMA,
 +      RAZWI_NIC,
 +      RAZWI_DEC,
 +      RAZWI_ROT
 +};
 +
 +struct hbm_mc_error_causes {
 +      u32 mask;
 +      char cause[50];
 +};
 +
 +static struct hl_special_block_info gaudi2_special_blocks[] = GAUDI2_SPECIAL_BLOCKS;
 +
 +/* Special blocks iterator is currently used to configure security protection bits,
 + * and read global errors. Most HW blocks are addressable and those who aren't (N/A)-
 + * must be skipped. Following configurations are commonly used for both PB config
 + * and global error reading, since currently they both share the same settings.
 + * Once it changes, we must remember to use separate configurations for either one.
 + */
 +static int gaudi2_iterator_skip_block_types[] = {
 +              GAUDI2_BLOCK_TYPE_PLL,
 +              GAUDI2_BLOCK_TYPE_EU_BIST,
 +              GAUDI2_BLOCK_TYPE_HBM,
 +              GAUDI2_BLOCK_TYPE_XFT
 +};
 +
 +static struct range gaudi2_iterator_skip_block_ranges[] = {
 +              /* Skip all PSOC blocks except for PSOC_GLOBAL_CONF */
 +              {mmPSOC_I2C_M0_BASE, mmPSOC_EFUSE_BASE},
 +              {mmPSOC_BTL_BASE, mmPSOC_MSTR_IF_RR_SHRD_HBW_BASE},
 +              /* Skip all CPU blocks except for CPU_IF */
 +              {mmCPU_CA53_CFG_BASE, mmCPU_CA53_CFG_BASE},
 +              {mmCPU_TIMESTAMP_BASE, mmCPU_MSTR_IF_RR_SHRD_HBW_BASE}
 +};
 +
 +static struct hbm_mc_error_causes hbm_mc_spi[GAUDI2_NUM_OF_HBM_MC_SPI_CAUSE] = {
 +      {HBM_MC_SPI_TEMP_PIN_CHG_MASK, "temperature pins changed"},
 +      {HBM_MC_SPI_THR_ENG_MASK, "temperature-based throttling engaged"},
 +      {HBM_MC_SPI_THR_DIS_ENG_MASK, "temperature-based throttling disengaged"},
 +      {HBM_MC_SPI_IEEE1500_COMP_MASK, "IEEE1500 op comp"},
 +      {HBM_MC_SPI_IEEE1500_PAUSED_MASK, "IEEE1500 op paused"},
 +};
 +
 +static const char * const hbm_mc_sei_cause[GAUDI2_NUM_OF_HBM_SEI_CAUSE] = {
 +      [HBM_SEI_CMD_PARITY_EVEN] = "SEI C/A parity even",
 +      [HBM_SEI_CMD_PARITY_ODD] = "SEI C/A parity odd",
 +      [HBM_SEI_READ_ERR] = "SEI read data error",
 +      [HBM_SEI_WRITE_DATA_PARITY_ERR] = "SEI write data parity error",
 +      [HBM_SEI_CATTRIP] = "SEI CATTRIP asserted",
 +      [HBM_SEI_MEM_BIST_FAIL] = "SEI memory BIST fail",
 +      [HBM_SEI_DFI] = "SEI DFI error",
 +      [HBM_SEI_INV_TEMP_READ_OUT] = "SEI invalid temp read",
 +      [HBM_SEI_BIST_FAIL] = "SEI BIST fail"
 +};
 +
 +struct mmu_spi_sei_cause {
 +      char cause[50];
 +      int clear_bit;
 +};
 +
 +static const struct mmu_spi_sei_cause gaudi2_mmu_spi_sei[GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE] = {
 +      {"page fault", 1},              /* INTERRUPT_CLR[1] */
 +      {"page access", 1},             /* INTERRUPT_CLR[1] */
 +      {"bypass ddr", 2},              /* INTERRUPT_CLR[2] */
 +      {"multi hit", 2},               /* INTERRUPT_CLR[2] */
 +      {"mmu rei0", -1},               /* no clear register bit */
 +      {"mmu rei1", -1},               /* no clear register bit */
 +      {"stlb rei0", -1},              /* no clear register bit */
 +      {"stlb rei1", -1},              /* no clear register bit */
 +      {"rr privileged write hit", 2}, /* INTERRUPT_CLR[2] */
 +      {"rr privileged read hit", 2},  /* INTERRUPT_CLR[2] */
 +      {"rr secure write hit", 2},     /* INTERRUPT_CLR[2] */
 +      {"rr secure read hit", 2},      /* INTERRUPT_CLR[2] */
 +      {"bist_fail no use", 2},        /* INTERRUPT_CLR[2] */
 +      {"bist_fail no use", 2},        /* INTERRUPT_CLR[2] */
 +      {"bist_fail no use", 2},        /* INTERRUPT_CLR[2] */
 +      {"bist_fail no use", 2},        /* INTERRUPT_CLR[2] */
 +      {"slave error", 16},            /* INTERRUPT_CLR[16] */
 +      {"dec error", 17},              /* INTERRUPT_CLR[17] */
 +      {"burst fifo full", 2}          /* INTERRUPT_CLR[2] */
 +};
 +
 +struct gaudi2_cache_invld_params {
 +      u64 start_va;
 +      u64 end_va;
 +      u32 inv_start_val;
 +      u32 flags;
 +      bool range_invalidation;
 +};
 +
 +struct gaudi2_tpc_idle_data {
 +      struct engines_data *e;
 +      unsigned long *mask;
 +      bool *is_idle;
 +      const char *tpc_fmt;
 +};
 +
 +struct gaudi2_tpc_mmu_data {
 +      u32 rw_asid;
 +};
 +
 +static s64 gaudi2_state_dump_specs_props[SP_MAX] = {0};
 +
 +static int gaudi2_memset_device_memory(struct hl_device *hdev, u64 addr, u64 size, u64 val);
 +static bool gaudi2_is_queue_enabled(struct hl_device *hdev, u32 hw_queue_id);
 +static bool gaudi2_is_arc_enabled(struct hl_device *hdev, u64 arc_id);
 +static void gaudi2_clr_arc_id_cap(struct hl_device *hdev, u64 arc_id);
 +static void gaudi2_set_arc_id_cap(struct hl_device *hdev, u64 arc_id);
 +static void gaudi2_memset_device_lbw(struct hl_device *hdev, u32 addr, u32 size, u32 val);
 +static int gaudi2_send_job_to_kdma(struct hl_device *hdev, u64 src_addr, u64 dst_addr, u32 size,
 +                                                                              bool is_memset);
 +static u64 gaudi2_mmu_scramble_addr(struct hl_device *hdev, u64 raw_addr);
 +
 +static void gaudi2_init_scrambler_hbm(struct hl_device *hdev)
 +{
 +
 +}
 +
 +static u32 gaudi2_get_signal_cb_size(struct hl_device *hdev)
 +{
 +      return sizeof(struct packet_msg_short);
 +}
 +
 +static u32 gaudi2_get_wait_cb_size(struct hl_device *hdev)
 +{
 +      return sizeof(struct packet_msg_short) * 4 + sizeof(struct packet_fence);
 +}
 +
 +void gaudi2_iterate_tpcs(struct hl_device *hdev, struct iterate_module_ctx *ctx)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int dcore, inst, tpc_seq;
 +      u32 offset;
 +
 +      /* init the return code */
 +      ctx->rc = 0;
 +
 +      for (dcore = 0; dcore < NUM_OF_DCORES; dcore++) {
 +              for (inst = 0; inst < NUM_OF_TPC_PER_DCORE; inst++) {
 +                      tpc_seq = dcore * NUM_OF_TPC_PER_DCORE + inst;
 +
 +                      if (!(prop->tpc_enabled_mask & BIT(tpc_seq)))
 +                              continue;
 +
 +                      offset = (DCORE_OFFSET * dcore) + (DCORE_TPC_OFFSET * inst);
 +
 +                      ctx->fn(hdev, dcore, inst, offset, ctx);
 +                      if (ctx->rc) {
 +                              dev_err(hdev->dev, "TPC iterator failed for DCORE%d TPC%d\n",
 +                                                      dcore, inst);
 +                              return;
 +                      }
 +              }
 +      }
 +
 +      if (!(prop->tpc_enabled_mask & BIT(TPC_ID_DCORE0_TPC6)))
 +              return;
 +
 +      /* special check for PCI TPC (DCORE0_TPC6) */
 +      offset = DCORE_TPC_OFFSET * (NUM_DCORE0_TPC - 1);
 +      ctx->fn(hdev, 0, NUM_DCORE0_TPC - 1, offset, ctx);
 +      if (ctx->rc)
 +              dev_err(hdev->dev, "TPC iterator failed for DCORE0 TPC6\n");
 +}
 +
 +static bool gaudi2_host_phys_addr_valid(u64 addr)
 +{
 +      if ((addr < HOST_PHYS_BASE_0 + HOST_PHYS_SIZE_0) || (addr >= HOST_PHYS_BASE_1))
 +              return true;
 +
 +      return false;
 +}
 +
 +static int set_number_of_functional_hbms(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u8 faulty_hbms = hweight64(hdev->dram_binning);
 +
 +      /* check if all HBMs should be used */
 +      if (!faulty_hbms) {
 +              dev_dbg(hdev->dev, "All HBM are in use (no binning)\n");
 +              prop->num_functional_hbms = GAUDI2_HBM_NUM;
 +              return 0;
 +      }
 +
 +      /*
 +       * check for error condition in which number of binning
 +       * candidates is higher than the maximum supported by the
 +       * driver (in which case binning mask shall be ignored and driver will
 +       * set the default)
 +       */
 +      if (faulty_hbms > MAX_FAULTY_HBMS) {
 +              dev_err(hdev->dev,
 +                      "HBM binning supports max of %d faulty HBMs, supplied mask 0x%llx.\n",
 +                      MAX_FAULTY_HBMS, hdev->dram_binning);
 +              return -EINVAL;
 +      }
 +
 +      /*
 +       * by default, number of functional HBMs in Gaudi2 is always
 +       * GAUDI2_HBM_NUM - 1.
 +       */
 +      prop->num_functional_hbms = GAUDI2_HBM_NUM - faulty_hbms;
 +      return 0;
 +}
 +
 +static int gaudi2_set_dram_properties(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u32 basic_hbm_page_size;
 +      int rc;
 +
 +      rc = set_number_of_functional_hbms(hdev);
 +      if (rc)
 +              return -EINVAL;
 +
 +      /*
 +       * Due to HW bug in which TLB size is x16 smaller than expected we use a workaround
 +       * in which we are using x16 bigger page size to be able to populate the entire
 +       * HBM mappings in the TLB
 +       */
 +      basic_hbm_page_size = prop->num_functional_hbms * SZ_8M;
 +      prop->dram_page_size = GAUDI2_COMPENSATE_TLB_PAGE_SIZE_FACTOR * basic_hbm_page_size;
 +      prop->device_mem_alloc_default_page_size = prop->dram_page_size;
 +      prop->dram_size = prop->num_functional_hbms * SZ_16G;
 +      prop->dram_base_address = DRAM_PHYS_BASE;
 +      prop->dram_end_address = prop->dram_base_address + prop->dram_size;
 +      prop->dram_supports_virtual_memory = true;
 +
 +      prop->dram_user_base_address = DRAM_PHYS_BASE + prop->dram_page_size;
 +      prop->dram_hints_align_mask = ~GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK;
 +      prop->hints_dram_reserved_va_range.start_addr = RESERVED_VA_RANGE_FOR_ARC_ON_HBM_START;
 +      prop->hints_dram_reserved_va_range.end_addr = RESERVED_VA_RANGE_FOR_ARC_ON_HBM_END;
 +
 +      /* since DRAM page size differs from DMMU page size we need to allocate
 +       * DRAM memory in units of dram_page size and mapping this memory in
 +       * units of DMMU page size. we overcome this size mismatch using a
 +       * scrambling routine which takes a DRAM page and converts it to a DMMU
 +       * page.
 +       * We therefore:
 +       * 1. partition the virtual address space to DRAM-page (whole) pages.
 +       *    (suppose we get n such pages)
 +       * 2. limit the amount of virtual address space we got from 1 above to
 +       *    a multiple of 64M as we don't want the scrambled address to cross
 +       *    the DRAM virtual address space.
 +       *    ( m = (n * DRAM_page_size) / DMMU_page_size).
 +       * 3. determine the and address accordingly
 +       *    end_addr = start_addr + m * 48M
 +       *
 +       *    the DRAM address MSBs (63:48) are not part of the roundup calculation
 +       */
 +      prop->dmmu.start_addr = prop->dram_base_address +
 +                      (prop->dram_page_size *
 +                              DIV_ROUND_UP_SECTOR_T(prop->dram_size, prop->dram_page_size));
 +
 +      prop->dmmu.end_addr = prop->dmmu.start_addr + prop->dram_page_size *
 +                      div_u64((VA_HBM_SPACE_END - prop->dmmu.start_addr), prop->dmmu.page_size);
 +
 +      return 0;
 +}
 +
 +static int gaudi2_set_fixed_properties(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct hw_queue_properties *q_props;
 +      u32 num_sync_stream_queues = 0;
 +      int i;
 +
 +      prop->max_queues = GAUDI2_QUEUE_ID_SIZE;
 +      prop->hw_queues_props = kcalloc(prop->max_queues, sizeof(struct hw_queue_properties),
 +                                      GFP_KERNEL);
 +
 +      if (!prop->hw_queues_props)
 +              return -ENOMEM;
 +
 +      q_props = prop->hw_queues_props;
 +
 +      for (i = 0 ; i < GAUDI2_QUEUE_ID_CPU_PQ ; i++) {
 +              q_props[i].type = QUEUE_TYPE_HW;
 +              q_props[i].driver_only = 0;
 +
 +              if (i >= GAUDI2_QUEUE_ID_NIC_0_0 && i <= GAUDI2_QUEUE_ID_NIC_23_3) {
 +                      q_props[i].supports_sync_stream = 0;
 +              } else {
 +                      q_props[i].supports_sync_stream = 1;
 +                      num_sync_stream_queues++;
 +              }
 +
 +              q_props[i].cb_alloc_flags = CB_ALLOC_USER;
 +      }
 +
 +      q_props[GAUDI2_QUEUE_ID_CPU_PQ].type = QUEUE_TYPE_CPU;
 +      q_props[GAUDI2_QUEUE_ID_CPU_PQ].driver_only = 1;
 +      q_props[GAUDI2_QUEUE_ID_CPU_PQ].cb_alloc_flags = CB_ALLOC_KERNEL;
 +
 +      prop->cache_line_size = DEVICE_CACHE_LINE_SIZE;
 +      prop->cfg_base_address = CFG_BASE;
 +      prop->device_dma_offset_for_host_access = HOST_PHYS_BASE_0;
 +      prop->host_base_address = HOST_PHYS_BASE_0;
 +      prop->host_end_address = prop->host_base_address + HOST_PHYS_SIZE_0;
 +      prop->max_pending_cs = GAUDI2_MAX_PENDING_CS;
 +      prop->completion_queues_count = GAUDI2_RESERVED_CQ_NUMBER;
 +      prop->user_dec_intr_count = NUMBER_OF_DEC;
 +      prop->user_interrupt_count = GAUDI2_IRQ_NUM_USER_LAST - GAUDI2_IRQ_NUM_USER_FIRST + 1;
 +      prop->completion_mode = HL_COMPLETION_MODE_CS;
 +      prop->sync_stream_first_sob = GAUDI2_RESERVED_SOB_NUMBER;
 +      prop->sync_stream_first_mon = GAUDI2_RESERVED_MON_NUMBER;
 +
 +      prop->sram_base_address = SRAM_BASE_ADDR;
 +      prop->sram_size = SRAM_SIZE;
 +      prop->sram_end_address = prop->sram_base_address + prop->sram_size;
 +      prop->sram_user_base_address = prop->sram_base_address + SRAM_USER_BASE_OFFSET;
 +
 +      prop->hints_range_reservation = true;
 +
 +      if (hdev->pldm)
 +              prop->mmu_pgt_size = 0x800000; /* 8MB */
 +      else
 +              prop->mmu_pgt_size = MMU_PAGE_TABLES_INITIAL_SIZE;
 +
 +      prop->mmu_pte_size = HL_PTE_SIZE;
 +      prop->mmu_hop_table_size = HOP_TABLE_SIZE_512_PTE;
 +      prop->mmu_hop0_tables_total_size = HOP0_512_PTE_TABLES_TOTAL_SIZE;
 +
 +      prop->dmmu.hop_shifts[MMU_HOP0] = DHOP0_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP1] = DHOP1_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP2] = DHOP2_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP3] = DHOP3_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP4] = DHOP4_SHIFT;
 +      prop->dmmu.hop_masks[MMU_HOP0] = DHOP0_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP1] = DHOP1_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP2] = DHOP2_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP3] = DHOP3_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP4] = DHOP4_MASK;
 +      prop->dmmu.page_size = PAGE_SIZE_1GB;
 +      prop->dmmu.num_hops = MMU_ARCH_6_HOPS;
 +      prop->dmmu.last_mask = LAST_MASK;
 +      prop->dmmu.host_resident = 1;
 +      /* TODO: will be duplicated until implementing per-MMU props */
 +      prop->dmmu.hop_table_size = prop->mmu_hop_table_size;
 +      prop->dmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
 +
 +      /*
 +       * this is done in order to be able to validate FW descriptor (i.e. validating that
 +       * the addresses and allocated space for FW image does not cross memory bounds).
 +       * for this reason we set the DRAM size to the minimum possible and later it will
 +       * be modified according to what reported in the cpucp info packet
 +       */
 +      prop->dram_size = (GAUDI2_HBM_NUM - 1) * SZ_16G;
 +
 +      hdev->pmmu_huge_range = true;
 +      prop->pmmu.host_resident = 1;
 +      prop->pmmu.num_hops = MMU_ARCH_6_HOPS;
 +      prop->pmmu.last_mask = LAST_MASK;
 +      /* TODO: will be duplicated until implementing per-MMU props */
 +      prop->pmmu.hop_table_size = prop->mmu_hop_table_size;
 +      prop->pmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
 +
 +      prop->hints_host_reserved_va_range.start_addr = RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START;
 +      prop->hints_host_reserved_va_range.end_addr = RESERVED_VA_RANGE_FOR_ARC_ON_HOST_END;
 +      prop->hints_host_hpage_reserved_va_range.start_addr =
 +                      RESERVED_VA_RANGE_FOR_ARC_ON_HOST_HPAGE_START;
 +      prop->hints_host_hpage_reserved_va_range.end_addr =
 +                      RESERVED_VA_RANGE_FOR_ARC_ON_HOST_HPAGE_END;
 +
 +      if (PAGE_SIZE == SZ_64K) {
 +              prop->pmmu.hop_shifts[MMU_HOP0] = HOP0_SHIFT_64K;
 +              prop->pmmu.hop_shifts[MMU_HOP1] = HOP1_SHIFT_64K;
 +              prop->pmmu.hop_shifts[MMU_HOP2] = HOP2_SHIFT_64K;
 +              prop->pmmu.hop_shifts[MMU_HOP3] = HOP3_SHIFT_64K;
 +              prop->pmmu.hop_shifts[MMU_HOP4] = HOP4_SHIFT_64K;
 +              prop->pmmu.hop_shifts[MMU_HOP5] = HOP5_SHIFT_64K;
 +              prop->pmmu.hop_masks[MMU_HOP0] = HOP0_MASK_64K;
 +              prop->pmmu.hop_masks[MMU_HOP1] = HOP1_MASK_64K;
 +              prop->pmmu.hop_masks[MMU_HOP2] = HOP2_MASK_64K;
 +              prop->pmmu.hop_masks[MMU_HOP3] = HOP3_MASK_64K;
 +              prop->pmmu.hop_masks[MMU_HOP4] = HOP4_MASK_64K;
 +              prop->pmmu.hop_masks[MMU_HOP5] = HOP5_MASK_64K;
 +              prop->pmmu.start_addr = VA_HOST_SPACE_PAGE_START;
 +              prop->pmmu.end_addr = VA_HOST_SPACE_PAGE_END;
 +              prop->pmmu.page_size = PAGE_SIZE_64KB;
 +
 +              /* shifts and masks are the same in PMMU and HPMMU */
 +              memcpy(&prop->pmmu_huge, &prop->pmmu, sizeof(prop->pmmu));
 +              prop->pmmu_huge.page_size = PAGE_SIZE_16MB;
 +              prop->pmmu_huge.start_addr = VA_HOST_SPACE_HPAGE_START;
 +              prop->pmmu_huge.end_addr = VA_HOST_SPACE_HPAGE_END;
 +      } else {
 +              prop->pmmu.hop_shifts[MMU_HOP0] = HOP0_SHIFT_4K;
 +              prop->pmmu.hop_shifts[MMU_HOP1] = HOP1_SHIFT_4K;
 +              prop->pmmu.hop_shifts[MMU_HOP2] = HOP2_SHIFT_4K;
 +              prop->pmmu.hop_shifts[MMU_HOP3] = HOP3_SHIFT_4K;
 +              prop->pmmu.hop_shifts[MMU_HOP4] = HOP4_SHIFT_4K;
 +              prop->pmmu.hop_shifts[MMU_HOP5] = HOP5_SHIFT_4K;
 +              prop->pmmu.hop_masks[MMU_HOP0] = HOP0_MASK_4K;
 +              prop->pmmu.hop_masks[MMU_HOP1] = HOP1_MASK_4K;
 +              prop->pmmu.hop_masks[MMU_HOP2] = HOP2_MASK_4K;
 +              prop->pmmu.hop_masks[MMU_HOP3] = HOP3_MASK_4K;
 +              prop->pmmu.hop_masks[MMU_HOP4] = HOP4_MASK_4K;
 +              prop->pmmu.hop_masks[MMU_HOP5] = HOP5_MASK_4K;
 +              prop->pmmu.start_addr = VA_HOST_SPACE_PAGE_START;
 +              prop->pmmu.end_addr = VA_HOST_SPACE_PAGE_END;
 +              prop->pmmu.page_size = PAGE_SIZE_4KB;
 +
 +              /* shifts and masks are the same in PMMU and HPMMU */
 +              memcpy(&prop->pmmu_huge, &prop->pmmu, sizeof(prop->pmmu));
 +              prop->pmmu_huge.page_size = PAGE_SIZE_2MB;
 +              prop->pmmu_huge.start_addr = VA_HOST_SPACE_HPAGE_START;
 +              prop->pmmu_huge.end_addr = VA_HOST_SPACE_HPAGE_END;
 +      }
 +
 +      prop->num_engine_cores = CPU_ID_MAX;
 +      prop->cfg_size = CFG_SIZE;
 +      prop->max_asid = MAX_ASID;
 +      prop->num_of_events = GAUDI2_EVENT_SIZE;
 +
 +      prop->dc_power_default = DC_POWER_DEFAULT;
 +
 +      prop->cb_pool_cb_cnt = GAUDI2_CB_POOL_CB_CNT;
 +      prop->cb_pool_cb_size = GAUDI2_CB_POOL_CB_SIZE;
 +      prop->pcie_dbi_base_address = CFG_BASE + mmPCIE_DBI_BASE;
 +      prop->pcie_aux_dbi_reg_addr = CFG_BASE + mmPCIE_AUX_DBI;
 +
 +      strncpy(prop->cpucp_info.card_name, GAUDI2_DEFAULT_CARD_NAME, CARD_NAME_MAX_LEN);
 +
 +      prop->mme_master_slave_mode = 1;
 +
 +      prop->first_available_user_sob[0] = GAUDI2_RESERVED_SOB_NUMBER +
 +                                      (num_sync_stream_queues * HL_RSVD_SOBS);
 +
 +      prop->first_available_user_mon[0] = GAUDI2_RESERVED_MON_NUMBER +
 +                                      (num_sync_stream_queues * HL_RSVD_MONS);
 +
 +      prop->first_available_user_interrupt = GAUDI2_IRQ_NUM_USER_FIRST;
 +
 +      prop->first_available_cq[0] = GAUDI2_RESERVED_CQ_NUMBER;
 +
 +      prop->fw_cpu_boot_dev_sts0_valid = false;
 +      prop->fw_cpu_boot_dev_sts1_valid = false;
 +      prop->hard_reset_done_by_fw = false;
 +      prop->gic_interrupts_enable = true;
 +
 +      prop->server_type = HL_SERVER_TYPE_UNKNOWN;
 +
 +      prop->max_dec = NUMBER_OF_DEC;
 +
 +      prop->clk_pll_index = HL_GAUDI2_MME_PLL;
 +
 +      prop->dma_mask = 64;
 +
 +      prop->hbw_flush_reg = mmPCIE_WRAP_SPECIAL_GLBL_SPARE_0;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_pci_bars_map(struct hl_device *hdev)
 +{
 +      static const char * const name[] = {"CFG_SRAM", "MSIX", "DRAM"};
 +      bool is_wc[3] = {false, false, true};
 +      int rc;
 +
 +      rc = hl_pci_bars_map(hdev, name, is_wc);
 +      if (rc)
 +              return rc;
 +
 +      hdev->rmmio = hdev->pcie_bar[SRAM_CFG_BAR_ID] + (CFG_BASE - STM_FLASH_BASE_ADDR);
 +
 +      return 0;
 +}
 +
 +static u64 gaudi2_set_hbm_bar_base(struct hl_device *hdev, u64 addr)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct hl_inbound_pci_region pci_region;
 +      u64 old_addr = addr;
 +      int rc;
 +
 +      if ((gaudi2) && (gaudi2->dram_bar_cur_addr == addr))
 +              return old_addr;
 +
 +      if (hdev->asic_prop.iatu_done_by_fw)
 +              return U64_MAX;
 +
 +      /* Inbound Region 2 - Bar 4 - Point to DRAM */
 +      pci_region.mode = PCI_BAR_MATCH_MODE;
 +      pci_region.bar = DRAM_BAR_ID;
 +      pci_region.addr = addr;
 +      rc = hl_pci_set_inbound_region(hdev, 2, &pci_region);
 +      if (rc)
 +              return U64_MAX;
 +
 +      if (gaudi2) {
 +              old_addr = gaudi2->dram_bar_cur_addr;
 +              gaudi2->dram_bar_cur_addr = addr;
 +      }
 +
 +      return old_addr;
 +}
 +
 +static int gaudi2_init_iatu(struct hl_device *hdev)
 +{
 +      struct hl_inbound_pci_region inbound_region;
 +      struct hl_outbound_pci_region outbound_region;
 +      u32 bar_addr_low, bar_addr_high;
 +      int rc;
 +
 +      if (hdev->asic_prop.iatu_done_by_fw)
 +              return 0;
 +
 +      /* Temporary inbound Region 0 - Bar 0 - Point to CFG
 +       * We must map this region in BAR match mode in order to
 +       * fetch BAR physical base address
 +       */
 +      inbound_region.mode = PCI_BAR_MATCH_MODE;
 +      inbound_region.bar = SRAM_CFG_BAR_ID;
 +      /* Base address must be aligned to Bar size which is 256 MB */
 +      inbound_region.addr = STM_FLASH_BASE_ADDR - STM_FLASH_ALIGNED_OFF;
 +      rc = hl_pci_set_inbound_region(hdev, 0, &inbound_region);
 +      if (rc)
 +              return rc;
 +
 +      /* Fetch physical BAR address */
 +      bar_addr_high = RREG32(mmPCIE_DBI_BAR1_REG + STM_FLASH_ALIGNED_OFF);
 +      bar_addr_low = RREG32(mmPCIE_DBI_BAR0_REG + STM_FLASH_ALIGNED_OFF) & ~0xF;
 +
 +      hdev->pcie_bar_phys[SRAM_CFG_BAR_ID] = (u64)bar_addr_high << 32 | bar_addr_low;
 +
 +      /* Inbound Region 0 - Bar 0 - Point to CFG */
 +      inbound_region.mode = PCI_ADDRESS_MATCH_MODE;
 +      inbound_region.bar = SRAM_CFG_BAR_ID;
 +      inbound_region.offset_in_bar = 0;
 +      inbound_region.addr = STM_FLASH_BASE_ADDR;
 +      inbound_region.size = CFG_REGION_SIZE;
 +      rc = hl_pci_set_inbound_region(hdev, 0, &inbound_region);
 +      if (rc)
 +              return rc;
 +
 +      /* Inbound Region 1 - Bar 0 - Point to BAR0_RESERVED + SRAM */
 +      inbound_region.mode = PCI_ADDRESS_MATCH_MODE;
 +      inbound_region.bar = SRAM_CFG_BAR_ID;
 +      inbound_region.offset_in_bar = CFG_REGION_SIZE;
 +      inbound_region.addr = BAR0_RSRVD_BASE_ADDR;
 +      inbound_region.size = BAR0_RSRVD_SIZE + SRAM_SIZE;
 +      rc = hl_pci_set_inbound_region(hdev, 1, &inbound_region);
 +      if (rc)
 +              return rc;
 +
 +      /* Inbound Region 2 - Bar 4 - Point to DRAM */
 +      inbound_region.mode = PCI_BAR_MATCH_MODE;
 +      inbound_region.bar = DRAM_BAR_ID;
 +      inbound_region.addr = DRAM_PHYS_BASE;
 +      rc = hl_pci_set_inbound_region(hdev, 2, &inbound_region);
 +      if (rc)
 +              return rc;
 +
 +      /* Outbound Region 0 - Point to Host */
 +      outbound_region.addr = HOST_PHYS_BASE_0;
 +      outbound_region.size = HOST_PHYS_SIZE_0;
 +      rc = hl_pci_set_outbound_region(hdev, &outbound_region);
 +
 +      return rc;
 +}
 +
 +static enum hl_device_hw_state gaudi2_get_hw_state(struct hl_device *hdev)
 +{
 +      return RREG32(mmHW_STATE);
 +}
 +
 +static int gaudi2_tpc_binning_init_prop(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +
 +      /*
 +       * check for error condition in which number of binning candidates
 +       * is higher than the maximum supported by the driver
 +       */
 +      if (hweight64(hdev->tpc_binning) > MAX_CLUSTER_BINNING_FAULTY_TPCS) {
 +              dev_err(hdev->dev, "TPC binning is supported for max of %d faulty TPCs, provided mask 0x%llx\n",
 +                                      MAX_CLUSTER_BINNING_FAULTY_TPCS,
 +                                      hdev->tpc_binning);
 +              return -EINVAL;
 +      }
 +
 +      prop->tpc_binning_mask = hdev->tpc_binning;
 +      prop->tpc_enabled_mask = GAUDI2_TPC_FULL_MASK;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_set_tpc_binning_masks(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct hw_queue_properties *q_props = prop->hw_queues_props;
 +      u64 tpc_binning_mask;
 +      u8 subst_idx = 0;
 +      int i, rc;
 +
 +      rc = gaudi2_tpc_binning_init_prop(hdev);
 +      if (rc)
 +              return rc;
 +
 +      tpc_binning_mask = prop->tpc_binning_mask;
 +
 +      for (i = 0 ; i < MAX_FAULTY_TPCS ; i++) {
 +              u8 subst_seq, binned, qid_base;
 +
 +              if (tpc_binning_mask == 0)
 +                      break;
 +
 +              if (subst_idx == 0) {
 +                      subst_seq = TPC_ID_DCORE0_TPC6;
 +                      qid_base = GAUDI2_QUEUE_ID_DCORE0_TPC_6_0;
 +              } else {
 +                      subst_seq = TPC_ID_DCORE3_TPC5;
 +                      qid_base = GAUDI2_QUEUE_ID_DCORE3_TPC_5_0;
 +              }
 +
 +
 +              /* clear bit from mask */
 +              binned = __ffs(tpc_binning_mask);
 +              /*
 +               * Coverity complains about possible out-of-bound access in
 +               * clear_bit
 +               */
 +              if (binned >= TPC_ID_SIZE) {
 +                      dev_err(hdev->dev,
 +                              "Invalid binned TPC (binning mask: %llx)\n",
 +                              tpc_binning_mask);
 +                      return -EINVAL;
 +              }
 +              clear_bit(binned, (unsigned long *)&tpc_binning_mask);
 +
 +              /* also clear replacing TPC bit from enabled mask */
 +              clear_bit(subst_seq, (unsigned long *)&prop->tpc_enabled_mask);
 +
 +              /* bin substite TPC's Qs */
 +              q_props[qid_base].binned = 1;
 +              q_props[qid_base + 1].binned = 1;
 +              q_props[qid_base + 2].binned = 1;
 +              q_props[qid_base + 3].binned = 1;
 +
 +              subst_idx++;
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_set_dec_binning_masks(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u8 num_faulty;
 +
 +      num_faulty = hweight32(hdev->decoder_binning);
 +
 +      /*
 +       * check for error condition in which number of binning candidates
 +       * is higher than the maximum supported by the driver
 +       */
 +      if (num_faulty > MAX_FAULTY_DECODERS) {
 +              dev_err(hdev->dev, "decoder binning is supported for max of single faulty decoder, provided mask 0x%x\n",
 +                                              hdev->decoder_binning);
 +              return -EINVAL;
 +      }
 +
 +      prop->decoder_binning_mask = (hdev->decoder_binning & GAUDI2_DECODER_FULL_MASK);
 +
 +      if (prop->decoder_binning_mask)
 +              prop->decoder_enabled_mask = (GAUDI2_DECODER_FULL_MASK & ~BIT(DEC_ID_PCIE_VDEC1));
 +      else
 +              prop->decoder_enabled_mask = GAUDI2_DECODER_FULL_MASK;
 +
 +      return 0;
 +}
 +
 +static void gaudi2_set_dram_binning_masks(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +
 +      /* check if we should override default binning */
 +      if (!hdev->dram_binning) {
 +              prop->dram_binning_mask = 0;
 +              prop->dram_enabled_mask = GAUDI2_DRAM_FULL_MASK;
 +              return;
 +      }
 +
 +      /* set DRAM binning constraints */
 +      prop->faulty_dram_cluster_map |= hdev->dram_binning;
 +      prop->dram_binning_mask = hdev->dram_binning;
 +      prop->dram_enabled_mask = GAUDI2_DRAM_FULL_MASK & ~BIT(HBM_ID5);
 +}
 +
 +static int gaudi2_set_edma_binning_masks(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct hw_queue_properties *q_props;
 +      u8 seq, num_faulty;
 +
 +      num_faulty = hweight32(hdev->edma_binning);
 +
 +      /*
 +       * check for error condition in which number of binning candidates
 +       * is higher than the maximum supported by the driver
 +       */
 +      if (num_faulty > MAX_FAULTY_EDMAS) {
 +              dev_err(hdev->dev,
 +                      "EDMA binning is supported for max of single faulty EDMA, provided mask 0x%x\n",
 +                      hdev->edma_binning);
 +              return -EINVAL;
 +      }
 +
 +      if (!hdev->edma_binning) {
 +              prop->edma_binning_mask = 0;
 +              prop->edma_enabled_mask = GAUDI2_EDMA_FULL_MASK;
 +              return 0;
 +      }
 +
 +      seq = __ffs((unsigned long)hdev->edma_binning);
 +
 +      /* set binning constraints */
 +      prop->faulty_dram_cluster_map |= BIT(edma_to_hbm_cluster[seq]);
 +      prop->edma_binning_mask = hdev->edma_binning;
 +      prop->edma_enabled_mask = GAUDI2_EDMA_FULL_MASK & ~BIT(EDMA_ID_DCORE3_INSTANCE1);
 +
 +      /* bin substitute EDMA's queue */
 +      q_props = prop->hw_queues_props;
 +      q_props[GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0].binned = 1;
 +      q_props[GAUDI2_QUEUE_ID_DCORE3_EDMA_1_1].binned = 1;
 +      q_props[GAUDI2_QUEUE_ID_DCORE3_EDMA_1_2].binned = 1;
 +      q_props[GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3].binned = 1;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_set_xbar_edge_enable_mask(struct hl_device *hdev, u32 xbar_edge_iso_mask)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u8 num_faulty, seq;
 +
 +      /* check if we should override default binning */
 +      if (!xbar_edge_iso_mask) {
 +              prop->xbar_edge_enabled_mask = GAUDI2_XBAR_EDGE_FULL_MASK;
 +              return 0;
 +      }
 +
 +      /*
 +       * note that it can be set to value other than 0 only after cpucp packet (i.e.
 +       * only the FW can set a redundancy value). for user it'll always be 0.
 +       */
 +      num_faulty = hweight32(xbar_edge_iso_mask);
 +
 +      /*
 +       * check for error condition in which number of binning candidates
 +       * is higher than the maximum supported by the driver
 +       */
 +      if (num_faulty > MAX_FAULTY_XBARS) {
 +              dev_err(hdev->dev, "we cannot have more than %d faulty XBAR EDGE\n",
 +                                                                      MAX_FAULTY_XBARS);
 +              return -EINVAL;
 +      }
 +
 +      seq = __ffs((unsigned long)xbar_edge_iso_mask);
 +
 +      /* set binning constraints */
 +      prop->faulty_dram_cluster_map |= BIT(xbar_edge_to_hbm_cluster[seq]);
 +      prop->xbar_edge_enabled_mask = (~xbar_edge_iso_mask) & GAUDI2_XBAR_EDGE_FULL_MASK;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_set_cluster_binning_masks_common(struct hl_device *hdev, u8 xbar_edge_iso_mask)
 +{
 +      int rc;
 +
 +      /*
 +       * mark all clusters as good, each component will "fail" cluster
 +       * based on eFuse/user values.
 +       * If more than single cluster is faulty- the chip is unusable
 +       */
 +      hdev->asic_prop.faulty_dram_cluster_map = 0;
 +
 +      gaudi2_set_dram_binning_masks(hdev);
 +
 +      rc = gaudi2_set_edma_binning_masks(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = gaudi2_set_xbar_edge_enable_mask(hdev, xbar_edge_iso_mask);
 +      if (rc)
 +              return rc;
 +
 +
 +      /* always initially set to full mask */
 +      hdev->asic_prop.hmmu_hif_enabled_mask = GAUDI2_HIF_HMMU_FULL_MASK;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_set_cluster_binning_masks(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int rc;
 +
 +      rc = gaudi2_set_cluster_binning_masks_common(hdev, prop->cpucp_info.xbar_binning_mask);
 +      if (rc)
 +              return rc;
 +
 +      /* if we have DRAM binning reported by FW we should perform cluster config  */
 +      if (prop->faulty_dram_cluster_map) {
 +              u8 cluster_seq = __ffs((unsigned long)prop->faulty_dram_cluster_map);
 +
 +              prop->hmmu_hif_enabled_mask = cluster_hmmu_hif_enabled_mask[cluster_seq];
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_set_binning_masks(struct hl_device *hdev)
 +{
 +      int rc;
 +
 +      rc = gaudi2_set_cluster_binning_masks(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = gaudi2_set_tpc_binning_masks(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = gaudi2_set_dec_binning_masks(hdev);
 +      if (rc)
 +              return rc;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_cpucp_info_get(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      long max_power;
 +      u64 dram_size;
 +      int rc;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      /* No point of asking this information again when not doing hard reset, as the device
 +       * CPU hasn't been reset
 +       */
 +      if (hdev->reset_info.in_compute_reset)
 +              return 0;
 +
 +      rc = hl_fw_cpucp_handshake(hdev, mmCPU_BOOT_DEV_STS0, mmCPU_BOOT_DEV_STS1, mmCPU_BOOT_ERR0,
 +                                                                              mmCPU_BOOT_ERR1);
 +      if (rc)
 +              return rc;
 +
 +      dram_size = le64_to_cpu(prop->cpucp_info.dram_size);
 +      if (dram_size) {
 +              /* we can have wither 5 or 6 HBMs. other values are invalid */
 +
 +              if ((dram_size != ((GAUDI2_HBM_NUM - 1) * SZ_16G)) &&
 +                                      (dram_size != (GAUDI2_HBM_NUM * SZ_16G))) {
 +                      dev_err(hdev->dev,
 +                              "F/W reported invalid DRAM size %llu. Trying to use default size %llu\n",
 +                              dram_size, prop->dram_size);
 +                      dram_size = prop->dram_size;
 +              }
 +
 +              prop->dram_size = dram_size;
 +              prop->dram_end_address = prop->dram_base_address + dram_size;
 +      }
 +
 +      if (!strlen(prop->cpucp_info.card_name))
 +              strncpy(prop->cpucp_info.card_name, GAUDI2_DEFAULT_CARD_NAME, CARD_NAME_MAX_LEN);
 +
 +      /* Overwrite binning masks with the actual binning values from F/W */
 +      hdev->dram_binning = prop->cpucp_info.dram_binning_mask;
 +      hdev->edma_binning = prop->cpucp_info.edma_binning_mask;
 +      hdev->tpc_binning = le64_to_cpu(prop->cpucp_info.tpc_binning_mask);
 +      hdev->decoder_binning = lower_32_bits(le64_to_cpu(prop->cpucp_info.decoder_binning_mask));
 +
 +      /*
 +       * at this point the DRAM parameters need to be updated according to data obtained
 +       * from the FW
 +       */
 +      rc = hdev->asic_funcs->set_dram_properties(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = hdev->asic_funcs->set_binning_masks(hdev);
 +      if (rc)
 +              return rc;
 +
 +      max_power = hl_fw_get_max_power(hdev);
 +      if (max_power < 0)
 +              return max_power;
 +
 +      prop->max_power_default = (u64) max_power;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_fetch_psoc_frequency(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u16 pll_freq_arr[HL_PLL_NUM_OUTPUTS];
 +      int rc;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      rc = hl_fw_cpucp_pll_info_get(hdev, HL_GAUDI2_CPU_PLL, pll_freq_arr);
 +      if (rc)
 +              return rc;
 +
 +      hdev->asic_prop.psoc_timestamp_frequency = pll_freq_arr[3];
 +
 +      return 0;
 +}
 +
 +static int gaudi2_early_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct pci_dev *pdev = hdev->pdev;
 +      resource_size_t pci_bar_size;
 +      int rc;
 +
 +      rc = gaudi2_set_fixed_properties(hdev);
 +      if (rc)
 +              return rc;
 +
 +      /* Check BAR sizes */
 +      pci_bar_size = pci_resource_len(pdev, SRAM_CFG_BAR_ID);
 +
 +      if (pci_bar_size != CFG_BAR_SIZE) {
 +              dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
 +                      SRAM_CFG_BAR_ID, &pci_bar_size, CFG_BAR_SIZE);
 +              rc = -ENODEV;
 +              goto free_queue_props;
 +      }
 +
 +      pci_bar_size = pci_resource_len(pdev, MSIX_BAR_ID);
 +      if (pci_bar_size != MSIX_BAR_SIZE) {
 +              dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
 +                      MSIX_BAR_ID, &pci_bar_size, MSIX_BAR_SIZE);
 +              rc = -ENODEV;
 +              goto free_queue_props;
 +      }
 +
 +      prop->dram_pci_bar_size = pci_resource_len(pdev, DRAM_BAR_ID);
 +      hdev->dram_pci_bar_start = pci_resource_start(pdev, DRAM_BAR_ID);
 +
 +      /*
 +       * Only in pldm driver config iATU
 +       */
 +      if (hdev->pldm)
 +              hdev->asic_prop.iatu_done_by_fw = false;
 +      else
 +              hdev->asic_prop.iatu_done_by_fw = true;
 +
 +      rc = hl_pci_init(hdev);
 +      if (rc)
 +              goto free_queue_props;
 +
 +      /* Before continuing in the initialization, we need to read the preboot
 +       * version to determine whether we run with a security-enabled firmware
 +       */
 +      rc = hl_fw_read_preboot_status(hdev);
 +      if (rc) {
 +              if (hdev->reset_on_preboot_fail)
 +                      hdev->asic_funcs->hw_fini(hdev, true, false);
 +              goto pci_fini;
 +      }
 +
 +      if (gaudi2_get_hw_state(hdev) == HL_DEVICE_HW_STATE_DIRTY) {
 +              dev_dbg(hdev->dev, "H/W state is dirty, must reset before initializing\n");
 +              hdev->asic_funcs->hw_fini(hdev, true, false);
 +      }
 +
 +      return 0;
 +
 +pci_fini:
 +      hl_pci_fini(hdev);
 +free_queue_props:
 +      kfree(hdev->asic_prop.hw_queues_props);
 +      return rc;
 +}
 +
 +static int gaudi2_early_fini(struct hl_device *hdev)
 +{
 +      kfree(hdev->asic_prop.hw_queues_props);
 +      hl_pci_fini(hdev);
 +
 +      return 0;
 +}
 +
 +static bool gaudi2_is_arc_nic_owned(u64 arc_id)
 +{
 +      switch (arc_id) {
 +      case CPU_ID_NIC_QMAN_ARC0...CPU_ID_NIC_QMAN_ARC23:
 +              return true;
 +      default:
 +              return false;
 +      }
 +}
 +
 +static bool gaudi2_is_arc_tpc_owned(u64 arc_id)
 +{
 +      switch (arc_id) {
 +      case CPU_ID_TPC_QMAN_ARC0...CPU_ID_TPC_QMAN_ARC24:
 +              return true;
 +      default:
 +              return false;
 +      }
 +}
 +
 +static void gaudi2_init_arcs(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u64 arc_id;
 +      u32 i;
 +
 +      for (i = CPU_ID_SCHED_ARC0 ; i <= CPU_ID_SCHED_ARC3 ; i++) {
 +              if (gaudi2_is_arc_enabled(hdev, i))
 +                      continue;
 +
 +              gaudi2_set_arc_id_cap(hdev, i);
 +      }
 +
 +      for (i = GAUDI2_QUEUE_ID_PDMA_0_0 ; i < GAUDI2_QUEUE_ID_CPU_PQ ; i += 4) {
 +              if (!gaudi2_is_queue_enabled(hdev, i))
 +                      continue;
 +
 +              arc_id = gaudi2_queue_id_to_arc_id[i];
 +              if (gaudi2_is_arc_enabled(hdev, arc_id))
 +                      continue;
 +
 +              if (gaudi2_is_arc_nic_owned(arc_id) &&
 +                              !(hdev->nic_ports_mask & BIT_ULL(arc_id - CPU_ID_NIC_QMAN_ARC0)))
 +                      continue;
 +
 +              if (gaudi2_is_arc_tpc_owned(arc_id) && !(gaudi2->tpc_hw_cap_initialized &
 +                                                      BIT_ULL(arc_id - CPU_ID_TPC_QMAN_ARC0)))
 +                      continue;
 +
 +              gaudi2_set_arc_id_cap(hdev, arc_id);
 +      }
 +}
 +
 +static int gaudi2_scrub_arc_dccm(struct hl_device *hdev, u32 cpu_id)
 +{
 +      u32 reg_base, reg_val;
 +      int rc;
 +
 +      switch (cpu_id) {
 +      case CPU_ID_SCHED_ARC0 ... CPU_ID_SCHED_ARC3:
 +              /* Each ARC scheduler has 2 consecutive DCCM blocks */
 +              rc = gaudi2_send_job_to_kdma(hdev, 0, CFG_BASE + gaudi2_arc_dccm_bases[cpu_id],
 +                                              ARC_DCCM_BLOCK_SIZE * 2, true);
 +              if (rc)
 +                      return rc;
 +              break;
 +      case CPU_ID_SCHED_ARC4:
 +      case CPU_ID_SCHED_ARC5:
 +      case CPU_ID_MME_QMAN_ARC0:
 +      case CPU_ID_MME_QMAN_ARC1:
 +              reg_base = gaudi2_arc_blocks_bases[cpu_id];
 +
 +              /* Scrub lower DCCM block */
 +              rc = gaudi2_send_job_to_kdma(hdev, 0, CFG_BASE + gaudi2_arc_dccm_bases[cpu_id],
 +                                              ARC_DCCM_BLOCK_SIZE, true);
 +              if (rc)
 +                      return rc;
 +
 +              /* Switch to upper DCCM block */
 +              reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_MME_ARC_UPPER_DCCM_EN_VAL_MASK, 1);
 +              WREG32(reg_base + ARC_DCCM_UPPER_EN_OFFSET, reg_val);
 +
 +              /* Scrub upper DCCM block */
 +              rc = gaudi2_send_job_to_kdma(hdev, 0, CFG_BASE + gaudi2_arc_dccm_bases[cpu_id],
 +                                              ARC_DCCM_BLOCK_SIZE, true);
 +              if (rc)
 +                      return rc;
 +
 +              /* Switch to lower DCCM block */
 +              reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_MME_ARC_UPPER_DCCM_EN_VAL_MASK, 0);
 +              WREG32(reg_base + ARC_DCCM_UPPER_EN_OFFSET, reg_val);
 +              break;
 +      default:
 +              rc = gaudi2_send_job_to_kdma(hdev, 0, CFG_BASE + gaudi2_arc_dccm_bases[cpu_id],
 +                                              ARC_DCCM_BLOCK_SIZE, true);
 +              if (rc)
 +                      return rc;
 +      }
 +
 +      return 0;
 +}
 +
 +static void gaudi2_scrub_arcs_dccm(struct hl_device *hdev)
 +{
 +      u16 arc_id;
 +
 +      for (arc_id = CPU_ID_SCHED_ARC0 ; arc_id < CPU_ID_MAX ; arc_id++) {
 +              if (!gaudi2_is_arc_enabled(hdev, arc_id))
 +                      continue;
 +
 +              gaudi2_scrub_arc_dccm(hdev, arc_id);
 +      }
 +}
 +
 +static int gaudi2_late_init(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int rc;
 +
 +      hdev->asic_prop.supports_advanced_cpucp_rc = true;
 +
 +      rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_ENABLE_PCI_ACCESS,
 +                                      gaudi2->virt_msix_db_dma_addr);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to enable PCI access from CPU\n");
 +              return rc;
 +      }
 +
 +      rc = gaudi2_fetch_psoc_frequency(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to fetch psoc frequency\n");
 +              goto disable_pci_access;
 +      }
 +
 +      gaudi2_init_arcs(hdev);
 +      gaudi2_scrub_arcs_dccm(hdev);
 +      gaudi2_init_security(hdev);
 +
 +      return 0;
 +
 +disable_pci_access:
 +      hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
 +
 +      return rc;
 +}
 +
 +static void gaudi2_late_fini(struct hl_device *hdev)
 +{
 +      hl_hwmon_release_resources(hdev);
 +}
 +
 +static void gaudi2_user_mapped_dec_init(struct gaudi2_device *gaudi2, u32 start_idx)
 +{
 +      struct user_mapped_block *blocks = gaudi2->mapped_blocks;
 +
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE0_DEC0_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE0_DEC1_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE1_DEC0_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE1_DEC1_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE2_DEC0_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE2_DEC1_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE3_DEC0_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE3_DEC1_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmPCIE_DEC0_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx], mmPCIE_DEC1_CMD_BASE, HL_BLOCK_SIZE);
 +}
 +
 +static void gaudi2_user_mapped_blocks_init(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct user_mapped_block *blocks = gaudi2->mapped_blocks;
 +      u32 block_size, umr_start_idx, num_umr_blocks;
 +      int i;
 +
 +      for (i = 0 ; i < NUM_ARC_CPUS ; i++) {
 +              if (i >= CPU_ID_SCHED_ARC0 && i <= CPU_ID_SCHED_ARC3)
 +                      block_size = ARC_DCCM_BLOCK_SIZE * 2;
 +              else
 +                      block_size = ARC_DCCM_BLOCK_SIZE;
 +
 +              blocks[i].address = gaudi2_arc_dccm_bases[i];
 +              blocks[i].size = block_size;
 +      }
 +
 +      blocks[NUM_ARC_CPUS].address = mmARC_FARM_ARC0_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS].size = HL_BLOCK_SIZE;
 +
 +      blocks[NUM_ARC_CPUS + 1].address = mmARC_FARM_ARC1_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS + 1].size = HL_BLOCK_SIZE;
 +
 +      blocks[NUM_ARC_CPUS + 2].address = mmARC_FARM_ARC2_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS + 2].size = HL_BLOCK_SIZE;
 +
 +      blocks[NUM_ARC_CPUS + 3].address = mmARC_FARM_ARC3_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS + 3].size = HL_BLOCK_SIZE;
 +
 +      blocks[NUM_ARC_CPUS + 4].address = mmDCORE0_MME_QM_ARC_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS + 4].size = HL_BLOCK_SIZE;
 +
 +      blocks[NUM_ARC_CPUS + 5].address = mmDCORE1_MME_QM_ARC_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS + 5].size = HL_BLOCK_SIZE;
 +
 +      blocks[NUM_ARC_CPUS + 6].address = mmDCORE2_MME_QM_ARC_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS + 6].size = HL_BLOCK_SIZE;
 +
 +      blocks[NUM_ARC_CPUS + 7].address = mmDCORE3_MME_QM_ARC_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS + 7].size = HL_BLOCK_SIZE;
 +
 +      umr_start_idx = NUM_ARC_CPUS + NUM_OF_USER_ACP_BLOCKS;
 +      num_umr_blocks = NIC_NUMBER_OF_ENGINES * NUM_OF_USER_NIC_UMR_BLOCKS;
 +      for (i = 0 ; i < num_umr_blocks ; i++) {
 +              u8 nic_id, umr_block_id;
 +
 +              nic_id = i / NUM_OF_USER_NIC_UMR_BLOCKS;
 +              umr_block_id = i % NUM_OF_USER_NIC_UMR_BLOCKS;
 +
 +              blocks[umr_start_idx + i].address =
 +                      mmNIC0_UMR0_0_UNSECURE_DOORBELL0_BASE +
 +                      (nic_id / NIC_NUMBER_OF_QM_PER_MACRO) * NIC_OFFSET +
 +                      (nic_id % NIC_NUMBER_OF_QM_PER_MACRO) * NIC_QM_OFFSET +
 +                      umr_block_id * NIC_UMR_OFFSET;
 +              blocks[umr_start_idx + i].size = HL_BLOCK_SIZE;
 +      }
 +
 +      /* Expose decoder HW configuration block to user */
 +      gaudi2_user_mapped_dec_init(gaudi2, USR_MAPPED_BLK_DEC_START_IDX);
 +
 +      for (i = 1; i < NUM_OF_DCORES; ++i) {
 +              blocks[USR_MAPPED_BLK_SM_START_IDX + 2 * (i - 1)].size = SM_OBJS_BLOCK_SIZE;
 +              blocks[USR_MAPPED_BLK_SM_START_IDX + 2 * (i - 1) + 1].size = HL_BLOCK_SIZE;
 +
 +              blocks[USR_MAPPED_BLK_SM_START_IDX + 2 * (i - 1)].address =
 +                                              mmDCORE0_SYNC_MNGR_OBJS_BASE + i * DCORE_OFFSET;
 +
 +              blocks[USR_MAPPED_BLK_SM_START_IDX + 2 * (i - 1) + 1].address =
 +                                              mmDCORE0_SYNC_MNGR_GLBL_BASE + i * DCORE_OFFSET;
 +      }
 +}
 +
 +static int gaudi2_alloc_cpu_accessible_dma_mem(struct hl_device *hdev)
 +{
 +      dma_addr_t dma_addr_arr[GAUDI2_ALLOC_CPU_MEM_RETRY_CNT] = {}, end_addr;
 +      void *virt_addr_arr[GAUDI2_ALLOC_CPU_MEM_RETRY_CNT] = {};
 +      int i, j, rc = 0;
 +
 +      /* The device ARC works with 32-bits addresses, and because there is a single HW register
 +       * that holds the extension bits (49..28), these bits must be identical in all the allocated
 +       * range.
 +       */
 +
 +      for (i = 0 ; i < GAUDI2_ALLOC_CPU_MEM_RETRY_CNT ; i++) {
 +              virt_addr_arr[i] = hl_asic_dma_alloc_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE,
 +                                                      &dma_addr_arr[i], GFP_KERNEL | __GFP_ZERO);
 +              if (!virt_addr_arr[i]) {
 +                      rc = -ENOMEM;
 +                      goto free_dma_mem_arr;
 +              }
 +
 +              end_addr = dma_addr_arr[i] + HL_CPU_ACCESSIBLE_MEM_SIZE - 1;
 +              if (GAUDI2_ARC_PCI_MSB_ADDR(dma_addr_arr[i]) == GAUDI2_ARC_PCI_MSB_ADDR(end_addr))
 +                      break;
 +      }
 +
 +      if (i == GAUDI2_ALLOC_CPU_MEM_RETRY_CNT) {
 +              dev_err(hdev->dev,
 +                      "MSB of ARC accessible DMA memory are not identical in all range\n");
 +              rc = -EFAULT;
 +              goto free_dma_mem_arr;
 +      }
 +
 +      hdev->cpu_accessible_dma_mem = virt_addr_arr[i];
 +      hdev->cpu_accessible_dma_address = dma_addr_arr[i];
 +
 +free_dma_mem_arr:
 +      for (j = 0 ; j < i ; j++)
 +              hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, virt_addr_arr[j],
 +                                              dma_addr_arr[j]);
 +
 +      return rc;
 +}
 +
 +static void gaudi2_set_pci_memory_regions(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct pci_mem_region *region;
 +
 +      /* CFG */
 +      region = &hdev->pci_mem_region[PCI_REGION_CFG];
 +      region->region_base = CFG_BASE;
 +      region->region_size = CFG_SIZE;
 +      region->offset_in_bar = CFG_BASE - STM_FLASH_BASE_ADDR;
 +      region->bar_size = CFG_BAR_SIZE;
 +      region->bar_id = SRAM_CFG_BAR_ID;
 +      region->used = 1;
 +
 +      /* SRAM */
 +      region = &hdev->pci_mem_region[PCI_REGION_SRAM];
 +      region->region_base = SRAM_BASE_ADDR;
 +      region->region_size = SRAM_SIZE;
 +      region->offset_in_bar = CFG_REGION_SIZE + BAR0_RSRVD_SIZE;
 +      region->bar_size = CFG_BAR_SIZE;
 +      region->bar_id = SRAM_CFG_BAR_ID;
 +      region->used = 1;
 +
 +      /* DRAM */
 +      region = &hdev->pci_mem_region[PCI_REGION_DRAM];
 +      region->region_base = DRAM_PHYS_BASE;
 +      region->region_size = hdev->asic_prop.dram_size;
 +      region->offset_in_bar = 0;
 +      region->bar_size = prop->dram_pci_bar_size;
 +      region->bar_id = DRAM_BAR_ID;
 +      region->used = 1;
 +}
 +
 +static void gaudi2_user_interrupt_setup(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int i, j, k;
 +
 +      /* Initialize common user CQ interrupt */
 +      HL_USR_INTR_STRUCT_INIT(hdev->common_user_cq_interrupt, hdev,
 +                              HL_COMMON_USER_CQ_INTERRUPT_ID, HL_USR_INTERRUPT_CQ);
 +
 +      /* Initialize common decoder interrupt */
 +      HL_USR_INTR_STRUCT_INIT(hdev->common_decoder_interrupt, hdev,
 +                              HL_COMMON_DEC_INTERRUPT_ID, HL_USR_INTERRUPT_DECODER);
 +
 +      /* User interrupts structure holds both decoder and user interrupts from various engines.
 +       * We first initialize the decoder interrupts and then we add the user interrupts.
 +       * The only limitation is that the last decoder interrupt id must be smaller
 +       * then GAUDI2_IRQ_NUM_USER_FIRST. This is checked at compilation time.
 +       */
 +
 +      /* Initialize decoder interrupts, expose only normal interrupts,
 +       * error interrupts to be handled by driver
 +       */
 +      for (i = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM, j = 0 ; i <= GAUDI2_IRQ_NUM_SHARED_DEC1_NRM;
 +                                                                              i += 2, j++)
 +              HL_USR_INTR_STRUCT_INIT(hdev->user_interrupt[j], hdev, i,
 +                                              HL_USR_INTERRUPT_DECODER);
 +
 +      for (i = GAUDI2_IRQ_NUM_USER_FIRST, k = 0 ; k < prop->user_interrupt_count; i++, j++, k++)
 +              HL_USR_INTR_STRUCT_INIT(hdev->user_interrupt[j], hdev, i, HL_USR_INTERRUPT_CQ);
 +}
 +
 +static inline int gaudi2_get_non_zero_random_int(void)
 +{
 +      int rand = get_random_u32();
 +
 +      return rand ? rand : 1;
 +}
 +
 +static void gaudi2_special_blocks_free(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct hl_skip_blocks_cfg *skip_special_blocks_cfg =
 +                      &prop->skip_special_blocks_cfg;
 +
 +      kfree(prop->special_blocks);
 +      kfree(skip_special_blocks_cfg->block_types);
 +      kfree(skip_special_blocks_cfg->block_ranges);
 +}
 +
 +static void gaudi2_special_blocks_iterator_free(struct hl_device *hdev)
 +{
 +      gaudi2_special_blocks_free(hdev);
 +}
 +
 +static bool gaudi2_special_block_skip(struct hl_device *hdev,
 +              struct hl_special_blocks_cfg *special_blocks_cfg,
 +              u32 blk_idx, u32 major, u32 minor, u32 sub_minor)
 +{
 +      return false;
 +}
 +
 +static int gaudi2_special_blocks_config(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int i, rc;
 +
 +      /* Configure Special blocks */
 +      prop->glbl_err_cause_num = GAUDI2_NUM_OF_GLBL_ERR_CAUSE;
 +      prop->num_of_special_blocks = ARRAY_SIZE(gaudi2_special_blocks);
 +      prop->special_blocks = kmalloc_array(prop->num_of_special_blocks,
 +                      sizeof(*prop->special_blocks), GFP_KERNEL);
 +      if (!prop->special_blocks)
 +              return -ENOMEM;
 +
 +      for (i = 0 ; i < prop->num_of_special_blocks ; i++)
 +              memcpy(&prop->special_blocks[i], &gaudi2_special_blocks[i],
 +                              sizeof(*prop->special_blocks));
 +
 +      /* Configure when to skip Special blocks */
 +      memset(&prop->skip_special_blocks_cfg, 0, sizeof(prop->skip_special_blocks_cfg));
 +      prop->skip_special_blocks_cfg.skip_block_hook = gaudi2_special_block_skip;
 +
 +      if (ARRAY_SIZE(gaudi2_iterator_skip_block_types)) {
 +              prop->skip_special_blocks_cfg.block_types =
 +                              kmalloc_array(ARRAY_SIZE(gaudi2_iterator_skip_block_types),
 +                                      sizeof(gaudi2_iterator_skip_block_types[0]), GFP_KERNEL);
 +              if (!prop->skip_special_blocks_cfg.block_types) {
 +                      rc = -ENOMEM;
 +                      goto free_special_blocks;
 +              }
 +
 +              memcpy(prop->skip_special_blocks_cfg.block_types, gaudi2_iterator_skip_block_types,
 +                              sizeof(gaudi2_iterator_skip_block_types));
 +
 +              prop->skip_special_blocks_cfg.block_types_len =
 +                                      ARRAY_SIZE(gaudi2_iterator_skip_block_types);
 +      }
 +
 +      if (ARRAY_SIZE(gaudi2_iterator_skip_block_ranges)) {
 +              prop->skip_special_blocks_cfg.block_ranges =
 +                              kmalloc_array(ARRAY_SIZE(gaudi2_iterator_skip_block_ranges),
 +                                      sizeof(gaudi2_iterator_skip_block_ranges[0]), GFP_KERNEL);
 +              if (!prop->skip_special_blocks_cfg.block_ranges) {
 +                      rc = -ENOMEM;
 +                      goto free_skip_special_blocks_types;
 +              }
 +
 +              for (i = 0 ; i < ARRAY_SIZE(gaudi2_iterator_skip_block_ranges) ; i++)
 +                      memcpy(&prop->skip_special_blocks_cfg.block_ranges[i],
 +                                      &gaudi2_iterator_skip_block_ranges[i],
 +                                      sizeof(struct range));
 +
 +              prop->skip_special_blocks_cfg.block_ranges_len =
 +                                      ARRAY_SIZE(gaudi2_iterator_skip_block_ranges);
 +      }
 +
 +      return 0;
 +
 +free_skip_special_blocks_types:
 +      kfree(prop->skip_special_blocks_cfg.block_types);
 +free_special_blocks:
 +      kfree(prop->special_blocks);
 +
 +      return rc;
 +}
 +
 +static int gaudi2_special_blocks_iterator_config(struct hl_device *hdev)
 +{
 +      return gaudi2_special_blocks_config(hdev);
 +}
 +
 +static int gaudi2_sw_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2;
 +      int i, rc;
 +
 +      /* Allocate device structure */
 +      gaudi2 = kzalloc(sizeof(*gaudi2), GFP_KERNEL);
 +      if (!gaudi2)
 +              return -ENOMEM;
 +
 +      for (i = 0 ; i < ARRAY_SIZE(gaudi2_irq_map_table) ; i++) {
 +              if (gaudi2_irq_map_table[i].msg || !gaudi2_irq_map_table[i].valid)
 +                      continue;
 +
 +              if (gaudi2->num_of_valid_hw_events == GAUDI2_EVENT_SIZE) {
 +                      dev_err(hdev->dev, "H/W events array exceeds the limit of %u events\n",
 +                              GAUDI2_EVENT_SIZE);
 +                      rc = -EINVAL;
 +                      goto free_gaudi2_device;
 +              }
 +
 +              gaudi2->hw_events[gaudi2->num_of_valid_hw_events++] = gaudi2_irq_map_table[i].fc_id;
 +      }
 +
 +      for (i = 0 ; i < MME_NUM_OF_LFSR_SEEDS ; i++)
 +              gaudi2->lfsr_rand_seeds[i] = gaudi2_get_non_zero_random_int();
 +
 +      gaudi2->cpucp_info_get = gaudi2_cpucp_info_get;
 +
 +      hdev->asic_specific = gaudi2;
 +
 +      /* Create DMA pool for small allocations.
 +       * Use DEVICE_CACHE_LINE_SIZE for alignment since the NIC memory-mapped
 +       * PI/CI registers allocated from this pool have this restriction
 +       */
 +      hdev->dma_pool = dma_pool_create(dev_name(hdev->dev), &hdev->pdev->dev,
 +                                      GAUDI2_DMA_POOL_BLK_SIZE, DEVICE_CACHE_LINE_SIZE, 0);
 +      if (!hdev->dma_pool) {
 +              dev_err(hdev->dev, "failed to create DMA pool\n");
 +              rc = -ENOMEM;
 +              goto free_gaudi2_device;
 +      }
 +
 +      rc = gaudi2_alloc_cpu_accessible_dma_mem(hdev);
 +      if (rc)
 +              goto free_dma_pool;
 +
 +      hdev->cpu_accessible_dma_pool = gen_pool_create(ilog2(32), -1);
 +      if (!hdev->cpu_accessible_dma_pool) {
 +              dev_err(hdev->dev, "Failed to create CPU accessible DMA pool\n");
 +              rc = -ENOMEM;
 +              goto free_cpu_dma_mem;
 +      }
 +
 +      rc = gen_pool_add(hdev->cpu_accessible_dma_pool, (uintptr_t) hdev->cpu_accessible_dma_mem,
 +                              HL_CPU_ACCESSIBLE_MEM_SIZE, -1);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to add memory to CPU accessible DMA pool\n");
 +              rc = -EFAULT;
 +              goto free_cpu_accessible_dma_pool;
 +      }
 +
 +      gaudi2->virt_msix_db_cpu_addr = hl_cpu_accessible_dma_pool_alloc(hdev, prop->pmmu.page_size,
 +                                                              &gaudi2->virt_msix_db_dma_addr);
 +      if (!gaudi2->virt_msix_db_cpu_addr) {
 +              dev_err(hdev->dev, "Failed to allocate DMA memory for virtual MSI-X doorbell\n");
 +              rc = -ENOMEM;
 +              goto free_cpu_accessible_dma_pool;
 +      }
 +
 +      spin_lock_init(&gaudi2->hw_queues_lock);
 +
 +      gaudi2->scratchpad_kernel_address = hl_asic_dma_alloc_coherent(hdev, PAGE_SIZE,
 +                                                      &gaudi2->scratchpad_bus_address,
 +                                                      GFP_KERNEL | __GFP_ZERO);
 +      if (!gaudi2->scratchpad_kernel_address) {
 +              rc = -ENOMEM;
 +              goto free_virt_msix_db_mem;
 +      }
 +
 +      gaudi2_user_mapped_blocks_init(hdev);
 +
 +      /* Initialize user interrupts */
 +      gaudi2_user_interrupt_setup(hdev);
 +
 +      hdev->supports_coresight = true;
 +      hdev->supports_sync_stream = true;
 +      hdev->supports_cb_mapping = true;
 +      hdev->supports_wait_for_multi_cs = false;
 +
 +      prop->supports_compute_reset = true;
 +
 +      hdev->asic_funcs->set_pci_memory_regions(hdev);
 +
 +      rc = gaudi2_special_blocks_iterator_config(hdev);
 +      if (rc)
 +              goto free_scratchpad_mem;
 +
 +      return 0;
 +
 +free_scratchpad_mem:
 +      hl_asic_dma_pool_free(hdev, gaudi2->scratchpad_kernel_address,
 +                              gaudi2->scratchpad_bus_address);
 +free_virt_msix_db_mem:
 +      hl_cpu_accessible_dma_pool_free(hdev, prop->pmmu.page_size, gaudi2->virt_msix_db_cpu_addr);
 +free_cpu_accessible_dma_pool:
 +      gen_pool_destroy(hdev->cpu_accessible_dma_pool);
 +free_cpu_dma_mem:
 +      hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
 +                                      hdev->cpu_accessible_dma_address);
 +free_dma_pool:
 +      dma_pool_destroy(hdev->dma_pool);
 +free_gaudi2_device:
 +      kfree(gaudi2);
 +      return rc;
 +}
 +
 +static int gaudi2_sw_fini(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      gaudi2_special_blocks_iterator_free(hdev);
 +
 +      hl_cpu_accessible_dma_pool_free(hdev, prop->pmmu.page_size, gaudi2->virt_msix_db_cpu_addr);
 +
 +      gen_pool_destroy(hdev->cpu_accessible_dma_pool);
 +
 +      hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
 +                                              hdev->cpu_accessible_dma_address);
 +
 +      hl_asic_dma_free_coherent(hdev, PAGE_SIZE, gaudi2->scratchpad_kernel_address,
 +                                      gaudi2->scratchpad_bus_address);
 +
 +      dma_pool_destroy(hdev->dma_pool);
 +
 +      kfree(gaudi2);
 +
 +      return 0;
 +}
 +
 +static void gaudi2_stop_qman_common(struct hl_device *hdev, u32 reg_base)
 +{
 +      WREG32(reg_base + QM_GLBL_CFG1_OFFSET, QM_GLBL_CFG1_PQF_STOP |
 +                                              QM_GLBL_CFG1_CQF_STOP |
 +                                              QM_GLBL_CFG1_CP_STOP);
 +
 +      /* stop also the ARC */
 +      WREG32(reg_base + QM_GLBL_CFG2_OFFSET, QM_GLBL_CFG2_ARC_CQF_STOP);
 +}
 +
 +static void gaudi2_flush_qman_common(struct hl_device *hdev, u32 reg_base)
 +{
 +      WREG32(reg_base + QM_GLBL_CFG1_OFFSET, QM_GLBL_CFG1_PQF_FLUSH |
 +                                              QM_GLBL_CFG1_CQF_FLUSH |
 +                                              QM_GLBL_CFG1_CP_FLUSH);
 +}
 +
 +static void gaudi2_flush_qman_arc_common(struct hl_device *hdev, u32 reg_base)
 +{
 +      WREG32(reg_base + QM_GLBL_CFG2_OFFSET, QM_GLBL_CFG2_ARC_CQF_FLUSH);
 +}
 +
 +/**
 + * gaudi2_clear_qm_fence_counters_common - clear QM's fence counters
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @queue_id: queue to clear fence counters to
 + * @skip_fence: if true set maximum fence value to all fence counters to avoid
 + *              getting stuck on any fence value. otherwise set all fence
 + *              counters to 0 (standard clear of fence counters)
 + */
 +static void gaudi2_clear_qm_fence_counters_common(struct hl_device *hdev, u32 queue_id,
 +                                              bool skip_fence)
 +{
 +      u32 size, reg_base;
 +      u32 addr, val;
 +
 +      reg_base = gaudi2_qm_blocks_bases[queue_id];
 +
 +      addr = reg_base + QM_CP_FENCE0_CNT_0_OFFSET;
 +      size = mmPDMA0_QM_CP_BARRIER_CFG - mmPDMA0_QM_CP_FENCE0_CNT_0;
 +
 +      /*
 +       * in case we want to make sure that QM that is stuck on a fence will
 +       * be released we should set the fence counter to a higher value that
 +       * the value the QM waiting for. to comply with any fence counter of
 +       * any value we set maximum fence value to all counters
 +       */
 +      val = skip_fence ? U32_MAX : 0;
 +      gaudi2_memset_device_lbw(hdev, addr, size, val);
 +}
 +
 +static void gaudi2_qman_manual_flush_common(struct hl_device *hdev, u32 queue_id)
 +{
 +      u32 reg_base = gaudi2_qm_blocks_bases[queue_id];
 +
 +      gaudi2_clear_qm_fence_counters_common(hdev, queue_id, true);
 +      gaudi2_flush_qman_common(hdev, reg_base);
 +      gaudi2_flush_qman_arc_common(hdev, reg_base);
 +}
 +
 +static void gaudi2_stop_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int dcore, inst;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_PDMA_MASK))
 +              goto stop_edma_qmans;
 +
 +      /* Stop CPs of PDMA QMANs */
 +      gaudi2_stop_qman_common(hdev, mmPDMA0_QM_BASE);
 +      gaudi2_stop_qman_common(hdev, mmPDMA1_QM_BASE);
 +
 +stop_edma_qmans:
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_EDMA_MASK))
 +              return;
 +
 +      for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
 +              for (inst = 0 ; inst < NUM_OF_EDMA_PER_DCORE ; inst++) {
 +                      u8 seq = dcore * NUM_OF_EDMA_PER_DCORE + inst;
 +                      u32 qm_base;
 +
 +                      if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_EDMA_SHIFT + seq)))
 +                              continue;
 +
 +                      qm_base = mmDCORE0_EDMA0_QM_BASE + dcore * DCORE_OFFSET +
 +                                      inst * DCORE_EDMA_OFFSET;
 +
 +                      /* Stop CPs of EDMA QMANs */
 +                      gaudi2_stop_qman_common(hdev, qm_base);
 +              }
 +      }
 +}
 +
 +static void gaudi2_stop_mme_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 offset, i;
 +
 +      offset = mmDCORE1_MME_QM_BASE - mmDCORE0_MME_QM_BASE;
 +
 +      for (i = 0 ; i < NUM_OF_DCORES ; i++) {
 +              if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_MME_SHIFT + i)))
 +                      continue;
 +
 +              gaudi2_stop_qman_common(hdev, mmDCORE0_MME_QM_BASE + (i * offset));
 +      }
 +}
 +
 +static void gaudi2_stop_tpc_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base;
 +      int i;
 +
 +      if (!(gaudi2->tpc_hw_cap_initialized & HW_CAP_TPC_MASK))
 +              return;
 +
 +      for (i = 0 ; i < TPC_ID_SIZE ; i++) {
 +              if (!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(HW_CAP_TPC_SHIFT + i)))
 +                      continue;
 +
 +              reg_base = gaudi2_qm_blocks_bases[gaudi2_tpc_id_to_queue_id[i]];
 +              gaudi2_stop_qman_common(hdev, reg_base);
 +      }
 +}
 +
 +static void gaudi2_stop_rot_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base;
 +      int i;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_ROT_MASK))
 +              return;
 +
 +      for (i = 0 ; i < ROTATOR_ID_SIZE ; i++) {
 +              if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_ROT_SHIFT + i)))
 +                      continue;
 +
 +              reg_base = gaudi2_qm_blocks_bases[gaudi2_rot_id_to_queue_id[i]];
 +              gaudi2_stop_qman_common(hdev, reg_base);
 +      }
 +}
 +
 +static void gaudi2_stop_nic_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base, queue_id;
 +      int i;
 +
 +      if (!(gaudi2->nic_hw_cap_initialized & HW_CAP_NIC_MASK))
 +              return;
 +
 +      queue_id = GAUDI2_QUEUE_ID_NIC_0_0;
 +
 +      for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
 +              if (!(hdev->nic_ports_mask & BIT(i)))
 +                      continue;
 +
 +              reg_base = gaudi2_qm_blocks_bases[queue_id];
 +              gaudi2_stop_qman_common(hdev, reg_base);
 +      }
 +}
 +
 +static void gaudi2_stall_dma_common(struct hl_device *hdev, u32 reg_base)
 +{
 +      u32 reg_val;
 +
 +      reg_val = FIELD_PREP(PDMA0_CORE_CFG_1_HALT_MASK, 0x1);
 +      WREG32(reg_base + DMA_CORE_CFG_1_OFFSET, reg_val);
 +}
 +
 +static void gaudi2_dma_stall(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int dcore, inst;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_PDMA_MASK))
 +              goto stall_edma;
 +
 +      gaudi2_stall_dma_common(hdev, mmPDMA0_CORE_BASE);
 +      gaudi2_stall_dma_common(hdev, mmPDMA1_CORE_BASE);
 +
 +stall_edma:
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_EDMA_MASK))
 +              return;
 +
 +      for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
 +              for (inst = 0 ; inst < NUM_OF_EDMA_PER_DCORE ; inst++) {
 +                      u8 seq = dcore * NUM_OF_EDMA_PER_DCORE + inst;
 +                      u32 core_base;
 +
 +                      if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_EDMA_SHIFT + seq)))
 +                              continue;
 +
 +                      core_base = mmDCORE0_EDMA0_CORE_BASE + dcore * DCORE_OFFSET +
 +                                      inst * DCORE_EDMA_OFFSET;
 +
 +                      /* Stall CPs of EDMA QMANs */
 +                      gaudi2_stall_dma_common(hdev, core_base);
 +              }
 +      }
 +}
 +
 +static void gaudi2_mme_stall(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 offset, i;
 +
 +      offset = mmDCORE1_MME_CTRL_LO_QM_STALL - mmDCORE0_MME_CTRL_LO_QM_STALL;
 +
 +      for (i = 0 ; i < NUM_OF_DCORES ; i++)
 +              if (gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_MME_SHIFT + i))
 +                      WREG32(mmDCORE0_MME_CTRL_LO_QM_STALL + (i * offset), 1);
 +}
 +
 +static void gaudi2_tpc_stall(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base;
 +      int i;
 +
 +      if (!(gaudi2->tpc_hw_cap_initialized & HW_CAP_TPC_MASK))
 +              return;
 +
 +      for (i = 0 ; i < TPC_ID_SIZE ; i++) {
 +              if (!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(HW_CAP_TPC_SHIFT + i)))
 +                      continue;
 +
 +              reg_base = gaudi2_tpc_cfg_blocks_bases[i];
 +              WREG32(reg_base + TPC_CFG_STALL_OFFSET, 1);
 +      }
 +}
 +
 +static void gaudi2_rotator_stall(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_val;
 +      int i;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_ROT_MASK))
 +              return;
 +
 +      reg_val = FIELD_PREP(ROT_MSS_HALT_WBC_MASK, 0x1) |
 +                      FIELD_PREP(ROT_MSS_HALT_RSB_MASK, 0x1) |
 +                      FIELD_PREP(ROT_MSS_HALT_MRSB_MASK, 0x1);
 +
 +      for (i = 0 ; i < ROTATOR_ID_SIZE ; i++) {
 +              if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_ROT_SHIFT + i)))
 +                      continue;
 +
 +              WREG32(mmROT0_MSS_HALT + i * ROT_OFFSET, reg_val);
 +      }
 +}
 +
 +static void gaudi2_disable_qman_common(struct hl_device *hdev, u32 reg_base)
 +{
 +      WREG32(reg_base + QM_GLBL_CFG0_OFFSET, 0);
 +}
 +
 +static void gaudi2_disable_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int dcore, inst;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_PDMA_MASK))
 +              goto stop_edma_qmans;
 +
 +      gaudi2_disable_qman_common(hdev, mmPDMA0_QM_BASE);
 +      gaudi2_disable_qman_common(hdev, mmPDMA1_QM_BASE);
 +
 +stop_edma_qmans:
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_EDMA_MASK))
 +              return;
 +
 +      for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
 +              for (inst = 0 ; inst < NUM_OF_EDMA_PER_DCORE ; inst++) {
 +                      u8 seq = dcore * NUM_OF_EDMA_PER_DCORE + inst;
 +                      u32 qm_base;
 +
 +                      if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_EDMA_SHIFT + seq)))
 +                              continue;
 +
 +                      qm_base = mmDCORE0_EDMA0_QM_BASE + dcore * DCORE_OFFSET +
 +                                      inst * DCORE_EDMA_OFFSET;
 +
 +                      /* Disable CPs of EDMA QMANs */
 +                      gaudi2_disable_qman_common(hdev, qm_base);
 +              }
 +      }
 +}
 +
 +static void gaudi2_disable_mme_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 offset, i;
 +
 +      offset = mmDCORE1_MME_QM_BASE - mmDCORE0_MME_QM_BASE;
 +
 +      for (i = 0 ; i < NUM_OF_DCORES ; i++)
 +              if (gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_MME_SHIFT + i))
 +                      gaudi2_disable_qman_common(hdev, mmDCORE0_MME_QM_BASE + (i * offset));
 +}
 +
 +static void gaudi2_disable_tpc_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base;
 +      int i;
 +
 +      if (!(gaudi2->tpc_hw_cap_initialized & HW_CAP_TPC_MASK))
 +              return;
 +
 +      for (i = 0 ; i < TPC_ID_SIZE ; i++) {
 +              if (!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(HW_CAP_TPC_SHIFT + i)))
 +                      continue;
 +
 +              reg_base = gaudi2_qm_blocks_bases[gaudi2_tpc_id_to_queue_id[i]];
 +              gaudi2_disable_qman_common(hdev, reg_base);
 +      }
 +}
 +
 +static void gaudi2_disable_rot_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base;
 +      int i;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_ROT_MASK))
 +              return;
 +
 +      for (i = 0 ; i < ROTATOR_ID_SIZE ; i++) {
 +              if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_ROT_SHIFT + i)))
 +                      continue;
 +
 +              reg_base = gaudi2_qm_blocks_bases[gaudi2_rot_id_to_queue_id[i]];
 +              gaudi2_disable_qman_common(hdev, reg_base);
 +      }
 +}
 +
 +static void gaudi2_disable_nic_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base, queue_id;
 +      int i;
 +
 +      if (!(gaudi2->nic_hw_cap_initialized & HW_CAP_NIC_MASK))
 +              return;
 +
 +      queue_id = GAUDI2_QUEUE_ID_NIC_0_0;
 +
 +      for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
 +              if (!(hdev->nic_ports_mask & BIT(i)))
 +                      continue;
 +
 +              reg_base = gaudi2_qm_blocks_bases[queue_id];
 +              gaudi2_disable_qman_common(hdev, reg_base);
 +      }
 +}
 +
 +static void gaudi2_enable_timestamp(struct hl_device *hdev)
 +{
 +      /* Disable the timestamp counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE, 0);
 +
 +      /* Zero the lower/upper parts of the 64-bit counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE + 0xC, 0);
 +      WREG32(mmPSOC_TIMESTAMP_BASE + 0x8, 0);
 +
 +      /* Enable the counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE, 1);
 +}
 +
 +static void gaudi2_disable_timestamp(struct hl_device *hdev)
 +{
 +      /* Disable the timestamp counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE, 0);
 +}
 +
 +static const char *gaudi2_irq_name(u16 irq_number)
 +{
 +      switch (irq_number) {
 +      case GAUDI2_IRQ_NUM_EVENT_QUEUE:
 +              return "gaudi2 cpu eq";
 +      case GAUDI2_IRQ_NUM_COMPLETION:
 +              return "gaudi2 completion";
 +      case GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM ... GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM:
 +              return gaudi2_vdec_irq_name[irq_number - GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM];
 +      case GAUDI2_IRQ_NUM_USER_FIRST ... GAUDI2_IRQ_NUM_USER_LAST:
 +              return "gaudi2 user completion";
 +      default:
 +              return "invalid";
 +      }
 +}
 +
 +static void gaudi2_dec_disable_msix(struct hl_device *hdev, u32 max_irq_num)
 +{
 +      int i, irq, relative_idx;
 +      struct hl_dec *dec;
 +
 +      for (i = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM ; i < max_irq_num ; i++) {
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              relative_idx = i - GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM;
 +
 +              dec = hdev->dec + relative_idx / 2;
 +
 +              /* We pass different structures depending on the irq handler. For the abnormal
 +               * interrupt we pass hl_dec and for the regular interrupt we pass the relevant
 +               * user_interrupt entry
 +               */
 +              free_irq(irq, ((relative_idx % 2) ?
 +                              (void *) dec :
 +                              (void *) &hdev->user_interrupt[dec->core_id]));
 +      }
 +}
 +
 +static int gaudi2_dec_enable_msix(struct hl_device *hdev)
 +{
 +      int rc, i, irq_init_cnt, irq, relative_idx;
 +      irq_handler_t irq_handler;
 +      struct hl_dec *dec;
 +
 +      for (i = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM, irq_init_cnt = 0;
 +                      i <= GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM;
 +                      i++, irq_init_cnt++) {
 +
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              relative_idx = i - GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM;
 +
 +              irq_handler = (relative_idx % 2) ?
 +                              hl_irq_handler_dec_abnrm :
 +                              hl_irq_handler_user_interrupt;
 +
 +              dec = hdev->dec + relative_idx / 2;
 +
 +              /* We pass different structures depending on the irq handler. For the abnormal
 +               * interrupt we pass hl_dec and for the regular interrupt we pass the relevant
 +               * user_interrupt entry
 +               */
 +              rc = request_irq(irq, irq_handler, 0, gaudi2_irq_name(i),
 +                              ((relative_idx % 2) ?
 +                              (void *) dec :
 +                              (void *) &hdev->user_interrupt[dec->core_id]));
 +              if (rc) {
 +                      dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +                      goto free_dec_irqs;
 +              }
 +      }
 +
 +      return 0;
 +
 +free_dec_irqs:
 +      gaudi2_dec_disable_msix(hdev, (GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM + irq_init_cnt));
 +      return rc;
 +}
 +
 +static int gaudi2_enable_msix(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int rc, irq, i, j, user_irq_init_cnt;
 +      irq_handler_t irq_handler;
 +      struct hl_cq *cq;
 +
 +      if (gaudi2->hw_cap_initialized & HW_CAP_MSIX)
 +              return 0;
 +
 +      rc = pci_alloc_irq_vectors(hdev->pdev, GAUDI2_MSIX_ENTRIES, GAUDI2_MSIX_ENTRIES,
 +                                      PCI_IRQ_MSIX);
 +      if (rc < 0) {
 +              dev_err(hdev->dev, "MSI-X: Failed to enable support -- %d/%d\n",
 +                      GAUDI2_MSIX_ENTRIES, rc);
 +              return rc;
 +      }
 +
 +      irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_COMPLETION);
 +      cq = &hdev->completion_queue[GAUDI2_RESERVED_CQ_CS_COMPLETION];
 +      rc = request_irq(irq, hl_irq_handler_cq, 0, gaudi2_irq_name(GAUDI2_IRQ_NUM_COMPLETION), cq);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +              goto free_irq_vectors;
 +      }
 +
 +      irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_EVENT_QUEUE);
 +      rc = request_irq(irq, hl_irq_handler_eq, 0, gaudi2_irq_name(GAUDI2_IRQ_NUM_EVENT_QUEUE),
 +                      &hdev->event_queue);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +              goto free_completion_irq;
 +      }
 +
 +      rc = gaudi2_dec_enable_msix(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to enable decoder IRQ");
 +              goto free_event_irq;
 +      }
 +
 +      for (i = GAUDI2_IRQ_NUM_USER_FIRST, j = prop->user_dec_intr_count, user_irq_init_cnt = 0;
 +                      user_irq_init_cnt < prop->user_interrupt_count;
 +                      i++, j++, user_irq_init_cnt++) {
 +
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              irq_handler = hl_irq_handler_user_interrupt;
 +
 +              rc = request_irq(irq, irq_handler, 0, gaudi2_irq_name(i), &hdev->user_interrupt[j]);
 +              if (rc) {
 +                      dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +                      goto free_user_irq;
 +              }
 +      }
 +
 +      gaudi2->hw_cap_initialized |= HW_CAP_MSIX;
 +
 +      return 0;
 +
 +free_user_irq:
 +      for (i = GAUDI2_IRQ_NUM_USER_FIRST, j = prop->user_dec_intr_count;
 +                      i < GAUDI2_IRQ_NUM_USER_FIRST + user_irq_init_cnt ; i++, j++) {
 +
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              free_irq(irq, &hdev->user_interrupt[j]);
 +      }
 +
 +      gaudi2_dec_disable_msix(hdev, GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM + 1);
 +
 +free_event_irq:
 +      irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_EVENT_QUEUE);
 +      free_irq(irq, cq);
 +
 +free_completion_irq:
 +      irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_COMPLETION);
 +      free_irq(irq, cq);
 +
 +free_irq_vectors:
 +      pci_free_irq_vectors(hdev->pdev);
 +
 +      return rc;
 +}
 +
 +static void gaudi2_sync_irqs(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int i, j;
 +      int irq;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_MSIX))
 +              return;
 +
 +      /* Wait for all pending IRQs to be finished */
 +      synchronize_irq(pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_COMPLETION));
 +
 +      for (i = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM ; i <= GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM ; i++) {
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              synchronize_irq(irq);
 +      }
 +
 +      for (i = GAUDI2_IRQ_NUM_USER_FIRST, j = 0 ; j < hdev->asic_prop.user_interrupt_count;
 +                                                                              i++, j++) {
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              synchronize_irq(irq);
 +      }
 +
 +      synchronize_irq(pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_EVENT_QUEUE));
 +}
 +
 +static void gaudi2_disable_msix(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct hl_cq *cq;
 +      int irq, i, j, k;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_MSIX))
 +              return;
 +
 +      gaudi2_sync_irqs(hdev);
 +
 +      irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_EVENT_QUEUE);
 +      free_irq(irq, &hdev->event_queue);
 +
 +      gaudi2_dec_disable_msix(hdev, GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM + 1);
 +
 +      for (i = GAUDI2_IRQ_NUM_USER_FIRST, j = prop->user_dec_intr_count, k = 0;
 +                      k < hdev->asic_prop.user_interrupt_count ; i++, j++, k++) {
 +
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              free_irq(irq, &hdev->user_interrupt[j]);
 +      }
 +
 +      irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_COMPLETION);
 +      cq = &hdev->completion_queue[GAUDI2_RESERVED_CQ_CS_COMPLETION];
 +      free_irq(irq, cq);
 +
 +      pci_free_irq_vectors(hdev->pdev);
 +
 +      gaudi2->hw_cap_initialized &= ~HW_CAP_MSIX;
 +}
 +
 +static void gaudi2_stop_dcore_dec(struct hl_device *hdev, int dcore_id)
 +{
 +      u32 reg_val = FIELD_PREP(DCORE0_VDEC0_BRDG_CTRL_GRACEFUL_STOP_MASK, 0x1);
 +      u32 graceful_pend_mask = DCORE0_VDEC0_BRDG_CTRL_GRACEFUL_PEND_MASK;
 +      u32 timeout_usec, dec_id, dec_bit, offset, graceful;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GAUDI2_PLDM_VDEC_TIMEOUT_USEC;
 +      else
 +              timeout_usec = GAUDI2_VDEC_TIMEOUT_USEC;
 +
 +      for (dec_id = 0 ; dec_id < NUM_OF_DEC_PER_DCORE ; dec_id++) {
 +              dec_bit = dcore_id * NUM_OF_DEC_PER_DCORE + dec_id;
 +              if (!(hdev->asic_prop.decoder_enabled_mask & BIT(dec_bit)))
 +                      continue;
 +
 +              offset = dcore_id * DCORE_OFFSET + dec_id * DCORE_VDEC_OFFSET;
 +
 +              WREG32(mmDCORE0_DEC0_CMD_SWREG16 + offset, 0);
 +
 +              WREG32(mmDCORE0_VDEC0_BRDG_CTRL_GRACEFUL + offset, reg_val);
 +
 +              /* Wait till all traffic from decoder stops
 +               * before apply core reset.
 +               */
 +              rc = hl_poll_timeout(
 +                              hdev,
 +                              mmDCORE0_VDEC0_BRDG_CTRL_GRACEFUL + offset,
 +                              graceful,
 +                              (graceful & graceful_pend_mask),
 +                              100,
 +                              timeout_usec);
 +              if (rc)
 +                      dev_err(hdev->dev,
 +                              "Failed to stop traffic from DCORE%d Decoder %d\n",
 +                              dcore_id, dec_id);
 +      }
 +}
 +
 +static void gaudi2_stop_pcie_dec(struct hl_device *hdev)
 +{
 +      u32 reg_val = FIELD_PREP(DCORE0_VDEC0_BRDG_CTRL_GRACEFUL_STOP_MASK, 0x1);
 +      u32 graceful_pend_mask = PCIE_VDEC0_BRDG_CTRL_GRACEFUL_PEND_MASK;
 +      u32 timeout_usec, dec_id, dec_bit, offset, graceful;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GAUDI2_PLDM_VDEC_TIMEOUT_USEC;
 +      else
 +              timeout_usec = GAUDI2_VDEC_TIMEOUT_USEC;
 +
 +      for (dec_id = 0 ; dec_id < NUM_OF_DEC_PER_DCORE ; dec_id++) {
 +              dec_bit = PCIE_DEC_SHIFT + dec_id;
 +              if (!(hdev->asic_prop.decoder_enabled_mask & BIT(dec_bit)))
 +                      continue;
 +
 +              offset = dec_id * PCIE_VDEC_OFFSET;
 +
 +              WREG32(mmPCIE_DEC0_CMD_SWREG16 + offset, 0);
 +
 +              WREG32(mmPCIE_VDEC0_BRDG_CTRL_GRACEFUL + offset, reg_val);
 +
 +              /* Wait till all traffic from decoder stops
 +               * before apply core reset.
 +               */
 +              rc = hl_poll_timeout(
 +                              hdev,
 +                              mmPCIE_VDEC0_BRDG_CTRL_GRACEFUL + offset,
 +                              graceful,
 +                              (graceful & graceful_pend_mask),
 +                              100,
 +                              timeout_usec);
 +              if (rc)
 +                      dev_err(hdev->dev,
 +                              "Failed to stop traffic from PCIe Decoder %d\n",
 +                              dec_id);
 +      }
 +}
 +
 +static void gaudi2_stop_dec(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int dcore_id;
 +
 +      if ((gaudi2->dec_hw_cap_initialized & HW_CAP_DEC_MASK) == 0)
 +              return;
 +
 +      for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++)
 +              gaudi2_stop_dcore_dec(hdev, dcore_id);
 +
 +      gaudi2_stop_pcie_dec(hdev);
 +}
 +
 +static void gaudi2_set_arc_running_mode(struct hl_device *hdev, u32 cpu_id, u32 run_mode)
 +{
 +      u32 reg_base, reg_val;
 +
 +      reg_base = gaudi2_arc_blocks_bases[cpu_id];
 +      if (run_mode == HL_ENGINE_CORE_RUN)
 +              reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_RUN_HALT_REQ_RUN_REQ_MASK, 1);
 +      else
 +              reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_RUN_HALT_REQ_HALT_REQ_MASK, 1);
 +
 +      WREG32(reg_base + ARC_HALT_REQ_OFFSET, reg_val);
 +}
 +
 +static void gaudi2_halt_arcs(struct hl_device *hdev)
 +{
 +      u16 arc_id;
 +
 +      for (arc_id = CPU_ID_SCHED_ARC0; arc_id < CPU_ID_MAX; arc_id++) {
 +              if (gaudi2_is_arc_enabled(hdev, arc_id))
 +                      gaudi2_set_arc_running_mode(hdev, arc_id, HL_ENGINE_CORE_HALT);
 +      }
 +}
 +
 +static int gaudi2_verify_arc_running_mode(struct hl_device *hdev, u32 cpu_id, u32 run_mode)
 +{
 +      int rc;
 +      u32 reg_base, val, ack_mask, timeout_usec = 100000;
 +
 +      if (hdev->pldm)
 +              timeout_usec *= 100;
 +
 +      reg_base = gaudi2_arc_blocks_bases[cpu_id];
 +      if (run_mode == HL_ENGINE_CORE_RUN)
 +              ack_mask = ARC_FARM_ARC0_AUX_RUN_HALT_ACK_RUN_ACK_MASK;
 +      else
 +              ack_mask = ARC_FARM_ARC0_AUX_RUN_HALT_ACK_HALT_ACK_MASK;
 +
 +      rc = hl_poll_timeout(hdev, reg_base + ARC_HALT_ACK_OFFSET,
 +                              val, ((val & ack_mask) == ack_mask),
 +                              1000, timeout_usec);
 +
 +      if (!rc) {
 +              /* Clear */
 +              val = FIELD_PREP(ARC_FARM_ARC0_AUX_RUN_HALT_REQ_RUN_REQ_MASK, 0);
 +              WREG32(reg_base + ARC_HALT_REQ_OFFSET, val);
 +      }
 +
 +      return rc;
 +}
 +
 +static void gaudi2_reset_arcs(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u16 arc_id;
 +
 +      if (!gaudi2)
 +              return;
 +
 +      for (arc_id = CPU_ID_SCHED_ARC0; arc_id < CPU_ID_MAX; arc_id++)
 +              if (gaudi2_is_arc_enabled(hdev, arc_id))
 +                      gaudi2_clr_arc_id_cap(hdev, arc_id);
 +}
 +
 +static void gaudi2_nic_qmans_manual_flush(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 queue_id;
 +      int i;
 +
 +      if (!(gaudi2->nic_hw_cap_initialized & HW_CAP_NIC_MASK))
 +              return;
 +
 +      queue_id = GAUDI2_QUEUE_ID_NIC_0_0;
 +
 +      for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
 +              if (!(hdev->nic_ports_mask & BIT(i)))
 +                      continue;
 +
 +              gaudi2_qman_manual_flush_common(hdev, queue_id);
 +      }
 +}
 +
 +static int gaudi2_set_engine_cores(struct hl_device *hdev, u32 *core_ids,
 +                                      u32 num_cores, u32 core_command)
 +{
 +      int i, rc;
 +
 +
 +      for (i = 0 ; i < num_cores ; i++) {
 +              if (gaudi2_is_arc_enabled(hdev, core_ids[i]))
 +                      gaudi2_set_arc_running_mode(hdev, core_ids[i], core_command);
 +      }
 +
 +      for (i = 0 ; i < num_cores ; i++) {
 +              if (gaudi2_is_arc_enabled(hdev, core_ids[i])) {
 +                      rc = gaudi2_verify_arc_running_mode(hdev, core_ids[i], core_command);
 +
 +                      if (rc) {
 +                              dev_err(hdev->dev, "failed to %s arc: %d\n",
 +                                      (core_command == HL_ENGINE_CORE_HALT) ?
 +                                      "HALT" : "RUN", core_ids[i]);
 +                              return -1;
 +                      }
 +              }
 +      }
 +
 +      return 0;
 +}
 +
 +static void gaudi2_halt_engines(struct hl_device *hdev, bool hard_reset, bool fw_reset)
 +{
 +      u32 wait_timeout_ms;
 +
 +      if (hdev->pldm)
 +              wait_timeout_ms = GAUDI2_PLDM_RESET_WAIT_MSEC;
 +      else
 +              wait_timeout_ms = GAUDI2_RESET_WAIT_MSEC;
 +
 +      if (fw_reset)
 +              goto skip_engines;
 +
 +      gaudi2_stop_dma_qmans(hdev);
 +      gaudi2_stop_mme_qmans(hdev);
 +      gaudi2_stop_tpc_qmans(hdev);
 +      gaudi2_stop_rot_qmans(hdev);
 +      gaudi2_stop_nic_qmans(hdev);
 +      msleep(wait_timeout_ms);
 +
 +      gaudi2_halt_arcs(hdev);
 +      gaudi2_dma_stall(hdev);
 +      gaudi2_mme_stall(hdev);
 +      gaudi2_tpc_stall(hdev);
 +      gaudi2_rotator_stall(hdev);
 +
 +      msleep(wait_timeout_ms);
 +
 +      gaudi2_stop_dec(hdev);
 +
 +      /*
 +       * in case of soft reset do a manual flush for QMANs (currently called
 +       * only for NIC QMANs
 +       */
 +      if (!hard_reset)
 +              gaudi2_nic_qmans_manual_flush(hdev);
 +
 +      gaudi2_disable_dma_qmans(hdev);
 +      gaudi2_disable_mme_qmans(hdev);
 +      gaudi2_disable_tpc_qmans(hdev);
 +      gaudi2_disable_rot_qmans(hdev);
 +      gaudi2_disable_nic_qmans(hdev);
 +      gaudi2_disable_timestamp(hdev);
 +
 +skip_engines:
 +      if (hard_reset) {
 +              gaudi2_disable_msix(hdev);
 +              return;
 +      }
 +
 +      gaudi2_sync_irqs(hdev);
 +}
 +
 +static void gaudi2_init_firmware_preload_params(struct hl_device *hdev)
 +{
 +      struct pre_fw_load_props *pre_fw_load = &hdev->fw_loader.pre_fw_load;
 +
 +      pre_fw_load->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
 +      pre_fw_load->sts_boot_dev_sts0_reg = mmCPU_BOOT_DEV_STS0;
 +      pre_fw_load->sts_boot_dev_sts1_reg = mmCPU_BOOT_DEV_STS1;
 +      pre_fw_load->boot_err0_reg = mmCPU_BOOT_ERR0;
 +      pre_fw_load->boot_err1_reg = mmCPU_BOOT_ERR1;
 +      pre_fw_load->wait_for_preboot_timeout = GAUDI2_PREBOOT_REQ_TIMEOUT_USEC;
 +}
 +
 +static void gaudi2_init_firmware_loader(struct hl_device *hdev)
 +{
 +      struct fw_load_mgr *fw_loader = &hdev->fw_loader;
 +      struct dynamic_fw_load_mgr *dynamic_loader;
 +      struct cpu_dyn_regs *dyn_regs;
 +
 +      /* fill common fields */
 +      fw_loader->fw_comp_loaded = FW_TYPE_NONE;
 +      fw_loader->boot_fit_img.image_name = GAUDI2_BOOT_FIT_FILE;
 +      fw_loader->linux_img.image_name = GAUDI2_LINUX_FW_FILE;
 +      fw_loader->boot_fit_timeout = GAUDI2_BOOT_FIT_REQ_TIMEOUT_USEC;
 +      fw_loader->skip_bmc = false;
 +      fw_loader->sram_bar_id = SRAM_CFG_BAR_ID;
 +      fw_loader->dram_bar_id = DRAM_BAR_ID;
 +      fw_loader->cpu_timeout = GAUDI2_CPU_TIMEOUT_USEC;
 +
 +      /* here we update initial values for few specific dynamic regs (as
 +       * before reading the first descriptor from FW those value has to be
 +       * hard-coded). in later stages of the protocol those values will be
 +       * updated automatically by reading the FW descriptor so data there
 +       * will always be up-to-date
 +       */
 +      dynamic_loader = &hdev->fw_loader.dynamic_loader;
 +      dyn_regs = &dynamic_loader->comm_desc.cpu_dyn_regs;
 +      dyn_regs->kmd_msg_to_cpu = cpu_to_le32(mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU);
 +      dyn_regs->cpu_cmd_status_to_host = cpu_to_le32(mmCPU_CMD_STATUS_TO_HOST);
 +      dynamic_loader->wait_for_bl_timeout = GAUDI2_WAIT_FOR_BL_TIMEOUT_USEC;
 +}
 +
 +static int gaudi2_init_cpu(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int rc;
 +
 +      if (!(hdev->fw_components & FW_TYPE_PREBOOT_CPU))
 +              return 0;
 +
 +      if (gaudi2->hw_cap_initialized & HW_CAP_CPU)
 +              return 0;
 +
 +      rc = hl_fw_init_cpu(hdev);
 +      if (rc)
 +              return rc;
 +
 +      gaudi2->hw_cap_initialized |= HW_CAP_CPU;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_init_cpu_queues(struct hl_device *hdev, u32 cpu_timeout)
 +{
 +      struct hl_hw_queue *cpu_pq = &hdev->kernel_queues[GAUDI2_QUEUE_ID_CPU_PQ];
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct cpu_dyn_regs *dyn_regs;
 +      struct hl_eq *eq;
 +      u32 status;
 +      int err;
 +
 +      if (!hdev->cpu_queues_enable)
 +              return 0;
 +
 +      if (gaudi2->hw_cap_initialized & HW_CAP_CPU_Q)
 +              return 0;
 +
 +      eq = &hdev->event_queue;
 +
 +      dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +
 +      WREG32(mmCPU_IF_PQ_BASE_ADDR_LOW, lower_32_bits(cpu_pq->bus_address));
 +      WREG32(mmCPU_IF_PQ_BASE_ADDR_HIGH, upper_32_bits(cpu_pq->bus_address));
 +
 +      WREG32(mmCPU_IF_EQ_BASE_ADDR_LOW, lower_32_bits(eq->bus_address));
 +      WREG32(mmCPU_IF_EQ_BASE_ADDR_HIGH, upper_32_bits(eq->bus_address));
 +
 +      WREG32(mmCPU_IF_CQ_BASE_ADDR_LOW, lower_32_bits(hdev->cpu_accessible_dma_address));
 +      WREG32(mmCPU_IF_CQ_BASE_ADDR_HIGH, upper_32_bits(hdev->cpu_accessible_dma_address));
 +
 +      WREG32(mmCPU_IF_PQ_LENGTH, HL_QUEUE_SIZE_IN_BYTES);
 +      WREG32(mmCPU_IF_EQ_LENGTH, HL_EQ_SIZE_IN_BYTES);
 +      WREG32(mmCPU_IF_CQ_LENGTH, HL_CPU_ACCESSIBLE_MEM_SIZE);
 +
 +      /* Used for EQ CI */
 +      WREG32(mmCPU_IF_EQ_RD_OFFS, 0);
 +
 +      WREG32(mmCPU_IF_PF_PQ_PI, 0);
 +
 +      WREG32(mmCPU_IF_QUEUE_INIT, PQ_INIT_STATUS_READY_FOR_CP);
 +
 +      /* Let the ARC know we are ready as it is now handling those queues  */
 +
 +      WREG32(le32_to_cpu(dyn_regs->gic_host_pi_upd_irq),
 +              gaudi2_irq_map_table[GAUDI2_EVENT_CPU_PI_UPDATE].cpu_id);
 +
 +      err = hl_poll_timeout(
 +              hdev,
 +              mmCPU_IF_QUEUE_INIT,
 +              status,
 +              (status == PQ_INIT_STATUS_READY_FOR_HOST),
 +              1000,
 +              cpu_timeout);
 +
 +      if (err) {
 +              dev_err(hdev->dev, "Failed to communicate with device CPU (timeout)\n");
 +              return -EIO;
 +      }
 +
 +      /* update FW application security bits */
 +      if (prop->fw_cpu_boot_dev_sts0_valid)
 +              prop->fw_app_cpu_boot_dev_sts0 = RREG32(mmCPU_BOOT_DEV_STS0);
 +
 +      if (prop->fw_cpu_boot_dev_sts1_valid)
 +              prop->fw_app_cpu_boot_dev_sts1 = RREG32(mmCPU_BOOT_DEV_STS1);
 +
 +      gaudi2->hw_cap_initialized |= HW_CAP_CPU_Q;
 +      return 0;
 +}
 +
 +static void gaudi2_init_qman_pq(struct hl_device *hdev, u32 reg_base,
 +                              u32 queue_id_base)
 +{
 +      struct hl_hw_queue *q;
 +      u32 pq_id, pq_offset;
 +
 +      for (pq_id = 0 ; pq_id < NUM_OF_PQ_PER_QMAN ; pq_id++) {
 +              q = &hdev->kernel_queues[queue_id_base + pq_id];
 +              pq_offset = pq_id * 4;
 +
 +              WREG32(reg_base + QM_PQ_BASE_LO_0_OFFSET + pq_offset,
 +                              lower_32_bits(q->bus_address));
 +              WREG32(reg_base + QM_PQ_BASE_HI_0_OFFSET + pq_offset,
 +                              upper_32_bits(q->bus_address));
 +              WREG32(reg_base + QM_PQ_SIZE_0_OFFSET + pq_offset, ilog2(HL_QUEUE_LENGTH));
 +              WREG32(reg_base + QM_PQ_PI_0_OFFSET + pq_offset, 0);
 +              WREG32(reg_base + QM_PQ_CI_0_OFFSET + pq_offset, 0);
 +      }
 +}
 +
 +static void gaudi2_init_qman_cp(struct hl_device *hdev, u32 reg_base)
 +{
 +      u32 cp_id, cp_offset, mtr_base_lo, mtr_base_hi, so_base_lo, so_base_hi;
 +
 +      mtr_base_lo = lower_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_hi = upper_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_lo = lower_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      for (cp_id = 0 ; cp_id < NUM_OF_CP_PER_QMAN; cp_id++) {
 +              cp_offset = cp_id * 4;
 +
 +              WREG32(reg_base + QM_CP_MSG_BASE0_ADDR_LO_0_OFFSET + cp_offset, mtr_base_lo);
 +              WREG32(reg_base + QM_CP_MSG_BASE0_ADDR_HI_0_OFFSET + cp_offset, mtr_base_hi);
 +              WREG32(reg_base + QM_CP_MSG_BASE1_ADDR_LO_0_OFFSET + cp_offset, so_base_lo);
 +              WREG32(reg_base + QM_CP_MSG_BASE1_ADDR_HI_0_OFFSET + cp_offset, so_base_hi);
 +      }
 +
 +      /* allow QMANs to accept work from ARC CQF */
 +      WREG32(reg_base + QM_CP_CFG_OFFSET, FIELD_PREP(PDMA0_QM_CP_CFG_SWITCH_EN_MASK, 0x1));
 +}
 +
 +static void gaudi2_init_qman_pqc(struct hl_device *hdev, u32 reg_base,
 +                              u32 queue_id_base)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 pq_id, pq_offset, so_base_lo, so_base_hi;
 +
 +      so_base_lo = lower_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      for (pq_id = 0 ; pq_id < NUM_OF_PQ_PER_QMAN ; pq_id++) {
 +              pq_offset = pq_id * 4;
 +
 +              /* Configure QMAN HBW to scratchpad as it is not needed */
 +              WREG32(reg_base + QM_PQC_HBW_BASE_LO_0_OFFSET + pq_offset,
 +                              lower_32_bits(gaudi2->scratchpad_bus_address));
 +              WREG32(reg_base + QM_PQC_HBW_BASE_HI_0_OFFSET + pq_offset,
 +                              upper_32_bits(gaudi2->scratchpad_bus_address));
 +              WREG32(reg_base + QM_PQC_SIZE_0_OFFSET + pq_offset,
 +                              ilog2(PAGE_SIZE / sizeof(struct hl_cq_entry)));
 +
 +              WREG32(reg_base + QM_PQC_PI_0_OFFSET + pq_offset, 0);
 +              WREG32(reg_base + QM_PQC_LBW_WDATA_0_OFFSET + pq_offset, QM_PQC_LBW_WDATA);
 +              WREG32(reg_base + QM_PQC_LBW_BASE_LO_0_OFFSET + pq_offset, so_base_lo);
 +              WREG32(reg_base + QM_PQC_LBW_BASE_HI_0_OFFSET + pq_offset, so_base_hi);
 +      }
 +
 +      /* Enable QMAN H/W completion */
 +      WREG32(reg_base + QM_PQC_CFG_OFFSET, 1 << PDMA0_QM_PQC_CFG_EN_SHIFT);
 +}
 +
 +static u32 gaudi2_get_dyn_sp_reg(struct hl_device *hdev, u32 queue_id_base)
 +{
 +      struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 sp_reg_addr;
 +
 +      switch (queue_id_base) {
 +      case GAUDI2_QUEUE_ID_PDMA_0_0...GAUDI2_QUEUE_ID_PDMA_1_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3:
 +              sp_reg_addr = le32_to_cpu(dyn_regs->gic_dma_qm_irq_ctrl);
 +              break;
 +      case GAUDI2_QUEUE_ID_DCORE0_MME_0_0...GAUDI2_QUEUE_ID_DCORE0_MME_0_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE1_MME_0_0...GAUDI2_QUEUE_ID_DCORE1_MME_0_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE2_MME_0_0...GAUDI2_QUEUE_ID_DCORE2_MME_0_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE3_MME_0_0...GAUDI2_QUEUE_ID_DCORE3_MME_0_3:
 +              sp_reg_addr = le32_to_cpu(dyn_regs->gic_mme_qm_irq_ctrl);
 +              break;
 +      case GAUDI2_QUEUE_ID_DCORE0_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE0_TPC_6_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE1_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE1_TPC_5_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE2_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE2_TPC_5_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE3_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE3_TPC_5_3:
 +              sp_reg_addr = le32_to_cpu(dyn_regs->gic_tpc_qm_irq_ctrl);
 +              break;
 +      case GAUDI2_QUEUE_ID_ROT_0_0...GAUDI2_QUEUE_ID_ROT_1_3:
 +              sp_reg_addr = le32_to_cpu(dyn_regs->gic_rot_qm_irq_ctrl);
 +              break;
 +      case GAUDI2_QUEUE_ID_NIC_0_0...GAUDI2_QUEUE_ID_NIC_23_3:
 +              sp_reg_addr = le32_to_cpu(dyn_regs->gic_nic_qm_irq_ctrl);
 +              break;
 +      default:
 +              dev_err(hdev->dev, "Unexpected h/w queue %d\n", queue_id_base);
 +              return 0;
 +      }
 +
 +      return sp_reg_addr;
 +}
 +
 +static void gaudi2_init_qman_common(struct hl_device *hdev, u32 reg_base,
 +                                      u32 queue_id_base)
 +{
 +      u32 glbl_prot = QMAN_MAKE_TRUSTED, irq_handler_offset;
 +      int map_table_entry;
 +
 +      WREG32(reg_base + QM_GLBL_PROT_OFFSET, glbl_prot);
 +
 +      irq_handler_offset = gaudi2_get_dyn_sp_reg(hdev, queue_id_base);
 +      WREG32(reg_base + QM_GLBL_ERR_ADDR_LO_OFFSET, lower_32_bits(CFG_BASE + irq_handler_offset));
 +      WREG32(reg_base + QM_GLBL_ERR_ADDR_HI_OFFSET, upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +      map_table_entry = gaudi2_qman_async_event_id[queue_id_base];
 +      WREG32(reg_base + QM_GLBL_ERR_WDATA_OFFSET,
 +              gaudi2_irq_map_table[map_table_entry].cpu_id);
 +
 +      WREG32(reg_base + QM_ARB_ERR_MSG_EN_OFFSET, QM_ARB_ERR_MSG_EN_MASK);
 +
 +      WREG32(reg_base + QM_ARB_SLV_CHOISE_WDT_OFFSET, GAUDI2_ARB_WDT_TIMEOUT);
 +      WREG32(reg_base + QM_GLBL_CFG1_OFFSET, 0);
 +      WREG32(reg_base + QM_GLBL_CFG2_OFFSET, 0);
 +
 +      /* Enable the QMAN channel.
 +       * PDMA QMAN configuration is different, as we do not allow user to
 +       * access some of the CPs.
 +       * PDMA0: CP2/3 are reserved for the ARC usage.
 +       * PDMA1: CP1/2/3 are reserved for the ARC usage.
 +       */
 +      if (reg_base == gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_PDMA_1_0])
 +              WREG32(reg_base + QM_GLBL_CFG0_OFFSET, PDMA1_QMAN_ENABLE);
 +      else if (reg_base == gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_PDMA_0_0])
 +              WREG32(reg_base + QM_GLBL_CFG0_OFFSET, PDMA0_QMAN_ENABLE);
 +      else
 +              WREG32(reg_base + QM_GLBL_CFG0_OFFSET, QMAN_ENABLE);
 +}
 +
 +static void gaudi2_init_qman(struct hl_device *hdev, u32 reg_base,
 +              u32 queue_id_base)
 +{
 +      u32 pq_id;
 +
 +      for (pq_id = 0 ; pq_id < NUM_OF_PQ_PER_QMAN ; pq_id++)
 +              hdev->kernel_queues[queue_id_base + pq_id].cq_id = GAUDI2_RESERVED_CQ_CS_COMPLETION;
 +
 +      gaudi2_init_qman_pq(hdev, reg_base, queue_id_base);
 +      gaudi2_init_qman_cp(hdev, reg_base);
 +      gaudi2_init_qman_pqc(hdev, reg_base, queue_id_base);
 +      gaudi2_init_qman_common(hdev, reg_base, queue_id_base);
 +}
 +
 +static void gaudi2_init_dma_core(struct hl_device *hdev, u32 reg_base,
 +                              u32 dma_core_id, bool is_secure)
 +{
 +      u32 prot, irq_handler_offset;
 +      struct cpu_dyn_regs *dyn_regs;
 +      int map_table_entry;
 +
 +      prot = 1 << ARC_FARM_KDMA_PROT_ERR_VAL_SHIFT;
 +      if (is_secure)
 +              prot |= 1 << ARC_FARM_KDMA_PROT_VAL_SHIFT;
 +
 +      WREG32(reg_base + DMA_CORE_PROT_OFFSET, prot);
 +
 +      dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      irq_handler_offset = le32_to_cpu(dyn_regs->gic_dma_core_irq_ctrl);
 +
 +      WREG32(reg_base + DMA_CORE_ERRMSG_ADDR_LO_OFFSET,
 +                      lower_32_bits(CFG_BASE + irq_handler_offset));
 +
 +      WREG32(reg_base + DMA_CORE_ERRMSG_ADDR_HI_OFFSET,
 +                      upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +      map_table_entry = gaudi2_dma_core_async_event_id[dma_core_id];
 +      WREG32(reg_base + DMA_CORE_ERRMSG_WDATA_OFFSET,
 +              gaudi2_irq_map_table[map_table_entry].cpu_id);
 +
 +      /* Enable the DMA channel */
 +      WREG32(reg_base + DMA_CORE_CFG_0_OFFSET, 1 << ARC_FARM_KDMA_CFG_0_EN_SHIFT);
 +}
 +
 +static void gaudi2_init_kdma(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base;
 +
 +      if ((gaudi2->hw_cap_initialized & HW_CAP_KDMA) == HW_CAP_KDMA)
 +              return;
 +
 +      reg_base = gaudi2_dma_core_blocks_bases[DMA_CORE_ID_KDMA];
 +
 +      gaudi2_init_dma_core(hdev, reg_base, DMA_CORE_ID_KDMA, true);
 +
 +      gaudi2->hw_cap_initialized |= HW_CAP_KDMA;
 +}
 +
 +static void gaudi2_init_pdma(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base;
 +
 +      if ((gaudi2->hw_cap_initialized & HW_CAP_PDMA_MASK) == HW_CAP_PDMA_MASK)
 +              return;
 +
 +      reg_base = gaudi2_dma_core_blocks_bases[DMA_CORE_ID_PDMA0];
 +      gaudi2_init_dma_core(hdev, reg_base, DMA_CORE_ID_PDMA0, false);
 +
 +      reg_base = gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_PDMA_0_0];
 +      gaudi2_init_qman(hdev, reg_base, GAUDI2_QUEUE_ID_PDMA_0_0);
 +
 +      reg_base = gaudi2_dma_core_blocks_bases[DMA_CORE_ID_PDMA1];
 +      gaudi2_init_dma_core(hdev, reg_base, DMA_CORE_ID_PDMA1, false);
 +
 +      reg_base = gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_PDMA_1_0];
 +      gaudi2_init_qman(hdev, reg_base, GAUDI2_QUEUE_ID_PDMA_1_0);
 +
 +      gaudi2->hw_cap_initialized |= HW_CAP_PDMA_MASK;
 +}
 +
 +static void gaudi2_init_edma_instance(struct hl_device *hdev, u8 seq)
 +{
 +      u32 reg_base, base_edma_core_id, base_edma_qman_id;
 +
 +      base_edma_core_id = DMA_CORE_ID_EDMA0 + seq;
 +      base_edma_qman_id = edma_stream_base[seq];
 +
 +      reg_base = gaudi2_dma_core_blocks_bases[base_edma_core_id];
 +      gaudi2_init_dma_core(hdev, reg_base, base_edma_core_id, false);
 +
 +      reg_base = gaudi2_qm_blocks_bases[base_edma_qman_id];
 +      gaudi2_init_qman(hdev, reg_base, base_edma_qman_id);
 +}
 +
 +static void gaudi2_init_edma(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int dcore, inst;
 +
 +      if ((gaudi2->hw_cap_initialized & HW_CAP_EDMA_MASK) == HW_CAP_EDMA_MASK)
 +              return;
 +
 +      for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
 +              for (inst = 0 ; inst < NUM_OF_EDMA_PER_DCORE ; inst++) {
 +                      u8 seq = dcore * NUM_OF_EDMA_PER_DCORE + inst;
 +
 +                      if (!(prop->edma_enabled_mask & BIT(seq)))
 +                              continue;
 +
 +                      gaudi2_init_edma_instance(hdev, seq);
 +
 +                      gaudi2->hw_cap_initialized |= BIT_ULL(HW_CAP_EDMA_SHIFT + seq);
 +              }
 +      }
 +}
 +
 +/*
 + * gaudi2_arm_monitors_for_virt_msix_db() - Arm monitors for writing to the virtual MSI-X doorbell.
 + * @hdev: pointer to habanalabs device structure.
 + * @sob_id: sync object ID.
 + * @first_mon_id: ID of first monitor out of 3 consecutive monitors.
 + * @interrupt_id: interrupt ID.
 + *
 + * Some initiators cannot have HBW address in their completion address registers, and thus cannot
 + * write directly to the HBW host memory of the virtual MSI-X doorbell.
 + * Instead, they are configured to LBW write to a sync object, and a monitor will do the HBW write.
 + *
 + * The mechanism in the sync manager block is composed of a master monitor with 3 messages.
 + * In addition to the HBW write, the other 2 messages are for preparing the monitor to next
 + * completion, by decrementing the sync object value and re-arming the monitor.
 + */
 +static void gaudi2_arm_monitors_for_virt_msix_db(struct hl_device *hdev, u32 sob_id,
 +                                                      u32 first_mon_id, u32 interrupt_id)
 +{
 +      u32 sob_offset, first_mon_offset, mon_offset, payload, sob_group, mode, arm, config;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u64 addr;
 +      u8 mask;
 +
 +      /* Reset the SOB value */
 +      sob_offset = sob_id * sizeof(u32);
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset, 0);
 +
 +      /* Configure 3 monitors:
 +       * 1. Write interrupt ID to the virtual MSI-X doorbell (master monitor)
 +       * 2. Decrement SOB value by 1.
 +       * 3. Re-arm the master monitor.
 +       */
 +
 +      first_mon_offset = first_mon_id * sizeof(u32);
 +
 +      /* 2nd monitor: Decrement SOB value by 1 */
 +      mon_offset = first_mon_offset + sizeof(u32);
 +
 +      addr = CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset;
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_offset, lower_32_bits(addr));
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_offset, upper_32_bits(addr));
 +
 +      payload = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_VAL_MASK, 0x7FFF) | /* "-1" */
 +                      FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_SIGN_MASK, 1) |
 +                      FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_INC_MASK, 1);
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_offset, payload);
 +
 +      /* 3rd monitor: Re-arm the master monitor */
 +      mon_offset = first_mon_offset + 2 * sizeof(u32);
 +
 +      addr = CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_MON_ARM_0 + first_mon_offset;
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_offset, lower_32_bits(addr));
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_offset, upper_32_bits(addr));
 +
 +      sob_group = sob_id / 8;
 +      mask = ~BIT(sob_id & 0x7);
 +      mode = 0; /* comparison mode is "greater than or equal to" */
 +      arm = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SID_MASK, sob_group) |
 +                      FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_MASK_MASK, mask) |
 +                      FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SOP_MASK, mode) |
 +                      FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SOD_MASK, 1);
 +
 +      payload = arm;
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_offset, payload);
 +
 +      /* 1st monitor (master): Write interrupt ID to the virtual MSI-X doorbell */
 +      mon_offset = first_mon_offset;
 +
 +      config = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_CONFIG_WR_NUM_MASK, 2); /* "2": 3 writes */
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + mon_offset, config);
 +
 +      addr = gaudi2->virt_msix_db_dma_addr;
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_offset, lower_32_bits(addr));
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_offset, upper_32_bits(addr));
 +
 +      payload = interrupt_id;
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_offset, payload);
 +
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_ARM_0 + mon_offset, arm);
 +}
 +
 +static void gaudi2_prepare_sm_for_virt_msix_db(struct hl_device *hdev)
 +{
 +      u32 decoder_id, sob_id, first_mon_id, interrupt_id;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +
 +      /* Decoder normal/abnormal interrupts */
 +      for (decoder_id = 0 ; decoder_id < NUMBER_OF_DEC ; ++decoder_id) {
 +              if (!(prop->decoder_enabled_mask & BIT(decoder_id)))
 +                      continue;
 +
 +              sob_id = GAUDI2_RESERVED_SOB_DEC_NRM_FIRST + decoder_id;
 +              first_mon_id = GAUDI2_RESERVED_MON_DEC_NRM_FIRST + 3 * decoder_id;
 +              interrupt_id = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM + 2 * decoder_id;
 +              gaudi2_arm_monitors_for_virt_msix_db(hdev, sob_id, first_mon_id, interrupt_id);
 +
 +              sob_id = GAUDI2_RESERVED_SOB_DEC_ABNRM_FIRST + decoder_id;
 +              first_mon_id = GAUDI2_RESERVED_MON_DEC_ABNRM_FIRST + 3 * decoder_id;
 +              interrupt_id += 1;
 +              gaudi2_arm_monitors_for_virt_msix_db(hdev, sob_id, first_mon_id, interrupt_id);
 +      }
 +}
 +
 +static void gaudi2_init_sm(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u64 cq_address;
 +      u32 reg_val;
 +      int i;
 +
 +      /* Enable HBW/LBW CQ for completion monitors */
 +      reg_val = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_CONFIG_CQ_EN_MASK, 1);
 +      reg_val |= FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_CONFIG_LBW_EN_MASK, 1);
 +
 +      for (i = 0 ; i < GAUDI2_MAX_PENDING_CS ; i++)
 +              WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + (4 * i), reg_val);
 +
 +      /* Enable only HBW CQ for KDMA completion monitor */
 +      reg_val = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_CONFIG_CQ_EN_MASK, 1);
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + (4 * i), reg_val);
 +
 +      /* Init CQ0 DB - configure the monitor to trigger MSI-X interrupt */
 +      WREG32(mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0, lower_32_bits(gaudi2->virt_msix_db_dma_addr));
 +      WREG32(mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0, upper_32_bits(gaudi2->virt_msix_db_dma_addr));
 +      WREG32(mmDCORE0_SYNC_MNGR_GLBL_LBW_DATA_0, GAUDI2_IRQ_NUM_COMPLETION);
 +
 +      for (i = 0 ; i < GAUDI2_RESERVED_CQ_NUMBER ; i++) {
 +              cq_address =
 +                      hdev->completion_queue[i].bus_address;
 +
 +              WREG32(mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_0 + (4 * i),
 +                                                      lower_32_bits(cq_address));
 +              WREG32(mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_0 + (4 * i),
 +                                                      upper_32_bits(cq_address));
 +              WREG32(mmDCORE0_SYNC_MNGR_GLBL_CQ_SIZE_LOG2_0 + (4 * i),
 +                                                      ilog2(HL_CQ_SIZE_IN_BYTES));
 +      }
 +
 +      /* Configure kernel ASID and MMU BP*/
 +      WREG32(mmDCORE0_SYNC_MNGR_GLBL_ASID_SEC, 0x10000);
 +      WREG32(mmDCORE0_SYNC_MNGR_GLBL_ASID_NONE_SEC_PRIV, 0);
 +
 +      /* Initialize sync objects and monitors which are used for the virtual MSI-X doorbell */
 +      gaudi2_prepare_sm_for_virt_msix_db(hdev);
 +}
 +
 +static void gaudi2_init_mme_acc(struct hl_device *hdev, u32 reg_base)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_val;
 +      int i;
 +
 +      reg_val = FIELD_PREP(MME_ACC_INTR_MASK_WBC_ERR_RESP_MASK, 0);
 +      reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_SRC_POS_INF_MASK, 1);
 +      reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_SRC_NEG_INF_MASK, 1);
 +      reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_SRC_NAN_MASK, 1);
 +      reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_RESULT_POS_INF_MASK, 1);
 +      reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_RESULT_NEG_INF_MASK, 1);
 +
 +      WREG32(reg_base + MME_ACC_INTR_MASK_OFFSET, reg_val);
 +      WREG32(reg_base + MME_ACC_AP_LFSR_POLY_OFFSET, 0x80DEADAF);
 +
 +      for (i = 0 ; i < MME_NUM_OF_LFSR_SEEDS ; i++) {
 +              WREG32(reg_base + MME_ACC_AP_LFSR_SEED_SEL_OFFSET, i);
 +              WREG32(reg_base + MME_ACC_AP_LFSR_SEED_WDATA_OFFSET, gaudi2->lfsr_rand_seeds[i]);
 +      }
 +}
 +
 +static void gaudi2_init_dcore_mme(struct hl_device *hdev, int dcore_id,
 +                                                      bool config_qman_only)
 +{
 +      u32 queue_id_base, reg_base;
 +
 +      switch (dcore_id) {
 +      case 0:
 +              queue_id_base = GAUDI2_QUEUE_ID_DCORE0_MME_0_0;
 +              break;
 +      case 1:
 +              queue_id_base = GAUDI2_QUEUE_ID_DCORE1_MME_0_0;
 +              break;
 +      case 2:
 +              queue_id_base = GAUDI2_QUEUE_ID_DCORE2_MME_0_0;
 +              break;
 +      case 3:
 +              queue_id_base = GAUDI2_QUEUE_ID_DCORE3_MME_0_0;
 +              break;
 +      default:
 +              dev_err(hdev->dev, "Invalid dcore id %u\n", dcore_id);
 +              return;
 +      }
 +
 +      if (!config_qman_only) {
 +              reg_base = gaudi2_mme_acc_blocks_bases[dcore_id];
 +              gaudi2_init_mme_acc(hdev, reg_base);
 +      }
 +
 +      reg_base = gaudi2_qm_blocks_bases[queue_id_base];
 +      gaudi2_init_qman(hdev, reg_base, queue_id_base);
 +}
 +
 +static void gaudi2_init_mme(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int i;
 +
 +      if ((gaudi2->hw_cap_initialized & HW_CAP_MME_MASK) == HW_CAP_MME_MASK)
 +              return;
 +
 +      for (i = 0 ; i < NUM_OF_DCORES ; i++) {
 +              gaudi2_init_dcore_mme(hdev, i, false);
 +
 +              gaudi2->hw_cap_initialized |= BIT_ULL(HW_CAP_MME_SHIFT + i);
 +      }
 +}
 +
 +static void gaudi2_init_tpc_cfg(struct hl_device *hdev, u32 reg_base)
 +{
 +      /* Mask arithmetic and QM interrupts in TPC */
 +      WREG32(reg_base + TPC_CFG_TPC_INTR_MASK_OFFSET, 0x23FFFE);
 +
 +      /* Set 16 cache lines */
 +      WREG32(reg_base + TPC_CFG_MSS_CONFIG_OFFSET,
 +                      2 << DCORE0_TPC0_CFG_MSS_CONFIG_ICACHE_FETCH_LINE_NUM_SHIFT);
 +}
 +
 +struct gaudi2_tpc_init_cfg_data {
 +      enum gaudi2_queue_id dcore_tpc_qid_base[NUM_OF_DCORES];
 +};
 +
 +static void gaudi2_init_tpc_config(struct hl_device *hdev, int dcore, int inst,
 +                                      u32 offset, struct iterate_module_ctx *ctx)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct gaudi2_tpc_init_cfg_data *cfg_data = ctx->data;
 +      u32 queue_id_base;
 +      u8 seq;
 +
 +      queue_id_base = cfg_data->dcore_tpc_qid_base[dcore] + (inst * NUM_OF_PQ_PER_QMAN);
 +
 +      if (dcore == 0 && inst == (NUM_DCORE0_TPC - 1))
 +              /* gets last sequence number */
 +              seq = NUM_OF_DCORES * NUM_OF_TPC_PER_DCORE;
 +      else
 +              seq = dcore * NUM_OF_TPC_PER_DCORE + inst;
 +
 +      gaudi2_init_tpc_cfg(hdev, mmDCORE0_TPC0_CFG_BASE + offset);
 +      gaudi2_init_qman(hdev, mmDCORE0_TPC0_QM_BASE + offset, queue_id_base);
 +
 +      gaudi2->tpc_hw_cap_initialized |= BIT_ULL(HW_CAP_TPC_SHIFT + seq);
 +}
 +
 +static void gaudi2_init_tpc(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct gaudi2_tpc_init_cfg_data init_cfg_data;
 +      struct iterate_module_ctx tpc_iter;
 +
 +      if (!hdev->asic_prop.tpc_enabled_mask)
 +              return;
 +
 +      if ((gaudi2->tpc_hw_cap_initialized & HW_CAP_TPC_MASK) == HW_CAP_TPC_MASK)
 +              return;
 +
 +      init_cfg_data.dcore_tpc_qid_base[0] = GAUDI2_QUEUE_ID_DCORE0_TPC_0_0;
 +      init_cfg_data.dcore_tpc_qid_base[1] = GAUDI2_QUEUE_ID_DCORE1_TPC_0_0;
 +      init_cfg_data.dcore_tpc_qid_base[2] = GAUDI2_QUEUE_ID_DCORE2_TPC_0_0;
 +      init_cfg_data.dcore_tpc_qid_base[3] = GAUDI2_QUEUE_ID_DCORE3_TPC_0_0;
 +      tpc_iter.fn = &gaudi2_init_tpc_config;
 +      tpc_iter.data = &init_cfg_data;
 +      gaudi2_iterate_tpcs(hdev, &tpc_iter);
 +}
 +
 +static void gaudi2_init_rotator(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 i, reg_base, queue_id;
 +
 +      queue_id = GAUDI2_QUEUE_ID_ROT_0_0;
 +
 +      for (i = 0 ; i < NUM_OF_ROT ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
 +              reg_base = gaudi2_qm_blocks_bases[queue_id];
 +              gaudi2_init_qman(hdev, reg_base, queue_id);
 +
 +              gaudi2->hw_cap_initialized |= BIT_ULL(HW_CAP_ROT_SHIFT + i);
 +      }
 +}
 +
 +static void gaudi2_init_vdec_brdg_ctrl(struct hl_device *hdev, u64 base_addr, u32 decoder_id)
 +{
 +      u32 sob_id;
 +
 +      /* VCMD normal interrupt */
 +      sob_id = GAUDI2_RESERVED_SOB_DEC_NRM_FIRST + decoder_id;
 +      WREG32(base_addr + BRDG_CTRL_NRM_MSIX_LBW_AWADDR,
 +                      mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_id * sizeof(u32));
 +      WREG32(base_addr + BRDG_CTRL_NRM_MSIX_LBW_WDATA, GAUDI2_SOB_INCREMENT_BY_ONE);
 +
 +      /* VCMD abnormal interrupt */
 +      sob_id = GAUDI2_RESERVED_SOB_DEC_ABNRM_FIRST + decoder_id;
 +      WREG32(base_addr + BRDG_CTRL_ABNRM_MSIX_LBW_AWADDR,
 +                      mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_id * sizeof(u32));
 +      WREG32(base_addr + BRDG_CTRL_ABNRM_MSIX_LBW_WDATA, GAUDI2_SOB_INCREMENT_BY_ONE);
 +}
 +
 +static void gaudi2_init_dec(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 dcore_id, dec_id, dec_bit;
 +      u64 base_addr;
 +
 +      if (!hdev->asic_prop.decoder_enabled_mask)
 +              return;
 +
 +      if ((gaudi2->dec_hw_cap_initialized & HW_CAP_DEC_MASK) == HW_CAP_DEC_MASK)
 +              return;
 +
 +      for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++)
 +              for (dec_id = 0 ; dec_id < NUM_OF_DEC_PER_DCORE ; dec_id++) {
 +                      dec_bit = dcore_id * NUM_OF_DEC_PER_DCORE + dec_id;
 +
 +                      if (!(hdev->asic_prop.decoder_enabled_mask & BIT(dec_bit)))
 +                              continue;
 +
 +                      base_addr =  mmDCORE0_DEC0_CMD_BASE +
 +                                      BRDG_CTRL_BLOCK_OFFSET +
 +                                      dcore_id * DCORE_OFFSET +
 +                                      dec_id * DCORE_VDEC_OFFSET;
 +
 +                      gaudi2_init_vdec_brdg_ctrl(hdev, base_addr, dec_bit);
 +
 +                      gaudi2->dec_hw_cap_initialized |= BIT_ULL(HW_CAP_DEC_SHIFT + dec_bit);
 +              }
 +
 +      for (dec_id = 0 ; dec_id < NUM_OF_PCIE_VDEC ; dec_id++) {
 +              dec_bit = PCIE_DEC_SHIFT + dec_id;
 +              if (!(hdev->asic_prop.decoder_enabled_mask & BIT(dec_bit)))
 +                      continue;
 +
 +              base_addr = mmPCIE_DEC0_CMD_BASE + BRDG_CTRL_BLOCK_OFFSET +
 +                              dec_id * DCORE_VDEC_OFFSET;
 +
 +              gaudi2_init_vdec_brdg_ctrl(hdev, base_addr, dec_bit);
 +
 +              gaudi2->dec_hw_cap_initialized |= BIT_ULL(HW_CAP_DEC_SHIFT + dec_bit);
 +      }
 +}
 +
 +static int gaudi2_mmu_update_asid_hop0_addr(struct hl_device *hdev,
 +                                      u32 stlb_base, u32 asid, u64 phys_addr)
 +{
 +      u32 status, timeout_usec;
 +      int rc;
 +
 +      if (hdev->pldm || !hdev->pdev)
 +              timeout_usec = GAUDI2_PLDM_MMU_TIMEOUT_USEC;
 +      else
 +              timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
 +
 +      WREG32(stlb_base + STLB_ASID_OFFSET, asid);
 +      WREG32(stlb_base + STLB_HOP0_PA43_12_OFFSET, phys_addr >> MMU_HOP0_PA43_12_SHIFT);
 +      WREG32(stlb_base + STLB_HOP0_PA63_44_OFFSET, phys_addr >> MMU_HOP0_PA63_44_SHIFT);
 +      WREG32(stlb_base + STLB_BUSY_OFFSET, 0x80000000);
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              stlb_base + STLB_BUSY_OFFSET,
 +              status,
 +              !(status & 0x80000000),
 +              1000,
 +              timeout_usec);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "Timeout during MMU hop0 config of asid %d\n", asid);
 +              return rc;
 +      }
 +
 +      return 0;
 +}
 +
 +static void gaudi2_mmu_send_invalidate_cache_cmd(struct hl_device *hdev, u32 stlb_base,
 +                                      u32 start_offset, u32 inv_start_val,
 +                                      u32 flags)
 +{
 +      /* clear PMMU mem line cache (only needed in mmu range invalidation) */
 +      if (flags & MMU_OP_CLEAR_MEMCACHE)
 +              WREG32(mmPMMU_HBW_STLB_MEM_CACHE_INVALIDATION, 0x1);
 +
 +      if (flags & MMU_OP_SKIP_LOW_CACHE_INV)
 +              return;
 +
 +      WREG32(stlb_base + start_offset, inv_start_val);
 +}
 +
 +static int gaudi2_mmu_invalidate_cache_status_poll(struct hl_device *hdev, u32 stlb_base,
 +                                              struct gaudi2_cache_invld_params *inv_params)
 +{
 +      u32 status, timeout_usec, start_offset;
 +      int rc;
 +
 +      timeout_usec = (hdev->pldm) ? GAUDI2_PLDM_MMU_TIMEOUT_USEC :
 +                                      GAUDI2_MMU_CACHE_INV_TIMEOUT_USEC;
 +
 +      /* poll PMMU mem line cache (only needed in mmu range invalidation) */
 +      if (inv_params->flags & MMU_OP_CLEAR_MEMCACHE) {
 +              rc = hl_poll_timeout(
 +                      hdev,
 +                      mmPMMU_HBW_STLB_MEM_CACHE_INV_STATUS,
 +                      status,
 +                      status & 0x1,
 +                      1000,
 +                      timeout_usec);
 +
 +              if (rc)
 +                      return rc;
 +
 +              /* Need to manually reset the status to 0 */
 +              WREG32(mmPMMU_HBW_STLB_MEM_CACHE_INV_STATUS, 0x0);
 +      }
 +
 +      /* Lower cache does not work with cache lines, hence we can skip its
 +       * invalidation upon map and invalidate only upon unmap
 +       */
 +      if (inv_params->flags & MMU_OP_SKIP_LOW_CACHE_INV)
 +              return 0;
 +
 +      start_offset = inv_params->range_invalidation ?
 +                      STLB_RANGE_CACHE_INVALIDATION_OFFSET : STLB_INV_ALL_START_OFFSET;
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              stlb_base + start_offset,
 +              status,
 +              !(status & 0x1),
 +              1000,
 +              timeout_usec);
 +
 +      return rc;
 +}
 +
 +bool gaudi2_is_hmmu_enabled(struct hl_device *hdev, int dcore_id, int hmmu_id)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 hw_cap;
 +
 +      hw_cap = HW_CAP_DCORE0_DMMU0 << (NUM_OF_HMMU_PER_DCORE * dcore_id + hmmu_id);
 +
 +      if (gaudi2->hw_cap_initialized & hw_cap)
 +              return true;
 +
 +      return false;
 +}
 +
 +/* this function shall be called only for HMMUs for which capability bit is set */
 +static inline u32 get_hmmu_stlb_base(int dcore_id, int hmmu_id)
 +{
 +      u32 offset;
 +
 +      offset =  (u32) (dcore_id * DCORE_OFFSET + hmmu_id * DCORE_HMMU_OFFSET);
 +      return (u32)(mmDCORE0_HMMU0_STLB_BASE + offset);
 +}
 +
 +static void gaudi2_mmu_invalidate_cache_trigger(struct hl_device *hdev, u32 stlb_base,
 +                                              struct gaudi2_cache_invld_params *inv_params)
 +{
 +      u32 start_offset;
 +
 +      if (inv_params->range_invalidation) {
 +              /* Set the addresses range
 +               * Note: that the start address we set in register, is not included in
 +               * the range of the invalidation, by design.
 +               * that's why we need to set lower address than the one we actually
 +               * want to be included in the range invalidation.
 +               */
 +              u64 start = inv_params->start_va - 1;
 +
 +              start_offset = STLB_RANGE_CACHE_INVALIDATION_OFFSET;
 +
 +              WREG32(stlb_base + STLB_RANGE_INV_START_LSB_OFFSET,
 +                              start >> MMU_RANGE_INV_VA_LSB_SHIFT);
 +
 +              WREG32(stlb_base + STLB_RANGE_INV_START_MSB_OFFSET,
 +                              start >> MMU_RANGE_INV_VA_MSB_SHIFT);
 +
 +              WREG32(stlb_base + STLB_RANGE_INV_END_LSB_OFFSET,
 +                              inv_params->end_va >> MMU_RANGE_INV_VA_LSB_SHIFT);
 +
 +              WREG32(stlb_base + STLB_RANGE_INV_END_MSB_OFFSET,
 +                              inv_params->end_va >> MMU_RANGE_INV_VA_MSB_SHIFT);
 +      } else {
 +              start_offset = STLB_INV_ALL_START_OFFSET;
 +      }
 +
 +      gaudi2_mmu_send_invalidate_cache_cmd(hdev, stlb_base, start_offset,
 +                                              inv_params->inv_start_val, inv_params->flags);
 +}
 +
 +static inline void gaudi2_hmmu_invalidate_cache_trigger(struct hl_device *hdev,
 +                                              int dcore_id, int hmmu_id,
 +                                              struct gaudi2_cache_invld_params *inv_params)
 +{
 +      u32 stlb_base = get_hmmu_stlb_base(dcore_id, hmmu_id);
 +
 +      gaudi2_mmu_invalidate_cache_trigger(hdev, stlb_base, inv_params);
 +}
 +
 +static inline int gaudi2_hmmu_invalidate_cache_status_poll(struct hl_device *hdev,
 +                                              int dcore_id, int hmmu_id,
 +                                              struct gaudi2_cache_invld_params *inv_params)
 +{
 +      u32 stlb_base = get_hmmu_stlb_base(dcore_id, hmmu_id);
 +
 +      return gaudi2_mmu_invalidate_cache_status_poll(hdev, stlb_base, inv_params);
 +}
 +
 +static int gaudi2_hmmus_invalidate_cache(struct hl_device *hdev,
 +                                              struct gaudi2_cache_invld_params *inv_params)
 +{
 +      int dcore_id, hmmu_id;
 +
 +      /* first send all invalidation commands */
 +      for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
 +              for (hmmu_id = 0 ; hmmu_id < NUM_OF_HMMU_PER_DCORE ; hmmu_id++) {
 +                      if (!gaudi2_is_hmmu_enabled(hdev, dcore_id, hmmu_id))
 +                              continue;
 +
 +                      gaudi2_hmmu_invalidate_cache_trigger(hdev, dcore_id, hmmu_id, inv_params);
 +              }
 +      }
 +
 +      /* next, poll all invalidations status */
 +      for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
 +              for (hmmu_id = 0 ; hmmu_id < NUM_OF_HMMU_PER_DCORE ; hmmu_id++) {
 +                      int rc;
 +
 +                      if (!gaudi2_is_hmmu_enabled(hdev, dcore_id, hmmu_id))
 +                              continue;
 +
 +                      rc = gaudi2_hmmu_invalidate_cache_status_poll(hdev, dcore_id, hmmu_id,
 +                                                                              inv_params);
 +                      if (rc)
 +                              return rc;
 +              }
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_mmu_invalidate_cache(struct hl_device *hdev, bool is_hard, u32 flags)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct gaudi2_cache_invld_params invld_params;
 +      int rc = 0;
 +
 +      if (hdev->reset_info.hard_reset_pending)
 +              return rc;
 +
 +      invld_params.range_invalidation = false;
 +      invld_params.inv_start_val = 1;
 +
 +      if ((flags & MMU_OP_USERPTR) && (gaudi2->hw_cap_initialized & HW_CAP_PMMU)) {
 +              invld_params.flags = flags;
 +              gaudi2_mmu_invalidate_cache_trigger(hdev, mmPMMU_HBW_STLB_BASE, &invld_params);
 +              rc = gaudi2_mmu_invalidate_cache_status_poll(hdev, mmPMMU_HBW_STLB_BASE,
 +                                                                              &invld_params);
 +      } else if (flags & MMU_OP_PHYS_PACK) {
 +              invld_params.flags = 0;
 +              rc = gaudi2_hmmus_invalidate_cache(hdev, &invld_params);
 +      }
 +
 +      return rc;
 +}
 +
 +static int gaudi2_mmu_invalidate_cache_range(struct hl_device *hdev, bool is_hard,
 +                              u32 flags, u32 asid, u64 va, u64 size)
 +{
 +      struct gaudi2_cache_invld_params invld_params = {0};
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u64 start_va, end_va;
 +      u32 inv_start_val;
 +      int rc = 0;
 +
 +      if (hdev->reset_info.hard_reset_pending)
 +              return 0;
 +
 +      inv_start_val = (1 << MMU_RANGE_INV_EN_SHIFT |
 +                      1 << MMU_RANGE_INV_ASID_EN_SHIFT |
 +                      asid << MMU_RANGE_INV_ASID_SHIFT);
 +      start_va = va;
 +      end_va = start_va + size;
 +
 +      if ((flags & MMU_OP_USERPTR) && (gaudi2->hw_cap_initialized & HW_CAP_PMMU)) {
 +              /* As range invalidation does not support zero address we will
 +               * do full invalidation in this case
 +               */
 +              if (start_va) {
 +                      invld_params.range_invalidation = true;
 +                      invld_params.start_va = start_va;
 +                      invld_params.end_va = end_va;
 +                      invld_params.inv_start_val = inv_start_val;
 +                      invld_params.flags = flags | MMU_OP_CLEAR_MEMCACHE;
 +              } else {
 +                      invld_params.range_invalidation = false;
 +                      invld_params.inv_start_val = 1;
 +                      invld_params.flags = flags;
 +              }
 +
 +
 +              gaudi2_mmu_invalidate_cache_trigger(hdev, mmPMMU_HBW_STLB_BASE, &invld_params);
 +              rc = gaudi2_mmu_invalidate_cache_status_poll(hdev, mmPMMU_HBW_STLB_BASE,
 +                                                                              &invld_params);
 +              if (rc)
 +                      return rc;
 +
 +      } else if (flags & MMU_OP_PHYS_PACK) {
 +              invld_params.start_va = gaudi2_mmu_scramble_addr(hdev, start_va);
 +              invld_params.end_va = gaudi2_mmu_scramble_addr(hdev, end_va);
 +              invld_params.inv_start_val = inv_start_val;
 +              invld_params.flags = flags;
 +              rc = gaudi2_hmmus_invalidate_cache(hdev, &invld_params);
 +      }
 +
 +      return rc;
 +}
 +
 +static int gaudi2_mmu_update_hop0_addr(struct hl_device *hdev, u32 stlb_base)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 hop0_addr;
 +      u32 asid, max_asid = prop->max_asid;
 +      int rc;
 +
 +      /* it takes too much time to init all of the ASIDs on palladium */
 +      if (hdev->pldm)
 +              max_asid = min((u32) 8, max_asid);
 +
 +      for (asid = 0 ; asid < max_asid ; asid++) {
 +              hop0_addr = hdev->mmu_priv.hr.mmu_asid_hop0[asid].phys_addr;
 +              rc = gaudi2_mmu_update_asid_hop0_addr(hdev, stlb_base, asid, hop0_addr);
 +              if (rc) {
 +                      dev_err(hdev->dev, "failed to set hop0 addr for asid %d\n", asid);
 +                      return rc;
 +              }
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_mmu_init_common(struct hl_device *hdev, u32 mmu_base, u32 stlb_base)
 +{
 +      u32 status, timeout_usec;
 +      int rc;
 +
 +      if (hdev->pldm || !hdev->pdev)
 +              timeout_usec = GAUDI2_PLDM_MMU_TIMEOUT_USEC;
 +      else
 +              timeout_usec = GAUDI2_MMU_CACHE_INV_TIMEOUT_USEC;
 +
 +      WREG32(stlb_base + STLB_INV_ALL_START_OFFSET, 1);
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              stlb_base + STLB_SRAM_INIT_OFFSET,
 +              status,
 +              !status,
 +              1000,
 +              timeout_usec);
 +
 +      if (rc)
 +              dev_notice_ratelimited(hdev->dev, "Timeout when waiting for MMU SRAM init\n");
 +
 +      rc = gaudi2_mmu_update_hop0_addr(hdev, stlb_base);
 +      if (rc)
 +              return rc;
 +
 +      WREG32(mmu_base + MMU_BYPASS_OFFSET, 0);
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              stlb_base + STLB_INV_ALL_START_OFFSET,
 +              status,
 +              !status,
 +              1000,
 +              timeout_usec);
 +
 +      if (rc)
 +              dev_notice_ratelimited(hdev->dev, "Timeout when waiting for MMU invalidate all\n");
 +
 +      WREG32(mmu_base + MMU_ENABLE_OFFSET, 1);
 +
 +      return rc;
 +}
 +
 +static int gaudi2_pci_mmu_init(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 mmu_base, stlb_base;
 +      int rc;
 +
 +      if (gaudi2->hw_cap_initialized & HW_CAP_PMMU)
 +              return 0;
 +
 +      mmu_base = mmPMMU_HBW_MMU_BASE;
 +      stlb_base = mmPMMU_HBW_STLB_BASE;
 +
 +      RMWREG32_SHIFTED(stlb_base + STLB_HOP_CONFIGURATION_OFFSET,
 +              (0 << PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_HOP_SHIFT) |
 +              (5 << PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_SMALL_P_SHIFT) |
 +              (4 << PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_LARGE_P_SHIFT) |
 +              (5 << PMMU_HBW_STLB_HOP_CONFIGURATION_LAST_HOP_SHIFT) |
 +              (5 << PMMU_HBW_STLB_HOP_CONFIGURATION_FOLLOWER_HOP_SHIFT),
 +              PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_HOP_MASK |
 +              PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_SMALL_P_MASK |
 +              PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_LARGE_P_MASK |
 +              PMMU_HBW_STLB_HOP_CONFIGURATION_LAST_HOP_MASK |
 +              PMMU_HBW_STLB_HOP_CONFIGURATION_FOLLOWER_HOP_MASK);
 +
 +      WREG32(stlb_base + STLB_LL_LOOKUP_MASK_63_32_OFFSET, 0);
 +
 +      if (PAGE_SIZE == SZ_64K) {
 +              /* Set page sizes to 64K on hop5 and 16M on hop4 + enable 8 bit hops */
 +              RMWREG32_SHIFTED(mmu_base + MMU_STATIC_MULTI_PAGE_SIZE_OFFSET,
 +                      FIELD_PREP(DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_HOP5_PAGE_SIZE_MASK, 4) |
 +                      FIELD_PREP(DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_HOP4_PAGE_SIZE_MASK, 3) |
 +                      FIELD_PREP(
 +                              DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_CFG_8_BITS_HOP_MODE_EN_MASK,
 +                              1),
 +                      DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_HOP5_PAGE_SIZE_MASK |
 +                      DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_HOP4_PAGE_SIZE_MASK |
 +                      DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_CFG_8_BITS_HOP_MODE_EN_MASK);
 +      }
 +
 +      WREG32(mmu_base + MMU_SPI_SEI_MASK_OFFSET, GAUDI2_PMMU_SPI_SEI_ENABLE_MASK);
 +
 +      rc = gaudi2_mmu_init_common(hdev, mmu_base, stlb_base);
 +      if (rc)
 +              return rc;
 +
 +      gaudi2->hw_cap_initialized |= HW_CAP_PMMU;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_dcore_hmmu_init(struct hl_device *hdev, int dcore_id,
 +                              int hmmu_id)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 offset, mmu_base, stlb_base, hw_cap;
 +      u8 dmmu_seq;
 +      int rc;
 +
 +      dmmu_seq = NUM_OF_HMMU_PER_DCORE * dcore_id + hmmu_id;
 +      hw_cap = HW_CAP_DCORE0_DMMU0 << dmmu_seq;
 +
 +      /*
 +       * return if DMMU is already initialized or if it's not out of
 +       * isolation (due to cluster binning)
 +       */
 +      if ((gaudi2->hw_cap_initialized & hw_cap) || !(prop->hmmu_hif_enabled_mask & BIT(dmmu_seq)))
 +              return 0;
 +
 +      offset = (u32) (dcore_id * DCORE_OFFSET + hmmu_id * DCORE_HMMU_OFFSET);
 +      mmu_base = mmDCORE0_HMMU0_MMU_BASE + offset;
 +      stlb_base = mmDCORE0_HMMU0_STLB_BASE + offset;
 +
 +      RMWREG32(mmu_base + MMU_STATIC_MULTI_PAGE_SIZE_OFFSET, 5 /* 64MB */,
 +                      MMU_STATIC_MULTI_PAGE_SIZE_HOP4_PAGE_SIZE_MASK);
 +
 +      RMWREG32_SHIFTED(stlb_base + STLB_HOP_CONFIGURATION_OFFSET,
 +              FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_HOP_MASK, 0) |
 +              FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_SMALL_P_MASK, 3) |
 +              FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_LARGE_P_MASK, 3) |
 +              FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_LAST_HOP_MASK, 3) |
 +              FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FOLLOWER_HOP_MASK, 3),
 +                      DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_HOP_MASK |
 +                      DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_SMALL_P_MASK |
 +                      DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_LARGE_P_MASK |
 +                      DCORE0_HMMU0_STLB_HOP_CONFIGURATION_LAST_HOP_MASK |
 +                      DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FOLLOWER_HOP_MASK);
 +
 +      RMWREG32(stlb_base + STLB_HOP_CONFIGURATION_OFFSET, 1,
 +                      STLB_HOP_CONFIGURATION_ONLY_LARGE_PAGE_MASK);
 +
 +      WREG32(mmu_base + MMU_SPI_SEI_MASK_OFFSET, GAUDI2_HMMU_SPI_SEI_ENABLE_MASK);
 +
 +      rc = gaudi2_mmu_init_common(hdev, mmu_base, stlb_base);
 +      if (rc)
 +              return rc;
 +
 +      gaudi2->hw_cap_initialized |= hw_cap;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_hbm_mmu_init(struct hl_device *hdev)
 +{
 +      int rc, dcore_id, hmmu_id;
 +
 +      for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++)
 +              for (hmmu_id = 0 ; hmmu_id < NUM_OF_HMMU_PER_DCORE; hmmu_id++) {
 +                      rc = gaudi2_dcore_hmmu_init(hdev, dcore_id, hmmu_id);
 +                      if (rc)
 +                              return rc;
 +              }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_mmu_init(struct hl_device *hdev)
 +{
 +      int rc;
 +
 +      rc = gaudi2_pci_mmu_init(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = gaudi2_hbm_mmu_init(hdev);
 +      if (rc)
 +              return rc;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_hw_init(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int rc;
 +
 +      /* Let's mark in the H/W that we have reached this point. We check
 +       * this value in the reset_before_init function to understand whether
 +       * we need to reset the chip before doing H/W init. This register is
 +       * cleared by the H/W upon H/W reset
 +       */
 +      WREG32(mmHW_STATE, HL_DEVICE_HW_STATE_DIRTY);
 +
 +      /* Perform read from the device to make sure device is up */
 +      RREG32(mmHW_STATE);
 +
 +      /* If iATU is done by FW, the HBM bar ALWAYS points to DRAM_PHYS_BASE.
 +       * So we set it here and if anyone tries to move it later to
 +       * a different address, there will be an error
 +       */
 +      if (hdev->asic_prop.iatu_done_by_fw)
 +              gaudi2->dram_bar_cur_addr = DRAM_PHYS_BASE;
 +
 +      /*
 +       * Before pushing u-boot/linux to device, need to set the hbm bar to
 +       * base address of dram
 +       */
 +      if (gaudi2_set_hbm_bar_base(hdev, DRAM_PHYS_BASE) == U64_MAX) {
 +              dev_err(hdev->dev, "failed to map HBM bar to DRAM base address\n");
 +              return -EIO;
 +      }
 +
 +      rc = gaudi2_init_cpu(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to initialize CPU\n");
 +              return rc;
 +      }
 +
 +      gaudi2_init_scrambler_hbm(hdev);
 +      gaudi2_init_kdma(hdev);
 +
 +      rc = gaudi2_init_cpu_queues(hdev, GAUDI2_CPU_TIMEOUT_USEC);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to initialize CPU H/W queues %d\n", rc);
 +              return rc;
 +      }
 +
 +      rc = gaudi2->cpucp_info_get(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to get cpucp info\n");
 +              return rc;
 +      }
 +
 +      rc = gaudi2_mmu_init(hdev);
 +      if (rc)
 +              return rc;
 +
 +      gaudi2_init_pdma(hdev);
 +      gaudi2_init_edma(hdev);
 +      gaudi2_init_sm(hdev);
 +      gaudi2_init_tpc(hdev);
 +      gaudi2_init_mme(hdev);
 +      gaudi2_init_rotator(hdev);
 +      gaudi2_init_dec(hdev);
 +      gaudi2_enable_timestamp(hdev);
 +
 +      rc = gaudi2_coresight_init(hdev);
 +      if (rc)
 +              goto disable_queues;
 +
 +      rc = gaudi2_enable_msix(hdev);
 +      if (rc)
 +              goto disable_queues;
 +
 +      /* Perform read from the device to flush all configuration */
 +      RREG32(mmHW_STATE);
 +
 +      return 0;
 +
 +disable_queues:
 +      gaudi2_disable_dma_qmans(hdev);
 +      gaudi2_disable_mme_qmans(hdev);
 +      gaudi2_disable_tpc_qmans(hdev);
 +      gaudi2_disable_rot_qmans(hdev);
 +      gaudi2_disable_nic_qmans(hdev);
 +
 +      gaudi2_disable_timestamp(hdev);
 +
 +      return rc;
 +}
 +
 +/**
 + * gaudi2_send_hard_reset_cmd - common function to handle reset
 + *
 + * @hdev: pointer to the habanalabs device structure
 + *
 + * This function handles the various possible scenarios for reset.
 + * It considers if reset is handled by driver\FW and what FW components are loaded
 + */
 +static void gaudi2_send_hard_reset_cmd(struct hl_device *hdev)
 +{
 +      struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      bool heartbeat_reset, preboot_only, cpu_initialized = false;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 cpu_boot_status;
 +
 +      preboot_only = (hdev->fw_loader.fw_comp_loaded == FW_TYPE_PREBOOT_CPU);
 +      heartbeat_reset = (hdev->reset_info.curr_reset_cause == HL_RESET_CAUSE_HEARTBEAT);
 +
 +      /*
 +       * Handle corner case where failure was at cpu management app load,
 +       * and driver didn't detect any failure while loading the FW,
 +       * then at such scenario driver will send only HALT_MACHINE
 +       * and no one will respond to this request since FW already back to preboot
 +       * and it cannot handle such cmd.
 +       * In this case next time the management app loads it'll check on events register
 +       * which will still have the halt indication, and will reboot the device.
 +       * The solution is to let preboot clear all relevant registers before next boot
 +       * once driver send COMMS_RST_DEV.
 +       */
 +      cpu_boot_status = RREG32(mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS);
 +
 +      if (gaudi2 && (gaudi2->hw_cap_initialized & HW_CAP_CPU) &&
 +                      (cpu_boot_status == CPU_BOOT_STATUS_SRAM_AVAIL))
 +              cpu_initialized = true;
 +
 +      /*
 +       * when Linux/Bootfit exist this write to the SP can be interpreted in 2 ways:
 +       * 1. FW reset: FW initiate the reset sequence
 +       * 2. driver reset: FW will start HALT sequence (the preparations for the
 +       *                  reset but not the reset itself as it is not implemented
 +       *                  on their part) and LKD will wait to let FW complete the
 +       *                  sequence before issuing the reset
 +       */
 +      if (!preboot_only && cpu_initialized) {
 +              WREG32(le32_to_cpu(dyn_regs->gic_host_halt_irq),
 +                      gaudi2_irq_map_table[GAUDI2_EVENT_CPU_HALT_MACHINE].cpu_id);
 +
 +              msleep(GAUDI2_CPU_RESET_WAIT_MSEC);
 +      }
 +
 +      /*
 +       * When working with preboot (without Linux/Boot fit) we can
 +       * communicate only using the COMMS commands to issue halt/reset.
 +       *
 +       * For the case in which we are working with Linux/Bootfit this is a hail-mary
 +       * attempt to revive the card in the small chance that the f/w has
 +       * experienced a watchdog event, which caused it to return back to preboot.
 +       * In that case, triggering reset through GIC won't help. We need to
 +       * trigger the reset as if Linux wasn't loaded.
 +       *
 +       * We do it only if the reset cause was HB, because that would be the
 +       * indication of such an event.
 +       *
 +       * In case watchdog hasn't expired but we still got HB, then this won't
 +       * do any damage.
 +       */
 +
 +      if (heartbeat_reset || preboot_only || !cpu_initialized) {
 +              if (hdev->asic_prop.hard_reset_done_by_fw)
 +                      hl_fw_ask_hard_reset_without_linux(hdev);
 +              else
 +                      hl_fw_ask_halt_machine_without_linux(hdev);
 +      }
 +}
 +
 +/**
 + * gaudi2_execute_hard_reset - execute hard reset by driver/FW
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @reset_sleep_ms: sleep time in msec after reset
 + *
 + * This function executes hard reset based on if driver/FW should do the reset
 + */
 +static void gaudi2_execute_hard_reset(struct hl_device *hdev, u32 reset_sleep_ms)
 +{
 +      if (hdev->asic_prop.hard_reset_done_by_fw) {
 +              gaudi2_send_hard_reset_cmd(hdev);
 +              return;
 +      }
 +
 +      /* Set device to handle FLR by H/W as we will put the device
 +       * CPU to halt mode
 +       */
 +      WREG32(mmPCIE_AUX_FLR_CTRL,
 +                      (PCIE_AUX_FLR_CTRL_HW_CTRL_MASK | PCIE_AUX_FLR_CTRL_INT_MASK_MASK));
 +
 +      gaudi2_send_hard_reset_cmd(hdev);
 +
 +      WREG32(mmPSOC_RESET_CONF_SW_ALL_RST, 1);
 +}
 +
 +/**
 + * gaudi2_execute_soft_reset - execute soft reset by driver/FW
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @reset_sleep_ms: sleep time in msec after reset
 + * @driver_performs_reset: true if driver should perform reset instead of f/w.
 + *
 + * This function executes soft reset based on if driver/FW should do the reset
 + */
 +static void gaudi2_execute_soft_reset(struct hl_device *hdev, u32 reset_sleep_ms,
 +                                              bool driver_performs_reset)
 +{
 +      struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +
 +      if (!driver_performs_reset) {
 +              /* set SP to indicate reset request sent to FW */
 +              if (dyn_regs->cpu_rst_status)
 +                      WREG32(le32_to_cpu(dyn_regs->cpu_rst_status), CPU_RST_STATUS_NA);
 +              else
 +                      WREG32(mmCPU_RST_STATUS_TO_HOST, CPU_RST_STATUS_NA);
 +
 +              WREG32(le32_to_cpu(dyn_regs->gic_host_soft_rst_irq),
 +                      gaudi2_irq_map_table[GAUDI2_EVENT_CPU_SOFT_RESET].cpu_id);
 +              return;
 +      }
 +
 +      /* Block access to engines, QMANs and SM during reset, these
 +       * RRs will be reconfigured after soft reset.
 +       * PCIE_MSIX is left unsecured to allow NIC packets processing during the reset.
 +       */
 +      gaudi2_write_rr_to_all_lbw_rtrs(hdev, RR_TYPE_LONG, NUM_LONG_LBW_RR - 1,
 +                                      mmDCORE0_TPC0_QM_DCCM_BASE, mmPCIE_MSIX_BASE);
 +
 +      gaudi2_write_rr_to_all_lbw_rtrs(hdev, RR_TYPE_LONG, NUM_LONG_LBW_RR - 2,
 +                              mmPCIE_MSIX_BASE + HL_BLOCK_SIZE,
 +                              mmPCIE_VDEC1_MSTR_IF_RR_SHRD_HBW_BASE + HL_BLOCK_SIZE);
 +
 +      WREG32(mmPSOC_RESET_CONF_SOFT_RST, 1);
 +}
 +
 +static void gaudi2_poll_btm_indication(struct hl_device *hdev, u32 reset_sleep_ms,
 +                                                              u32 poll_timeout_us)
 +{
 +      int i, rc = 0;
 +      u32 reg_val;
 +
 +      /* without this sleep reset will not work */
 +      msleep(reset_sleep_ms);
 +
 +      /* We poll the BTM done indication multiple times after reset due to
 +       * a HW errata 'GAUDI2_0300'
 +       */
 +      for (i = 0 ; i < GAUDI2_RESET_POLL_CNT ; i++)
 +              rc = hl_poll_timeout(
 +                      hdev,
 +                      mmPSOC_GLOBAL_CONF_BTM_FSM,
 +                      reg_val,
 +                      reg_val == 0,
 +                      1000,
 +                      poll_timeout_us);
 +
 +      if (rc)
 +              dev_err(hdev->dev, "Timeout while waiting for device to reset 0x%x\n", reg_val);
 +}
 +
 +static void gaudi2_get_soft_rst_done_indication(struct hl_device *hdev, u32 poll_timeout_us)
 +{
 +      int i, rc = 0;
 +      u32 reg_val;
 +
 +      for (i = 0 ; i < GAUDI2_RESET_POLL_CNT ; i++)
 +              rc = hl_poll_timeout(
 +                      hdev,
 +                      mmCPU_RST_STATUS_TO_HOST,
 +                      reg_val,
 +                      reg_val == CPU_RST_STATUS_SOFT_RST_DONE,
 +                      1000,
 +                      poll_timeout_us);
 +
 +      if (rc)
 +              dev_err(hdev->dev, "Timeout while waiting for FW to complete soft reset (0x%x)\n",
 +                              reg_val);
 +}
 +
 +static void gaudi2_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 poll_timeout_us, reset_sleep_ms;
 +      bool driver_performs_reset = false;
 +
 +      if (hdev->pldm) {
 +              reset_sleep_ms = hard_reset ? GAUDI2_PLDM_HRESET_TIMEOUT_MSEC :
 +                                              GAUDI2_PLDM_SRESET_TIMEOUT_MSEC;
 +              poll_timeout_us = GAUDI2_PLDM_RESET_POLL_TIMEOUT_USEC;
 +      } else {
 +              reset_sleep_ms = GAUDI2_RESET_TIMEOUT_MSEC;
 +              poll_timeout_us = GAUDI2_RESET_POLL_TIMEOUT_USEC;
 +      }
 +
 +      if (fw_reset)
 +              goto skip_reset;
 +
 +      gaudi2_reset_arcs(hdev);
 +
 +      if (hard_reset) {
 +              driver_performs_reset = !hdev->asic_prop.hard_reset_done_by_fw;
 +              gaudi2_execute_hard_reset(hdev, reset_sleep_ms);
 +      } else {
 +              /*
 +               * As we have to support also work with preboot only (which does not supports
 +               * soft reset) we have to make sure that security is disabled before letting driver
 +               * do the reset. user shall control the BFE flags to avoid asking soft reset in
 +               * secured device with preboot only.
 +               */
 +              driver_performs_reset = (hdev->fw_components == FW_TYPE_PREBOOT_CPU &&
 +                                                      !hdev->asic_prop.fw_security_enabled);
 +              gaudi2_execute_soft_reset(hdev, reset_sleep_ms, driver_performs_reset);
 +      }
 +
 +skip_reset:
 +      if (driver_performs_reset || hard_reset)
 +              /*
 +               * Instead of waiting for BTM indication we should wait for preboot ready:
 +               * Consider the below scenario:
 +               * 1. FW update is being triggered
 +               *        - setting the dirty bit
 +               * 2. hard reset will be triggered due to the dirty bit
 +               * 3. FW initiates the reset:
 +               *        - dirty bit cleared
 +               *        - BTM indication cleared
 +               *        - preboot ready indication cleared
 +               * 4. during hard reset:
 +               *        - BTM indication will be set
 +               *        - BIST test performed and another reset triggered
 +               * 5. only after this reset the preboot will set the preboot ready
 +               *
 +               * when polling on BTM indication alone we can lose sync with FW while trying to
 +               * communicate with FW that is during reset.
 +               * to overcome this we will always wait to preboot ready indication
 +               */
 +              if ((hdev->fw_components & FW_TYPE_PREBOOT_CPU)) {
 +                      msleep(reset_sleep_ms);
 +                      hl_fw_wait_preboot_ready(hdev);
 +              } else {
 +                      gaudi2_poll_btm_indication(hdev, reset_sleep_ms, poll_timeout_us);
 +              }
 +      else
 +              gaudi2_get_soft_rst_done_indication(hdev, poll_timeout_us);
 +
 +      if (!gaudi2)
 +              return;
 +
 +      gaudi2->dec_hw_cap_initialized &= ~(HW_CAP_DEC_MASK);
 +      gaudi2->tpc_hw_cap_initialized &= ~(HW_CAP_TPC_MASK);
 +
 +      /*
 +       * Clear NIC capability mask in order for driver to re-configure
 +       * NIC QMANs. NIC ports will not be re-configured during soft
 +       * reset as we call gaudi2_nic_init only during hard reset
 +       */
 +      gaudi2->nic_hw_cap_initialized &= ~(HW_CAP_NIC_MASK);
 +
 +      if (hard_reset) {
 +              gaudi2->hw_cap_initialized &=
 +                      ~(HW_CAP_DRAM | HW_CAP_CLK_GATE | HW_CAP_HBM_SCRAMBLER_MASK |
 +                      HW_CAP_PMMU | HW_CAP_CPU | HW_CAP_CPU_Q |
 +                      HW_CAP_SRAM_SCRAMBLER | HW_CAP_DMMU_MASK |
 +                      HW_CAP_PDMA_MASK | HW_CAP_EDMA_MASK | HW_CAP_KDMA |
 +                      HW_CAP_MME_MASK | HW_CAP_ROT_MASK);
 +
 +              memset(gaudi2->events_stat, 0, sizeof(gaudi2->events_stat));
 +      } else {
 +              gaudi2->hw_cap_initialized &=
 +                      ~(HW_CAP_CLK_GATE | HW_CAP_HBM_SCRAMBLER_SW_RESET |
 +                      HW_CAP_PDMA_MASK | HW_CAP_EDMA_MASK | HW_CAP_MME_MASK |
 +                      HW_CAP_ROT_MASK);
 +      }
 +}
 +
 +static int gaudi2_suspend(struct hl_device *hdev)
 +{
 +      int rc;
 +
 +      rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
 +      if (rc)
 +              dev_err(hdev->dev, "Failed to disable PCI access from CPU\n");
 +
 +      return rc;
 +}
 +
 +static int gaudi2_resume(struct hl_device *hdev)
 +{
 +      return gaudi2_init_iatu(hdev);
 +}
 +
 +static int gaudi2_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
 +              void *cpu_addr, dma_addr_t dma_addr, size_t size)
 +{
 +      int rc;
 +
-       vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
-                       VM_DONTCOPY | VM_NORESERVE;
++      vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
++                      VM_DONTCOPY | VM_NORESERVE);
 +
 +#ifdef _HAS_DMA_MMAP_COHERENT
 +
 +      rc = dma_mmap_coherent(hdev->dev, vma, cpu_addr, dma_addr, size);
 +      if (rc)
 +              dev_err(hdev->dev, "dma_mmap_coherent error %d", rc);
 +
 +#else
 +
 +      rc = remap_pfn_range(vma, vma->vm_start,
 +                              virt_to_phys(cpu_addr) >> PAGE_SHIFT,
 +                              size, vma->vm_page_prot);
 +      if (rc)
 +              dev_err(hdev->dev, "remap_pfn_range error %d", rc);
 +
 +#endif
 +
 +      return rc;
 +}
 +
 +static bool gaudi2_is_queue_enabled(struct hl_device *hdev, u32 hw_queue_id)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u64 hw_cap_mask = 0;
 +      u64 hw_tpc_cap_bit = 0;
 +      u64 hw_nic_cap_bit = 0;
 +      u64 hw_test_cap_bit = 0;
 +
 +      switch (hw_queue_id) {
 +      case GAUDI2_QUEUE_ID_PDMA_0_0:
 +      case GAUDI2_QUEUE_ID_PDMA_0_1:
 +      case GAUDI2_QUEUE_ID_PDMA_1_0:
 +              hw_cap_mask = HW_CAP_PDMA_MASK;
 +              break;
 +      case GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3:
 +              hw_test_cap_bit = HW_CAP_EDMA_SHIFT +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0) >> 2);
 +              break;
 +      case GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3:
 +              hw_test_cap_bit = HW_CAP_EDMA_SHIFT + NUM_OF_EDMA_PER_DCORE +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0) >> 2);
 +              break;
 +      case GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3:
 +              hw_test_cap_bit = HW_CAP_EDMA_SHIFT + 2 * NUM_OF_EDMA_PER_DCORE +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0) >> 2);
 +              break;
 +      case GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3:
 +              hw_test_cap_bit = HW_CAP_EDMA_SHIFT + 3 * NUM_OF_EDMA_PER_DCORE +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0) >> 2);
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE0_MME_0_0 ... GAUDI2_QUEUE_ID_DCORE0_MME_0_3:
 +              hw_test_cap_bit = HW_CAP_MME_SHIFT;
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE1_MME_0_0 ... GAUDI2_QUEUE_ID_DCORE1_MME_0_3:
 +              hw_test_cap_bit = HW_CAP_MME_SHIFT + 1;
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE2_MME_0_0 ... GAUDI2_QUEUE_ID_DCORE2_MME_0_3:
 +              hw_test_cap_bit = HW_CAP_MME_SHIFT + 2;
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE3_MME_0_0 ... GAUDI2_QUEUE_ID_DCORE3_MME_0_3:
 +              hw_test_cap_bit = HW_CAP_MME_SHIFT + 3;
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE0_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE0_TPC_5_3:
 +              hw_tpc_cap_bit = HW_CAP_TPC_SHIFT +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE0_TPC_0_0) >> 2);
 +
 +              /* special case where cap bit refers to the first queue id */
 +              if (!hw_tpc_cap_bit)
 +                      return !!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(0));
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE1_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE1_TPC_5_3:
 +              hw_tpc_cap_bit = HW_CAP_TPC_SHIFT + NUM_OF_TPC_PER_DCORE +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE1_TPC_0_0) >> 2);
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE2_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE2_TPC_5_3:
 +              hw_tpc_cap_bit = HW_CAP_TPC_SHIFT + (2 * NUM_OF_TPC_PER_DCORE) +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE2_TPC_0_0) >> 2);
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE3_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE3_TPC_5_3:
 +              hw_tpc_cap_bit = HW_CAP_TPC_SHIFT + (3 * NUM_OF_TPC_PER_DCORE) +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE3_TPC_0_0) >> 2);
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE0_TPC_6_0 ... GAUDI2_QUEUE_ID_DCORE0_TPC_6_3:
 +              hw_tpc_cap_bit = HW_CAP_TPC_SHIFT + (4 * NUM_OF_TPC_PER_DCORE);
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_ROT_0_0 ... GAUDI2_QUEUE_ID_ROT_1_3:
 +              hw_test_cap_bit = HW_CAP_ROT_SHIFT + ((hw_queue_id - GAUDI2_QUEUE_ID_ROT_0_0) >> 2);
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_NIC_0_0 ... GAUDI2_QUEUE_ID_NIC_23_3:
 +              hw_nic_cap_bit = HW_CAP_NIC_SHIFT + ((hw_queue_id - GAUDI2_QUEUE_ID_NIC_0_0) >> 2);
 +
 +              /* special case where cap bit refers to the first queue id */
 +              if (!hw_nic_cap_bit)
 +                      return !!(gaudi2->nic_hw_cap_initialized & BIT_ULL(0));
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_CPU_PQ:
 +              return !!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q);
 +
 +      default:
 +              return false;
 +      }
 +
 +      if (hw_tpc_cap_bit)
 +              return  !!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(hw_tpc_cap_bit));
 +
 +      if (hw_nic_cap_bit)
 +              return  !!(gaudi2->nic_hw_cap_initialized & BIT_ULL(hw_nic_cap_bit));
 +
 +      if (hw_test_cap_bit)
 +              hw_cap_mask = BIT_ULL(hw_test_cap_bit);
 +
 +      return !!(gaudi2->hw_cap_initialized & hw_cap_mask);
 +}
 +
 +static bool gaudi2_is_arc_enabled(struct hl_device *hdev, u64 arc_id)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      switch (arc_id) {
 +      case CPU_ID_SCHED_ARC0 ... CPU_ID_SCHED_ARC5:
 +      case CPU_ID_MME_QMAN_ARC0...CPU_ID_ROT_QMAN_ARC1:
 +              return !!(gaudi2->active_hw_arc & BIT_ULL(arc_id));
 +
 +      case CPU_ID_TPC_QMAN_ARC0...CPU_ID_TPC_QMAN_ARC24:
 +              return !!(gaudi2->active_tpc_arc & BIT_ULL(arc_id - CPU_ID_TPC_QMAN_ARC0));
 +
 +      case CPU_ID_NIC_QMAN_ARC0...CPU_ID_NIC_QMAN_ARC23:
 +              return !!(gaudi2->active_nic_arc & BIT_ULL(arc_id - CPU_ID_NIC_QMAN_ARC0));
 +
 +      default:
 +              return false;
 +      }
 +}
 +
 +static void gaudi2_clr_arc_id_cap(struct hl_device *hdev, u64 arc_id)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      switch (arc_id) {
 +      case CPU_ID_SCHED_ARC0 ... CPU_ID_SCHED_ARC5:
 +      case CPU_ID_MME_QMAN_ARC0...CPU_ID_ROT_QMAN_ARC1:
 +              gaudi2->active_hw_arc &= ~(BIT_ULL(arc_id));
 +              break;
 +
 +      case CPU_ID_TPC_QMAN_ARC0...CPU_ID_TPC_QMAN_ARC24:
 +              gaudi2->active_tpc_arc &= ~(BIT_ULL(arc_id - CPU_ID_TPC_QMAN_ARC0));
 +              break;
 +
 +      case CPU_ID_NIC_QMAN_ARC0...CPU_ID_NIC_QMAN_ARC23:
 +              gaudi2->active_nic_arc &= ~(BIT_ULL(arc_id - CPU_ID_NIC_QMAN_ARC0));
 +              break;
 +
 +      default:
 +              return;
 +      }
 +}
 +
 +static void gaudi2_set_arc_id_cap(struct hl_device *hdev, u64 arc_id)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      switch (arc_id) {
 +      case CPU_ID_SCHED_ARC0 ... CPU_ID_SCHED_ARC5:
 +      case CPU_ID_MME_QMAN_ARC0...CPU_ID_ROT_QMAN_ARC1:
 +              gaudi2->active_hw_arc |= BIT_ULL(arc_id);
 +              break;
 +
 +      case CPU_ID_TPC_QMAN_ARC0...CPU_ID_TPC_QMAN_ARC24:
 +              gaudi2->active_tpc_arc |= BIT_ULL(arc_id - CPU_ID_TPC_QMAN_ARC0);
 +              break;
 +
 +      case CPU_ID_NIC_QMAN_ARC0...CPU_ID_NIC_QMAN_ARC23:
 +              gaudi2->active_nic_arc |= BIT_ULL(arc_id - CPU_ID_NIC_QMAN_ARC0);
 +              break;
 +
 +      default:
 +              return;
 +      }
 +}
 +
 +static void gaudi2_ring_doorbell(struct hl_device *hdev, u32 hw_queue_id, u32 pi)
 +{
 +      struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 pq_offset, reg_base, db_reg_offset, db_value;
 +
 +      if (hw_queue_id != GAUDI2_QUEUE_ID_CPU_PQ) {
 +              /*
 +               * QMAN has 4 successive PQ_PI registers, 1 for each of the QMAN PQs.
 +               * Masking the H/W queue ID with 0x3 extracts the QMAN internal PQ
 +               * number.
 +               */
 +              pq_offset = (hw_queue_id & 0x3) * 4;
 +              reg_base = gaudi2_qm_blocks_bases[hw_queue_id];
 +              db_reg_offset = reg_base + QM_PQ_PI_0_OFFSET + pq_offset;
 +      } else {
 +              db_reg_offset = mmCPU_IF_PF_PQ_PI;
 +      }
 +
 +      db_value = pi;
 +
 +      /* ring the doorbell */
 +      WREG32(db_reg_offset, db_value);
 +
 +      if (hw_queue_id == GAUDI2_QUEUE_ID_CPU_PQ) {
 +              /* make sure device CPU will read latest data from host */
 +              mb();
 +              WREG32(le32_to_cpu(dyn_regs->gic_host_pi_upd_irq),
 +                      gaudi2_irq_map_table[GAUDI2_EVENT_CPU_PI_UPDATE].cpu_id);
 +      }
 +}
 +
 +static void gaudi2_pqe_write(struct hl_device *hdev, __le64 *pqe, struct hl_bd *bd)
 +{
 +      __le64 *pbd = (__le64 *) bd;
 +
 +      /* The QMANs are on the host memory so a simple copy suffice */
 +      pqe[0] = pbd[0];
 +      pqe[1] = pbd[1];
 +}
 +
 +static void *gaudi2_dma_alloc_coherent(struct hl_device *hdev, size_t size,
 +                              dma_addr_t *dma_handle, gfp_t flags)
 +{
 +      return dma_alloc_coherent(&hdev->pdev->dev, size, dma_handle, flags);
 +}
 +
 +static void gaudi2_dma_free_coherent(struct hl_device *hdev, size_t size,
 +                              void *cpu_addr, dma_addr_t dma_handle)
 +{
 +      dma_free_coherent(&hdev->pdev->dev, size, cpu_addr, dma_handle);
 +}
 +
 +static int gaudi2_send_cpu_message(struct hl_device *hdev, u32 *msg, u16 len,
 +                              u32 timeout, u64 *result)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q)) {
 +              if (result)
 +                      *result = 0;
 +              return 0;
 +      }
 +
 +      if (!timeout)
 +              timeout = GAUDI2_MSG_TO_CPU_TIMEOUT_USEC;
 +
 +      return hl_fw_send_cpu_message(hdev, GAUDI2_QUEUE_ID_CPU_PQ, msg, len, timeout, result);
 +}
 +
 +static void *gaudi2_dma_pool_zalloc(struct hl_device *hdev, size_t size,
 +                              gfp_t mem_flags, dma_addr_t *dma_handle)
 +{
 +      if (size > GAUDI2_DMA_POOL_BLK_SIZE)
 +              return NULL;
 +
 +      return dma_pool_zalloc(hdev->dma_pool, mem_flags, dma_handle);
 +}
 +
 +static void gaudi2_dma_pool_free(struct hl_device *hdev, void *vaddr, dma_addr_t dma_addr)
 +{
 +      dma_pool_free(hdev->dma_pool, vaddr, dma_addr);
 +}
 +
 +static void *gaudi2_cpu_accessible_dma_pool_alloc(struct hl_device *hdev, size_t size,
 +                                              dma_addr_t *dma_handle)
 +{
 +      return hl_fw_cpu_accessible_dma_pool_alloc(hdev, size, dma_handle);
 +}
 +
 +static void gaudi2_cpu_accessible_dma_pool_free(struct hl_device *hdev, size_t size, void *vaddr)
 +{
 +      hl_fw_cpu_accessible_dma_pool_free(hdev, size, vaddr);
 +}
 +
 +static dma_addr_t gaudi2_dma_map_single(struct hl_device *hdev, void *addr, int len,
 +                                      enum dma_data_direction dir)
 +{
 +      dma_addr_t dma_addr;
 +
 +      dma_addr = dma_map_single(&hdev->pdev->dev, addr, len, dir);
 +      if (unlikely(dma_mapping_error(&hdev->pdev->dev, dma_addr)))
 +              return 0;
 +
 +      return dma_addr;
 +}
 +
 +static void gaudi2_dma_unmap_single(struct hl_device *hdev, dma_addr_t addr, int len,
 +                                      enum dma_data_direction dir)
 +{
 +      dma_unmap_single(&hdev->pdev->dev, addr, len, dir);
 +}
 +
 +static int gaudi2_validate_cb_address(struct hl_device *hdev, struct hl_cs_parser *parser)
 +{
 +      struct asic_fixed_properties *asic_prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (!gaudi2_is_queue_enabled(hdev, parser->hw_queue_id)) {
 +              dev_err(hdev->dev, "h/w queue %d is disabled\n", parser->hw_queue_id);
 +              return -EINVAL;
 +      }
 +
 +      /* Just check if CB address is valid */
 +
 +      if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
 +                                      parser->user_cb_size,
 +                                      asic_prop->sram_user_base_address,
 +                                      asic_prop->sram_end_address))
 +              return 0;
 +
 +      if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
 +                                      parser->user_cb_size,
 +                                      asic_prop->dram_user_base_address,
 +                                      asic_prop->dram_end_address))
 +              return 0;
 +
 +      if ((gaudi2->hw_cap_initialized & HW_CAP_DMMU_MASK) &&
 +              hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
 +                                              parser->user_cb_size,
 +                                              asic_prop->dmmu.start_addr,
 +                                              asic_prop->dmmu.end_addr))
 +              return 0;
 +
 +      if (gaudi2->hw_cap_initialized & HW_CAP_PMMU) {
 +              if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
 +                                      parser->user_cb_size,
 +                                      asic_prop->pmmu.start_addr,
 +                                      asic_prop->pmmu.end_addr) ||
 +                      hl_mem_area_inside_range(
 +                                      (u64) (uintptr_t) parser->user_cb,
 +                                      parser->user_cb_size,
 +                                      asic_prop->pmmu_huge.start_addr,
 +                                      asic_prop->pmmu_huge.end_addr))
 +                      return 0;
 +
 +      } else if (gaudi2_host_phys_addr_valid((u64) (uintptr_t) parser->user_cb)) {
 +              if (!hdev->pdev)
 +                      return 0;
 +
 +              if (!device_iommu_mapped(&hdev->pdev->dev))
 +                      return 0;
 +      }
 +
 +      dev_err(hdev->dev, "CB address %p + 0x%x for internal QMAN is not valid\n",
 +              parser->user_cb, parser->user_cb_size);
 +
 +      return -EFAULT;
 +}
 +
 +static int gaudi2_cs_parser(struct hl_device *hdev, struct hl_cs_parser *parser)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (!parser->is_kernel_allocated_cb)
 +              return gaudi2_validate_cb_address(hdev, parser);
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_PMMU)) {
 +              dev_err(hdev->dev, "PMMU not initialized - Unsupported mode in Gaudi2\n");
 +              return -EINVAL;
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_send_heartbeat(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_send_heartbeat(hdev);
 +}
 +
 +/* This is an internal helper function, used to update the KDMA mmu props.
 + * Should be called with a proper kdma lock.
 + */
 +static void gaudi2_kdma_set_mmbp_asid(struct hl_device *hdev,
 +                                         bool mmu_bypass, u32 asid)
 +{
 +      u32 rw_asid, rw_mmu_bp;
 +
 +      rw_asid = (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_RD_SHIFT) |
 +                    (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_WR_SHIFT);
 +
 +      rw_mmu_bp = (!!mmu_bypass << ARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP_RD_SHIFT) |
 +                      (!!mmu_bypass << ARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP_WR_SHIFT);
 +
 +      WREG32(mmARC_FARM_KDMA_CTX_AXUSER_HB_ASID, rw_asid);
 +      WREG32(mmARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP, rw_mmu_bp);
 +}
 +
 +static void gaudi2_arm_cq_monitor(struct hl_device *hdev, u32 sob_id, u32 mon_id, u32 cq_id,
 +                                              u32 mon_payload, u32 sync_value)
 +{
 +      u32 sob_offset, mon_offset, sync_group_id, mode, mon_arm;
 +      u8 mask;
 +
 +      sob_offset = sob_id * 4;
 +      mon_offset = mon_id * 4;
 +
 +      /* Reset the SOB value */
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset, 0);
 +
 +      /* Configure this address with CQ_ID 0 because CQ_EN is set */
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_offset, cq_id);
 +
 +      /* Configure this address with CS index because CQ_EN is set */
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_offset, mon_payload);
 +
 +      sync_group_id = sob_id / 8;
 +      mask = ~(1 << (sob_id & 0x7));
 +      mode = 1; /* comparison mode is "equal to" */
 +
 +      mon_arm = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SOD_MASK, sync_value);
 +      mon_arm |= FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SOP_MASK, mode);
 +      mon_arm |= FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_MASK_MASK, mask);
 +      mon_arm |= FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SID_MASK, sync_group_id);
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_ARM_0 + mon_offset, mon_arm);
 +}
 +
 +/* This is an internal helper function used by gaudi2_send_job_to_kdma only */
 +static int gaudi2_send_job_to_kdma(struct hl_device *hdev,
 +                                      u64 src_addr, u64 dst_addr,
 +                                      u32 size, bool is_memset)
 +{
 +      u32 comp_val, commit_mask, *polling_addr, timeout, status = 0;
 +      struct hl_cq_entry *cq_base;
 +      struct hl_cq *cq;
 +      u64 comp_addr;
 +      int rc;
 +
 +      gaudi2_arm_cq_monitor(hdev, GAUDI2_RESERVED_SOB_KDMA_COMPLETION,
 +                              GAUDI2_RESERVED_MON_KDMA_COMPLETION,
 +                              GAUDI2_RESERVED_CQ_KDMA_COMPLETION, 1, 1);
 +
 +      comp_addr = CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 +
 +                      (GAUDI2_RESERVED_SOB_KDMA_COMPLETION * sizeof(u32));
 +
 +      comp_val = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_INC_MASK, 1) |
 +                      FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_VAL_MASK, 1);
 +
 +      WREG32(mmARC_FARM_KDMA_CTX_SRC_BASE_LO, lower_32_bits(src_addr));
 +      WREG32(mmARC_FARM_KDMA_CTX_SRC_BASE_HI, upper_32_bits(src_addr));
 +      WREG32(mmARC_FARM_KDMA_CTX_DST_BASE_LO, lower_32_bits(dst_addr));
 +      WREG32(mmARC_FARM_KDMA_CTX_DST_BASE_HI, upper_32_bits(dst_addr));
 +      WREG32(mmARC_FARM_KDMA_CTX_WR_COMP_ADDR_LO, lower_32_bits(comp_addr));
 +      WREG32(mmARC_FARM_KDMA_CTX_WR_COMP_ADDR_HI, upper_32_bits(comp_addr));
 +      WREG32(mmARC_FARM_KDMA_CTX_WR_COMP_WDATA, comp_val);
 +      WREG32(mmARC_FARM_KDMA_CTX_DST_TSIZE_0, size);
 +
 +      commit_mask = FIELD_PREP(ARC_FARM_KDMA_CTX_COMMIT_LIN_MASK, 1) |
 +                              FIELD_PREP(ARC_FARM_KDMA_CTX_COMMIT_WR_COMP_EN_MASK, 1);
 +
 +      if (is_memset)
 +              commit_mask |= FIELD_PREP(ARC_FARM_KDMA_CTX_COMMIT_MEM_SET_MASK, 1);
 +
 +      WREG32(mmARC_FARM_KDMA_CTX_COMMIT, commit_mask);
 +
 +      /* Wait for completion */
 +      cq = &hdev->completion_queue[GAUDI2_RESERVED_CQ_KDMA_COMPLETION];
 +      cq_base = cq->kernel_address;
 +      polling_addr = (u32 *)&cq_base[cq->ci];
 +
 +      if (hdev->pldm)
 +              /* for each 1MB 20 second of timeout */
 +              timeout = ((size / SZ_1M) + 1) * USEC_PER_SEC * 20;
 +      else
 +              timeout = KDMA_TIMEOUT_USEC;
 +
 +      /* Polling */
 +      rc = hl_poll_timeout_memory(
 +                      hdev,
 +                      polling_addr,
 +                      status,
 +                      (status == 1),
 +                      1000,
 +                      timeout,
 +                      true);
 +
 +      *polling_addr = 0;
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "Timeout while waiting for KDMA to be idle\n");
 +              WREG32(mmARC_FARM_KDMA_CFG_1, 1 << ARC_FARM_KDMA_CFG_1_HALT_SHIFT);
 +              return rc;
 +      }
 +
 +      cq->ci = hl_cq_inc_ptr(cq->ci);
 +
 +      return 0;
 +}
 +
 +static void gaudi2_memset_device_lbw(struct hl_device *hdev, u32 addr, u32 size, u32 val)
 +{
 +      u32 i;
 +
 +      for (i = 0 ; i < size ; i += sizeof(u32))
 +              WREG32(addr + i, val);
 +}
 +
 +static void gaudi2_qman_set_test_mode(struct hl_device *hdev, u32 hw_queue_id, bool enable)
 +{
 +      u32 reg_base = gaudi2_qm_blocks_bases[hw_queue_id];
 +
 +      if (enable) {
 +              WREG32(reg_base + QM_GLBL_PROT_OFFSET, QMAN_MAKE_TRUSTED_TEST_MODE);
 +              WREG32(reg_base + QM_PQC_CFG_OFFSET, 0);
 +      } else {
 +              WREG32(reg_base + QM_GLBL_PROT_OFFSET, QMAN_MAKE_TRUSTED);
 +              WREG32(reg_base + QM_PQC_CFG_OFFSET, 1 << PDMA0_QM_PQC_CFG_EN_SHIFT);
 +      }
 +}
 +
 +static int gaudi2_test_queue(struct hl_device *hdev, u32 hw_queue_id)
 +{
 +      u32 sob_offset = hdev->asic_prop.first_available_user_sob[0] * 4;
 +      u32 sob_addr = mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset;
 +      u32 timeout_usec, tmp, sob_base = 1, sob_val = 0x5a5a;
 +      struct packet_msg_short *msg_short_pkt;
 +      dma_addr_t pkt_dma_addr;
 +      size_t pkt_size;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GAUDI2_PLDM_TEST_QUEUE_WAIT_USEC;
 +      else
 +              timeout_usec = GAUDI2_TEST_QUEUE_WAIT_USEC;
 +
 +      pkt_size = sizeof(*msg_short_pkt);
 +      msg_short_pkt = hl_asic_dma_pool_zalloc(hdev, pkt_size, GFP_KERNEL, &pkt_dma_addr);
 +      if (!msg_short_pkt) {
 +              dev_err(hdev->dev, "Failed to allocate packet for H/W queue %d testing\n",
 +                      hw_queue_id);
 +              return -ENOMEM;
 +      }
 +
 +      tmp = (PACKET_MSG_SHORT << GAUDI2_PKT_CTL_OPCODE_SHIFT) |
 +              (1 << GAUDI2_PKT_CTL_EB_SHIFT) |
 +              (1 << GAUDI2_PKT_CTL_MB_SHIFT) |
 +              (sob_base << GAUDI2_PKT_SHORT_CTL_BASE_SHIFT) |
 +              (sob_offset << GAUDI2_PKT_SHORT_CTL_ADDR_SHIFT);
 +
 +      msg_short_pkt->value = cpu_to_le32(sob_val);
 +      msg_short_pkt->ctl = cpu_to_le32(tmp);
 +
 +      /* Reset the SOB value */
 +      WREG32(sob_addr, 0);
 +
 +      rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id, pkt_size, pkt_dma_addr);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to send msg_short packet to H/W queue %d\n",
 +                      hw_queue_id);
 +              goto free_pkt;
 +      }
 +
 +      rc = hl_poll_timeout(
 +                      hdev,
 +                      sob_addr,
 +                      tmp,
 +                      (tmp == sob_val),
 +                      1000,
 +                      timeout_usec);
 +
 +      if (rc == -ETIMEDOUT) {
 +              dev_err(hdev->dev, "H/W queue %d test failed (SOB_OBJ_0 == 0x%x)\n",
 +                      hw_queue_id, tmp);
 +              rc = -EIO;
 +      }
 +
 +      /* Reset the SOB value */
 +      WREG32(sob_addr, 0);
 +
 +free_pkt:
 +      hl_asic_dma_pool_free(hdev, (void *) msg_short_pkt, pkt_dma_addr);
 +      return rc;
 +}
 +
 +static int gaudi2_test_cpu_queue(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      /*
 +       * check capability here as send_cpu_message() won't update the result
 +       * value if no capability
 +       */
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_test_cpu_queue(hdev);
 +}
 +
 +static int gaudi2_test_queues(struct hl_device *hdev)
 +{
 +      int i, rc, ret_val = 0;
 +
 +      for (i = GAUDI2_QUEUE_ID_PDMA_0_0 ; i < GAUDI2_QUEUE_ID_CPU_PQ; i++) {
 +              if (!gaudi2_is_queue_enabled(hdev, i))
 +                      continue;
 +
 +              gaudi2_qman_set_test_mode(hdev, i, true);
 +              rc = gaudi2_test_queue(hdev, i);
 +              gaudi2_qman_set_test_mode(hdev, i, false);
 +
 +              if (rc) {
 +                      ret_val = -EINVAL;
 +                      goto done;
 +              }
 +      }
 +
 +      rc = gaudi2_test_cpu_queue(hdev);
 +      if (rc) {
 +              ret_val = -EINVAL;
 +              goto done;
 +      }
 +
 +done:
 +      return ret_val;
 +}
 +
 +static int gaudi2_compute_reset_late_init(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      size_t irq_arr_size;
 +
 +      /* TODO: missing gaudi2_nic_resume.
 +       * Until implemented nic_hw_cap_initialized will remain zeroed
 +       */
 +      gaudi2_init_arcs(hdev);
 +      gaudi2_scrub_arcs_dccm(hdev);
 +      gaudi2_init_security(hdev);
 +
 +      /* Unmask all IRQs since some could have been received during the soft reset */
 +      irq_arr_size = gaudi2->num_of_valid_hw_events * sizeof(gaudi2->hw_events[0]);
 +      return hl_fw_unmask_irq_arr(hdev, gaudi2->hw_events, irq_arr_size);
 +}
 +
 +static void gaudi2_is_tpc_engine_idle(struct hl_device *hdev, int dcore, int inst, u32 offset,
 +                                      struct iterate_module_ctx *ctx)
 +{
 +      struct gaudi2_tpc_idle_data *idle_data = ctx->data;
 +      u32 tpc_cfg_sts, qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts;
 +      bool is_eng_idle;
 +      int engine_idx;
 +
 +      if ((dcore == 0) && (inst == (NUM_DCORE0_TPC - 1)))
 +              engine_idx = GAUDI2_DCORE0_ENGINE_ID_TPC_6;
 +      else
 +              engine_idx = GAUDI2_DCORE0_ENGINE_ID_TPC_0 +
 +                              dcore * GAUDI2_ENGINE_ID_DCORE_OFFSET + inst;
 +
 +      tpc_cfg_sts = RREG32(mmDCORE0_TPC0_CFG_STATUS + offset);
 +      qm_glbl_sts0 = RREG32(mmDCORE0_TPC0_QM_GLBL_STS0 + offset);
 +      qm_glbl_sts1 = RREG32(mmDCORE0_TPC0_QM_GLBL_STS1 + offset);
 +      qm_cgm_sts = RREG32(mmDCORE0_TPC0_QM_CGM_STS + offset);
 +
 +      is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts) &&
 +                                              IS_TPC_IDLE(tpc_cfg_sts);
 +      *(idle_data->is_idle) &= is_eng_idle;
 +
 +      if (idle_data->mask && !is_eng_idle)
 +              set_bit(engine_idx, idle_data->mask);
 +
 +      if (idle_data->e)
 +              hl_engine_data_sprintf(idle_data->e,
 +                                      idle_data->tpc_fmt, dcore, inst,
 +                                      is_eng_idle ? "Y" : "N",
 +                                      qm_glbl_sts0, qm_cgm_sts, tpc_cfg_sts);
 +}
 +
 +static bool gaudi2_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask_len,
 +                                      struct engines_data *e)
 +{
 +      u32 qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts, dma_core_idle_ind_mask,
 +              mme_arch_sts, dec_swreg15, dec_enabled_bit;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      const char *rot_fmt = "%-6d%-5d%-9s%#-14x%#-12x%s\n";
 +      unsigned long *mask = (unsigned long *) mask_arr;
 +      const char *edma_fmt = "%-6d%-6d%-9s%#-14x%#x\n";
 +      const char *mme_fmt = "%-5d%-6s%-9s%#-14x%#x\n";
 +      const char *nic_fmt = "%-5d%-9s%#-14x%#-12x\n";
 +      const char *pdma_fmt = "%-6d%-9s%#-14x%#x\n";
 +      const char *pcie_dec_fmt = "%-10d%-9s%#x\n";
 +      const char *dec_fmt = "%-6d%-5d%-9s%#x\n";
 +      bool is_idle = true, is_eng_idle;
 +      u64 offset;
 +
 +      struct gaudi2_tpc_idle_data tpc_idle_data = {
 +              .tpc_fmt = "%-6d%-5d%-9s%#-14x%#-12x%#x\n",
 +              .e = e,
 +              .mask = mask,
 +              .is_idle = &is_idle,
 +      };
 +      struct iterate_module_ctx tpc_iter = {
 +              .fn = &gaudi2_is_tpc_engine_idle,
 +              .data = &tpc_idle_data,
 +      };
 +
 +      int engine_idx, i, j;
 +
 +      /* EDMA, Two engines per Dcore */
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                      "\nCORE  EDMA  is_idle  QM_GLBL_STS0  DMA_CORE_IDLE_IND_MASK\n"
 +                      "----  ----  -------  ------------  ----------------------\n");
 +
 +      for (i = 0; i < NUM_OF_DCORES; i++) {
 +              for (j = 0 ; j < NUM_OF_EDMA_PER_DCORE ; j++) {
 +                      int seq = i * NUM_OF_EDMA_PER_DCORE + j;
 +
 +                      if (!(prop->edma_enabled_mask & BIT(seq)))
 +                              continue;
 +
 +                      engine_idx = GAUDI2_DCORE0_ENGINE_ID_EDMA_0 +
 +                                      i * GAUDI2_ENGINE_ID_DCORE_OFFSET + j;
 +                      offset = i * DCORE_OFFSET + j * DCORE_EDMA_OFFSET;
 +
 +                      dma_core_idle_ind_mask =
 +                      RREG32(mmDCORE0_EDMA0_CORE_IDLE_IND_MASK + offset);
 +
 +                      qm_glbl_sts0 = RREG32(mmDCORE0_EDMA0_QM_GLBL_STS0 + offset);
 +                      qm_glbl_sts1 = RREG32(mmDCORE0_EDMA0_QM_GLBL_STS1 + offset);
 +                      qm_cgm_sts = RREG32(mmDCORE0_EDMA0_QM_CGM_STS + offset);
 +
 +                      is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts) &&
 +                                      IS_DMA_IDLE(dma_core_idle_ind_mask);
 +                      is_idle &= is_eng_idle;
 +
 +                      if (mask && !is_eng_idle)
 +                              set_bit(engine_idx, mask);
 +
 +                      if (e)
 +                              hl_engine_data_sprintf(e, edma_fmt, i, j,
 +                                                      is_eng_idle ? "Y" : "N",
 +                                                      qm_glbl_sts0,
 +                                                      dma_core_idle_ind_mask);
 +              }
 +      }
 +
 +      /* PDMA, Two engines in Full chip */
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                                      "\nPDMA  is_idle  QM_GLBL_STS0  DMA_CORE_IDLE_IND_MASK\n"
 +                                      "----  -------  ------------  ----------------------\n");
 +
 +      for (i = 0 ; i < NUM_OF_PDMA ; i++) {
 +              engine_idx = GAUDI2_ENGINE_ID_PDMA_0 + i;
 +              offset = i * PDMA_OFFSET;
 +              dma_core_idle_ind_mask = RREG32(mmPDMA0_CORE_IDLE_IND_MASK + offset);
 +
 +              qm_glbl_sts0 = RREG32(mmPDMA0_QM_GLBL_STS0 + offset);
 +              qm_glbl_sts1 = RREG32(mmPDMA0_QM_GLBL_STS1 + offset);
 +              qm_cgm_sts = RREG32(mmPDMA0_QM_CGM_STS + offset);
 +
 +              is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts) &&
 +                              IS_DMA_IDLE(dma_core_idle_ind_mask);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(engine_idx, mask);
 +
 +              if (e)
 +                      hl_engine_data_sprintf(e, pdma_fmt, i, is_eng_idle ? "Y" : "N",
 +                                              qm_glbl_sts0, dma_core_idle_ind_mask);
 +      }
 +
 +      /* NIC, twelve macros in Full chip */
 +      if (e && hdev->nic_ports_mask)
 +              hl_engine_data_sprintf(e,
 +                                      "\nNIC  is_idle  QM_GLBL_STS0  QM_CGM_STS\n"
 +                                      "---  -------  ------------  ----------\n");
 +
 +      for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++) {
 +              if (!(i & 1))
 +                      offset = i / 2 * NIC_OFFSET;
 +              else
 +                      offset += NIC_QM_OFFSET;
 +
 +              if (!(hdev->nic_ports_mask & BIT(i)))
 +                      continue;
 +
 +              engine_idx = GAUDI2_ENGINE_ID_NIC0_0 + i;
 +
 +
 +              qm_glbl_sts0 = RREG32(mmNIC0_QM0_GLBL_STS0 + offset);
 +              qm_glbl_sts1 = RREG32(mmNIC0_QM0_GLBL_STS1 + offset);
 +              qm_cgm_sts = RREG32(mmNIC0_QM0_CGM_STS + offset);
 +
 +              is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(engine_idx, mask);
 +
 +              if (e)
 +                      hl_engine_data_sprintf(e, nic_fmt, i, is_eng_idle ? "Y" : "N",
 +                                              qm_glbl_sts0, qm_cgm_sts);
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                                      "\nMME  Stub  is_idle  QM_GLBL_STS0  MME_ARCH_STATUS\n"
 +                                      "---  ----  -------  ------------  ---------------\n");
 +      /* MME, one per Dcore */
 +      for (i = 0 ; i < NUM_OF_DCORES ; i++) {
 +              engine_idx = GAUDI2_DCORE0_ENGINE_ID_MME + i * GAUDI2_ENGINE_ID_DCORE_OFFSET;
 +              offset = i * DCORE_OFFSET;
 +
 +              qm_glbl_sts0 = RREG32(mmDCORE0_MME_QM_GLBL_STS0 + offset);
 +              qm_glbl_sts1 = RREG32(mmDCORE0_MME_QM_GLBL_STS1 + offset);
 +              qm_cgm_sts = RREG32(mmDCORE0_MME_QM_CGM_STS + offset);
 +
 +              is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts);
 +              is_idle &= is_eng_idle;
 +
 +              mme_arch_sts = RREG32(mmDCORE0_MME_CTRL_LO_ARCH_STATUS + offset);
 +              is_eng_idle &= IS_MME_IDLE(mme_arch_sts);
 +              is_idle &= is_eng_idle;
 +
 +              if (e)
 +                      hl_engine_data_sprintf(e, mme_fmt, i, "N",
 +                              is_eng_idle ? "Y" : "N",
 +                              qm_glbl_sts0,
 +                              mme_arch_sts);
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(engine_idx, mask);
 +      }
 +
 +      /*
 +       * TPC
 +       */
 +      if (e && prop->tpc_enabled_mask)
 +              hl_engine_data_sprintf(e,
 +                      "\nCORE  TPC   is_idle  QM_GLBL_STS0  QM_CGM_STS  DMA_CORE_IDLE_IND_MASK\n"
 +                      "----  ---  --------  ------------  ----------  ----------------------\n");
 +
 +      gaudi2_iterate_tpcs(hdev, &tpc_iter);
 +
 +      /* Decoders, two each Dcore and two shared PCIe decoders */
 +      if (e && (prop->decoder_enabled_mask & (~PCIE_DEC_EN_MASK)))
 +              hl_engine_data_sprintf(e,
 +                      "\nCORE  DEC  is_idle  VSI_CMD_SWREG15\n"
 +                      "----  ---  -------  ---------------\n");
 +
 +      for (i = 0 ; i < NUM_OF_DCORES ; i++) {
 +              for (j = 0 ; j < NUM_OF_DEC_PER_DCORE ; j++) {
 +                      dec_enabled_bit = 1 << (i * NUM_OF_DEC_PER_DCORE + j);
 +                      if (!(prop->decoder_enabled_mask & dec_enabled_bit))
 +                              continue;
 +
 +                      engine_idx = GAUDI2_DCORE0_ENGINE_ID_DEC_0 +
 +                                      i * GAUDI2_ENGINE_ID_DCORE_OFFSET + j;
 +                      offset = i * DCORE_OFFSET + j * DCORE_DEC_OFFSET;
 +
 +                      dec_swreg15 = RREG32(mmDCORE0_DEC0_CMD_SWREG15 + offset);
 +                      is_eng_idle = IS_DEC_IDLE(dec_swreg15);
 +                      is_idle &= is_eng_idle;
 +
 +                      if (mask && !is_eng_idle)
 +                              set_bit(engine_idx, mask);
 +
 +                      if (e)
 +                              hl_engine_data_sprintf(e, dec_fmt, i, j,
 +                                                      is_eng_idle ? "Y" : "N", dec_swreg15);
 +              }
 +      }
 +
 +      if (e && (prop->decoder_enabled_mask & PCIE_DEC_EN_MASK))
 +              hl_engine_data_sprintf(e,
 +                      "\nPCIe DEC  is_idle  VSI_CMD_SWREG15\n"
 +                      "--------  -------  ---------------\n");
 +
 +      /* Check shared(PCIe) decoders */
 +      for (i = 0 ; i < NUM_OF_DEC_PER_DCORE ; i++) {
 +              dec_enabled_bit = PCIE_DEC_SHIFT + i;
 +              if (!(prop->decoder_enabled_mask & BIT(dec_enabled_bit)))
 +                      continue;
 +
 +              engine_idx = GAUDI2_PCIE_ENGINE_ID_DEC_0 + i;
 +              offset = i * DCORE_DEC_OFFSET;
 +              dec_swreg15 = RREG32(mmPCIE_DEC0_CMD_SWREG15 + offset);
 +              is_eng_idle = IS_DEC_IDLE(dec_swreg15);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(engine_idx, mask);
 +
 +              if (e)
 +                      hl_engine_data_sprintf(e, pcie_dec_fmt, i,
 +                                              is_eng_idle ? "Y" : "N", dec_swreg15);
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                      "\nCORE  ROT  is_idle  QM_GLBL_STS0  QM_CGM_STS  DMA_CORE_STS0\n"
 +                      "----  ----  -------  ------------  ----------  -------------\n");
 +
 +      for (i = 0 ; i < NUM_OF_ROT ; i++) {
 +              engine_idx = GAUDI2_ENGINE_ID_ROT_0 + i;
 +
 +              offset = i * ROT_OFFSET;
 +
 +              qm_glbl_sts0 = RREG32(mmROT0_QM_GLBL_STS0 + offset);
 +              qm_glbl_sts1 = RREG32(mmROT0_QM_GLBL_STS1 + offset);
 +              qm_cgm_sts = RREG32(mmROT0_QM_CGM_STS + offset);
 +
 +              is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(engine_idx, mask);
 +
 +              if (e)
 +                      hl_engine_data_sprintf(e, rot_fmt, i, 0, is_eng_idle ? "Y" : "N",
 +                                      qm_glbl_sts0, qm_cgm_sts, "-");
 +      }
 +
 +      return is_idle;
 +}
 +
 +static void gaudi2_hw_queues_lock(struct hl_device *hdev)
 +      __acquires(&gaudi2->hw_queues_lock)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      spin_lock(&gaudi2->hw_queues_lock);
 +}
 +
 +static void gaudi2_hw_queues_unlock(struct hl_device *hdev)
 +      __releases(&gaudi2->hw_queues_lock)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      spin_unlock(&gaudi2->hw_queues_lock);
 +}
 +
 +static u32 gaudi2_get_pci_id(struct hl_device *hdev)
 +{
 +      return hdev->pdev->device;
 +}
 +
 +static int gaudi2_get_eeprom_data(struct hl_device *hdev, void *data, size_t max_size)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_get_eeprom_data(hdev, data, max_size);
 +}
 +
 +static void gaudi2_update_eq_ci(struct hl_device *hdev, u32 val)
 +{
 +      WREG32(mmCPU_IF_EQ_RD_OFFS, val);
 +}
 +
 +static void *gaudi2_get_events_stat(struct hl_device *hdev, bool aggregate, u32 *size)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (aggregate) {
 +              *size = (u32) sizeof(gaudi2->events_stat_aggregate);
 +              return gaudi2->events_stat_aggregate;
 +      }
 +
 +      *size = (u32) sizeof(gaudi2->events_stat);
 +      return gaudi2->events_stat;
 +}
 +
 +static void gaudi2_mmu_vdec_dcore_prepare(struct hl_device *hdev, int dcore_id,
 +                              int dcore_vdec_id, u32 rw_asid, u32 rw_mmu_bp)
 +{
 +      u32 offset = (mmDCORE0_VDEC1_BRDG_CTRL_BASE - mmDCORE0_VDEC0_BRDG_CTRL_BASE) *
 +                      dcore_vdec_id + DCORE_OFFSET * dcore_id;
 +
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_DEC_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_DEC_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_ABNRM_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_ABNRM_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_L2C_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_L2C_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_NRM_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_NRM_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_VCD_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_VCD_HB_ASID + offset, rw_asid);
 +}
 +
 +static void gaudi2_mmu_dcore_prepare(struct hl_device *hdev, int dcore_id, u32 asid)
 +{
 +      u32 rw_asid = (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_RD_SHIFT) |
 +                      (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_WR_SHIFT);
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u32 dcore_offset = dcore_id * DCORE_OFFSET;
 +      u32 vdec_id, i, ports_offset, reg_val;
 +      u8 edma_seq_base;
 +
 +      /* EDMA */
 +      edma_seq_base = dcore_id * NUM_OF_EDMA_PER_DCORE;
 +      if (prop->edma_enabled_mask & BIT(edma_seq_base)) {
 +              WREG32(mmDCORE0_EDMA0_QM_AXUSER_NONSECURED_HB_MMU_BP + dcore_offset, 0);
 +              WREG32(mmDCORE0_EDMA0_QM_AXUSER_NONSECURED_HB_ASID + dcore_offset, rw_asid);
 +              WREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_MMU_BP + dcore_offset, 0);
 +              WREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_ASID + dcore_offset, rw_asid);
 +      }
 +
 +      if (prop->edma_enabled_mask & BIT(edma_seq_base + 1)) {
 +              WREG32(mmDCORE0_EDMA1_QM_AXUSER_NONSECURED_HB_MMU_BP + dcore_offset, 0);
 +              WREG32(mmDCORE0_EDMA1_QM_AXUSER_NONSECURED_HB_ASID + dcore_offset, rw_asid);
 +              WREG32(mmDCORE0_EDMA1_CORE_CTX_AXUSER_HB_ASID + dcore_offset, rw_asid);
 +              WREG32(mmDCORE0_EDMA1_CORE_CTX_AXUSER_HB_MMU_BP + dcore_offset, 0);
 +      }
 +
 +      /* Sync Mngr */
 +      WREG32(mmDCORE0_SYNC_MNGR_GLBL_ASID_NONE_SEC_PRIV + dcore_offset, asid);
 +      /*
 +       * Sync Mngrs on dcores 1 - 3 are exposed to user, so must use user ASID
 +       * for any access type
 +       */
 +      if (dcore_id > 0) {
 +              reg_val = (asid << DCORE0_SYNC_MNGR_MSTR_IF_AXUSER_HB_ASID_RD_SHIFT) |
 +                        (asid << DCORE0_SYNC_MNGR_MSTR_IF_AXUSER_HB_ASID_WR_SHIFT);
 +              WREG32(mmDCORE0_SYNC_MNGR_MSTR_IF_AXUSER_HB_ASID + dcore_offset, reg_val);
 +              WREG32(mmDCORE0_SYNC_MNGR_MSTR_IF_AXUSER_HB_MMU_BP + dcore_offset, 0);
 +      }
 +
 +      WREG32(mmDCORE0_MME_CTRL_LO_MME_AXUSER_HB_MMU_BP + dcore_offset, 0);
 +      WREG32(mmDCORE0_MME_CTRL_LO_MME_AXUSER_HB_ASID + dcore_offset, rw_asid);
 +
 +      for (i = 0 ; i < NUM_OF_MME_SBTE_PORTS ; i++) {
 +              ports_offset = i * DCORE_MME_SBTE_OFFSET;
 +              WREG32(mmDCORE0_MME_SBTE0_MSTR_IF_AXUSER_HB_MMU_BP +
 +                              dcore_offset + ports_offset, 0);
 +              WREG32(mmDCORE0_MME_SBTE0_MSTR_IF_AXUSER_HB_ASID +
 +                              dcore_offset + ports_offset, rw_asid);
 +      }
 +
 +      for (i = 0 ; i < NUM_OF_MME_WB_PORTS ; i++) {
 +              ports_offset = i * DCORE_MME_WB_OFFSET;
 +              WREG32(mmDCORE0_MME_WB0_MSTR_IF_AXUSER_HB_MMU_BP +
 +                              dcore_offset + ports_offset, 0);
 +              WREG32(mmDCORE0_MME_WB0_MSTR_IF_AXUSER_HB_ASID +
 +                              dcore_offset + ports_offset, rw_asid);
 +      }
 +
 +      WREG32(mmDCORE0_MME_QM_AXUSER_NONSECURED_HB_MMU_BP + dcore_offset, 0);
 +      WREG32(mmDCORE0_MME_QM_AXUSER_NONSECURED_HB_ASID + dcore_offset, rw_asid);
 +
 +      /*
 +       * Decoders
 +       */
 +      for (vdec_id = 0 ; vdec_id < NUM_OF_DEC_PER_DCORE ; vdec_id++) {
 +              if (prop->decoder_enabled_mask & BIT(dcore_id * NUM_OF_DEC_PER_DCORE + vdec_id))
 +                      gaudi2_mmu_vdec_dcore_prepare(hdev, dcore_id, vdec_id, rw_asid, 0);
 +      }
 +}
 +
 +static void gudi2_mmu_vdec_shared_prepare(struct hl_device *hdev,
 +                              int shared_vdec_id, u32 rw_asid, u32 rw_mmu_bp)
 +{
 +      u32 offset = (mmPCIE_VDEC1_BRDG_CTRL_BASE - mmPCIE_VDEC0_BRDG_CTRL_BASE) * shared_vdec_id;
 +
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_DEC_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_DEC_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_ABNRM_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_ABNRM_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_L2C_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_L2C_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_NRM_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_NRM_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_VCD_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_VCD_HB_ASID + offset, rw_asid);
 +}
 +
 +static void gudi2_mmu_arc_farm_arc_dup_eng_prepare(struct hl_device *hdev, int arc_farm_id,
 +                                                      u32 rw_asid, u32 rw_mmu_bp)
 +{
 +      u32 offset = (mmARC_FARM_ARC1_DUP_ENG_BASE - mmARC_FARM_ARC0_DUP_ENG_BASE) * arc_farm_id;
 +
 +      WREG32(mmARC_FARM_ARC0_DUP_ENG_AXUSER_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmARC_FARM_ARC0_DUP_ENG_AXUSER_HB_ASID + offset, rw_asid);
 +}
 +
 +static void gaudi2_arc_mmu_prepare(struct hl_device *hdev, u32 cpu_id, u32 asid)
 +{
 +      u32 reg_base, reg_offset, reg_val = 0;
 +
 +      reg_base = gaudi2_arc_blocks_bases[cpu_id];
 +
 +      /* Enable MMU and configure asid for all relevant ARC regions */
 +      reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_ARC_REGION_CFG_MMU_BP_MASK, 0);
 +      reg_val |= FIELD_PREP(ARC_FARM_ARC0_AUX_ARC_REGION_CFG_0_ASID_MASK, asid);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION3_GENERAL);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION4_HBM0_FW);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION5_HBM1_GC_DATA);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION6_HBM2_GC_DATA);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION7_HBM3_GC_DATA);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION9_PCIE);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION10_GENERAL);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION11_GENERAL);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION12_GENERAL);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION13_GENERAL);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION14_GENERAL);
 +      WREG32(reg_base + reg_offset, reg_val);
 +}
 +
 +static int gaudi2_arc_mmu_prepare_all(struct hl_device *hdev, u32 asid)
 +{
 +      int i;
 +
 +      if (hdev->fw_components & FW_TYPE_BOOT_CPU)
 +              return hl_fw_cpucp_engine_core_asid_set(hdev, asid);
 +
 +      for (i = CPU_ID_SCHED_ARC0 ; i < NUM_OF_ARC_FARMS_ARC ; i++)
 +              gaudi2_arc_mmu_prepare(hdev, i, asid);
 +
 +      for (i = GAUDI2_QUEUE_ID_PDMA_0_0 ; i < GAUDI2_QUEUE_ID_CPU_PQ ; i += 4) {
 +              if (!gaudi2_is_queue_enabled(hdev, i))
 +                      continue;
 +
 +              gaudi2_arc_mmu_prepare(hdev, gaudi2_queue_id_to_arc_id[i], asid);
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_mmu_shared_prepare(struct hl_device *hdev, u32 asid)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u32 rw_asid, offset;
 +      int rc, i;
 +
 +      rw_asid = FIELD_PREP(ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_RD_MASK, asid) |
 +                      FIELD_PREP(ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_WR_MASK, asid);
 +
 +      WREG32(mmPDMA0_QM_AXUSER_NONSECURED_HB_ASID, rw_asid);
 +      WREG32(mmPDMA0_QM_AXUSER_NONSECURED_HB_MMU_BP, 0);
 +      WREG32(mmPDMA0_CORE_CTX_AXUSER_HB_ASID, rw_asid);
 +      WREG32(mmPDMA0_CORE_CTX_AXUSER_HB_MMU_BP, 0);
 +
 +      WREG32(mmPDMA1_QM_AXUSER_NONSECURED_HB_ASID, rw_asid);
 +      WREG32(mmPDMA1_QM_AXUSER_NONSECURED_HB_MMU_BP, 0);
 +      WREG32(mmPDMA1_CORE_CTX_AXUSER_HB_ASID, rw_asid);
 +      WREG32(mmPDMA1_CORE_CTX_AXUSER_HB_MMU_BP, 0);
 +
 +      /* ROT */
 +      for (i = 0 ; i < NUM_OF_ROT ; i++) {
 +              offset = i * ROT_OFFSET;
 +              WREG32(mmROT0_QM_AXUSER_NONSECURED_HB_ASID + offset, rw_asid);
 +              WREG32(mmROT0_QM_AXUSER_NONSECURED_HB_MMU_BP + offset, 0);
 +              RMWREG32(mmROT0_CPL_QUEUE_AWUSER + offset, asid, MMUBP_ASID_MASK);
 +              RMWREG32(mmROT0_DESC_HBW_ARUSER_LO + offset, asid, MMUBP_ASID_MASK);
 +              RMWREG32(mmROT0_DESC_HBW_AWUSER_LO + offset, asid, MMUBP_ASID_MASK);
 +      }
 +
 +      /* Shared Decoders are the last bits in the decoders mask */
 +      if (prop->decoder_enabled_mask & BIT(NUM_OF_DCORES * NUM_OF_DEC_PER_DCORE + 0))
 +              gudi2_mmu_vdec_shared_prepare(hdev, 0, rw_asid, 0);
 +
 +      if (prop->decoder_enabled_mask & BIT(NUM_OF_DCORES * NUM_OF_DEC_PER_DCORE + 1))
 +              gudi2_mmu_vdec_shared_prepare(hdev, 1, rw_asid, 0);
 +
 +      /* arc farm arc dup eng */
 +      for (i = 0 ; i < NUM_OF_ARC_FARMS_ARC ; i++)
 +              gudi2_mmu_arc_farm_arc_dup_eng_prepare(hdev, i, rw_asid, 0);
 +
 +      rc = gaudi2_arc_mmu_prepare_all(hdev, asid);
 +      if (rc)
 +              return rc;
 +
 +      return 0;
 +}
 +
 +static void gaudi2_tpc_mmu_prepare(struct hl_device *hdev, int dcore, int inst,       u32 offset,
 +                                      struct iterate_module_ctx *ctx)
 +{
 +      struct gaudi2_tpc_mmu_data *mmu_data = ctx->data;
 +
 +      WREG32(mmDCORE0_TPC0_CFG_AXUSER_HB_MMU_BP + offset, 0);
 +      WREG32(mmDCORE0_TPC0_CFG_AXUSER_HB_ASID + offset, mmu_data->rw_asid);
 +      WREG32(mmDCORE0_TPC0_QM_AXUSER_NONSECURED_HB_MMU_BP + offset, 0);
 +      WREG32(mmDCORE0_TPC0_QM_AXUSER_NONSECURED_HB_ASID + offset, mmu_data->rw_asid);
 +}
 +
 +/* zero the MMUBP and set the ASID */
 +static int gaudi2_mmu_prepare(struct hl_device *hdev, u32 asid)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct gaudi2_tpc_mmu_data tpc_mmu_data;
 +      struct iterate_module_ctx tpc_iter = {
 +              .fn = &gaudi2_tpc_mmu_prepare,
 +              .data = &tpc_mmu_data,
 +      };
 +      int rc, i;
 +
 +      if (asid & ~DCORE0_HMMU0_STLB_ASID_ASID_MASK) {
 +              dev_crit(hdev->dev, "asid %u is too big\n", asid);
 +              return -EINVAL;
 +      }
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_MMU_MASK))
 +              return 0;
 +
 +      rc = gaudi2_mmu_shared_prepare(hdev, asid);
 +      if (rc)
 +              return rc;
 +
 +      /* configure DCORE MMUs */
 +      tpc_mmu_data.rw_asid = (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_RD_SHIFT) |
 +                              (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_WR_SHIFT);
 +      gaudi2_iterate_tpcs(hdev, &tpc_iter);
 +      for (i = 0 ; i < NUM_OF_DCORES ; i++)
 +              gaudi2_mmu_dcore_prepare(hdev, i, asid);
 +
 +      return 0;
 +}
 +
 +static inline bool is_info_event(u32 event)
 +{
 +      switch (event) {
 +      case GAUDI2_EVENT_CPU_CPLD_SHUTDOWN_CAUSE:
 +      case GAUDI2_EVENT_CPU_FIX_POWER_ENV_S ... GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_E:
 +
 +      /* return in case of NIC status event - these events are received periodically and not as
 +       * an indication to an error.
 +       */
 +      case GAUDI2_EVENT_CPU0_STATUS_NIC0_ENG0 ... GAUDI2_EVENT_CPU11_STATUS_NIC11_ENG1:
 +              return true;
 +      default:
 +              return false;
 +      }
 +}
 +
 +static void gaudi2_print_event(struct hl_device *hdev, u16 event_type,
 +                      bool ratelimited, const char *fmt, ...)
 +{
 +      struct va_format vaf;
 +      va_list args;
 +
 +      va_start(args, fmt);
 +      vaf.fmt = fmt;
 +      vaf.va = &args;
 +
 +      if (ratelimited)
 +              dev_err_ratelimited(hdev->dev, "%s: %pV\n",
 +                      gaudi2_irq_map_table[event_type].valid ?
 +                      gaudi2_irq_map_table[event_type].name : "N/A Event", &vaf);
 +      else
 +              dev_err(hdev->dev, "%s: %pV\n",
 +                      gaudi2_irq_map_table[event_type].valid ?
 +                      gaudi2_irq_map_table[event_type].name : "N/A Event", &vaf);
 +
 +      va_end(args);
 +}
 +
 +static bool gaudi2_handle_ecc_event(struct hl_device *hdev, u16 event_type,
 +              struct hl_eq_ecc_data *ecc_data)
 +{
 +      u64 ecc_address = 0, ecc_syndrom = 0;
 +      u8 memory_wrapper_idx = 0;
 +
 +      ecc_address = le64_to_cpu(ecc_data->ecc_address);
 +      ecc_syndrom = le64_to_cpu(ecc_data->ecc_syndrom);
 +      memory_wrapper_idx = ecc_data->memory_wrapper_idx;
 +
 +      gaudi2_print_event(hdev, event_type, !ecc_data->is_critical,
 +              "ECC error detected. address: %#llx. Syndrom: %#llx. block id %u. critical %u.\n",
 +              ecc_address, ecc_syndrom, memory_wrapper_idx, ecc_data->is_critical);
 +
 +      return !!ecc_data->is_critical;
 +}
 +
 +/*
 + * gaudi2_queue_idx_dec - decrement queue index (pi/ci) and handle wrap
 + *
 + * @idx: the current pi/ci value
 + * @q_len: the queue length (power of 2)
 + *
 + * @return the cyclically decremented index
 + */
 +static inline u32 gaudi2_queue_idx_dec(u32 idx, u32 q_len)
 +{
 +      u32 mask = q_len - 1;
 +
 +      /*
 +       * modular decrement is equivalent to adding (queue_size -1)
 +       * later we take LSBs to make sure the value is in the
 +       * range [0, queue_len - 1]
 +       */
 +      return (idx + q_len - 1) & mask;
 +}
 +
 +/**
 + * gaudi2_print_sw_config_stream_data - print SW config stream data
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @stream: the QMAN's stream
 + * @qman_base: base address of QMAN registers block
 + */
 +static void gaudi2_print_sw_config_stream_data(struct hl_device *hdev,
 +                                              u32 stream, u64 qman_base)
 +{
 +      u64 cq_ptr_lo, cq_ptr_hi, cq_tsize, cq_ptr;
 +      u32 cq_ptr_lo_off, size;
 +
 +      cq_ptr_lo_off = mmDCORE0_TPC0_QM_CQ_PTR_LO_1 - mmDCORE0_TPC0_QM_CQ_PTR_LO_0;
 +
 +      cq_ptr_lo = qman_base + (mmDCORE0_TPC0_QM_CQ_PTR_LO_0 - mmDCORE0_TPC0_QM_BASE) +
 +                                                                      stream * cq_ptr_lo_off;
 +
 +      cq_ptr_hi = cq_ptr_lo + (mmDCORE0_TPC0_QM_CQ_PTR_HI_0 - mmDCORE0_TPC0_QM_CQ_PTR_LO_0);
 +
 +      cq_tsize = cq_ptr_lo + (mmDCORE0_TPC0_QM_CQ_TSIZE_0 - mmDCORE0_TPC0_QM_CQ_PTR_LO_0);
 +
 +      cq_ptr = (((u64) RREG32(cq_ptr_hi)) << 32) | RREG32(cq_ptr_lo);
 +      size = RREG32(cq_tsize);
 +      dev_info(hdev->dev, "stop on err: stream: %u, addr: %#llx, size: %x\n",
 +              stream, cq_ptr, size);
 +}
 +
 +/**
 + * gaudi2_print_last_pqes_on_err - print last PQEs on error
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @qid_base: first QID of the QMAN (out of 4 streams)
 + * @stream: the QMAN's stream
 + * @qman_base: base address of QMAN registers block
 + * @pr_sw_conf: if true print the SW config stream data (CQ PTR and SIZE)
 + */
 +static void gaudi2_print_last_pqes_on_err(struct hl_device *hdev, u32 qid_base, u32 stream,
 +                                              u64 qman_base, bool pr_sw_conf)
 +{
 +      u32 ci, qm_ci_stream_off;
 +      struct hl_hw_queue *q;
 +      u64 pq_ci;
 +      int i;
 +
 +      q = &hdev->kernel_queues[qid_base + stream];
 +
 +      qm_ci_stream_off = mmDCORE0_TPC0_QM_PQ_CI_1 - mmDCORE0_TPC0_QM_PQ_CI_0;
 +      pq_ci = qman_base + (mmDCORE0_TPC0_QM_PQ_CI_0 - mmDCORE0_TPC0_QM_BASE) +
 +                                              stream * qm_ci_stream_off;
 +
 +      hdev->asic_funcs->hw_queues_lock(hdev);
 +
 +      if (pr_sw_conf)
 +              gaudi2_print_sw_config_stream_data(hdev, stream, qman_base);
 +
 +      ci = RREG32(pq_ci);
 +
 +      /* we should start printing form ci -1 */
 +      ci = gaudi2_queue_idx_dec(ci, HL_QUEUE_LENGTH);
 +
 +      for (i = 0; i < PQ_FETCHER_CACHE_SIZE; i++) {
 +              struct hl_bd *bd;
 +              u64 addr;
 +              u32 len;
 +
 +              bd = q->kernel_address;
 +              bd += ci;
 +
 +              len = le32_to_cpu(bd->len);
 +              /* len 0 means uninitialized entry- break */
 +              if (!len)
 +                      break;
 +
 +              addr = le64_to_cpu(bd->ptr);
 +
 +              dev_info(hdev->dev, "stop on err PQE(stream %u): ci: %u, addr: %#llx, size: %x\n",
 +                      stream, ci, addr, len);
 +
 +              /* get previous ci, wrap if needed */
 +              ci = gaudi2_queue_idx_dec(ci, HL_QUEUE_LENGTH);
 +      }
 +
 +      hdev->asic_funcs->hw_queues_unlock(hdev);
 +}
 +
 +/**
 + * print_qman_data_on_err - extract QMAN data on error
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @qid_base: first QID of the QMAN (out of 4 streams)
 + * @stream: the QMAN's stream
 + * @qman_base: base address of QMAN registers block
 + *
 + * This function attempt to extract as much data as possible on QMAN error.
 + * On upper CP print the SW config stream data and last 8 PQEs.
 + * On lower CP print SW config data and last PQEs of ALL 4 upper CPs
 + */
 +static void print_qman_data_on_err(struct hl_device *hdev, u32 qid_base, u32 stream, u64 qman_base)
 +{
 +      u32 i;
 +
 +      if (stream != QMAN_STREAMS) {
 +              gaudi2_print_last_pqes_on_err(hdev, qid_base, stream, qman_base, true);
 +              return;
 +      }
 +
 +      gaudi2_print_sw_config_stream_data(hdev, stream, qman_base);
 +
 +      for (i = 0 ; i < QMAN_STREAMS ; i++)
 +              gaudi2_print_last_pqes_on_err(hdev, qid_base, i, qman_base, false);
 +}
 +
 +static int gaudi2_handle_qman_err_generic(struct hl_device *hdev, u16 event_type,
 +                                                      u64 qman_base, u32 qid_base)
 +{
 +      u32 i, j, glbl_sts_val, arb_err_val, num_error_causes, error_count = 0;
 +      u64 glbl_sts_addr, arb_err_addr;
 +      char reg_desc[32];
 +
 +      glbl_sts_addr = qman_base + (mmDCORE0_TPC0_QM_GLBL_ERR_STS_0 - mmDCORE0_TPC0_QM_BASE);
 +      arb_err_addr = qman_base + (mmDCORE0_TPC0_QM_ARB_ERR_CAUSE - mmDCORE0_TPC0_QM_BASE);
 +
 +      /* Iterate through all stream GLBL_ERR_STS registers + Lower CP */
 +      for (i = 0 ; i < QMAN_STREAMS + 1 ; i++) {
 +              glbl_sts_val = RREG32(glbl_sts_addr + 4 * i);
 +
 +              if (!glbl_sts_val)
 +                      continue;
 +
 +              if (i == QMAN_STREAMS) {
 +                      snprintf(reg_desc, ARRAY_SIZE(reg_desc), "LowerCP");
 +                      num_error_causes = GAUDI2_NUM_OF_QM_LCP_ERR_CAUSE;
 +              } else {
 +                      snprintf(reg_desc, ARRAY_SIZE(reg_desc), "stream%u", i);
 +                      num_error_causes = GAUDI2_NUM_OF_QM_ERR_CAUSE;
 +              }
 +
 +              for (j = 0 ; j < num_error_causes ; j++)
 +                      if (glbl_sts_val & BIT(j)) {
 +                              gaudi2_print_event(hdev, event_type, true,
 +                                      "%s. err cause: %s", reg_desc,
 +                                      i == QMAN_STREAMS ?
 +                                      gaudi2_qman_lower_cp_error_cause[j] :
 +                                      gaudi2_qman_error_cause[j]);
 +                              error_count++;
 +                      }
 +
 +              print_qman_data_on_err(hdev, qid_base, i, qman_base);
 +      }
 +
 +      arb_err_val = RREG32(arb_err_addr);
 +
 +      if (!arb_err_val)
 +              goto out;
 +
 +      for (j = 0 ; j < GAUDI2_NUM_OF_QM_ARB_ERR_CAUSE ; j++) {
 +              if (arb_err_val & BIT(j)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "ARB_ERR. err cause: %s",
 +                              gaudi2_qman_arb_error_cause[j]);
 +                      error_count++;
 +              }
 +      }
 +
 +out:
 +      return error_count;
 +}
 +
 +static void gaudi2_razwi_rr_hbw_shared_printf_info(struct hl_device *hdev,
 +                      u64 rtr_mstr_if_base_addr, bool is_write, char *name,
 +                      enum gaudi2_engine_id id, u64 *event_mask)
 +{
 +      u32 razwi_hi, razwi_lo, razwi_xy;
 +      u16 eng_id = id;
 +      u8 rd_wr_flag;
 +
 +      if (is_write) {
 +              razwi_hi = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_HI);
 +              razwi_lo = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_LO);
 +              razwi_xy = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_XY);
 +              rd_wr_flag = HL_RAZWI_WRITE;
 +      } else {
 +              razwi_hi = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_HI);
 +              razwi_lo = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_LO);
 +              razwi_xy = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_XY);
 +              rd_wr_flag = HL_RAZWI_READ;
 +      }
 +
 +      hl_handle_razwi(hdev, (u64)razwi_hi << 32 | razwi_lo, &eng_id, 1,
 +                              rd_wr_flag | HL_RAZWI_HBW, event_mask);
 +
 +      dev_err_ratelimited(hdev->dev,
 +              "%s-RAZWI SHARED RR HBW %s error, address %#llx, Initiator coordinates 0x%x\n",
 +              name, is_write ? "WR" : "RD", (u64)razwi_hi << 32 | razwi_lo, razwi_xy);
 +}
 +
 +static void gaudi2_razwi_rr_lbw_shared_printf_info(struct hl_device *hdev,
 +                      u64 rtr_mstr_if_base_addr, bool is_write, char *name,
 +                      enum gaudi2_engine_id id, u64 *event_mask)
 +{
 +      u64 razwi_addr = CFG_BASE;
 +      u32 razwi_xy;
 +      u16 eng_id = id;
 +      u8 rd_wr_flag;
 +
 +      if (is_write) {
 +              razwi_addr += RREG32(rtr_mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI);
 +              razwi_xy = RREG32(rtr_mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI_XY);
 +              rd_wr_flag = HL_RAZWI_WRITE;
 +      } else {
 +              razwi_addr += RREG32(rtr_mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI);
 +              razwi_xy = RREG32(rtr_mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI_XY);
 +              rd_wr_flag = HL_RAZWI_READ;
 +      }
 +
 +      hl_handle_razwi(hdev, razwi_addr, &eng_id, 1, rd_wr_flag | HL_RAZWI_LBW, event_mask);
 +      dev_err_ratelimited(hdev->dev,
 +                              "%s-RAZWI SHARED RR LBW %s error, mstr_if 0x%llx, captured address 0x%llX Initiator coordinates 0x%x\n",
 +                              name, is_write ? "WR" : "RD", rtr_mstr_if_base_addr, razwi_addr,
 +                                              razwi_xy);
 +}
 +
 +static enum gaudi2_engine_id gaudi2_razwi_calc_engine_id(struct hl_device *hdev,
 +                                              enum razwi_event_sources module, u8 module_idx)
 +{
 +      switch (module) {
 +      case RAZWI_TPC:
 +              if (module_idx == (NUM_OF_TPC_PER_DCORE * NUM_OF_DCORES))
 +                      return GAUDI2_DCORE0_ENGINE_ID_TPC_6;
 +              return (((module_idx / NUM_OF_TPC_PER_DCORE) * ENGINE_ID_DCORE_OFFSET) +
 +                              (module_idx % NUM_OF_TPC_PER_DCORE) +
 +                              (GAUDI2_DCORE0_ENGINE_ID_TPC_0 - GAUDI2_DCORE0_ENGINE_ID_EDMA_0));
 +
 +      case RAZWI_MME:
 +              return ((GAUDI2_DCORE0_ENGINE_ID_MME - GAUDI2_DCORE0_ENGINE_ID_EDMA_0) +
 +                      (module_idx * ENGINE_ID_DCORE_OFFSET));
 +
 +      case RAZWI_EDMA:
 +              return (((module_idx / NUM_OF_EDMA_PER_DCORE) * ENGINE_ID_DCORE_OFFSET) +
 +                      (module_idx % NUM_OF_EDMA_PER_DCORE));
 +
 +      case RAZWI_PDMA:
 +              return (GAUDI2_ENGINE_ID_PDMA_0 + module_idx);
 +
 +      case RAZWI_NIC:
 +              return (GAUDI2_ENGINE_ID_NIC0_0 + (NIC_NUMBER_OF_QM_PER_MACRO * module_idx));
 +
 +      case RAZWI_DEC:
 +              if (module_idx == 8)
 +                      return GAUDI2_PCIE_ENGINE_ID_DEC_0;
 +
 +              if (module_idx == 9)
 +                      return GAUDI2_PCIE_ENGINE_ID_DEC_1;
 +                                      ;
 +              return (((module_idx / NUM_OF_DEC_PER_DCORE) * ENGINE_ID_DCORE_OFFSET) +
 +                              (module_idx % NUM_OF_DEC_PER_DCORE) +
 +                              (GAUDI2_DCORE0_ENGINE_ID_DEC_0 - GAUDI2_DCORE0_ENGINE_ID_EDMA_0));
 +
 +      case RAZWI_ROT:
 +              return GAUDI2_ENGINE_ID_ROT_0 + module_idx;
 +
 +      default:
 +              return GAUDI2_ENGINE_ID_SIZE;
 +      }
 +}
 +
 +/*
 + * This function handles RR(Range register) hit events.
 + * raised be initiators not PSOC RAZWI.
 + */
 +static void gaudi2_ack_module_razwi_event_handler(struct hl_device *hdev,
 +                              enum razwi_event_sources module, u8 module_idx,
 +                              u8 module_sub_idx, u64 *event_mask)
 +{
 +      bool via_sft = false;
 +      u32 hbw_rtr_id, lbw_rtr_id, dcore_id, dcore_rtr_id, eng_id;
 +      u64 hbw_rtr_mstr_if_base_addr, lbw_rtr_mstr_if_base_addr;
 +      u32 hbw_shrd_aw = 0, hbw_shrd_ar = 0;
 +      u32 lbw_shrd_aw = 0, lbw_shrd_ar = 0;
 +      char initiator_name[64];
 +
 +      switch (module) {
 +      case RAZWI_TPC:
 +              hbw_rtr_id = gaudi2_tpc_initiator_hbw_rtr_id[module_idx];
 +
 +              /* TODO : remove this check and depend only on tpc routers table
 +               * when SW-118828 is resolved
 +               */
 +              if (!hdev->asic_prop.fw_security_enabled &&
 +                              ((module_idx == 0) || (module_idx == 1)))
 +                      lbw_rtr_id = DCORE0_RTR0;
 +              else
 +                      lbw_rtr_id = gaudi2_tpc_initiator_lbw_rtr_id[module_idx];
 +              sprintf(initiator_name, "TPC_%u", module_idx);
 +              break;
 +      case RAZWI_MME:
 +              sprintf(initiator_name, "MME_%u", module_idx);
 +              switch (module_sub_idx) {
 +              case MME_WAP0:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].wap0;
 +                      break;
 +              case MME_WAP1:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].wap1;
 +                      break;
 +              case MME_WRITE:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].write;
 +                      break;
 +              case MME_READ:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].read;
 +                      break;
 +              case MME_SBTE0:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte0;
 +                      break;
 +              case MME_SBTE1:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte1;
 +                      break;
 +              case MME_SBTE2:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte2;
 +                      break;
 +              case MME_SBTE3:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte3;
 +                      break;
 +              case MME_SBTE4:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte4;
 +                      break;
 +              default:
 +                      return;
 +              }
 +              lbw_rtr_id = hbw_rtr_id;
 +              break;
 +      case RAZWI_EDMA:
 +              hbw_rtr_mstr_if_base_addr = gaudi2_edma_initiator_hbw_sft[module_idx];
 +              dcore_id = module_idx / NUM_OF_EDMA_PER_DCORE;
 +              /* SFT has separate MSTR_IF for LBW, only there we can
 +               * read the LBW razwi related registers
 +               */
 +              lbw_rtr_mstr_if_base_addr = mmSFT0_LBW_RTR_IF_MSTR_IF_RR_SHRD_HBW_BASE +
 +                                                              dcore_id * SFT_DCORE_OFFSET;
 +              via_sft = true;
 +              sprintf(initiator_name, "EDMA_%u", module_idx);
 +              break;
 +      case RAZWI_PDMA:
 +              hbw_rtr_id = gaudi2_pdma_initiator_hbw_rtr_id[module_idx];
 +              lbw_rtr_id = gaudi2_pdma_initiator_lbw_rtr_id[module_idx];
 +              sprintf(initiator_name, "PDMA_%u", module_idx);
 +              break;
 +      case RAZWI_NIC:
 +              hbw_rtr_id = gaudi2_nic_initiator_hbw_rtr_id[module_idx];
 +              lbw_rtr_id = gaudi2_nic_initiator_lbw_rtr_id[module_idx];
 +              sprintf(initiator_name, "NIC_%u", module_idx);
 +              break;
 +      case RAZWI_DEC:
 +              hbw_rtr_id = gaudi2_dec_initiator_hbw_rtr_id[module_idx];
 +              lbw_rtr_id = gaudi2_dec_initiator_lbw_rtr_id[module_idx];
 +              sprintf(initiator_name, "DEC_%u", module_idx);
 +              break;
 +      case RAZWI_ROT:
 +              hbw_rtr_id = gaudi2_rot_initiator_hbw_rtr_id[module_idx];
 +              lbw_rtr_id = gaudi2_rot_initiator_lbw_rtr_id[module_idx];
 +              sprintf(initiator_name, "ROT_%u", module_idx);
 +              break;
 +      default:
 +              return;
 +      }
 +
 +      /* Find router mstr_if register base */
 +      if (!via_sft) {
 +              dcore_id = hbw_rtr_id / NUM_OF_RTR_PER_DCORE;
 +              dcore_rtr_id = hbw_rtr_id % NUM_OF_RTR_PER_DCORE;
 +              hbw_rtr_mstr_if_base_addr = mmDCORE0_RTR0_CTRL_BASE +
 +                              dcore_id * DCORE_OFFSET +
 +                              dcore_rtr_id * DCORE_RTR_OFFSET +
 +                              RTR_MSTR_IF_OFFSET;
 +              lbw_rtr_mstr_if_base_addr = hbw_rtr_mstr_if_base_addr +
 +                              (((s32)lbw_rtr_id - hbw_rtr_id) * DCORE_RTR_OFFSET);
 +      }
 +
 +      /* Find out event cause by reading "RAZWI_HAPPENED" registers */
 +      hbw_shrd_aw = RREG32(hbw_rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_HAPPENED);
 +      hbw_shrd_ar = RREG32(hbw_rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_HAPPENED);
 +      lbw_shrd_aw = RREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI_HAPPENED);
 +      lbw_shrd_ar = RREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI_HAPPENED);
 +
 +      eng_id = gaudi2_razwi_calc_engine_id(hdev, module, module_idx);
 +      if (hbw_shrd_aw) {
 +              gaudi2_razwi_rr_hbw_shared_printf_info(hdev, hbw_rtr_mstr_if_base_addr, true,
 +                                              initiator_name, eng_id, event_mask);
 +
 +              /* Clear event indication */
 +              WREG32(hbw_rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_HAPPENED, hbw_shrd_aw);
 +      }
 +
 +      if (hbw_shrd_ar) {
 +              gaudi2_razwi_rr_hbw_shared_printf_info(hdev, hbw_rtr_mstr_if_base_addr, false,
 +                                              initiator_name, eng_id, event_mask);
 +
 +              /* Clear event indication */
 +              WREG32(hbw_rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_HAPPENED, hbw_shrd_ar);
 +      }
 +
 +      if (lbw_shrd_aw) {
 +              gaudi2_razwi_rr_lbw_shared_printf_info(hdev, lbw_rtr_mstr_if_base_addr, true,
 +                                              initiator_name, eng_id, event_mask);
 +
 +              /* Clear event indication */
 +              WREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI_HAPPENED, lbw_shrd_aw);
 +      }
 +
 +      if (lbw_shrd_ar) {
 +              gaudi2_razwi_rr_lbw_shared_printf_info(hdev, lbw_rtr_mstr_if_base_addr, false,
 +                                              initiator_name, eng_id, event_mask);
 +
 +              /* Clear event indication */
 +              WREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI_HAPPENED, lbw_shrd_ar);
 +      }
 +}
 +
 +static void gaudi2_check_if_razwi_happened(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u8 mod_idx, sub_mod;
 +
 +      /* check all TPCs */
 +      for (mod_idx = 0 ; mod_idx < (NUM_OF_TPC_PER_DCORE * NUM_OF_DCORES + 1) ; mod_idx++) {
 +              if (prop->tpc_enabled_mask & BIT(mod_idx))
 +                      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_TPC, mod_idx, 0, NULL);
 +      }
 +
 +      /* check all MMEs */
 +      for (mod_idx = 0 ; mod_idx < (NUM_OF_MME_PER_DCORE * NUM_OF_DCORES) ; mod_idx++)
 +              for (sub_mod = MME_WAP0 ; sub_mod < MME_INITIATORS_MAX ; sub_mod++)
 +                      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_MME, mod_idx,
 +                                                                      sub_mod, NULL);
 +
 +      /* check all EDMAs */
 +      for (mod_idx = 0 ; mod_idx < (NUM_OF_EDMA_PER_DCORE * NUM_OF_DCORES) ; mod_idx++)
 +              if (prop->edma_enabled_mask & BIT(mod_idx))
 +                      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_EDMA, mod_idx, 0, NULL);
 +
 +      /* check all PDMAs */
 +      for (mod_idx = 0 ; mod_idx < NUM_OF_PDMA ; mod_idx++)
 +              gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_PDMA, mod_idx, 0, NULL);
 +
 +      /* check all NICs */
 +      for (mod_idx = 0 ; mod_idx < NIC_NUMBER_OF_PORTS ; mod_idx++)
 +              if (hdev->nic_ports_mask & BIT(mod_idx))
 +                      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_NIC, mod_idx >> 1, 0,
 +                                                              NULL);
 +
 +      /* check all DECs */
 +      for (mod_idx = 0 ; mod_idx < NUMBER_OF_DEC ; mod_idx++)
 +              if (prop->decoder_enabled_mask & BIT(mod_idx))
 +                      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_DEC, mod_idx, 0, NULL);
 +
 +      /* check all ROTs */
 +      for (mod_idx = 0 ; mod_idx < NUM_OF_ROT ; mod_idx++)
 +              gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_ROT, mod_idx, 0, NULL);
 +}
 +
 +static const char *gaudi2_get_initiators_name(u32 rtr_id)
 +{
 +      switch (rtr_id) {
 +      case DCORE0_RTR0:
 +              return "DEC0/1/8/9, TPC24, PDMA0/1, PMMU, PCIE_IF, EDMA0/2, HMMU0/2/4/6, CPU";
 +      case DCORE0_RTR1:
 +              return "TPC0/1";
 +      case DCORE0_RTR2:
 +              return "TPC2/3";
 +      case DCORE0_RTR3:
 +              return "TPC4/5";
 +      case DCORE0_RTR4:
 +              return "MME0_SBTE0/1";
 +      case DCORE0_RTR5:
 +              return "MME0_WAP0/SBTE2";
 +      case DCORE0_RTR6:
 +              return "MME0_CTRL_WR/SBTE3";
 +      case DCORE0_RTR7:
 +              return "MME0_WAP1/CTRL_RD/SBTE4";
 +      case DCORE1_RTR0:
 +              return "MME1_WAP1/CTRL_RD/SBTE4";
 +      case DCORE1_RTR1:
 +              return "MME1_CTRL_WR/SBTE3";
 +      case DCORE1_RTR2:
 +              return "MME1_WAP0/SBTE2";
 +      case DCORE1_RTR3:
 +              return "MME1_SBTE0/1";
 +      case DCORE1_RTR4:
 +              return "TPC10/11";
 +      case DCORE1_RTR5:
 +              return "TPC8/9";
 +      case DCORE1_RTR6:
 +              return "TPC6/7";
 +      case DCORE1_RTR7:
 +              return "DEC2/3, NIC0/1/2/3/4, ARC_FARM, KDMA, EDMA1/3, HMMU1/3/5/7";
 +      case DCORE2_RTR0:
 +              return "DEC4/5, NIC5/6/7/8, EDMA4/6, HMMU8/10/12/14, ROT0";
 +      case DCORE2_RTR1:
 +              return "TPC16/17";
 +      case DCORE2_RTR2:
 +              return "TPC14/15";
 +      case DCORE2_RTR3:
 +              return "TPC12/13";
 +      case DCORE2_RTR4:
 +              return "MME2_SBTE0/1";
 +      case DCORE2_RTR5:
 +              return "MME2_WAP0/SBTE2";
 +      case DCORE2_RTR6:
 +              return "MME2_CTRL_WR/SBTE3";
 +      case DCORE2_RTR7:
 +              return "MME2_WAP1/CTRL_RD/SBTE4";
 +      case DCORE3_RTR0:
 +              return "MME3_WAP1/CTRL_RD/SBTE4";
 +      case DCORE3_RTR1:
 +              return "MME3_CTRL_WR/SBTE3";
 +      case DCORE3_RTR2:
 +              return "MME3_WAP0/SBTE2";
 +      case DCORE3_RTR3:
 +              return "MME3_SBTE0/1";
 +      case DCORE3_RTR4:
 +              return "TPC18/19";
 +      case DCORE3_RTR5:
 +              return "TPC20/21";
 +      case DCORE3_RTR6:
 +              return "TPC22/23";
 +      case DCORE3_RTR7:
 +              return "DEC6/7, NIC9/10/11, EDMA5/7, HMMU9/11/13/15, ROT1, PSOC";
 +      default:
 +      return "N/A";
 +      }
 +}
 +
 +static u16 gaudi2_get_razwi_initiators(u32 rtr_id, u16 *engines)
 +{
 +      switch (rtr_id) {
 +      case DCORE0_RTR0:
 +              engines[0] = GAUDI2_DCORE0_ENGINE_ID_DEC_0;
 +              engines[1] = GAUDI2_DCORE0_ENGINE_ID_DEC_1;
 +              engines[2] = GAUDI2_PCIE_ENGINE_ID_DEC_0;
 +              engines[3] = GAUDI2_PCIE_ENGINE_ID_DEC_1;
 +              engines[4] = GAUDI2_DCORE0_ENGINE_ID_TPC_6;
 +              engines[5] = GAUDI2_ENGINE_ID_PDMA_0;
 +              engines[6] = GAUDI2_ENGINE_ID_PDMA_1;
 +              engines[7] = GAUDI2_ENGINE_ID_PCIE;
 +              engines[8] = GAUDI2_DCORE0_ENGINE_ID_EDMA_0;
 +              engines[9] = GAUDI2_DCORE1_ENGINE_ID_EDMA_0;
 +              engines[10] = GAUDI2_ENGINE_ID_PSOC;
 +              return 11;
 +
 +      case DCORE0_RTR1:
 +              engines[0] = GAUDI2_DCORE0_ENGINE_ID_TPC_0;
 +              engines[1] = GAUDI2_DCORE0_ENGINE_ID_TPC_1;
 +              return 2;
 +
 +      case DCORE0_RTR2:
 +              engines[0] = GAUDI2_DCORE0_ENGINE_ID_TPC_2;
 +              engines[1] = GAUDI2_DCORE0_ENGINE_ID_TPC_3;
 +              return 2;
 +
 +      case DCORE0_RTR3:
 +              engines[0] = GAUDI2_DCORE0_ENGINE_ID_TPC_4;
 +              engines[1] = GAUDI2_DCORE0_ENGINE_ID_TPC_5;
 +              return 2;
 +
 +      case DCORE0_RTR4:
 +      case DCORE0_RTR5:
 +      case DCORE0_RTR6:
 +      case DCORE0_RTR7:
 +              engines[0] = GAUDI2_DCORE0_ENGINE_ID_MME;
 +              return 1;
 +
 +      case DCORE1_RTR0:
 +      case DCORE1_RTR1:
 +      case DCORE1_RTR2:
 +      case DCORE1_RTR3:
 +              engines[0] = GAUDI2_DCORE1_ENGINE_ID_MME;
 +              return 1;
 +
 +      case DCORE1_RTR4:
 +              engines[0] = GAUDI2_DCORE1_ENGINE_ID_TPC_4;
 +              engines[1] = GAUDI2_DCORE1_ENGINE_ID_TPC_5;
 +              return 2;
 +
 +      case DCORE1_RTR5:
 +              engines[0] = GAUDI2_DCORE1_ENGINE_ID_TPC_2;
 +              engines[1] = GAUDI2_DCORE1_ENGINE_ID_TPC_3;
 +              return 2;
 +
 +      case DCORE1_RTR6:
 +              engines[0] = GAUDI2_DCORE1_ENGINE_ID_TPC_0;
 +              engines[1] = GAUDI2_DCORE1_ENGINE_ID_TPC_1;
 +              return 2;
 +
 +      case DCORE1_RTR7:
 +              engines[0] = GAUDI2_DCORE1_ENGINE_ID_DEC_0;
 +              engines[1] = GAUDI2_DCORE1_ENGINE_ID_DEC_1;
 +              engines[2] = GAUDI2_ENGINE_ID_NIC0_0;
 +              engines[3] = GAUDI2_ENGINE_ID_NIC1_0;
 +              engines[4] = GAUDI2_ENGINE_ID_NIC2_0;
 +              engines[5] = GAUDI2_ENGINE_ID_NIC3_0;
 +              engines[6] = GAUDI2_ENGINE_ID_NIC4_0;
 +              engines[7] = GAUDI2_ENGINE_ID_ARC_FARM;
 +              engines[8] = GAUDI2_ENGINE_ID_KDMA;
 +              engines[9] = GAUDI2_DCORE0_ENGINE_ID_EDMA_1;
 +              engines[10] = GAUDI2_DCORE1_ENGINE_ID_EDMA_1;
 +              return 11;
 +
 +      case DCORE2_RTR0:
 +              engines[0] = GAUDI2_DCORE2_ENGINE_ID_DEC_0;
 +              engines[1] = GAUDI2_DCORE2_ENGINE_ID_DEC_1;
 +              engines[2] = GAUDI2_ENGINE_ID_NIC5_0;
 +              engines[3] = GAUDI2_ENGINE_ID_NIC6_0;
 +              engines[4] = GAUDI2_ENGINE_ID_NIC7_0;
 +              engines[5] = GAUDI2_ENGINE_ID_NIC8_0;
 +              engines[6] = GAUDI2_DCORE2_ENGINE_ID_EDMA_0;
 +              engines[7] = GAUDI2_DCORE3_ENGINE_ID_EDMA_0;
 +              engines[8] = GAUDI2_ENGINE_ID_ROT_0;
 +              return 9;
 +
 +      case DCORE2_RTR1:
 +              engines[0] = GAUDI2_DCORE2_ENGINE_ID_TPC_4;
 +              engines[1] = GAUDI2_DCORE2_ENGINE_ID_TPC_5;
 +              return 2;
 +
 +      case DCORE2_RTR2:
 +              engines[0] = GAUDI2_DCORE2_ENGINE_ID_TPC_2;
 +              engines[1] = GAUDI2_DCORE2_ENGINE_ID_TPC_3;
 +              return 2;
 +
 +      case DCORE2_RTR3:
 +              engines[0] = GAUDI2_DCORE2_ENGINE_ID_TPC_0;
 +              engines[1] = GAUDI2_DCORE2_ENGINE_ID_TPC_1;
 +              return 2;
 +
 +      case DCORE2_RTR4:
 +      case DCORE2_RTR5:
 +      case DCORE2_RTR6:
 +      case DCORE2_RTR7:
 +              engines[0] = GAUDI2_DCORE2_ENGINE_ID_MME;
 +              return 1;
 +      case DCORE3_RTR0:
 +      case DCORE3_RTR1:
 +      case DCORE3_RTR2:
 +      case DCORE3_RTR3:
 +              engines[0] = GAUDI2_DCORE3_ENGINE_ID_MME;
 +              return 1;
 +      case DCORE3_RTR4:
 +              engines[0] = GAUDI2_DCORE3_ENGINE_ID_TPC_0;
 +              engines[1] = GAUDI2_DCORE3_ENGINE_ID_TPC_1;
 +              return 2;
 +      case DCORE3_RTR5:
 +              engines[0] = GAUDI2_DCORE3_ENGINE_ID_TPC_2;
 +              engines[1] = GAUDI2_DCORE3_ENGINE_ID_TPC_3;
 +              return 2;
 +      case DCORE3_RTR6:
 +              engines[0] = GAUDI2_DCORE3_ENGINE_ID_TPC_4;
 +              engines[1] = GAUDI2_DCORE3_ENGINE_ID_TPC_5;
 +              return 2;
 +      case DCORE3_RTR7:
 +              engines[0] = GAUDI2_DCORE3_ENGINE_ID_DEC_0;
 +              engines[1] = GAUDI2_DCORE3_ENGINE_ID_DEC_1;
 +              engines[2] = GAUDI2_ENGINE_ID_NIC9_0;
 +              engines[3] = GAUDI2_ENGINE_ID_NIC10_0;
 +              engines[4] = GAUDI2_ENGINE_ID_NIC11_0;
 +              engines[5] = GAUDI2_DCORE2_ENGINE_ID_EDMA_1;
 +              engines[6] = GAUDI2_DCORE3_ENGINE_ID_EDMA_1;
 +              engines[7] = GAUDI2_ENGINE_ID_ROT_1;
 +              engines[8] = GAUDI2_ENGINE_ID_ROT_0;
 +              return 9;
 +      default:
 +              return 0;
 +      }
 +}
 +
 +static void gaudi2_razwi_unmapped_addr_hbw_printf_info(struct hl_device *hdev, u32 rtr_id,
 +                                                      u64 rtr_ctrl_base_addr, bool is_write,
 +                                                      u64 *event_mask)
 +{
 +      u16 engines[HL_RAZWI_MAX_NUM_OF_ENGINES_PER_RTR], num_of_eng;
 +      u32 razwi_hi, razwi_lo;
 +      u8 rd_wr_flag;
 +
 +      num_of_eng = gaudi2_get_razwi_initiators(rtr_id, &engines[0]);
 +
 +      if (is_write) {
 +              razwi_hi = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AW_ADDR_HI);
 +              razwi_lo = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AW_ADDR_LO);
 +              rd_wr_flag = HL_RAZWI_WRITE;
 +
 +              /* Clear set indication */
 +              WREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AW_SET, 0x1);
 +      } else {
 +              razwi_hi = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AR_ADDR_HI);
 +              razwi_lo = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AR_ADDR_LO);
 +              rd_wr_flag = HL_RAZWI_READ;
 +
 +              /* Clear set indication */
 +              WREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AR_SET, 0x1);
 +      }
 +
 +      hl_handle_razwi(hdev, (u64)razwi_hi << 32 | razwi_lo, &engines[0], num_of_eng,
 +                              rd_wr_flag | HL_RAZWI_HBW, event_mask);
 +      dev_err_ratelimited(hdev->dev,
 +              "RAZWI PSOC unmapped HBW %s error, rtr id %u, address %#llx\n",
 +              is_write ? "WR" : "RD", rtr_id, (u64)razwi_hi << 32 | razwi_lo);
 +
 +      dev_err_ratelimited(hdev->dev,
 +              "Initiators: %s\n", gaudi2_get_initiators_name(rtr_id));
 +}
 +
 +static void gaudi2_razwi_unmapped_addr_lbw_printf_info(struct hl_device *hdev, u32 rtr_id,
 +                                                      u64 rtr_ctrl_base_addr, bool is_write,
 +                                                      u64 *event_mask)
 +{
 +      u16 engines[HL_RAZWI_MAX_NUM_OF_ENGINES_PER_RTR], num_of_eng;
 +      u64 razwi_addr = CFG_BASE;
 +      u8 rd_wr_flag;
 +
 +      num_of_eng = gaudi2_get_razwi_initiators(rtr_id, &engines[0]);
 +
 +      if (is_write) {
 +              razwi_addr += RREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AW_ADDR);
 +              rd_wr_flag = HL_RAZWI_WRITE;
 +
 +              /* Clear set indication */
 +              WREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AW_SET, 0x1);
 +      } else {
 +              razwi_addr += RREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AR_ADDR);
 +              rd_wr_flag = HL_RAZWI_READ;
 +
 +              /* Clear set indication */
 +              WREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AR_SET, 0x1);
 +      }
 +
 +      hl_handle_razwi(hdev, razwi_addr, &engines[0], num_of_eng, rd_wr_flag | HL_RAZWI_LBW,
 +                      event_mask);
 +      dev_err_ratelimited(hdev->dev,
 +              "RAZWI PSOC unmapped LBW %s error, rtr id %u, address 0x%llX\n",
 +              is_write ? "WR" : "RD", rtr_id, razwi_addr);
 +
 +      dev_err_ratelimited(hdev->dev,
 +              "Initiators: %s\n", gaudi2_get_initiators_name(rtr_id));
 +}
 +
 +/* PSOC RAZWI interrupt occurs only when trying to access a bad address */
 +static int gaudi2_ack_psoc_razwi_event_handler(struct hl_device *hdev, u64 *event_mask)
 +{
 +      u32 hbw_aw_set, hbw_ar_set, lbw_aw_set, lbw_ar_set, rtr_id, dcore_id, dcore_rtr_id, xy,
 +                                              razwi_mask_info, razwi_intr = 0, error_count = 0;
 +      int rtr_map_arr_len = NUM_OF_RTR_PER_DCORE * NUM_OF_DCORES;
 +      u64 rtr_ctrl_base_addr;
 +
 +      if (hdev->pldm || !(hdev->fw_components & FW_TYPE_LINUX)) {
 +              razwi_intr = RREG32(mmPSOC_GLOBAL_CONF_RAZWI_INTERRUPT);
 +              if (!razwi_intr)
 +                      return 0;
 +      }
 +
 +      razwi_mask_info = RREG32(mmPSOC_GLOBAL_CONF_RAZWI_MASK_INFO);
 +      xy = FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_AXUSER_L_MASK, razwi_mask_info);
 +
 +      dev_err_ratelimited(hdev->dev,
 +              "PSOC RAZWI interrupt: Mask %d, AR %d, AW %d, AXUSER_L 0x%x AXUSER_H 0x%x\n",
 +              FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_MASK_MASK, razwi_mask_info),
 +              FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_WAS_AR_MASK, razwi_mask_info),
 +              FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_WAS_AW_MASK, razwi_mask_info),
 +              xy,
 +              FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_AXUSER_H_MASK, razwi_mask_info));
 +
 +      if (xy == 0) {
 +              dev_err_ratelimited(hdev->dev,
 +                              "PSOC RAZWI interrupt: received event from 0 rtr coordinates\n");
 +              goto clear;
 +      }
 +
 +      /* Find router id by router coordinates */
 +      for (rtr_id = 0 ; rtr_id < rtr_map_arr_len ; rtr_id++)
 +              if (rtr_coordinates_to_rtr_id[rtr_id] == xy)
 +                      break;
 +
 +      if (rtr_id == rtr_map_arr_len) {
 +              dev_err_ratelimited(hdev->dev,
 +                              "PSOC RAZWI interrupt: invalid rtr coordinates (0x%x)\n", xy);
 +              goto clear;
 +      }
 +
 +      /* Find router mstr_if register base */
 +      dcore_id = rtr_id / NUM_OF_RTR_PER_DCORE;
 +      dcore_rtr_id = rtr_id % NUM_OF_RTR_PER_DCORE;
 +      rtr_ctrl_base_addr = mmDCORE0_RTR0_CTRL_BASE + dcore_id * DCORE_OFFSET +
 +                              dcore_rtr_id * DCORE_RTR_OFFSET;
 +
 +      hbw_aw_set = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AW_SET);
 +      hbw_ar_set = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AR_SET);
 +      lbw_aw_set = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AW_SET);
 +      lbw_ar_set = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AR_SET);
 +
 +      if (hbw_aw_set)
 +              gaudi2_razwi_unmapped_addr_hbw_printf_info(hdev, rtr_id,
 +                                              rtr_ctrl_base_addr, true, event_mask);
 +
 +      if (hbw_ar_set)
 +              gaudi2_razwi_unmapped_addr_hbw_printf_info(hdev, rtr_id,
 +                                              rtr_ctrl_base_addr, false, event_mask);
 +
 +      if (lbw_aw_set)
 +              gaudi2_razwi_unmapped_addr_lbw_printf_info(hdev, rtr_id,
 +                                              rtr_ctrl_base_addr, true, event_mask);
 +
 +      if (lbw_ar_set)
 +              gaudi2_razwi_unmapped_addr_lbw_printf_info(hdev, rtr_id,
 +                                              rtr_ctrl_base_addr, false, event_mask);
 +
 +      error_count++;
 +
 +clear:
 +      /* Clear Interrupts only on pldm or if f/w doesn't handle interrupts */
 +      if (hdev->pldm || !(hdev->fw_components & FW_TYPE_LINUX))
 +              WREG32(mmPSOC_GLOBAL_CONF_RAZWI_INTERRUPT, razwi_intr);
 +
 +      return error_count;
 +}
 +
 +static int _gaudi2_handle_qm_sei_err(struct hl_device *hdev, u64 qman_base, u16 event_type)
 +{
 +      u32 i, sts_val, sts_clr_val = 0, error_count = 0;
 +
 +      sts_val = RREG32(qman_base + QM_SEI_STATUS_OFFSET);
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_QM_SEI_ERR_CAUSE ; i++) {
 +              if (sts_val & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_qm_sei_error_cause[i]);
 +                      sts_clr_val |= BIT(i);
 +                      error_count++;
 +              }
 +      }
 +
 +      WREG32(qman_base + QM_SEI_STATUS_OFFSET, sts_clr_val);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_qm_sei_err(struct hl_device *hdev, u16 event_type,
 +                                      bool extended_err_check, u64 *event_mask)
 +{
 +      enum razwi_event_sources module;
 +      u32 error_count = 0;
 +      u64 qman_base;
 +      u8 index;
 +
 +      switch (event_type) {
 +      case GAUDI2_EVENT_TPC0_AXI_ERR_RSP ... GAUDI2_EVENT_TPC23_AXI_ERR_RSP:
 +              index = event_type - GAUDI2_EVENT_TPC0_AXI_ERR_RSP;
 +              qman_base = mmDCORE0_TPC0_QM_BASE +
 +                              (index / NUM_OF_TPC_PER_DCORE) * DCORE_OFFSET +
 +                              (index % NUM_OF_TPC_PER_DCORE) * DCORE_TPC_OFFSET;
 +              module = RAZWI_TPC;
 +              break;
 +      case GAUDI2_EVENT_TPC24_AXI_ERR_RSP:
 +              qman_base = mmDCORE0_TPC6_QM_BASE;
 +              module = RAZWI_TPC;
 +              break;
 +      case GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_MME1_CTRL_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_MME2_CTRL_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_MME3_CTRL_AXI_ERROR_RESPONSE:
 +              index = (event_type - GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE) /
 +                              (GAUDI2_EVENT_MME1_CTRL_AXI_ERROR_RESPONSE -
 +                                              GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE);
 +              qman_base = mmDCORE0_MME_QM_BASE + index * DCORE_OFFSET;
 +              module = RAZWI_MME;
 +              break;
 +      case GAUDI2_EVENT_PDMA_CH0_AXI_ERR_RSP:
 +      case GAUDI2_EVENT_PDMA_CH1_AXI_ERR_RSP:
 +              index = event_type - GAUDI2_EVENT_PDMA_CH0_AXI_ERR_RSP;
 +              qman_base = mmPDMA0_QM_BASE + index * PDMA_OFFSET;
 +              module = RAZWI_PDMA;
 +              break;
 +      case GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_ROTATOR1_AXI_ERROR_RESPONSE:
 +              index = event_type - GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE;
 +              qman_base = mmROT0_QM_BASE + index * ROT_OFFSET;
 +              module = RAZWI_ROT;
 +              break;
 +      default:
 +              return 0;
 +      }
 +
 +      error_count = _gaudi2_handle_qm_sei_err(hdev, qman_base, event_type);
 +
 +      /* There is a single event per NIC macro, so should check its both QMAN blocks */
 +      if (event_type >= GAUDI2_EVENT_NIC0_AXI_ERROR_RESPONSE &&
 +                      event_type <= GAUDI2_EVENT_NIC11_AXI_ERROR_RESPONSE)
 +              error_count += _gaudi2_handle_qm_sei_err(hdev,
 +                                      qman_base + NIC_QM_OFFSET, event_type);
 +
 +      if (extended_err_check) {
 +              /* check if RAZWI happened */
 +              gaudi2_ack_module_razwi_event_handler(hdev, module, 0, 0, event_mask);
 +              hl_check_for_glbl_errors(hdev);
 +      }
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_qman_err(struct hl_device *hdev, u16 event_type, u64 *event_mask)
 +{
 +      u32 qid_base, error_count = 0;
 +      u64 qman_base;
 +      u8 index;
 +
 +      switch (event_type) {
 +      case GAUDI2_EVENT_TPC0_QM ... GAUDI2_EVENT_TPC5_QM:
 +              index = event_type - GAUDI2_EVENT_TPC0_QM;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE0_TPC_0_0 + index * QMAN_STREAMS;
 +              qman_base = mmDCORE0_TPC0_QM_BASE + index * DCORE_TPC_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_TPC6_QM ... GAUDI2_EVENT_TPC11_QM:
 +              index = event_type - GAUDI2_EVENT_TPC6_QM;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE1_TPC_0_0 + index * QMAN_STREAMS;
 +              qman_base = mmDCORE1_TPC0_QM_BASE + index * DCORE_TPC_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_TPC12_QM ... GAUDI2_EVENT_TPC17_QM:
 +              index = event_type - GAUDI2_EVENT_TPC12_QM;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE2_TPC_0_0 + index * QMAN_STREAMS;
 +              qman_base = mmDCORE2_TPC0_QM_BASE + index * DCORE_TPC_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_TPC18_QM ... GAUDI2_EVENT_TPC23_QM:
 +              index = event_type - GAUDI2_EVENT_TPC18_QM;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE3_TPC_0_0 + index * QMAN_STREAMS;
 +              qman_base = mmDCORE3_TPC0_QM_BASE + index * DCORE_TPC_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_TPC24_QM:
 +              qid_base = GAUDI2_QUEUE_ID_DCORE0_TPC_6_0;
 +              qman_base = mmDCORE0_TPC6_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_MME0_QM:
 +              qid_base = GAUDI2_QUEUE_ID_DCORE0_MME_0_0;
 +              qman_base = mmDCORE0_MME_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_MME1_QM:
 +              qid_base = GAUDI2_QUEUE_ID_DCORE1_MME_0_0;
 +              qman_base = mmDCORE1_MME_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_MME2_QM:
 +              qid_base = GAUDI2_QUEUE_ID_DCORE2_MME_0_0;
 +              qman_base = mmDCORE2_MME_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_MME3_QM:
 +              qid_base = GAUDI2_QUEUE_ID_DCORE3_MME_0_0;
 +              qman_base = mmDCORE3_MME_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA0_QM:
 +              index = 0;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0;
 +              qman_base = mmDCORE0_EDMA0_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA1_QM:
 +              index = 1;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0;
 +              qman_base = mmDCORE0_EDMA1_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA2_QM:
 +              index = 2;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0;
 +              qman_base = mmDCORE1_EDMA0_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA3_QM:
 +              index = 3;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0;
 +              qman_base = mmDCORE1_EDMA1_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA4_QM:
 +              index = 4;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0;
 +              qman_base = mmDCORE2_EDMA0_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA5_QM:
 +              index = 5;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0;
 +              qman_base = mmDCORE2_EDMA1_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA6_QM:
 +              index = 6;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0;
 +              qman_base = mmDCORE3_EDMA0_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA7_QM:
 +              index = 7;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0;
 +              qman_base = mmDCORE3_EDMA1_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_PDMA0_QM:
 +              qid_base = GAUDI2_QUEUE_ID_PDMA_0_0;
 +              qman_base = mmPDMA0_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_PDMA1_QM:
 +              qid_base = GAUDI2_QUEUE_ID_PDMA_1_0;
 +              qman_base = mmPDMA1_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_ROTATOR0_ROT0_QM:
 +              qid_base = GAUDI2_QUEUE_ID_ROT_0_0;
 +              qman_base = mmROT0_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_ROTATOR1_ROT1_QM:
 +              qid_base = GAUDI2_QUEUE_ID_ROT_1_0;
 +              qman_base = mmROT1_QM_BASE;
 +              break;
 +      default:
 +              return 0;
 +      }
 +
 +      error_count = gaudi2_handle_qman_err_generic(hdev, event_type, qman_base, qid_base);
 +
 +      /* Handle EDMA QM SEI here because there is no AXI error response event for EDMA */
 +      if (event_type >= GAUDI2_EVENT_HDMA2_QM && event_type <= GAUDI2_EVENT_HDMA5_QM) {
 +              error_count += _gaudi2_handle_qm_sei_err(hdev, qman_base, event_type);
 +              gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_EDMA, index, 0, event_mask);
 +      }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_arc_farm_sei_err(struct hl_device *hdev, u16 event_type)
 +{
 +      u32 i, sts_val, sts_clr_val = 0, error_count = 0;
 +
 +      sts_val = RREG32(mmARC_FARM_ARC0_AUX_ARC_SEI_INTR_STS);
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_ARC_SEI_ERR_CAUSE ; i++) {
 +              if (sts_val & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_arc_sei_error_cause[i]);
 +                      sts_clr_val |= BIT(i);
 +                      error_count++;
 +              }
 +      }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      WREG32(mmARC_FARM_ARC0_AUX_ARC_SEI_INTR_CLR, sts_clr_val);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_cpu_sei_err(struct hl_device *hdev, u16 event_type)
 +{
 +      u32 i, sts_val, sts_clr_val = 0, error_count = 0;
 +
 +      sts_val = RREG32(mmCPU_IF_CPU_SEI_INTR_STS);
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_CPU_SEI_ERR_CAUSE ; i++) {
 +              if (sts_val & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_cpu_sei_error_cause[i]);
 +                      sts_clr_val |= BIT(i);
 +                      error_count++;
 +              }
 +      }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      WREG32(mmCPU_IF_CPU_SEI_INTR_CLR, sts_clr_val);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_rot_err(struct hl_device *hdev, u8 rot_index, u16 event_type,
 +                                      struct hl_eq_razwi_with_intr_cause *razwi_with_intr_cause,
 +                                      u64 *event_mask)
 +{
 +      u64 intr_cause_data = le64_to_cpu(razwi_with_intr_cause->intr_cause.intr_cause_data);
 +      u32 error_count = 0;
 +      int i;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_ROT_ERR_CAUSE ; i++)
 +              if (intr_cause_data & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", guadi2_rot_error_cause[i]);
 +                      error_count++;
 +              }
 +
 +      /* check if RAZWI happened */
 +      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_ROT, rot_index, 0, event_mask);
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_tpc_ack_interrupts(struct hl_device *hdev,  u8 tpc_index, u16 event_type,
 +                                      struct hl_eq_razwi_with_intr_cause *razwi_with_intr_cause,
 +                                      u64 *event_mask)
 +{
 +      u64 intr_cause_data = le64_to_cpu(razwi_with_intr_cause->intr_cause.intr_cause_data);
 +      u32 error_count = 0;
 +      int i;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_TPC_INTR_CAUSE ; i++)
 +              if (intr_cause_data & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "interrupt cause: %s",  gaudi2_tpc_interrupts_cause[i]);
 +                      error_count++;
 +              }
 +
 +      /* check if RAZWI happened */
 +      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_TPC, tpc_index, 0, event_mask);
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_dec_err(struct hl_device *hdev, u8 dec_index, u16 event_type,
 +                                      u64 *event_mask)
 +{
 +      u32 sts_addr, sts_val, sts_clr_val = 0, error_count = 0;
 +      int i;
 +
 +      if (dec_index < NUM_OF_VDEC_PER_DCORE * NUM_OF_DCORES)
 +              /* DCORE DEC */
 +              sts_addr = mmDCORE0_VDEC0_BRDG_CTRL_CAUSE_INTR +
 +                              DCORE_OFFSET * (dec_index / NUM_OF_DEC_PER_DCORE) +
 +                              DCORE_VDEC_OFFSET * (dec_index % NUM_OF_DEC_PER_DCORE);
 +      else
 +              /* PCIE DEC */
 +              sts_addr = mmPCIE_VDEC0_BRDG_CTRL_CAUSE_INTR + PCIE_VDEC_OFFSET *
 +                              (dec_index - NUM_OF_VDEC_PER_DCORE * NUM_OF_DCORES);
 +
 +      sts_val = RREG32(sts_addr);
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_DEC_ERR_CAUSE ; i++) {
 +              if (sts_val & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_dec_error_cause[i]);
 +                      sts_clr_val |= BIT(i);
 +                      error_count++;
 +              }
 +      }
 +
 +      /* check if RAZWI happened */
 +      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_DEC, dec_index, 0, event_mask);
 +      hl_check_for_glbl_errors(hdev);
 +
 +      /* Write 1 clear errors */
 +      WREG32(sts_addr, sts_clr_val);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_mme_err(struct hl_device *hdev, u8 mme_index, u16 event_type,
 +                                      u64 *event_mask)
 +{
 +      u32 sts_addr, sts_val, sts_clr_addr, sts_clr_val = 0, error_count = 0;
 +      int i;
 +
 +      sts_addr = mmDCORE0_MME_CTRL_LO_INTR_CAUSE + DCORE_OFFSET * mme_index;
 +      sts_clr_addr = mmDCORE0_MME_CTRL_LO_INTR_CLEAR + DCORE_OFFSET * mme_index;
 +
 +      sts_val = RREG32(sts_addr);
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_MME_ERR_CAUSE ; i++) {
 +              if (sts_val & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", guadi2_mme_error_cause[i]);
 +                      sts_clr_val |= BIT(i);
 +                      error_count++;
 +              }
 +      }
 +
 +      /* check if RAZWI happened */
 +      for (i = MME_WRITE ; i < MME_INITIATORS_MAX ; i++)
 +              gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_MME, mme_index, i, event_mask);
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      WREG32(sts_clr_addr, sts_clr_val);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_mme_sbte_err(struct hl_device *hdev, u16 event_type,
 +                                      u64 intr_cause_data)
 +{
 +      int i, error_count = 0;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_MME_SBTE_ERR_CAUSE ; i++)
 +              if (intr_cause_data & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", guadi2_mme_sbte_error_cause[i]);
 +                      error_count++;
 +              }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_mme_wap_err(struct hl_device *hdev, u8 mme_index, u16 event_type,
 +                                      u64 *event_mask)
 +{
 +      u32 sts_addr, sts_val, sts_clr_addr, sts_clr_val = 0, error_count = 0;
 +      int i;
 +
 +      sts_addr = mmDCORE0_MME_ACC_INTR_CAUSE + DCORE_OFFSET * mme_index;
 +      sts_clr_addr = mmDCORE0_MME_ACC_INTR_CLEAR + DCORE_OFFSET * mme_index;
 +
 +      sts_val = RREG32(sts_addr);
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_MME_WAP_ERR_CAUSE ; i++) {
 +              if (sts_val & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", guadi2_mme_wap_error_cause[i]);
 +                      sts_clr_val |= BIT(i);
 +                      error_count++;
 +              }
 +      }
 +
 +      /* check if RAZWI happened on WAP0/1 */
 +      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_MME, mme_index, MME_WAP0, event_mask);
 +      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_MME, mme_index, MME_WAP1, event_mask);
 +      hl_check_for_glbl_errors(hdev);
 +
 +      WREG32(sts_clr_addr, sts_clr_val);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_kdma_core_event(struct hl_device *hdev, u16 event_type,
 +                                      u64 intr_cause_data)
 +{
 +      u32 error_count = 0;
 +      int i;
 +
 +      /* If an AXI read or write error is received, an error is reported and
 +       * interrupt message is sent. Due to an HW errata, when reading the cause
 +       * register of the KDMA engine, the reported error is always HBW even if
 +       * the actual error caused by a LBW KDMA transaction.
 +       */
 +      for (i = 0 ; i < GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE ; i++)
 +              if (intr_cause_data & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_kdma_core_interrupts_cause[i]);
 +                      error_count++;
 +              }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_dma_core_event(struct hl_device *hdev, u16 event_type,
 +                                      u64 intr_cause_data)
 +{
 +      u32 error_count = 0;
 +      int i;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE ; i++)
 +              if (intr_cause_data & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_dma_core_interrupts_cause[i]);
 +                      error_count++;
 +              }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static void gaudi2_print_pcie_mstr_rr_mstr_if_razwi_info(struct hl_device *hdev, u64 *event_mask)
 +{
 +      u32 mstr_if_base_addr = mmPCIE_MSTR_RR_MSTR_IF_RR_SHRD_HBW_BASE, razwi_happened_addr;
 +
 +      razwi_happened_addr = mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_HAPPENED;
 +      if (RREG32(razwi_happened_addr)) {
 +              gaudi2_razwi_rr_hbw_shared_printf_info(hdev, mstr_if_base_addr, true, "PCIE",
 +                                                      GAUDI2_ENGINE_ID_PCIE, event_mask);
 +              WREG32(razwi_happened_addr, 0x1);
 +      }
 +
 +      razwi_happened_addr = mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_HAPPENED;
 +      if (RREG32(razwi_happened_addr)) {
 +              gaudi2_razwi_rr_hbw_shared_printf_info(hdev, mstr_if_base_addr, false, "PCIE",
 +                                                      GAUDI2_ENGINE_ID_PCIE, event_mask);
 +              WREG32(razwi_happened_addr, 0x1);
 +      }
 +
 +      razwi_happened_addr = mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI_HAPPENED;
 +      if (RREG32(razwi_happened_addr)) {
 +              gaudi2_razwi_rr_lbw_shared_printf_info(hdev, mstr_if_base_addr, true, "PCIE",
 +                                                      GAUDI2_ENGINE_ID_PCIE, event_mask);
 +              WREG32(razwi_happened_addr, 0x1);
 +      }
 +
 +      razwi_happened_addr = mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI_HAPPENED;
 +      if (RREG32(razwi_happened_addr)) {
 +              gaudi2_razwi_rr_lbw_shared_printf_info(hdev, mstr_if_base_addr, false, "PCIE",
 +                                                      GAUDI2_ENGINE_ID_PCIE, event_mask);
 +              WREG32(razwi_happened_addr, 0x1);
 +      }
 +}
 +
 +static int gaudi2_print_pcie_addr_dec_info(struct hl_device *hdev, u16 event_type,
 +                                      u64 intr_cause_data, u64 *event_mask)
 +{
 +      u32 error_count = 0;
 +      int i;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_PCIE_ADDR_DEC_ERR_CAUSE ; i++) {
 +              if (!(intr_cause_data & BIT_ULL(i)))
 +                      continue;
 +
 +              gaudi2_print_event(hdev, event_type, true,
 +                      "err cause: %s", gaudi2_pcie_addr_dec_error_cause[i]);
 +              error_count++;
 +
 +              switch (intr_cause_data & BIT_ULL(i)) {
 +              case PCIE_WRAP_PCIE_IC_SEI_INTR_IND_AXI_LBW_ERR_INTR_MASK:
 +                      hl_check_for_glbl_errors(hdev);
 +                      break;
 +              case PCIE_WRAP_PCIE_IC_SEI_INTR_IND_BAD_ACCESS_INTR_MASK:
 +                      gaudi2_print_pcie_mstr_rr_mstr_if_razwi_info(hdev, event_mask);
 +                      break;
 +              }
 +      }
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_pif_fatal(struct hl_device *hdev, u16 event_type,
 +                              u64 intr_cause_data)
 +
 +{
 +      u32 error_count = 0;
 +      int i;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_PMMU_FATAL_ERR_CAUSE ; i++) {
 +              if (intr_cause_data & BIT_ULL(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_pmmu_fatal_interrupts_cause[i]);
 +                      error_count++;
 +              }
 +      }
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_hif_fatal(struct hl_device *hdev, u16 event_type, u64 intr_cause_data)
 +{
 +      u32 error_count = 0;
 +      int i;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_HIF_FATAL_ERR_CAUSE ; i++) {
 +              if (intr_cause_data & BIT_ULL(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_hif_fatal_interrupts_cause[i]);
 +                      error_count++;
 +              }
 +      }
 +
 +      return error_count;
 +}
 +
 +static void gaudi2_handle_page_error(struct hl_device *hdev, u64 mmu_base, bool is_pmmu,
 +                                      u64 *event_mask)
 +{
 +      u32 valid, val, axid_l, axid_h;
 +      u64 addr;
 +
 +      valid = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_PAGE_ERROR_VALID));
 +
 +      if (!(valid & DCORE0_HMMU0_MMU_ACCESS_PAGE_ERROR_VALID_PAGE_ERR_VALID_ENTRY_MASK))
 +              return;
 +
 +      val = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_ERROR_CAPTURE));
 +      addr = val & DCORE0_HMMU0_MMU_PAGE_ERROR_CAPTURE_VA_63_32_MASK;
 +      addr <<= 32;
 +      addr |= RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_ERROR_CAPTURE_VA));
 +
 +      axid_l = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_FAULT_ID_LSB));
 +      axid_h = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_FAULT_ID_MSB));
 +
 +      dev_err_ratelimited(hdev->dev, "%s page fault on va 0x%llx, transaction id 0x%llX\n",
 +                              is_pmmu ? "PMMU" : "HMMU", addr, ((u64)axid_h << 32) + axid_l);
 +      hl_handle_page_fault(hdev, addr, 0, is_pmmu, event_mask);
 +
 +      WREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_ERROR_CAPTURE), 0);
 +}
 +
 +static void gaudi2_handle_access_error(struct hl_device *hdev, u64 mmu_base, bool is_pmmu)
 +{
 +      u32 valid, val;
 +      u64 addr;
 +
 +      valid = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_PAGE_ERROR_VALID));
 +
 +      if (!(valid & DCORE0_HMMU0_MMU_ACCESS_PAGE_ERROR_VALID_ACCESS_ERR_VALID_ENTRY_MASK))
 +              return;
 +
 +      val = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_ERROR_CAPTURE));
 +      addr = val & DCORE0_HMMU0_MMU_ACCESS_ERROR_CAPTURE_VA_63_32_MASK;
 +      addr <<= 32;
 +      addr |= RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_ERROR_CAPTURE_VA));
 +
 +      dev_err_ratelimited(hdev->dev, "%s access error on va 0x%llx\n",
 +                              is_pmmu ? "PMMU" : "HMMU", addr);
 +      WREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_ERROR_CAPTURE), 0);
 +}
 +
 +static int gaudi2_handle_mmu_spi_sei_generic(struct hl_device *hdev, u16 event_type,
 +                                              u64 mmu_base, bool is_pmmu, u64 *event_mask)
 +{
 +      u32 spi_sei_cause, interrupt_clr = 0x0, error_count = 0;
 +      int i;
 +
 +      spi_sei_cause = RREG32(mmu_base + MMU_SPI_SEI_CAUSE_OFFSET);
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE ; i++) {
 +              if (spi_sei_cause & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_mmu_spi_sei[i].cause);
 +
 +                      if (i == 0)
 +                              gaudi2_handle_page_error(hdev, mmu_base, is_pmmu, event_mask);
 +                      else if (i == 1)
 +                              gaudi2_handle_access_error(hdev, mmu_base, is_pmmu);
 +
 +                      if (gaudi2_mmu_spi_sei[i].clear_bit >= 0)
 +                              interrupt_clr |= BIT(gaudi2_mmu_spi_sei[i].clear_bit);
 +
 +                      error_count++;
 +              }
 +      }
 +
 +      /* Clear cause */
 +      WREG32_AND(mmu_base + MMU_SPI_SEI_CAUSE_OFFSET, ~spi_sei_cause);
 +
 +      /* Clear interrupt */
 +      WREG32(mmu_base + MMU_INTERRUPT_CLR_OFFSET, interrupt_clr);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_sm_err(struct hl_device *hdev, u16 event_type, u8 sm_index)
 +{
 +      u32 sei_cause_addr, sei_cause_val, sei_cause_cause, sei_cause_log,
 +              cq_intr_addr, cq_intr_val, cq_intr_queue_index, error_count = 0;
 +      int i;
 +
 +      sei_cause_addr = mmDCORE0_SYNC_MNGR_GLBL_SM_SEI_CAUSE + DCORE_OFFSET * sm_index;
 +      cq_intr_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_INTR + DCORE_OFFSET * sm_index;
 +
 +      sei_cause_val = RREG32(sei_cause_addr);
 +      sei_cause_cause = FIELD_GET(DCORE0_SYNC_MNGR_GLBL_SM_SEI_CAUSE_CAUSE_MASK, sei_cause_val);
 +      cq_intr_val = RREG32(cq_intr_addr);
 +
 +      /* SEI interrupt */
 +      if (sei_cause_cause) {
 +              /* There are corresponding SEI_CAUSE_log bits for every SEI_CAUSE_cause bit */
 +              sei_cause_log = FIELD_GET(DCORE0_SYNC_MNGR_GLBL_SM_SEI_CAUSE_LOG_MASK,
 +                                      sei_cause_val);
 +
 +              for (i = 0 ; i < GAUDI2_NUM_OF_SM_SEI_ERR_CAUSE ; i++) {
 +                      if (!(sei_cause_cause & BIT(i)))
 +                              continue;
 +
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s. %s: 0x%X\n",
 +                              gaudi2_sm_sei_cause[i].cause_name,
 +                              gaudi2_sm_sei_cause[i].log_name,
 +                              sei_cause_log);
 +                      error_count++;
 +                      break;
 +              }
 +
 +              /* Clear SM_SEI_CAUSE */
 +              WREG32(sei_cause_addr, 0);
 +      }
 +
 +      /* CQ interrupt */
 +      if (cq_intr_val & DCORE0_SYNC_MNGR_GLBL_CQ_INTR_CQ_SEC_INTR_MASK) {
 +              cq_intr_queue_index =
 +                              FIELD_GET(DCORE0_SYNC_MNGR_GLBL_CQ_INTR_CQ_INTR_QUEUE_INDEX_MASK,
 +                                      cq_intr_val);
 +
 +              dev_err_ratelimited(hdev->dev, "SM%u err. err cause: CQ_INTR. queue index: %u\n",
 +                              sm_index, cq_intr_queue_index);
 +              error_count++;
 +
 +              /* Clear CQ_INTR */
 +              WREG32(cq_intr_addr, 0);
 +      }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_mmu_spi_sei_err(struct hl_device *hdev, u16 event_type, u64 *event_mask)
 +{
 +      bool is_pmmu = false;
 +      u32 error_count = 0;
 +      u64 mmu_base;
 +      u8 index;
 +
 +      switch (event_type) {
 +      case GAUDI2_EVENT_HMMU0_PAGE_FAULT_OR_WR_PERM ... GAUDI2_EVENT_HMMU3_SECURITY_ERROR:
 +              index = (event_type - GAUDI2_EVENT_HMMU0_PAGE_FAULT_OR_WR_PERM) / 3;
 +              mmu_base = mmDCORE0_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_HMMU_0_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_3_AXI_ERR_RSP:
 +              index = (event_type - GAUDI2_EVENT_HMMU_0_AXI_ERR_RSP);
 +              mmu_base = mmDCORE0_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_HMMU8_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_HMMU11_SECURITY_ERROR:
 +              index = (event_type - GAUDI2_EVENT_HMMU8_PAGE_FAULT_WR_PERM) / 3;
 +              mmu_base = mmDCORE1_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_HMMU_8_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_11_AXI_ERR_RSP:
 +              index = (event_type - GAUDI2_EVENT_HMMU_8_AXI_ERR_RSP);
 +              mmu_base = mmDCORE1_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_HMMU7_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_HMMU4_SECURITY_ERROR:
 +              index = (event_type - GAUDI2_EVENT_HMMU7_PAGE_FAULT_WR_PERM) / 3;
 +              mmu_base = mmDCORE2_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_HMMU_7_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_4_AXI_ERR_RSP:
 +              index = (event_type - GAUDI2_EVENT_HMMU_7_AXI_ERR_RSP);
 +              mmu_base = mmDCORE2_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_HMMU15_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_HMMU12_SECURITY_ERROR:
 +              index = (event_type - GAUDI2_EVENT_HMMU15_PAGE_FAULT_WR_PERM) / 3;
 +              mmu_base = mmDCORE3_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_HMMU_15_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_12_AXI_ERR_RSP:
 +              index = (event_type - GAUDI2_EVENT_HMMU_15_AXI_ERR_RSP);
 +              mmu_base = mmDCORE3_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_PMMU0_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_PMMU0_SECURITY_ERROR:
 +      case GAUDI2_EVENT_PMMU_AXI_ERR_RSP_0:
 +              is_pmmu = true;
 +              mmu_base = mmPMMU_HBW_MMU_BASE;
 +              break;
 +      default:
 +              return 0;
 +      }
 +
 +      error_count = gaudi2_handle_mmu_spi_sei_generic(hdev, event_type, mmu_base,
 +                                                      is_pmmu, event_mask);
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +
 +/* returns true if hard reset is required (ECC DERR or Read parity), false otherwise (ECC SERR) */
 +static bool gaudi2_hbm_sei_handle_read_err(struct hl_device *hdev,
 +                      struct hl_eq_hbm_sei_read_err_intr_info *rd_err_data, u32 err_cnt)
 +{
 +      u32 addr, beat, beat_shift;
 +      bool rc = false;
 +
 +      dev_err_ratelimited(hdev->dev,
 +                      "READ ERROR count: ECC SERR: %d, ECC DERR: %d, RD_PARITY: %d\n",
 +                      FIELD_GET(HBM_ECC_SERR_CNTR_MASK, err_cnt),
 +                      FIELD_GET(HBM_ECC_DERR_CNTR_MASK, err_cnt),
 +                      FIELD_GET(HBM_RD_PARITY_CNTR_MASK, err_cnt));
 +
 +      addr = le32_to_cpu(rd_err_data->dbg_rd_err_addr.rd_addr_val);
 +      dev_err_ratelimited(hdev->dev,
 +                      "READ ERROR address: sid(%u), bg(%u), ba(%u), col(%u), row(%u)\n",
 +                      FIELD_GET(HBM_RD_ADDR_SID_MASK, addr),
 +                      FIELD_GET(HBM_RD_ADDR_BG_MASK, addr),
 +                      FIELD_GET(HBM_RD_ADDR_BA_MASK, addr),
 +                      FIELD_GET(HBM_RD_ADDR_COL_MASK, addr),
 +                      FIELD_GET(HBM_RD_ADDR_ROW_MASK, addr));
 +
 +      /* For each beat (RDQS edge), look for possible errors and print relevant info */
 +      for (beat = 0 ; beat < 4 ; beat++) {
 +              if (le32_to_cpu(rd_err_data->dbg_rd_err_misc) &
 +                      (HBM_RD_ERR_SERR_BEAT0_MASK << beat))
 +                      dev_err_ratelimited(hdev->dev, "Beat%d ECC SERR: DM: %#x, Syndrome: %#x\n",
 +                                              beat,
 +                                              le32_to_cpu(rd_err_data->dbg_rd_err_dm),
 +                                              le32_to_cpu(rd_err_data->dbg_rd_err_syndrome));
 +
 +              if (le32_to_cpu(rd_err_data->dbg_rd_err_misc) &
 +                      (HBM_RD_ERR_DERR_BEAT0_MASK << beat)) {
 +                      dev_err_ratelimited(hdev->dev, "Beat%d ECC DERR: DM: %#x, Syndrome: %#x\n",
 +                                              beat,
 +                                              le32_to_cpu(rd_err_data->dbg_rd_err_dm),
 +                                              le32_to_cpu(rd_err_data->dbg_rd_err_syndrome));
 +                      rc |= true;
 +              }
 +
 +              beat_shift = beat * HBM_RD_ERR_BEAT_SHIFT;
 +              if (le32_to_cpu(rd_err_data->dbg_rd_err_misc) &
 +                      (HBM_RD_ERR_PAR_ERR_BEAT0_MASK << beat_shift)) {
 +                      dev_err_ratelimited(hdev->dev,
 +                                      "Beat%d read PARITY: DM: %#x, PAR data: %#x\n",
 +                                      beat,
 +                                      le32_to_cpu(rd_err_data->dbg_rd_err_dm),
 +                                      (le32_to_cpu(rd_err_data->dbg_rd_err_misc) &
 +                                              (HBM_RD_ERR_PAR_DATA_BEAT0_MASK << beat_shift)) >>
 +                                              (HBM_RD_ERR_PAR_DATA_BEAT0_SHIFT + beat_shift));
 +                      rc |= true;
 +              }
 +
 +              dev_err_ratelimited(hdev->dev, "Beat%d DQ data:\n", beat);
 +              dev_err_ratelimited(hdev->dev, "\t0x%08x\n",
 +                                      le32_to_cpu(rd_err_data->dbg_rd_err_data[beat * 2]));
 +              dev_err_ratelimited(hdev->dev, "\t0x%08x\n",
 +                                      le32_to_cpu(rd_err_data->dbg_rd_err_data[beat * 2 + 1]));
 +      }
 +
 +      return rc;
 +}
 +
 +static void gaudi2_hbm_sei_print_wr_par_info(struct hl_device *hdev,
 +                      struct hl_eq_hbm_sei_wr_par_intr_info *wr_par_err_data, u32 err_cnt)
 +{
 +      struct hbm_sei_wr_cmd_address *wr_cmd_addr = wr_par_err_data->dbg_last_wr_cmds;
 +      u32 i, curr_addr, derr = wr_par_err_data->dbg_derr;
 +
 +      dev_err_ratelimited(hdev->dev, "WRITE PARITY ERROR count: %d\n", err_cnt);
 +
 +      dev_err_ratelimited(hdev->dev, "CK-0 DERR: 0x%02x, CK-1 DERR: 0x%02x\n",
 +                              derr & 0x3, derr & 0xc);
 +
 +      /* JIRA H6-3286 - the following prints may not be valid */
 +      dev_err_ratelimited(hdev->dev, "Last latched write commands addresses:\n");
 +      for (i = 0 ; i < HBM_WR_PAR_CMD_LIFO_LEN ; i++) {
 +              curr_addr = le32_to_cpu(wr_cmd_addr[i].dbg_wr_cmd_addr);
 +              dev_err_ratelimited(hdev->dev,
 +                              "\twrite cmd[%u]: Address: SID(%u) BG(%u) BA(%u) COL(%u).\n",
 +                              i,
 +                              FIELD_GET(WR_PAR_LAST_CMD_SID_MASK, curr_addr),
 +                              FIELD_GET(WR_PAR_LAST_CMD_BG_MASK, curr_addr),
 +                              FIELD_GET(WR_PAR_LAST_CMD_BA_MASK, curr_addr),
 +                              FIELD_GET(WR_PAR_LAST_CMD_COL_MASK, curr_addr));
 +      }
 +}
 +
 +static void gaudi2_hbm_sei_print_ca_par_info(struct hl_device *hdev,
 +              struct hl_eq_hbm_sei_ca_par_intr_info *ca_par_err_data, u32 err_cnt)
 +{
 +      __le32 *col_cmd = ca_par_err_data->dbg_col;
 +      __le16 *row_cmd = ca_par_err_data->dbg_row;
 +      u32 i;
 +
 +      dev_err_ratelimited(hdev->dev, "CA ERROR count: %d\n", err_cnt);
 +
 +      dev_err_ratelimited(hdev->dev, "Last latched C&R bus commands:\n");
 +      for (i = 0 ; i < HBM_CA_ERR_CMD_LIFO_LEN ; i++)
 +              dev_err_ratelimited(hdev->dev, "cmd%u: ROW(0x%04x) COL(0x%05x)\n", i,
 +                      le16_to_cpu(row_cmd[i]) & (u16)GENMASK(13, 0),
 +                      le32_to_cpu(col_cmd[i]) & (u32)GENMASK(17, 0));
 +}
 +
 +/* Returns true if hard reset is needed or false otherwise */
 +static bool gaudi2_handle_hbm_mc_sei_err(struct hl_device *hdev, u16 event_type,
 +                                      struct hl_eq_hbm_sei_data *sei_data)
 +{
 +      bool require_hard_reset = false;
 +      u32 hbm_id, mc_id, cause_idx;
 +
 +      hbm_id = (event_type - GAUDI2_EVENT_HBM0_MC0_SEI_SEVERE) / 4;
 +      mc_id = ((event_type - GAUDI2_EVENT_HBM0_MC0_SEI_SEVERE) / 2) % 2;
 +
 +      cause_idx = sei_data->hdr.sei_cause;
 +      if (cause_idx > GAUDI2_NUM_OF_HBM_SEI_CAUSE - 1) {
 +              gaudi2_print_event(hdev, event_type, true,
 +                      "err cause: %s",
 +                      "Invalid HBM SEI event cause (%d) provided by FW\n", cause_idx);
 +              return true;
 +      }
 +
 +      gaudi2_print_event(hdev, event_type, !sei_data->hdr.is_critical,
 +              "System %s Error Interrupt - HBM(%u) MC(%u) MC_CH(%u) MC_PC(%u). Error cause: %s\n",
 +              sei_data->hdr.is_critical ? "Critical" : "Non-critical",
 +              hbm_id, mc_id, sei_data->hdr.mc_channel, sei_data->hdr.mc_pseudo_channel,
 +              hbm_mc_sei_cause[cause_idx]);
 +
 +      /* Print error-specific info */
 +      switch (cause_idx) {
 +      case HBM_SEI_CATTRIP:
 +              require_hard_reset = true;
 +              break;
 +
 +      case  HBM_SEI_CMD_PARITY_EVEN:
 +              gaudi2_hbm_sei_print_ca_par_info(hdev, &sei_data->ca_parity_even_info,
 +                                              le32_to_cpu(sei_data->hdr.cnt));
 +              require_hard_reset = true;
 +              break;
 +
 +      case  HBM_SEI_CMD_PARITY_ODD:
 +              gaudi2_hbm_sei_print_ca_par_info(hdev, &sei_data->ca_parity_odd_info,
 +                                              le32_to_cpu(sei_data->hdr.cnt));
 +              require_hard_reset = true;
 +              break;
 +
 +      case HBM_SEI_WRITE_DATA_PARITY_ERR:
 +              gaudi2_hbm_sei_print_wr_par_info(hdev, &sei_data->wr_parity_info,
 +                                              le32_to_cpu(sei_data->hdr.cnt));
 +              require_hard_reset = true;
 +              break;
 +
 +      case HBM_SEI_READ_ERR:
 +              /* Unlike other SEI events, read error requires further processing of the
 +               * raw data in order to determine the root cause.
 +               */
 +              require_hard_reset = gaudi2_hbm_sei_handle_read_err(hdev,
 +                                                              &sei_data->read_err_info,
 +                                                              le32_to_cpu(sei_data->hdr.cnt));
 +              break;
 +
 +      default:
 +              break;
 +      }
 +
 +      require_hard_reset |= !!sei_data->hdr.is_critical;
 +
 +      return require_hard_reset;
 +}
 +
 +static int gaudi2_handle_hbm_cattrip(struct hl_device *hdev, u16 event_type,
 +                              u64 intr_cause_data)
 +{
 +      if (intr_cause_data) {
 +              gaudi2_print_event(hdev, event_type, true,
 +                      "temperature error cause: %#llx", intr_cause_data);
 +              return 1;
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_handle_hbm_mc_spi(struct hl_device *hdev, u64 intr_cause_data)
 +{
 +      u32 i, error_count = 0;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_HBM_MC_SPI_CAUSE ; i++)
 +              if (intr_cause_data & hbm_mc_spi[i].mask) {
 +                      dev_dbg(hdev->dev, "HBM spi event: notification cause(%s)\n",
 +                              hbm_mc_spi[i].cause);
 +                      error_count++;
 +              }
 +
 +      return error_count;
 +}
 +
 +static void gaudi2_print_clk_change_info(struct hl_device *hdev, u16 event_type, u64 *event_mask)
 +{
 +      ktime_t zero_time = ktime_set(0, 0);
 +
 +      mutex_lock(&hdev->clk_throttling.lock);
 +
 +      switch (event_type) {
 +      case GAUDI2_EVENT_CPU_FIX_POWER_ENV_S:
 +              hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].start = ktime_get();
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = zero_time;
 +              dev_dbg_ratelimited(hdev->dev, "Clock throttling due to power consumption\n");
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_FIX_POWER_ENV_E:
 +              hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = ktime_get();
 +              dev_dbg_ratelimited(hdev->dev, "Power envelop is safe, back to optimal clock\n");
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_S:
 +              hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].start = ktime_get();
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = zero_time;
 +              *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              dev_info_ratelimited(hdev->dev, "Clock throttling due to overheating\n");
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_E:
 +              hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = ktime_get();
 +              *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              dev_info_ratelimited(hdev->dev, "Thermal envelop is safe, back to optimal clock\n");
 +              break;
 +
 +      default:
 +              dev_err(hdev->dev, "Received invalid clock change event %d\n", event_type);
 +              break;
 +      }
 +
 +      mutex_unlock(&hdev->clk_throttling.lock);
 +}
 +
 +static void gaudi2_print_out_of_sync_info(struct hl_device *hdev, u16 event_type,
 +                                      struct cpucp_pkt_sync_err *sync_err)
 +{
 +      struct hl_hw_queue *q = &hdev->kernel_queues[GAUDI2_QUEUE_ID_CPU_PQ];
 +
 +      gaudi2_print_event(hdev, event_type, false,
 +              "FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n",
 +              le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci),
 +              q->pi, atomic_read(&q->ci));
 +}
 +
 +static int gaudi2_handle_pcie_p2p_msix(struct hl_device *hdev, u16 event_type)
 +{
 +      u32 p2p_intr, msix_gw_intr, error_count = 0;
 +
 +      p2p_intr = RREG32(mmPCIE_WRAP_P2P_INTR);
 +      msix_gw_intr = RREG32(mmPCIE_WRAP_MSIX_GW_INTR);
 +
 +      if (p2p_intr) {
 +              gaudi2_print_event(hdev, event_type, true,
 +                      "pcie p2p transaction terminated due to security, req_id(0x%x)\n",
 +                      RREG32(mmPCIE_WRAP_P2P_REQ_ID));
 +
 +              WREG32(mmPCIE_WRAP_P2P_INTR, 0x1);
 +              error_count++;
 +      }
 +
 +      if (msix_gw_intr) {
 +              gaudi2_print_event(hdev, event_type, true,
 +                      "pcie msi-x gen denied due to vector num check failure, vec(0x%X)\n",
 +                      RREG32(mmPCIE_WRAP_MSIX_GW_VEC));
 +
 +              WREG32(mmPCIE_WRAP_MSIX_GW_INTR, 0x1);
 +              error_count++;
 +      }
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_pcie_drain(struct hl_device *hdev,
 +                      struct hl_eq_pcie_drain_ind_data *drain_data)
 +{
 +      u64 lbw_rd, lbw_wr, hbw_rd, hbw_wr, cause, error_count = 0;
 +
 +      cause = le64_to_cpu(drain_data->intr_cause.intr_cause_data);
 +      lbw_rd = le64_to_cpu(drain_data->drain_rd_addr_lbw);
 +      lbw_wr = le64_to_cpu(drain_data->drain_wr_addr_lbw);
 +      hbw_rd = le64_to_cpu(drain_data->drain_rd_addr_hbw);
 +      hbw_wr = le64_to_cpu(drain_data->drain_wr_addr_hbw);
 +
 +      if (cause & BIT_ULL(0)) {
 +              dev_err_ratelimited(hdev->dev,
 +                      "PCIE AXI drain LBW completed, read_err %u, write_err %u\n",
 +                      !!lbw_rd, !!lbw_wr);
 +              error_count++;
 +      }
 +
 +      if (cause & BIT_ULL(1)) {
 +              dev_err_ratelimited(hdev->dev,
 +                      "PCIE AXI drain HBW completed, raddr %#llx, waddr %#llx\n",
 +                      hbw_rd, hbw_wr);
 +              error_count++;
 +      }
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_psoc_drain(struct hl_device *hdev, u64 intr_cause_data)
 +{
 +      u32 error_count = 0;
 +      int i;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_AXI_DRAIN_ERR_CAUSE ; i++) {
 +              if (intr_cause_data & BIT_ULL(i)) {
 +                      dev_err_ratelimited(hdev->dev, "PSOC %s completed\n",
 +                              gaudi2_psoc_axi_drain_interrupts_cause[i]);
 +                      error_count++;
 +              }
 +      }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static void gaudi2_print_cpu_pkt_failure_info(struct hl_device *hdev, u16 event_type,
 +                                      struct cpucp_pkt_sync_err *sync_err)
 +{
 +      struct hl_hw_queue *q = &hdev->kernel_queues[GAUDI2_QUEUE_ID_CPU_PQ];
 +
 +      gaudi2_print_event(hdev, event_type, false,
 +              "FW reported sanity check failure, FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n",
 +              le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci));
 +}
 +
 +static int hl_arc_event_handle(struct hl_device *hdev, u16 event_type,
 +                                      struct hl_eq_engine_arc_intr_data *data)
 +{
 +      struct hl_engine_arc_dccm_queue_full_irq *q;
 +      u32 intr_type, engine_id;
 +      u64 payload;
 +
 +      intr_type = le32_to_cpu(data->intr_type);
 +      engine_id = le32_to_cpu(data->engine_id);
 +      payload = le64_to_cpu(data->payload);
 +
 +      switch (intr_type) {
 +      case ENGINE_ARC_DCCM_QUEUE_FULL_IRQ:
 +              q = (struct hl_engine_arc_dccm_queue_full_irq *) &payload;
 +
 +              gaudi2_print_event(hdev, event_type, true,
 +                              "ARC DCCM Full event: EngId: %u, Intr_type: %u, Qidx: %u\n",
 +                              engine_id, intr_type, q->queue_index);
 +              return 1;
 +      default:
 +              gaudi2_print_event(hdev, event_type, true, "Unknown ARC event type\n");
 +              return 0;
 +      }
 +}
 +
 +static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entry)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      bool reset_required = false, is_critical = false;
 +      u32 index, ctl, reset_flags = HL_DRV_RESET_HARD, error_count = 0;
 +      u64 event_mask = 0;
 +      u16 event_type;
 +
 +      ctl = le32_to_cpu(eq_entry->hdr.ctl);
 +      event_type = ((ctl & EQ_CTL_EVENT_TYPE_MASK) >> EQ_CTL_EVENT_TYPE_SHIFT);
 +
 +      if (event_type >= GAUDI2_EVENT_SIZE) {
 +              dev_err(hdev->dev, "Event type %u exceeds maximum of %u",
 +                              event_type, GAUDI2_EVENT_SIZE - 1);
 +              return;
 +      }
 +
 +      gaudi2->events_stat[event_type]++;
 +      gaudi2->events_stat_aggregate[event_type]++;
 +
 +      switch (event_type) {
 +      case GAUDI2_EVENT_PCIE_CORE_SERR ... GAUDI2_EVENT_ARC0_ECC_DERR:
 +              fallthrough;
 +      case GAUDI2_EVENT_ROTATOR0_SERR ... GAUDI2_EVENT_ROTATOR1_DERR:
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              reset_required = gaudi2_handle_ecc_event(hdev, event_type, &eq_entry->ecc_data);
 +              is_critical = eq_entry->ecc_data.is_critical;
 +              error_count++;
 +              break;
 +
 +      case GAUDI2_EVENT_TPC0_QM ... GAUDI2_EVENT_PDMA1_QM:
 +              fallthrough;
 +      case GAUDI2_EVENT_ROTATOR0_ROT0_QM ... GAUDI2_EVENT_ROTATOR1_ROT1_QM:
 +              fallthrough;
 +      case GAUDI2_EVENT_NIC0_QM0 ... GAUDI2_EVENT_NIC11_QM1:
 +              error_count = gaudi2_handle_qman_err(hdev, event_type, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_ARC_AXI_ERROR_RESPONSE_0:
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              error_count = gaudi2_handle_arc_farm_sei_err(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_AXI_ERR_RSP:
 +              error_count = gaudi2_handle_cpu_sei_err(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PDMA_CH0_AXI_ERR_RSP:
 +      case GAUDI2_EVENT_PDMA_CH1_AXI_ERR_RSP:
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              error_count = gaudi2_handle_qm_sei_err(hdev, event_type, true, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_ROTATOR1_AXI_ERROR_RESPONSE:
 +              index = event_type - GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE;
 +              error_count = gaudi2_handle_rot_err(hdev, index, event_type,
 +                                      &eq_entry->razwi_with_intr_cause, &event_mask);
 +              error_count += gaudi2_handle_qm_sei_err(hdev, event_type, false, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_TPC0_AXI_ERR_RSP ... GAUDI2_EVENT_TPC24_AXI_ERR_RSP:
 +              index = event_type - GAUDI2_EVENT_TPC0_AXI_ERR_RSP;
 +              error_count = gaudi2_tpc_ack_interrupts(hdev, index, event_type,
 +                                              &eq_entry->razwi_with_intr_cause, &event_mask);
 +              error_count += gaudi2_handle_qm_sei_err(hdev, event_type, false, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_DEC0_AXI_ERR_RSPONSE ... GAUDI2_EVENT_DEC9_AXI_ERR_RSPONSE:
 +              index = event_type - GAUDI2_EVENT_DEC0_AXI_ERR_RSPONSE;
 +              error_count = gaudi2_handle_dec_err(hdev, index, event_type, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_TPC0_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC1_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC2_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC3_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC4_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC5_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC6_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC7_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC8_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC9_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC10_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC11_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC12_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC13_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC14_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC15_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC16_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC17_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC18_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC19_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC20_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC21_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC22_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC23_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC24_KERNEL_ERR:
 +              index = (event_type - GAUDI2_EVENT_TPC0_KERNEL_ERR) /
 +                      (GAUDI2_EVENT_TPC1_KERNEL_ERR - GAUDI2_EVENT_TPC0_KERNEL_ERR);
 +              error_count = gaudi2_tpc_ack_interrupts(hdev, index, event_type,
 +                                      &eq_entry->razwi_with_intr_cause, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_DEC0_SPI:
 +      case GAUDI2_EVENT_DEC1_SPI:
 +      case GAUDI2_EVENT_DEC2_SPI:
 +      case GAUDI2_EVENT_DEC3_SPI:
 +      case GAUDI2_EVENT_DEC4_SPI:
 +      case GAUDI2_EVENT_DEC5_SPI:
 +      case GAUDI2_EVENT_DEC6_SPI:
 +      case GAUDI2_EVENT_DEC7_SPI:
 +      case GAUDI2_EVENT_DEC8_SPI:
 +      case GAUDI2_EVENT_DEC9_SPI:
 +              index = (event_type - GAUDI2_EVENT_DEC0_SPI) /
 +                              (GAUDI2_EVENT_DEC1_SPI - GAUDI2_EVENT_DEC0_SPI);
 +              error_count = gaudi2_handle_dec_err(hdev, index, event_type, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_MME1_CTRL_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_MME2_CTRL_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_MME3_CTRL_AXI_ERROR_RESPONSE:
 +              index = (event_type - GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE) /
 +                              (GAUDI2_EVENT_MME1_CTRL_AXI_ERROR_RESPONSE -
 +                                              GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE);
 +              error_count = gaudi2_handle_mme_err(hdev, index, event_type, &event_mask);
 +              error_count += gaudi2_handle_qm_sei_err(hdev, event_type, false, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_MME0_QMAN_SW_ERROR:
 +      case GAUDI2_EVENT_MME1_QMAN_SW_ERROR:
 +      case GAUDI2_EVENT_MME2_QMAN_SW_ERROR:
 +      case GAUDI2_EVENT_MME3_QMAN_SW_ERROR:
 +              index = (event_type - GAUDI2_EVENT_MME0_QMAN_SW_ERROR) /
 +                              (GAUDI2_EVENT_MME1_QMAN_SW_ERROR -
 +                                      GAUDI2_EVENT_MME0_QMAN_SW_ERROR);
 +              error_count = gaudi2_handle_mme_err(hdev, index, event_type, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_MME0_WAP_SOURCE_RESULT_INVALID:
 +      case GAUDI2_EVENT_MME1_WAP_SOURCE_RESULT_INVALID:
 +      case GAUDI2_EVENT_MME2_WAP_SOURCE_RESULT_INVALID:
 +      case GAUDI2_EVENT_MME3_WAP_SOURCE_RESULT_INVALID:
 +              index = (event_type - GAUDI2_EVENT_MME0_WAP_SOURCE_RESULT_INVALID) /
 +                              (GAUDI2_EVENT_MME1_WAP_SOURCE_RESULT_INVALID -
 +                                      GAUDI2_EVENT_MME0_WAP_SOURCE_RESULT_INVALID);
 +              error_count = gaudi2_handle_mme_wap_err(hdev, index, event_type, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_KDMA_CH0_AXI_ERR_RSP:
 +      case GAUDI2_EVENT_KDMA0_CORE:
 +              error_count = gaudi2_handle_kdma_core_event(hdev, event_type,
 +                                      le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_HDMA2_CORE ... GAUDI2_EVENT_PDMA1_CORE:
 +              error_count = gaudi2_handle_dma_core_event(hdev, event_type,
 +                                      le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PCIE_ADDR_DEC_ERR:
 +              error_count = gaudi2_print_pcie_addr_dec_info(hdev, event_type,
 +                              le64_to_cpu(eq_entry->intr_cause.intr_cause_data), &event_mask);
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_HMMU0_PAGE_FAULT_OR_WR_PERM ... GAUDI2_EVENT_HMMU12_SECURITY_ERROR:
 +      case GAUDI2_EVENT_HMMU_0_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_12_AXI_ERR_RSP:
 +      case GAUDI2_EVENT_PMMU0_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_PMMU0_SECURITY_ERROR:
 +      case GAUDI2_EVENT_PMMU_AXI_ERR_RSP_0:
 +              error_count = gaudi2_handle_mmu_spi_sei_err(hdev, event_type, &event_mask);
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_HIF0_FATAL ... GAUDI2_EVENT_HIF12_FATAL:
 +              error_count = gaudi2_handle_hif_fatal(hdev, event_type,
 +                              le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PMMU_FATAL_0:
 +              error_count = gaudi2_handle_pif_fatal(hdev, event_type,
 +                              le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PSOC63_RAZWI_OR_PID_MIN_MAX_INTERRUPT:
 +              error_count = gaudi2_ack_psoc_razwi_event_handler(hdev, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_HBM0_MC0_SEI_SEVERE ... GAUDI2_EVENT_HBM5_MC1_SEI_NON_SEVERE:
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              if (gaudi2_handle_hbm_mc_sei_err(hdev, event_type, &eq_entry->sei_data)) {
 +                      reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +                      reset_required = true;
 +              }
 +              error_count++;
 +              break;
 +
 +      case GAUDI2_EVENT_HBM_CATTRIP_0 ... GAUDI2_EVENT_HBM_CATTRIP_5:
 +              error_count = gaudi2_handle_hbm_cattrip(hdev, event_type,
 +                              le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_HBM0_MC0_SPI ... GAUDI2_EVENT_HBM5_MC1_SPI:
 +              error_count = gaudi2_handle_hbm_mc_spi(hdev,
 +                              le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PCIE_DRAIN_COMPLETE:
 +              error_count = gaudi2_handle_pcie_drain(hdev, &eq_entry->pcie_drain_ind_data);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PSOC59_RPM_ERROR_OR_DRAIN:
 +              error_count = gaudi2_handle_psoc_drain(hdev,
 +                              le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_AXI_ECC:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_CPU_L2_RAM_ECC:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_MME0_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME0_SBTE4_AXI_ERR_RSP:
 +      case GAUDI2_EVENT_MME1_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME1_SBTE4_AXI_ERR_RSP:
 +      case GAUDI2_EVENT_MME2_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME2_SBTE4_AXI_ERR_RSP:
 +      case GAUDI2_EVENT_MME3_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME3_SBTE4_AXI_ERR_RSP:
 +              error_count = gaudi2_handle_mme_sbte_err(hdev, event_type,
 +                                              le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +      case GAUDI2_EVENT_VM0_ALARM_A ... GAUDI2_EVENT_VM3_ALARM_B:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_PSOC_AXI_ERR_RSP:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_PSOC_PRSTN_FALL:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_PCIE_APB_TIMEOUT:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_PCIE_FATAL_ERR:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_TPC0_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC1_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC2_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC3_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC4_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC5_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC6_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC7_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC8_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC9_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC10_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC11_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC12_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC13_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC14_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC15_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC16_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC17_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC18_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC19_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC20_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC21_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC22_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC23_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC24_BMON_SPMU:
 +      case GAUDI2_EVENT_MME0_CTRL_BMON_SPMU:
 +      case GAUDI2_EVENT_MME0_SBTE_BMON_SPMU:
 +      case GAUDI2_EVENT_MME0_WAP_BMON_SPMU:
 +      case GAUDI2_EVENT_MME1_CTRL_BMON_SPMU:
 +      case GAUDI2_EVENT_MME1_SBTE_BMON_SPMU:
 +      case GAUDI2_EVENT_MME1_WAP_BMON_SPMU:
 +      case GAUDI2_EVENT_MME2_CTRL_BMON_SPMU:
 +      case GAUDI2_EVENT_MME2_SBTE_BMON_SPMU:
 +      case GAUDI2_EVENT_MME2_WAP_BMON_SPMU:
 +      case GAUDI2_EVENT_MME3_CTRL_BMON_SPMU:
 +      case GAUDI2_EVENT_MME3_SBTE_BMON_SPMU:
 +      case GAUDI2_EVENT_MME3_WAP_BMON_SPMU:
 +      case GAUDI2_EVENT_HDMA2_BM_SPMU ... GAUDI2_EVENT_PDMA1_BM_SPMU:
 +              fallthrough;
 +      case GAUDI2_EVENT_DEC0_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC1_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC2_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC3_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC4_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC5_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC6_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC7_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC8_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC9_BMON_SPMU:
 +      case GAUDI2_EVENT_ROTATOR0_BMON_SPMU ... GAUDI2_EVENT_SM3_BMON_SPMU:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_FIX_POWER_ENV_S:
 +      case GAUDI2_EVENT_CPU_FIX_POWER_ENV_E:
 +      case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_S:
 +      case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_E:
 +              gaudi2_print_clk_change_info(hdev, event_type, &event_mask);
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_PKT_QUEUE_OUT_SYNC:
 +              gaudi2_print_out_of_sync_info(hdev, event_type, &eq_entry->pkt_sync_err);
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PCIE_FLR_REQUESTED:
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              /* Do nothing- FW will handle it */
 +              break;
 +
 +      case GAUDI2_EVENT_PCIE_P2P_MSIX:
 +              error_count = gaudi2_handle_pcie_p2p_msix(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_SM0_AXI_ERROR_RESPONSE ... GAUDI2_EVENT_SM3_AXI_ERROR_RESPONSE:
 +              index = event_type - GAUDI2_EVENT_SM0_AXI_ERROR_RESPONSE;
 +              error_count = gaudi2_handle_sm_err(hdev, event_type, index);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PSOC_MME_PLL_LOCK_ERR ... GAUDI2_EVENT_DCORE2_HBM_PLL_LOCK_ERR:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_CPLD_SHUTDOWN_CAUSE:
 +              dev_info(hdev->dev, "CPLD shutdown cause, reset reason: 0x%llx\n",
 +                                              le64_to_cpu(eq_entry->data[0]));
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_CPU_CPLD_SHUTDOWN_EVENT:
 +              dev_err(hdev->dev, "CPLD shutdown event, reset reason: 0x%llx\n",
 +                                              le64_to_cpu(eq_entry->data[0]));
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_PKT_SANITY_FAILED:
 +              gaudi2_print_cpu_pkt_failure_info(hdev, event_type, &eq_entry->pkt_sync_err);
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_ARC_DCCM_FULL:
 +              error_count = hl_arc_event_handle(hdev, event_type, &eq_entry->arc_data);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_FP32_NOT_SUPPORTED:
 +      case GAUDI2_EVENT_DEV_RESET_REQ:
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              is_critical = true;
 +              break;
 +
 +      default:
 +              if (gaudi2_irq_map_table[event_type].valid) {
 +                      dev_err_ratelimited(hdev->dev, "Cannot find handler for event %d\n",
 +                                              event_type);
 +                      error_count = GAUDI2_NA_EVENT_CAUSE;
 +              }
 +      }
 +
 +      /* Make sure to dump an error in case no error cause was printed so far.
 +       * Note that although we have counted the errors, we use this number as
 +       * a boolean.
 +       */
 +      if (error_count == GAUDI2_NA_EVENT_CAUSE && !is_info_event(event_type))
 +              gaudi2_print_event(hdev, event_type, true, "%d", event_type);
 +      else if (error_count == 0)
 +              gaudi2_print_event(hdev, event_type, true,
 +                              "No error cause for H/W event %u\n", event_type);
 +
 +      if ((gaudi2_irq_map_table[event_type].reset || reset_required) &&
 +                              (hdev->hard_reset_on_fw_events ||
 +                              (hdev->asic_prop.fw_security_enabled && is_critical)))
 +              goto reset_device;
 +
 +      /* Send unmask irq only for interrupts not classified as MSG */
 +      if (!gaudi2_irq_map_table[event_type].msg)
 +              hl_fw_unmask_irq(hdev, event_type);
 +
 +      if (event_mask)
 +              hl_notifier_event_send_all(hdev, event_mask);
 +
 +      return;
 +
 +reset_device:
 +      if (hdev->asic_prop.fw_security_enabled && is_critical) {
 +              reset_flags |= HL_DRV_RESET_BYPASS_REQ_TO_FW;
 +              event_mask |= HL_NOTIFIER_EVENT_DEVICE_UNAVAILABLE;
 +      } else {
 +              reset_flags |= HL_DRV_RESET_DELAY;
 +      }
 +      event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
 +      hl_device_cond_reset(hdev, reset_flags, event_mask);
 +}
 +
 +static int gaudi2_memset_memory_chunk_using_edma_qm(struct hl_device *hdev,
 +                      struct packet_lin_dma *lin_dma_pkt, dma_addr_t pkt_dma_addr,
 +                      u32 hw_queue_id, u32 size, u64 addr, u32 val)
 +{
 +      u32 ctl, pkt_size;
 +      int rc = 0;
 +
 +      ctl = FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_LIN_DMA);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_LIN_DMA_CTL_MEMSET_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_LIN_DMA_CTL_WRCOMP_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, 1);
 +
 +      lin_dma_pkt->ctl = cpu_to_le32(ctl);
 +      lin_dma_pkt->src_addr = cpu_to_le64(val);
 +      lin_dma_pkt->dst_addr = cpu_to_le64(addr);
 +      lin_dma_pkt->tsize = cpu_to_le32(size);
 +
 +      pkt_size = sizeof(struct packet_lin_dma);
 +
 +      rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id, pkt_size, pkt_dma_addr);
 +      if (rc)
 +              dev_err(hdev->dev, "Failed to send lin dma packet to H/W queue %d\n",
 +                              hw_queue_id);
 +
 +      return rc;
 +}
 +
 +static int gaudi2_memset_device_memory(struct hl_device *hdev, u64 addr, u64 size, u64 val)
 +{
 +      u32 edma_queues_id[] = {GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0,
 +                                      GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0,
 +                                      GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0,
 +                                      GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0};
 +      u32 chunk_size, dcore, edma_idx, sob_offset, sob_addr, comp_val,
 +              old_mmubp, mmubp, num_of_pkts, busy, pkt_size;
 +      u64 comp_addr, cur_addr = addr, end_addr = addr + size;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      void *lin_dma_pkts_arr;
 +      dma_addr_t pkt_dma_addr;
 +      int rc = 0, dma_num = 0;
 +
 +      if (prop->edma_enabled_mask == 0) {
 +              dev_info(hdev->dev, "non of the EDMA engines is enabled - skip dram scrubbing\n");
 +              return -EIO;
 +      }
 +
 +      sob_offset = hdev->asic_prop.first_available_user_sob[0] * 4;
 +      sob_addr = mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset;
 +      comp_addr = CFG_BASE + sob_addr;
 +      comp_val = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_INC_MASK, 1) |
 +              FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_VAL_MASK, 1);
 +      mmubp = FIELD_PREP(ARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP_WR_MASK, 1) |
 +              FIELD_PREP(ARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP_RD_MASK, 1);
 +
 +      /* Calculate how many lin dma pkts we'll need */
 +      num_of_pkts = div64_u64(round_up(size, SZ_2G), SZ_2G);
 +      pkt_size = sizeof(struct packet_lin_dma);
 +
 +      lin_dma_pkts_arr = hl_asic_dma_alloc_coherent(hdev, pkt_size * num_of_pkts,
 +                                      &pkt_dma_addr, GFP_KERNEL);
 +      if (!lin_dma_pkts_arr)
 +              return -ENOMEM;
 +
 +      /*
 +       * set mmu bypass for the scrubbing - all ddmas are configured the same so save
 +       * only the first one to restore later
 +       * also set the sob addr for all edma cores for completion.
 +       * set QM as trusted to allow it to access physical address with MMU bp.
 +       */
 +      old_mmubp = RREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_MMU_BP);
 +      for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
 +              for (edma_idx = 0 ; edma_idx < NUM_OF_EDMA_PER_DCORE ; edma_idx++) {
 +                      u32 edma_offset = dcore * DCORE_OFFSET + edma_idx * DCORE_EDMA_OFFSET;
 +                      u32 edma_bit = dcore * NUM_OF_EDMA_PER_DCORE + edma_idx;
 +
 +                      if (!(prop->edma_enabled_mask & BIT(edma_bit)))
 +                              continue;
 +
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_MMU_BP +
 +                                      edma_offset, mmubp);
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_ADDR_LO + edma_offset,
 +                                      lower_32_bits(comp_addr));
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_ADDR_HI + edma_offset,
 +                                      upper_32_bits(comp_addr));
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_WDATA + edma_offset,
 +                                      comp_val);
 +                      gaudi2_qman_set_test_mode(hdev,
 +                                      edma_queues_id[dcore] + 4 * edma_idx, true);
 +              }
 +      }
 +
 +      WREG32(sob_addr, 0);
 +
 +      while (cur_addr < end_addr) {
 +              for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
 +                      for (edma_idx = 0 ; edma_idx < NUM_OF_EDMA_PER_DCORE ; edma_idx++) {
 +                              u32 edma_bit = dcore * NUM_OF_EDMA_PER_DCORE + edma_idx;
 +
 +                              if (!(prop->edma_enabled_mask & BIT(edma_bit)))
 +                                      continue;
 +
 +                              chunk_size = min_t(u64, SZ_2G, end_addr - cur_addr);
 +
 +                              rc = gaudi2_memset_memory_chunk_using_edma_qm(hdev,
 +                                      (struct packet_lin_dma *)lin_dma_pkts_arr + dma_num,
 +                                      pkt_dma_addr + dma_num * pkt_size,
 +                                      edma_queues_id[dcore] + edma_idx * 4,
 +                                      chunk_size, cur_addr, val);
 +                              if (rc)
 +                                      goto end;
 +
 +                              dma_num++;
 +                              cur_addr += chunk_size;
 +                              if (cur_addr == end_addr)
 +                                      break;
 +                      }
 +              }
 +      }
 +
 +      rc = hl_poll_timeout(hdev, sob_addr, busy, (busy == dma_num), 1000, 1000000);
 +      if (rc) {
 +              dev_err(hdev->dev, "DMA Timeout during HBM scrubbing\n");
 +              goto end;
 +      }
 +end:
 +      for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
 +              for (edma_idx = 0 ; edma_idx < NUM_OF_EDMA_PER_DCORE ; edma_idx++) {
 +                      u32 edma_offset = dcore * DCORE_OFFSET + edma_idx * DCORE_EDMA_OFFSET;
 +                      u32 edma_bit = dcore * NUM_OF_EDMA_PER_DCORE + edma_idx;
 +
 +                      if (!(prop->edma_enabled_mask & BIT(edma_bit)))
 +                              continue;
 +
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_MMU_BP + edma_offset, old_mmubp);
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_ADDR_LO + edma_offset, 0);
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_ADDR_HI + edma_offset, 0);
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_WDATA + edma_offset, 0);
 +                      gaudi2_qman_set_test_mode(hdev,
 +                                      edma_queues_id[dcore] + 4 * edma_idx, false);
 +              }
 +      }
 +
 +      WREG32(sob_addr, 0);
 +      hl_asic_dma_free_coherent(hdev, pkt_size * num_of_pkts, lin_dma_pkts_arr, pkt_dma_addr);
 +
 +      return rc;
 +}
 +
 +static int gaudi2_scrub_device_dram(struct hl_device *hdev, u64 val)
 +{
 +      int rc;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 size = prop->dram_end_address - prop->dram_user_base_address;
 +
 +      rc = gaudi2_memset_device_memory(hdev, prop->dram_user_base_address, size, val);
 +
 +      if (rc)
 +              dev_err(hdev->dev, "Failed to scrub dram, address: 0x%llx size: %llu\n",
 +                              prop->dram_user_base_address, size);
 +      return rc;
 +}
 +
 +static int gaudi2_scrub_device_mem(struct hl_device *hdev)
 +{
 +      int rc;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 val = hdev->memory_scrub_val;
 +      u64 addr, size;
 +
 +      if (!hdev->memory_scrub)
 +              return 0;
 +
 +      /* scrub SRAM */
 +      addr = prop->sram_user_base_address;
 +      size = hdev->pldm ? 0x10000 : (prop->sram_size - SRAM_USER_BASE_OFFSET);
 +      dev_dbg(hdev->dev, "Scrubbing SRAM: 0x%09llx - 0x%09llx, val: 0x%llx\n",
 +                      addr, addr + size, val);
 +      rc = gaudi2_memset_device_memory(hdev, addr, size, val);
 +      if (rc) {
 +              dev_err(hdev->dev, "scrubbing SRAM failed (%d)\n", rc);
 +              return rc;
 +      }
 +
 +      /* scrub DRAM */
 +      rc = gaudi2_scrub_device_dram(hdev, val);
 +      if (rc) {
 +              dev_err(hdev->dev, "scrubbing DRAM failed (%d)\n", rc);
 +              return rc;
 +      }
 +      return 0;
 +}
 +
 +static void gaudi2_restore_user_sm_registers(struct hl_device *hdev)
 +{
 +      u64 addr, mon_sts_addr, mon_cfg_addr, cq_lbw_l_addr, cq_lbw_h_addr,
 +              cq_lbw_data_addr, cq_base_l_addr, cq_base_h_addr, cq_size_addr;
 +      u32 val, size, offset;
 +      int dcore_id;
 +
 +      offset = hdev->asic_prop.first_available_cq[0] * 4;
 +      cq_lbw_l_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0 + offset;
 +      cq_lbw_h_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0 + offset;
 +      cq_lbw_data_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_DATA_0 + offset;
 +      cq_base_l_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_0 + offset;
 +      cq_base_h_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_0 + offset;
 +      cq_size_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_SIZE_LOG2_0 + offset;
 +      size = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0 -
 +                      (mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0 + offset);
 +
 +      /* memset dcore0 CQ registers */
 +      gaudi2_memset_device_lbw(hdev, cq_lbw_l_addr, size, 0);
 +      gaudi2_memset_device_lbw(hdev, cq_lbw_h_addr, size, 0);
 +      gaudi2_memset_device_lbw(hdev, cq_lbw_data_addr, size, 0);
 +      gaudi2_memset_device_lbw(hdev, cq_base_l_addr, size, 0);
 +      gaudi2_memset_device_lbw(hdev, cq_base_h_addr, size, 0);
 +      gaudi2_memset_device_lbw(hdev, cq_size_addr, size, 0);
 +
 +      cq_lbw_l_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0 + DCORE_OFFSET;
 +      cq_lbw_h_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0 + DCORE_OFFSET;
 +      cq_lbw_data_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_DATA_0 + DCORE_OFFSET;
 +      cq_base_l_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_0 + DCORE_OFFSET;
 +      cq_base_h_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_0 + DCORE_OFFSET;
 +      cq_size_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_SIZE_LOG2_0 + DCORE_OFFSET;
 +      size = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0 - mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0;
 +
 +      for (dcore_id = 1 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
 +              gaudi2_memset_device_lbw(hdev, cq_lbw_l_addr, size, 0);
 +              gaudi2_memset_device_lbw(hdev, cq_lbw_h_addr, size, 0);
 +              gaudi2_memset_device_lbw(hdev, cq_lbw_data_addr, size, 0);
 +              gaudi2_memset_device_lbw(hdev, cq_base_l_addr, size, 0);
 +              gaudi2_memset_device_lbw(hdev, cq_base_h_addr, size, 0);
 +              gaudi2_memset_device_lbw(hdev, cq_size_addr, size, 0);
 +
 +              cq_lbw_l_addr += DCORE_OFFSET;
 +              cq_lbw_h_addr += DCORE_OFFSET;
 +              cq_lbw_data_addr += DCORE_OFFSET;
 +              cq_base_l_addr += DCORE_OFFSET;
 +              cq_base_h_addr += DCORE_OFFSET;
 +              cq_size_addr += DCORE_OFFSET;
 +      }
 +
 +      offset = hdev->asic_prop.first_available_user_mon[0] * 4;
 +      addr = mmDCORE0_SYNC_MNGR_OBJS_MON_STATUS_0 + offset;
 +      val = 1 << DCORE0_SYNC_MNGR_OBJS_MON_STATUS_PROT_SHIFT;
 +      size = mmDCORE0_SYNC_MNGR_OBJS_SM_SEC_0 - (mmDCORE0_SYNC_MNGR_OBJS_MON_STATUS_0 + offset);
 +
 +      /* memset dcore0 monitors */
 +      gaudi2_memset_device_lbw(hdev, addr, size, val);
 +
 +      addr = mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + offset;
 +      gaudi2_memset_device_lbw(hdev, addr, size, 0);
 +
 +      mon_sts_addr = mmDCORE0_SYNC_MNGR_OBJS_MON_STATUS_0 + DCORE_OFFSET;
 +      mon_cfg_addr = mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + DCORE_OFFSET;
 +      size = mmDCORE0_SYNC_MNGR_OBJS_SM_SEC_0 - mmDCORE0_SYNC_MNGR_OBJS_MON_STATUS_0;
 +
 +      for (dcore_id = 1 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
 +              gaudi2_memset_device_lbw(hdev, mon_sts_addr, size, val);
 +              gaudi2_memset_device_lbw(hdev, mon_cfg_addr, size, 0);
 +              mon_sts_addr += DCORE_OFFSET;
 +              mon_cfg_addr += DCORE_OFFSET;
 +      }
 +
 +      offset = hdev->asic_prop.first_available_user_sob[0] * 4;
 +      addr = mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + offset;
 +      val = 0;
 +      size = mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 -
 +                      (mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + offset);
 +
 +      /* memset dcore0 sobs */
 +      gaudi2_memset_device_lbw(hdev, addr, size, val);
 +
 +      addr = mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + DCORE_OFFSET;
 +      size = mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 - mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0;
 +
 +      for (dcore_id = 1 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
 +              gaudi2_memset_device_lbw(hdev, addr, size, val);
 +              addr += DCORE_OFFSET;
 +      }
 +
 +      /* Flush all WREG to prevent race */
 +      val = RREG32(mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + offset);
 +}
 +
 +static void gaudi2_restore_user_qm_registers(struct hl_device *hdev)
 +{
 +      u32 reg_base, hw_queue_id;
 +
 +      for (hw_queue_id = GAUDI2_QUEUE_ID_PDMA_0_0 ; hw_queue_id <= GAUDI2_QUEUE_ID_ROT_1_0;
 +                                                      hw_queue_id += NUM_OF_PQ_PER_QMAN) {
 +              if (!gaudi2_is_queue_enabled(hdev, hw_queue_id))
 +                      continue;
 +
 +              gaudi2_clear_qm_fence_counters_common(hdev, hw_queue_id, false);
 +
 +              reg_base = gaudi2_qm_blocks_bases[hw_queue_id];
 +              WREG32(reg_base + QM_ARB_CFG_0_OFFSET, 0);
 +      }
 +
 +      /* Flush all WREG to prevent race */
 +      RREG32(mmPDMA0_QM_ARB_CFG_0);
 +}
 +
 +static void gaudi2_restore_nic_qm_registers(struct hl_device *hdev)
 +{
 +      u32 reg_base, hw_queue_id;
 +
 +      for (hw_queue_id = GAUDI2_QUEUE_ID_NIC_0_0 ; hw_queue_id <= GAUDI2_QUEUE_ID_NIC_23_3;
 +                                                      hw_queue_id += NUM_OF_PQ_PER_QMAN) {
 +              if (!gaudi2_is_queue_enabled(hdev, hw_queue_id))
 +                      continue;
 +
 +              gaudi2_clear_qm_fence_counters_common(hdev, hw_queue_id, false);
 +
 +              reg_base = gaudi2_qm_blocks_bases[hw_queue_id];
 +              WREG32(reg_base + QM_ARB_CFG_0_OFFSET, 0);
 +      }
 +
 +      /* Flush all WREG to prevent race */
 +      RREG32(mmPDMA0_QM_ARB_CFG_0);
 +}
 +
 +static int gaudi2_context_switch(struct hl_device *hdev, u32 asid)
 +{
 +      return 0;
 +}
 +
 +static void gaudi2_restore_phase_topology(struct hl_device *hdev)
 +{
 +}
 +
 +static void gaudi2_init_block_instances(struct hl_device *hdev, u32 block_idx,
 +                                              struct dup_block_ctx *cfg_ctx)
 +{
 +      u64 block_base = cfg_ctx->base + block_idx * cfg_ctx->block_off;
 +      u8 seq;
 +      int i;
 +
 +      for (i = 0 ; i < cfg_ctx->instances ; i++) {
 +              seq = block_idx * cfg_ctx->instances + i;
 +
 +              /* skip disabled instance */
 +              if (!(cfg_ctx->enabled_mask & BIT_ULL(seq)))
 +                      continue;
 +
 +              cfg_ctx->instance_cfg_fn(hdev, block_base + i * cfg_ctx->instance_off,
 +                                      cfg_ctx->data);
 +      }
 +}
 +
 +static void gaudi2_init_blocks_with_mask(struct hl_device *hdev, struct dup_block_ctx *cfg_ctx,
 +                                              u64 mask)
 +{
 +      int i;
 +
 +      cfg_ctx->enabled_mask = mask;
 +
 +      for (i = 0 ; i < cfg_ctx->blocks ; i++)
 +              gaudi2_init_block_instances(hdev, i, cfg_ctx);
 +}
 +
 +void gaudi2_init_blocks(struct hl_device *hdev, struct dup_block_ctx *cfg_ctx)
 +{
 +      gaudi2_init_blocks_with_mask(hdev, cfg_ctx, U64_MAX);
 +}
 +
 +static int gaudi2_debugfs_read_dma(struct hl_device *hdev, u64 addr, u32 size, void *blob_addr)
 +{
 +      void *host_mem_virtual_addr;
 +      dma_addr_t host_mem_dma_addr;
 +      u64 reserved_va_base;
 +      u32 pos, size_left, size_to_dma;
 +      struct hl_ctx *ctx;
 +      int rc = 0;
 +
 +      /* Fetch the ctx */
 +      ctx = hl_get_compute_ctx(hdev);
 +      if (!ctx) {
 +              dev_err(hdev->dev, "No ctx available\n");
 +              return -EINVAL;
 +      }
 +
 +      /* Allocate buffers for read and for poll */
 +      host_mem_virtual_addr = hl_asic_dma_alloc_coherent(hdev, SZ_2M, &host_mem_dma_addr,
 +                                                              GFP_KERNEL | __GFP_ZERO);
 +      if (host_mem_virtual_addr == NULL) {
 +              dev_err(hdev->dev, "Failed to allocate memory for KDMA read\n");
 +              rc = -ENOMEM;
 +              goto put_ctx;
 +      }
 +
 +      /* Reserve VM region on asic side */
 +      reserved_va_base = hl_reserve_va_block(hdev, ctx, HL_VA_RANGE_TYPE_HOST, SZ_2M,
 +                                              HL_MMU_VA_ALIGNMENT_NOT_NEEDED);
 +      if (!reserved_va_base) {
 +              dev_err(hdev->dev, "Failed to reserve vmem on asic\n");
 +              rc = -ENOMEM;
 +              goto free_data_buffer;
 +      }
 +
 +      /* Create mapping on asic side */
 +      mutex_lock(&hdev->mmu_lock);
 +      rc = hl_mmu_map_contiguous(ctx, reserved_va_base, host_mem_dma_addr, SZ_2M);
 +      hl_mmu_invalidate_cache_range(hdev, false,
 +                                    MMU_OP_USERPTR | MMU_OP_SKIP_LOW_CACHE_INV,
 +                                    ctx->asid, reserved_va_base, SZ_2M);
 +      mutex_unlock(&hdev->mmu_lock);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to create mapping on asic mmu\n");
 +              goto unreserve_va;
 +      }
 +
 +      /* Enable MMU on KDMA */
 +      gaudi2_kdma_set_mmbp_asid(hdev, false, ctx->asid);
 +
 +      pos = 0;
 +      size_left = size;
 +      size_to_dma = SZ_2M;
 +
 +      while (size_left > 0) {
 +              if (size_left < SZ_2M)
 +                      size_to_dma = size_left;
 +
 +              rc = gaudi2_send_job_to_kdma(hdev, addr, reserved_va_base, size_to_dma, false);
 +              if (rc)
 +                      break;
 +
 +              memcpy(blob_addr + pos, host_mem_virtual_addr, size_to_dma);
 +
 +              if (size_left <= SZ_2M)
 +                      break;
 +
 +              pos += SZ_2M;
 +              addr += SZ_2M;
 +              size_left -= SZ_2M;
 +      }
 +
 +      gaudi2_kdma_set_mmbp_asid(hdev, true, HL_KERNEL_ASID_ID);
 +
 +      mutex_lock(&hdev->mmu_lock);
 +      hl_mmu_unmap_contiguous(ctx, reserved_va_base, SZ_2M);
 +      hl_mmu_invalidate_cache_range(hdev, false, MMU_OP_USERPTR,
 +                                    ctx->asid, reserved_va_base, SZ_2M);
 +      mutex_unlock(&hdev->mmu_lock);
 +unreserve_va:
 +      hl_unreserve_va_block(hdev, ctx, reserved_va_base, SZ_2M);
 +free_data_buffer:
 +      hl_asic_dma_free_coherent(hdev, SZ_2M, host_mem_virtual_addr, host_mem_dma_addr);
 +put_ctx:
 +      hl_ctx_put(ctx);
 +
 +      return rc;
 +}
 +
 +static int gaudi2_internal_cb_pool_init(struct hl_device *hdev, struct hl_ctx *ctx)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int min_alloc_order, rc;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_PMMU))
 +              return 0;
 +
 +      hdev->internal_cb_pool_virt_addr = hl_asic_dma_alloc_coherent(hdev,
 +                                                              HOST_SPACE_INTERNAL_CB_SZ,
 +                                                              &hdev->internal_cb_pool_dma_addr,
 +                                                              GFP_KERNEL | __GFP_ZERO);
 +
 +      if (!hdev->internal_cb_pool_virt_addr)
 +              return -ENOMEM;
 +
 +      min_alloc_order = ilog2(min(gaudi2_get_signal_cb_size(hdev),
 +                                      gaudi2_get_wait_cb_size(hdev)));
 +
 +      hdev->internal_cb_pool = gen_pool_create(min_alloc_order, -1);
 +      if (!hdev->internal_cb_pool) {
 +              dev_err(hdev->dev, "Failed to create internal CB pool\n");
 +              rc = -ENOMEM;
 +              goto free_internal_cb_pool;
 +      }
 +
 +      rc = gen_pool_add(hdev->internal_cb_pool, (uintptr_t) hdev->internal_cb_pool_virt_addr,
 +                              HOST_SPACE_INTERNAL_CB_SZ, -1);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to add memory to internal CB pool\n");
 +              rc = -EFAULT;
 +              goto destroy_internal_cb_pool;
 +      }
 +
 +      hdev->internal_cb_va_base = hl_reserve_va_block(hdev, ctx, HL_VA_RANGE_TYPE_HOST,
 +                                      HOST_SPACE_INTERNAL_CB_SZ, HL_MMU_VA_ALIGNMENT_NOT_NEEDED);
 +
 +      if (!hdev->internal_cb_va_base) {
 +              rc = -ENOMEM;
 +              goto destroy_internal_cb_pool;
 +      }
 +
 +      mutex_lock(&hdev->mmu_lock);
 +      rc = hl_mmu_map_contiguous(ctx, hdev->internal_cb_va_base, hdev->internal_cb_pool_dma_addr,
 +                                      HOST_SPACE_INTERNAL_CB_SZ);
 +      hl_mmu_invalidate_cache(hdev, false, MMU_OP_USERPTR);
 +      mutex_unlock(&hdev->mmu_lock);
 +
 +      if (rc)
 +              goto unreserve_internal_cb_pool;
 +
 +      return 0;
 +
 +unreserve_internal_cb_pool:
 +      hl_unreserve_va_block(hdev, ctx, hdev->internal_cb_va_base, HOST_SPACE_INTERNAL_CB_SZ);
 +destroy_internal_cb_pool:
 +      gen_pool_destroy(hdev->internal_cb_pool);
 +free_internal_cb_pool:
 +      hl_asic_dma_free_coherent(hdev, HOST_SPACE_INTERNAL_CB_SZ, hdev->internal_cb_pool_virt_addr,
 +                                      hdev->internal_cb_pool_dma_addr);
 +
 +      return rc;
 +}
 +
 +static void gaudi2_internal_cb_pool_fini(struct hl_device *hdev, struct hl_ctx *ctx)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_PMMU))
 +              return;
 +
 +      mutex_lock(&hdev->mmu_lock);
 +      hl_mmu_unmap_contiguous(ctx, hdev->internal_cb_va_base, HOST_SPACE_INTERNAL_CB_SZ);
 +      hl_unreserve_va_block(hdev, ctx, hdev->internal_cb_va_base, HOST_SPACE_INTERNAL_CB_SZ);
 +      hl_mmu_invalidate_cache(hdev, true, MMU_OP_USERPTR);
 +      mutex_unlock(&hdev->mmu_lock);
 +
 +      gen_pool_destroy(hdev->internal_cb_pool);
 +
 +      hl_asic_dma_free_coherent(hdev, HOST_SPACE_INTERNAL_CB_SZ, hdev->internal_cb_pool_virt_addr,
 +                                      hdev->internal_cb_pool_dma_addr);
 +}
 +
 +static void gaudi2_restore_user_registers(struct hl_device *hdev)
 +{
 +      gaudi2_restore_user_sm_registers(hdev);
 +      gaudi2_restore_user_qm_registers(hdev);
 +}
 +
 +static int gaudi2_map_virtual_msix_doorbell_memory(struct hl_ctx *ctx)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int rc;
 +
 +      rc = hl_mmu_map_page(ctx, RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START,
 +                              gaudi2->virt_msix_db_dma_addr, prop->pmmu.page_size, true);
 +      if (rc)
 +              dev_err(hdev->dev, "Failed to map VA %#llx for virtual MSI-X doorbell memory\n",
 +                      RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START);
 +
 +      return rc;
 +}
 +
 +static void gaudi2_unmap_virtual_msix_doorbell_memory(struct hl_ctx *ctx)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int rc;
 +
 +      rc = hl_mmu_unmap_page(ctx, RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START,
 +                              prop->pmmu.page_size, true);
 +      if (rc)
 +              dev_err(hdev->dev, "Failed to unmap VA %#llx of virtual MSI-X doorbell memory\n",
 +                      RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START);
 +}
 +
 +static int gaudi2_ctx_init(struct hl_ctx *ctx)
 +{
 +      int rc;
 +
 +      rc = gaudi2_mmu_prepare(ctx->hdev, ctx->asid);
 +      if (rc)
 +              return rc;
 +
 +      /* No need to clear user registers if the device has just
 +       * performed reset, we restore only nic qm registers
 +       */
 +      if (ctx->hdev->reset_upon_device_release)
 +              gaudi2_restore_nic_qm_registers(ctx->hdev);
 +      else
 +              gaudi2_restore_user_registers(ctx->hdev);
 +
 +      rc = gaudi2_internal_cb_pool_init(ctx->hdev, ctx);
 +      if (rc)
 +              return rc;
 +
 +      rc = gaudi2_map_virtual_msix_doorbell_memory(ctx);
 +      if (rc)
 +              gaudi2_internal_cb_pool_fini(ctx->hdev, ctx);
 +
 +      return rc;
 +}
 +
 +static void gaudi2_ctx_fini(struct hl_ctx *ctx)
 +{
 +      if (ctx->asid == HL_KERNEL_ASID_ID)
 +              return;
 +
 +      gaudi2_internal_cb_pool_fini(ctx->hdev, ctx);
 +
 +      gaudi2_unmap_virtual_msix_doorbell_memory(ctx);
 +}
 +
 +static int gaudi2_pre_schedule_cs(struct hl_cs *cs)
 +{
 +      struct hl_device *hdev = cs->ctx->hdev;
 +      int index = cs->sequence & (hdev->asic_prop.max_pending_cs - 1);
 +      u32 mon_payload, sob_id, mon_id;
 +
 +      if (!cs_needs_completion(cs))
 +              return 0;
 +
 +      /*
 +       * First 64 SOB/MON are reserved for driver for QMAN auto completion
 +       * mechanism. Each SOB/MON pair are used for a pending CS with the same
 +       * cyclic index. The SOB value is increased when each of the CS jobs is
 +       * completed. When the SOB reaches the number of CS jobs, the monitor
 +       * generates MSI-X interrupt.
 +       */
 +
 +      sob_id = mon_id = index;
 +      mon_payload = (1 << CQ_ENTRY_SHADOW_INDEX_VALID_SHIFT) |
 +                              (1 << CQ_ENTRY_READY_SHIFT) | index;
 +
 +      gaudi2_arm_cq_monitor(hdev, sob_id, mon_id, GAUDI2_RESERVED_CQ_CS_COMPLETION, mon_payload,
 +                              cs->jobs_cnt);
 +
 +      return 0;
 +}
 +
 +static u32 gaudi2_get_queue_id_for_cq(struct hl_device *hdev, u32 cq_idx)
 +{
 +      return HL_INVALID_QUEUE;
 +}
 +
 +static u32 gaudi2_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id, u32 size, bool eb)
 +{
 +      struct hl_cb *cb = data;
 +      struct packet_msg_short *pkt;
 +      u32 value, ctl, pkt_size = sizeof(*pkt);
 +
 +      pkt = (struct packet_msg_short *) (uintptr_t) (cb->kernel_address + size);
 +      memset(pkt, 0, pkt_size);
 +
 +      /* Inc by 1, Mode ADD */
 +      value = FIELD_PREP(GAUDI2_PKT_SHORT_VAL_SOB_SYNC_VAL_MASK, 1);
 +      value |= FIELD_PREP(GAUDI2_PKT_SHORT_VAL_SOB_MOD_MASK, 1);
 +
 +      ctl = FIELD_PREP(GAUDI2_PKT_SHORT_CTL_ADDR_MASK, sob_id * 4);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_SHORT_CTL_BASE_MASK, 1); /* SOB base */
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, eb);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_MB_MASK, 1);
 +
 +      pkt->value = cpu_to_le32(value);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return size + pkt_size;
 +}
 +
 +static u32 gaudi2_add_mon_msg_short(struct packet_msg_short *pkt, u32 value, u16 addr)
 +{
 +      u32 ctl, pkt_size = sizeof(*pkt);
 +
 +      memset(pkt, 0, pkt_size);
 +
 +      ctl = FIELD_PREP(GAUDI2_PKT_SHORT_CTL_ADDR_MASK, addr);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_SHORT_CTL_BASE_MASK, 0);  /* MON base */
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, 0);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_MB_MASK, 0);
 +
 +      pkt->value = cpu_to_le32(value);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return pkt_size;
 +}
 +
 +static u32 gaudi2_add_arm_monitor_pkt(struct hl_device *hdev, struct packet_msg_short *pkt,
 +                                      u16 sob_base, u8 sob_mask, u16 sob_val, u16 addr)
 +{
 +      u32 ctl, value, pkt_size = sizeof(*pkt);
 +      u8 mask;
 +
 +      if (hl_gen_sob_mask(sob_base, sob_mask, &mask)) {
 +              dev_err(hdev->dev, "sob_base %u (mask %#x) is not valid\n", sob_base, sob_mask);
 +              return 0;
 +      }
 +
 +      memset(pkt, 0, pkt_size);
 +
 +      value = FIELD_PREP(GAUDI2_PKT_SHORT_VAL_MON_SYNC_GID_MASK, sob_base / 8);
 +      value |= FIELD_PREP(GAUDI2_PKT_SHORT_VAL_MON_SYNC_VAL_MASK, sob_val);
 +      value |= FIELD_PREP(GAUDI2_PKT_SHORT_VAL_MON_MODE_MASK, 0); /* GREATER OR EQUAL*/
 +      value |= FIELD_PREP(GAUDI2_PKT_SHORT_VAL_MON_MASK_MASK, mask);
 +
 +      ctl = FIELD_PREP(GAUDI2_PKT_SHORT_CTL_ADDR_MASK, addr);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_SHORT_CTL_BASE_MASK, 0); /* MON base */
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, 0);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_MB_MASK, 1);
 +
 +      pkt->value = cpu_to_le32(value);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return pkt_size;
 +}
 +
 +static u32 gaudi2_add_fence_pkt(struct packet_fence *pkt)
 +{
 +      u32 ctl, cfg, pkt_size = sizeof(*pkt);
 +
 +      memset(pkt, 0, pkt_size);
 +
 +      cfg = FIELD_PREP(GAUDI2_PKT_FENCE_CFG_DEC_VAL_MASK, 1);
 +      cfg |= FIELD_PREP(GAUDI2_PKT_FENCE_CFG_TARGET_VAL_MASK, 1);
 +      cfg |= FIELD_PREP(GAUDI2_PKT_FENCE_CFG_ID_MASK, 2);
 +
 +      ctl = FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_FENCE);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, 0);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_MB_MASK, 1);
 +
 +      pkt->cfg = cpu_to_le32(cfg);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return pkt_size;
 +}
 +
 +static u32 gaudi2_gen_wait_cb(struct hl_device *hdev, struct hl_gen_wait_properties *prop)
 +{
 +      struct hl_cb *cb = prop->data;
 +      void *buf = (void *) (uintptr_t) (cb->kernel_address);
 +
 +      u64 monitor_base, fence_addr = 0;
 +      u32 stream_index, size = prop->size;
 +      u16 msg_addr_offset;
 +
 +      stream_index = prop->q_idx % 4;
 +      fence_addr = CFG_BASE + gaudi2_qm_blocks_bases[prop->q_idx] +
 +                      QM_FENCE2_OFFSET + stream_index * 4;
 +
 +      /*
 +       * monitor_base should be the content of the base0 address registers,
 +       * so it will be added to the msg short offsets
 +       */
 +      monitor_base = mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
 +
 +      /* First monitor config packet: low address of the sync */
 +      msg_addr_offset = (mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + prop->mon_id * 4) -
 +                              monitor_base;
 +
 +      size += gaudi2_add_mon_msg_short(buf + size, (u32) fence_addr, msg_addr_offset);
 +
 +      /* Second monitor config packet: high address of the sync */
 +      msg_addr_offset = (mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + prop->mon_id * 4) -
 +                              monitor_base;
 +
 +      size += gaudi2_add_mon_msg_short(buf + size, (u32) (fence_addr >> 32), msg_addr_offset);
 +
 +      /*
 +       * Third monitor config packet: the payload, i.e. what to write when the
 +       * sync triggers
 +       */
 +      msg_addr_offset = (mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + prop->mon_id * 4) -
 +                              monitor_base;
 +
 +      size += gaudi2_add_mon_msg_short(buf + size, 1, msg_addr_offset);
 +
 +      /* Fourth monitor config packet: bind the monitor to a sync object */
 +      msg_addr_offset = (mmDCORE0_SYNC_MNGR_OBJS_MON_ARM_0 + prop->mon_id * 4) - monitor_base;
 +
 +      size += gaudi2_add_arm_monitor_pkt(hdev, buf + size, prop->sob_base, prop->sob_mask,
 +                                              prop->sob_val, msg_addr_offset);
 +
 +      /* Fence packet */
 +      size += gaudi2_add_fence_pkt(buf + size);
 +
 +      return size;
 +}
 +
 +static void gaudi2_reset_sob(struct hl_device *hdev, void *data)
 +{
 +      struct hl_hw_sob *hw_sob = data;
 +
 +      dev_dbg(hdev->dev, "reset SOB, q_idx: %d, sob_id: %d\n", hw_sob->q_idx, hw_sob->sob_id);
 +
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + hw_sob->sob_id * 4, 0);
 +
 +      kref_init(&hw_sob->kref);
 +}
 +
 +static void gaudi2_reset_sob_group(struct hl_device *hdev, u16 sob_group)
 +{
 +}
 +
 +static u64 gaudi2_get_device_time(struct hl_device *hdev)
 +{
 +      u64 device_time = ((u64) RREG32(mmPSOC_TIMESTAMP_CNTCVU)) << 32;
 +
 +      return device_time | RREG32(mmPSOC_TIMESTAMP_CNTCVL);
 +}
 +
 +static int gaudi2_collective_wait_init_cs(struct hl_cs *cs)
 +{
 +      return 0;
 +}
 +
 +static int gaudi2_collective_wait_create_jobs(struct hl_device *hdev, struct hl_ctx *ctx,
 +                                      struct hl_cs *cs, u32 wait_queue_id,
 +                                      u32 collective_engine_id, u32 encaps_signal_offset)
 +{
 +      return -EINVAL;
 +}
 +
 +/*
 + * hl_mmu_scramble - converts a dram (non power of 2) page-size aligned address
 + *                   to DMMU page-size address (64MB) before mapping it in
 + *                   the MMU.
 + * The operation is performed on both the virtual and physical addresses.
 + * for device with 6 HBMs the scramble is:
 + * (addr[47:0] / 48M) * 64M + addr % 48M + addr[63:48]
 + *
 + * Example:
 + * =============================================================================
 + * Allocated DRAM  Reserved VA      scrambled VA for MMU mapping    Scrambled PA
 + * Phys address                                                     in MMU last
 + *                                                                    HOP
 + * =============================================================================
 + * PA1 0x3000000  VA1 0x9C000000  SVA1= (VA1/48M)*64M 0xD0000000  <- PA1/48M 0x1
 + * PA2 0x9000000  VA2 0x9F000000  SVA2= (VA2/48M)*64M 0xD4000000  <- PA2/48M 0x3
 + * =============================================================================
 + */
 +static u64 gaudi2_mmu_scramble_addr(struct hl_device *hdev, u64 raw_addr)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u32 divisor, mod_va;
 +      u64 div_va;
 +
 +      /* accept any address in the DRAM address space */
 +      if (hl_mem_area_inside_range(raw_addr, sizeof(raw_addr), DRAM_PHYS_BASE,
 +                                                                      VA_HBM_SPACE_END)) {
 +
 +              divisor = prop->num_functional_hbms * GAUDI2_HBM_MMU_SCRM_MEM_SIZE;
 +              div_va = div_u64_rem(raw_addr & GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK, divisor, &mod_va);
 +              return (raw_addr & ~GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK) |
 +                      (div_va << GAUDI2_HBM_MMU_SCRM_DIV_SHIFT) |
 +                      (mod_va << GAUDI2_HBM_MMU_SCRM_MOD_SHIFT);
 +      }
 +
 +      return raw_addr;
 +}
 +
 +static u64 gaudi2_mmu_descramble_addr(struct hl_device *hdev, u64 scrambled_addr)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u32 divisor, mod_va;
 +      u64 div_va;
 +
 +      /* accept any address in the DRAM address space */
 +      if (hl_mem_area_inside_range(scrambled_addr, sizeof(scrambled_addr), DRAM_PHYS_BASE,
 +                                                                      VA_HBM_SPACE_END)) {
 +
 +              divisor = prop->num_functional_hbms * GAUDI2_HBM_MMU_SCRM_MEM_SIZE;
 +              div_va = div_u64_rem(scrambled_addr & GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK,
 +                                      PAGE_SIZE_64MB, &mod_va);
 +
 +              return ((scrambled_addr & ~GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK) +
 +                                      (div_va * divisor + mod_va));
 +      }
 +
 +      return scrambled_addr;
 +}
 +
 +static u32 gaudi2_get_dec_base_addr(struct hl_device *hdev, u32 core_id)
 +{
 +      u32 base = 0, dcore_id, dec_id;
 +
 +      if (core_id >= NUMBER_OF_DEC) {
 +              dev_err(hdev->dev, "Unexpected core number %d for DEC\n", core_id);
 +              goto out;
 +      }
 +
 +      if (core_id < 8) {
 +              dcore_id = core_id / NUM_OF_DEC_PER_DCORE;
 +              dec_id = core_id % NUM_OF_DEC_PER_DCORE;
 +
 +              base = mmDCORE0_DEC0_CMD_BASE + dcore_id * DCORE_OFFSET +
 +                              dec_id * DCORE_VDEC_OFFSET;
 +      } else {
 +              /* PCIe Shared Decoder */
 +              base = mmPCIE_DEC0_CMD_BASE + ((core_id % 8) * PCIE_VDEC_OFFSET);
 +      }
 +out:
 +      return base;
 +}
 +
 +static int gaudi2_get_hw_block_id(struct hl_device *hdev, u64 block_addr,
 +                              u32 *block_size, u32 *block_id)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int i;
 +
 +      for (i = 0 ; i < NUM_USER_MAPPED_BLOCKS ; i++) {
 +              if (block_addr == CFG_BASE + gaudi2->mapped_blocks[i].address) {
 +                      *block_id = i;
 +                      if (block_size)
 +                              *block_size = gaudi2->mapped_blocks[i].size;
 +                      return 0;
 +              }
 +      }
 +
 +      dev_err(hdev->dev, "Invalid block address %#llx", block_addr);
 +
 +      return -EINVAL;
 +}
 +
 +static int gaudi2_block_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
 +                      u32 block_id, u32 block_size)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u64 offset_in_bar;
 +      u64 address;
 +      int rc;
 +
 +      if (block_id >= NUM_USER_MAPPED_BLOCKS) {
 +              dev_err(hdev->dev, "Invalid block id %u", block_id);
 +              return -EINVAL;
 +      }
 +
 +      /* we allow mapping only an entire block */
 +      if (block_size != gaudi2->mapped_blocks[block_id].size) {
 +              dev_err(hdev->dev, "Invalid block size %u", block_size);
 +              return -EINVAL;
 +      }
 +
 +      offset_in_bar = CFG_BASE + gaudi2->mapped_blocks[block_id].address - STM_FLASH_BASE_ADDR;
 +
 +      address = pci_resource_start(hdev->pdev, SRAM_CFG_BAR_ID) + offset_in_bar;
 +
++      vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
++                      VM_DONTCOPY | VM_NORESERVE);
 +
 +      rc = remap_pfn_range(vma, vma->vm_start, address >> PAGE_SHIFT,
 +                      block_size, vma->vm_page_prot);
 +      if (rc)
 +              dev_err(hdev->dev, "remap_pfn_range error %d", rc);
 +
 +      return rc;
 +}
 +
 +static void gaudi2_enable_events_from_fw(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 irq_handler_offset = le32_to_cpu(dyn_regs->gic_host_ints_irq);
 +
 +      if (gaudi2->hw_cap_initialized & HW_CAP_CPU_Q)
 +              WREG32(irq_handler_offset,
 +                      gaudi2_irq_map_table[GAUDI2_EVENT_CPU_INTS_REGISTER].cpu_id);
 +}
 +
 +static int gaudi2_get_mmu_base(struct hl_device *hdev, u64 mmu_id, u32 *mmu_base)
 +{
 +      switch (mmu_id) {
 +      case HW_CAP_DCORE0_DMMU0:
 +              *mmu_base = mmDCORE0_HMMU0_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE0_DMMU1:
 +              *mmu_base = mmDCORE0_HMMU1_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE0_DMMU2:
 +              *mmu_base = mmDCORE0_HMMU2_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE0_DMMU3:
 +              *mmu_base = mmDCORE0_HMMU3_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE1_DMMU0:
 +              *mmu_base = mmDCORE1_HMMU0_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE1_DMMU1:
 +              *mmu_base = mmDCORE1_HMMU1_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE1_DMMU2:
 +              *mmu_base = mmDCORE1_HMMU2_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE1_DMMU3:
 +              *mmu_base = mmDCORE1_HMMU3_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE2_DMMU0:
 +              *mmu_base = mmDCORE2_HMMU0_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE2_DMMU1:
 +              *mmu_base = mmDCORE2_HMMU1_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE2_DMMU2:
 +              *mmu_base = mmDCORE2_HMMU2_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE2_DMMU3:
 +              *mmu_base = mmDCORE2_HMMU3_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE3_DMMU0:
 +              *mmu_base = mmDCORE3_HMMU0_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE3_DMMU1:
 +              *mmu_base = mmDCORE3_HMMU1_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE3_DMMU2:
 +              *mmu_base = mmDCORE3_HMMU2_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE3_DMMU3:
 +              *mmu_base = mmDCORE3_HMMU3_MMU_BASE;
 +              break;
 +      case HW_CAP_PMMU:
 +              *mmu_base = mmPMMU_HBW_MMU_BASE;
 +              break;
 +      default:
 +              return -EINVAL;
 +      }
 +
 +      return 0;
 +}
 +
 +static void gaudi2_ack_mmu_error(struct hl_device *hdev, u64 mmu_id)
 +{
 +      bool is_pmmu = (mmu_id == HW_CAP_PMMU);
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 mmu_base;
 +
 +      if (!(gaudi2->hw_cap_initialized & mmu_id))
 +              return;
 +
 +      if (gaudi2_get_mmu_base(hdev, mmu_id, &mmu_base))
 +              return;
 +
 +      gaudi2_handle_page_error(hdev, mmu_base, is_pmmu, NULL);
 +      gaudi2_handle_access_error(hdev, mmu_base, is_pmmu);
 +}
 +
 +static int gaudi2_ack_mmu_page_fault_or_access_error(struct hl_device *hdev, u64 mmu_cap_mask)
 +{
 +      u32 i, mmu_id, num_of_hmmus = NUM_OF_HMMU_PER_DCORE * NUM_OF_DCORES;
 +
 +      /* check all HMMUs */
 +      for (i = 0 ; i < num_of_hmmus ; i++) {
 +              mmu_id = HW_CAP_DCORE0_DMMU0 << i;
 +
 +              if (mmu_cap_mask & mmu_id)
 +                      gaudi2_ack_mmu_error(hdev, mmu_id);
 +      }
 +
 +      /* check PMMU */
 +      if (mmu_cap_mask & HW_CAP_PMMU)
 +              gaudi2_ack_mmu_error(hdev, HW_CAP_PMMU);
 +
 +      return 0;
 +}
 +
 +static void gaudi2_get_msi_info(__le32 *table)
 +{
 +      table[CPUCP_EVENT_QUEUE_MSI_TYPE] = cpu_to_le32(GAUDI2_EVENT_QUEUE_MSIX_IDX);
 +}
 +
 +static int gaudi2_map_pll_idx_to_fw_idx(u32 pll_idx)
 +{
 +      switch (pll_idx) {
 +      case HL_GAUDI2_CPU_PLL: return CPU_PLL;
 +      case HL_GAUDI2_PCI_PLL: return PCI_PLL;
 +      case HL_GAUDI2_NIC_PLL: return NIC_PLL;
 +      case HL_GAUDI2_DMA_PLL: return DMA_PLL;
 +      case HL_GAUDI2_MESH_PLL: return MESH_PLL;
 +      case HL_GAUDI2_MME_PLL: return MME_PLL;
 +      case HL_GAUDI2_TPC_PLL: return TPC_PLL;
 +      case HL_GAUDI2_IF_PLL: return IF_PLL;
 +      case HL_GAUDI2_SRAM_PLL: return SRAM_PLL;
 +      case HL_GAUDI2_HBM_PLL: return HBM_PLL;
 +      case HL_GAUDI2_VID_PLL: return VID_PLL;
 +      case HL_GAUDI2_MSS_PLL: return MSS_PLL;
 +      default: return -EINVAL;
 +      }
 +}
 +
 +static int gaudi2_gen_sync_to_engine_map(struct hl_device *hdev, struct hl_sync_to_engine_map *map)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +static int gaudi2_monitor_valid(struct hl_mon_state_dump *mon)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +static int gaudi2_print_single_monitor(char **buf, size_t *size, size_t *offset,
 +                              struct hl_device *hdev, struct hl_mon_state_dump *mon)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +
 +static int gaudi2_print_fences_single_engine(struct hl_device *hdev, u64 base_offset,
 +                              u64 status_base_offset, enum hl_sync_engine_type engine_type,
 +                              u32 engine_id, char **buf, size_t *size, size_t *offset)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +
 +static struct hl_state_dump_specs_funcs gaudi2_state_dump_funcs = {
 +      .monitor_valid = gaudi2_monitor_valid,
 +      .print_single_monitor = gaudi2_print_single_monitor,
 +      .gen_sync_to_engine_map = gaudi2_gen_sync_to_engine_map,
 +      .print_fences_single_engine = gaudi2_print_fences_single_engine,
 +};
 +
 +static void gaudi2_state_dump_init(struct hl_device *hdev)
 +{
 +      /* Not implemented */
 +      hdev->state_dump_specs.props = gaudi2_state_dump_specs_props;
 +      hdev->state_dump_specs.funcs = gaudi2_state_dump_funcs;
 +}
 +
 +static u32 gaudi2_get_sob_addr(struct hl_device *hdev, u32 sob_id)
 +{
 +      return 0;
 +}
 +
 +static u32 *gaudi2_get_stream_master_qid_arr(void)
 +{
 +      return NULL;
 +}
 +
 +static void gaudi2_add_device_attr(struct hl_device *hdev, struct attribute_group *dev_clk_attr_grp,
 +                              struct attribute_group *dev_vrm_attr_grp)
 +{
 +      hl_sysfs_add_dev_clk_attr(hdev, dev_clk_attr_grp);
 +      hl_sysfs_add_dev_vrm_attr(hdev, dev_vrm_attr_grp);
 +}
 +
 +static int gaudi2_mmu_get_real_page_size(struct hl_device *hdev, struct hl_mmu_properties *mmu_prop,
 +                                      u32 page_size, u32 *real_page_size, bool is_dram_addr)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +
 +      /* for host pages the page size must be  */
 +      if (!is_dram_addr) {
 +              if (page_size % mmu_prop->page_size)
 +                      goto page_size_err;
 +
 +              *real_page_size = mmu_prop->page_size;
 +              return 0;
 +      }
 +
 +      if ((page_size % prop->dram_page_size) || (prop->dram_page_size > mmu_prop->page_size))
 +              goto page_size_err;
 +
 +      /*
 +       * MMU page size is different from DRAM page size (more precisely, DMMU page is greater
 +       * than DRAM page size).
 +       * for this reason work with the DRAM page size and let the MMU scrambling routine handle
 +       * this mismatch when calculating the address to place in the MMU page table.
 +       * (in that case also make sure that the dram_page_size is not greater than the
 +       * mmu page size)
 +       */
 +      *real_page_size = prop->dram_page_size;
 +
 +      return 0;
 +
 +page_size_err:
 +      dev_err(hdev->dev, "page size of %u is not %uKB aligned, can't map\n",
 +                                                      page_size, mmu_prop->page_size >> 10);
 +      return -EFAULT;
 +}
 +
 +static int gaudi2_get_monitor_dump(struct hl_device *hdev, void *data)
 +{
 +      return -EOPNOTSUPP;
 +}
 +
 +int gaudi2_send_device_activity(struct hl_device *hdev, bool open)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_send_device_activity(hdev, open);
 +}
 +
 +static const struct hl_asic_funcs gaudi2_funcs = {
 +      .early_init = gaudi2_early_init,
 +      .early_fini = gaudi2_early_fini,
 +      .late_init = gaudi2_late_init,
 +      .late_fini = gaudi2_late_fini,
 +      .sw_init = gaudi2_sw_init,
 +      .sw_fini = gaudi2_sw_fini,
 +      .hw_init = gaudi2_hw_init,
 +      .hw_fini = gaudi2_hw_fini,
 +      .halt_engines = gaudi2_halt_engines,
 +      .suspend = gaudi2_suspend,
 +      .resume = gaudi2_resume,
 +      .mmap = gaudi2_mmap,
 +      .ring_doorbell = gaudi2_ring_doorbell,
 +      .pqe_write = gaudi2_pqe_write,
 +      .asic_dma_alloc_coherent = gaudi2_dma_alloc_coherent,
 +      .asic_dma_free_coherent = gaudi2_dma_free_coherent,
 +      .scrub_device_mem = gaudi2_scrub_device_mem,
 +      .scrub_device_dram = gaudi2_scrub_device_dram,
 +      .get_int_queue_base = NULL,
 +      .test_queues = gaudi2_test_queues,
 +      .asic_dma_pool_zalloc = gaudi2_dma_pool_zalloc,
 +      .asic_dma_pool_free = gaudi2_dma_pool_free,
 +      .cpu_accessible_dma_pool_alloc = gaudi2_cpu_accessible_dma_pool_alloc,
 +      .cpu_accessible_dma_pool_free = gaudi2_cpu_accessible_dma_pool_free,
 +      .asic_dma_unmap_single = gaudi2_dma_unmap_single,
 +      .asic_dma_map_single = gaudi2_dma_map_single,
 +      .hl_dma_unmap_sgtable = hl_dma_unmap_sgtable,
 +      .cs_parser = gaudi2_cs_parser,
 +      .asic_dma_map_sgtable = hl_dma_map_sgtable,
 +      .add_end_of_cb_packets = NULL,
 +      .update_eq_ci = gaudi2_update_eq_ci,
 +      .context_switch = gaudi2_context_switch,
 +      .restore_phase_topology = gaudi2_restore_phase_topology,
 +      .debugfs_read_dma = gaudi2_debugfs_read_dma,
 +      .add_device_attr = gaudi2_add_device_attr,
 +      .handle_eqe = gaudi2_handle_eqe,
 +      .get_events_stat = gaudi2_get_events_stat,
 +      .read_pte = NULL,
 +      .write_pte = NULL,
 +      .mmu_invalidate_cache = gaudi2_mmu_invalidate_cache,
 +      .mmu_invalidate_cache_range = gaudi2_mmu_invalidate_cache_range,
 +      .mmu_prefetch_cache_range = NULL,
 +      .send_heartbeat = gaudi2_send_heartbeat,
 +      .debug_coresight = gaudi2_debug_coresight,
 +      .is_device_idle = gaudi2_is_device_idle,
 +      .compute_reset_late_init = gaudi2_compute_reset_late_init,
 +      .hw_queues_lock = gaudi2_hw_queues_lock,
 +      .hw_queues_unlock = gaudi2_hw_queues_unlock,
 +      .get_pci_id = gaudi2_get_pci_id,
 +      .get_eeprom_data = gaudi2_get_eeprom_data,
 +      .get_monitor_dump = gaudi2_get_monitor_dump,
 +      .send_cpu_message = gaudi2_send_cpu_message,
 +      .pci_bars_map = gaudi2_pci_bars_map,
 +      .init_iatu = gaudi2_init_iatu,
 +      .rreg = hl_rreg,
 +      .wreg = hl_wreg,
 +      .halt_coresight = gaudi2_halt_coresight,
 +      .ctx_init = gaudi2_ctx_init,
 +      .ctx_fini = gaudi2_ctx_fini,
 +      .pre_schedule_cs = gaudi2_pre_schedule_cs,
 +      .get_queue_id_for_cq = gaudi2_get_queue_id_for_cq,
 +      .load_firmware_to_device = NULL,
 +      .load_boot_fit_to_device = NULL,
 +      .get_signal_cb_size = gaudi2_get_signal_cb_size,
 +      .get_wait_cb_size = gaudi2_get_wait_cb_size,
 +      .gen_signal_cb = gaudi2_gen_signal_cb,
 +      .gen_wait_cb = gaudi2_gen_wait_cb,
 +      .reset_sob = gaudi2_reset_sob,
 +      .reset_sob_group = gaudi2_reset_sob_group,
 +      .get_device_time = gaudi2_get_device_time,
 +      .pb_print_security_errors = gaudi2_pb_print_security_errors,
 +      .collective_wait_init_cs = gaudi2_collective_wait_init_cs,
 +      .collective_wait_create_jobs = gaudi2_collective_wait_create_jobs,
 +      .get_dec_base_addr = gaudi2_get_dec_base_addr,
 +      .scramble_addr = gaudi2_mmu_scramble_addr,
 +      .descramble_addr = gaudi2_mmu_descramble_addr,
 +      .ack_protection_bits_errors = gaudi2_ack_protection_bits_errors,
 +      .get_hw_block_id = gaudi2_get_hw_block_id,
 +      .hw_block_mmap = gaudi2_block_mmap,
 +      .enable_events_from_fw = gaudi2_enable_events_from_fw,
 +      .ack_mmu_errors = gaudi2_ack_mmu_page_fault_or_access_error,
 +      .get_msi_info = gaudi2_get_msi_info,
 +      .map_pll_idx_to_fw_idx = gaudi2_map_pll_idx_to_fw_idx,
 +      .init_firmware_preload_params = gaudi2_init_firmware_preload_params,
 +      .init_firmware_loader = gaudi2_init_firmware_loader,
 +      .init_cpu_scrambler_dram = gaudi2_init_scrambler_hbm,
 +      .state_dump_init = gaudi2_state_dump_init,
 +      .get_sob_addr = &gaudi2_get_sob_addr,
 +      .set_pci_memory_regions = gaudi2_set_pci_memory_regions,
 +      .get_stream_master_qid_arr = gaudi2_get_stream_master_qid_arr,
 +      .check_if_razwi_happened = gaudi2_check_if_razwi_happened,
 +      .mmu_get_real_page_size = gaudi2_mmu_get_real_page_size,
 +      .access_dev_mem = hl_access_dev_mem,
 +      .set_dram_bar_base = gaudi2_set_hbm_bar_base,
 +      .set_engine_cores = gaudi2_set_engine_cores,
 +      .send_device_activity = gaudi2_send_device_activity,
 +      .set_dram_properties = gaudi2_set_dram_properties,
 +      .set_binning_masks = gaudi2_set_binning_masks,
 +};
 +
 +void gaudi2_set_asic_funcs(struct hl_device *hdev)
 +{
 +      hdev->asic_funcs = &gaudi2_funcs;
 +}
index 2b135e856607da737187cea7d796f56de5b6413a,0000000000000000000000000000000000000000..df65e9bdc18aa945b5fa7dcabdd2a7ec6ea7857e
mode 100644,000000..100644
--- /dev/null
@@@ -1,5544 -1,0 +1,5544 @@@
-       vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
-                       VM_DONTCOPY | VM_NORESERVE;
 +// SPDX-License-Identifier: GPL-2.0
 +
 +/*
 + * Copyright 2016-2022 HabanaLabs, Ltd.
 + * All Rights Reserved.
 + */
 +
 +#include "goyaP.h"
 +#include "../include/hw_ip/mmu/mmu_general.h"
 +#include "../include/hw_ip/mmu/mmu_v1_0.h"
 +#include "../include/goya/asic_reg/goya_masks.h"
 +#include "../include/goya/goya_reg_map.h"
 +
 +#include <linux/pci.h>
 +#include <linux/hwmon.h>
 +#include <linux/iommu.h>
 +#include <linux/seq_file.h>
 +
 +/*
 + * GOYA security scheme:
 + *
 + * 1. Host is protected by:
 + *        - Range registers (When MMU is enabled, DMA RR does NOT protect host)
 + *        - MMU
 + *
 + * 2. DRAM is protected by:
 + *        - Range registers (protect the first 512MB)
 + *        - MMU (isolation between users)
 + *
 + * 3. Configuration is protected by:
 + *        - Range registers
 + *        - Protection bits
 + *
 + * When MMU is disabled:
 + *
 + * QMAN DMA: PQ, CQ, CP, DMA are secured.
 + * PQ, CB and the data are on the host.
 + *
 + * QMAN TPC/MME:
 + * PQ, CQ and CP are not secured.
 + * PQ, CB and the data are on the SRAM/DRAM.
 + *
 + * Since QMAN DMA is secured, the driver is parsing the DMA CB:
 + *     - checks DMA pointer
 + *     - WREG, MSG_PROT are not allowed.
 + *     - MSG_LONG/SHORT are allowed.
 + *
 + * A read/write transaction by the QMAN to a protected area will succeed if
 + * and only if the QMAN's CP is secured and MSG_PROT is used
 + *
 + *
 + * When MMU is enabled:
 + *
 + * QMAN DMA: PQ, CQ and CP are secured.
 + * MMU is set to bypass on the Secure props register of the QMAN.
 + * The reasons we don't enable MMU for PQ, CQ and CP are:
 + *     - PQ entry is in kernel address space and the driver doesn't map it.
 + *     - CP writes to MSIX register and to kernel address space (completion
 + *       queue).
 + *
 + * DMA is not secured but because CP is secured, the driver still needs to parse
 + * the CB, but doesn't need to check the DMA addresses.
 + *
 + * For QMAN DMA 0, DMA is also secured because only the driver uses this DMA and
 + * the driver doesn't map memory in MMU.
 + *
 + * QMAN TPC/MME: PQ, CQ and CP aren't secured (no change from MMU disabled mode)
 + *
 + * DMA RR does NOT protect host because DMA is not secured
 + *
 + */
 +
 +#define GOYA_BOOT_FIT_FILE    "habanalabs/goya/goya-boot-fit.itb"
 +#define GOYA_LINUX_FW_FILE    "habanalabs/goya/goya-fit.itb"
 +
 +#define GOYA_MMU_REGS_NUM             63
 +
 +#define GOYA_DMA_POOL_BLK_SIZE                0x100           /* 256 bytes */
 +
 +#define GOYA_RESET_TIMEOUT_MSEC               500             /* 500ms */
 +#define GOYA_PLDM_RESET_TIMEOUT_MSEC  20000           /* 20s */
 +#define GOYA_RESET_WAIT_MSEC          1               /* 1ms */
 +#define GOYA_CPU_RESET_WAIT_MSEC      100             /* 100ms */
 +#define GOYA_PLDM_RESET_WAIT_MSEC     1000            /* 1s */
 +#define GOYA_TEST_QUEUE_WAIT_USEC     100000          /* 100ms */
 +#define GOYA_PLDM_MMU_TIMEOUT_USEC    (MMU_CONFIG_TIMEOUT_USEC * 100)
 +#define GOYA_PLDM_QMAN0_TIMEOUT_USEC  (HL_DEVICE_TIMEOUT_USEC * 30)
 +#define GOYA_BOOT_FIT_REQ_TIMEOUT_USEC        1000000         /* 1s */
 +#define GOYA_MSG_TO_CPU_TIMEOUT_USEC  4000000         /* 4s */
 +#define GOYA_WAIT_FOR_BL_TIMEOUT_USEC 15000000        /* 15s */
 +
 +#define GOYA_QMAN0_FENCE_VAL          0xD169B243
 +
 +#define GOYA_MAX_STRING_LEN           20
 +
 +#define GOYA_CB_POOL_CB_CNT           512
 +#define GOYA_CB_POOL_CB_SIZE          0x20000         /* 128KB */
 +
 +#define IS_QM_IDLE(engine, qm_glbl_sts0) \
 +      (((qm_glbl_sts0) & engine##_QM_IDLE_MASK) == engine##_QM_IDLE_MASK)
 +#define IS_DMA_QM_IDLE(qm_glbl_sts0)  IS_QM_IDLE(DMA, qm_glbl_sts0)
 +#define IS_TPC_QM_IDLE(qm_glbl_sts0)  IS_QM_IDLE(TPC, qm_glbl_sts0)
 +#define IS_MME_QM_IDLE(qm_glbl_sts0)  IS_QM_IDLE(MME, qm_glbl_sts0)
 +
 +#define IS_CMDQ_IDLE(engine, cmdq_glbl_sts0) \
 +      (((cmdq_glbl_sts0) & engine##_CMDQ_IDLE_MASK) == \
 +                      engine##_CMDQ_IDLE_MASK)
 +#define IS_TPC_CMDQ_IDLE(cmdq_glbl_sts0) \
 +      IS_CMDQ_IDLE(TPC, cmdq_glbl_sts0)
 +#define IS_MME_CMDQ_IDLE(cmdq_glbl_sts0) \
 +      IS_CMDQ_IDLE(MME, cmdq_glbl_sts0)
 +
 +#define IS_DMA_IDLE(dma_core_sts0) \
 +      !((dma_core_sts0) & DMA_CH_0_STS0_DMA_BUSY_MASK)
 +
 +#define IS_TPC_IDLE(tpc_cfg_sts) \
 +      (((tpc_cfg_sts) & TPC_CFG_IDLE_MASK) == TPC_CFG_IDLE_MASK)
 +
 +#define IS_MME_IDLE(mme_arch_sts) \
 +      (((mme_arch_sts) & MME_ARCH_IDLE_MASK) == MME_ARCH_IDLE_MASK)
 +
 +static const char goya_irq_name[GOYA_MSIX_ENTRIES][GOYA_MAX_STRING_LEN] = {
 +              "goya cq 0", "goya cq 1", "goya cq 2", "goya cq 3",
 +              "goya cq 4", "goya cpu eq"
 +};
 +
 +static u16 goya_packet_sizes[MAX_PACKET_ID] = {
 +      [PACKET_WREG_32]        = sizeof(struct packet_wreg32),
 +      [PACKET_WREG_BULK]      = sizeof(struct packet_wreg_bulk),
 +      [PACKET_MSG_LONG]       = sizeof(struct packet_msg_long),
 +      [PACKET_MSG_SHORT]      = sizeof(struct packet_msg_short),
 +      [PACKET_CP_DMA]         = sizeof(struct packet_cp_dma),
 +      [PACKET_MSG_PROT]       = sizeof(struct packet_msg_prot),
 +      [PACKET_FENCE]          = sizeof(struct packet_fence),
 +      [PACKET_LIN_DMA]        = sizeof(struct packet_lin_dma),
 +      [PACKET_NOP]            = sizeof(struct packet_nop),
 +      [PACKET_STOP]           = sizeof(struct packet_stop)
 +};
 +
 +static inline bool validate_packet_id(enum packet_id id)
 +{
 +      switch (id) {
 +      case PACKET_WREG_32:
 +      case PACKET_WREG_BULK:
 +      case PACKET_MSG_LONG:
 +      case PACKET_MSG_SHORT:
 +      case PACKET_CP_DMA:
 +      case PACKET_MSG_PROT:
 +      case PACKET_FENCE:
 +      case PACKET_LIN_DMA:
 +      case PACKET_NOP:
 +      case PACKET_STOP:
 +              return true;
 +      default:
 +              return false;
 +      }
 +}
 +
 +static u64 goya_mmu_regs[GOYA_MMU_REGS_NUM] = {
 +      mmDMA_QM_0_GLBL_NON_SECURE_PROPS,
 +      mmDMA_QM_1_GLBL_NON_SECURE_PROPS,
 +      mmDMA_QM_2_GLBL_NON_SECURE_PROPS,
 +      mmDMA_QM_3_GLBL_NON_SECURE_PROPS,
 +      mmDMA_QM_4_GLBL_NON_SECURE_PROPS,
 +      mmTPC0_QM_GLBL_SECURE_PROPS,
 +      mmTPC0_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC0_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC0_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC0_CFG_ARUSER,
 +      mmTPC0_CFG_AWUSER,
 +      mmTPC1_QM_GLBL_SECURE_PROPS,
 +      mmTPC1_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC1_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC1_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC1_CFG_ARUSER,
 +      mmTPC1_CFG_AWUSER,
 +      mmTPC2_QM_GLBL_SECURE_PROPS,
 +      mmTPC2_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC2_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC2_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC2_CFG_ARUSER,
 +      mmTPC2_CFG_AWUSER,
 +      mmTPC3_QM_GLBL_SECURE_PROPS,
 +      mmTPC3_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC3_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC3_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC3_CFG_ARUSER,
 +      mmTPC3_CFG_AWUSER,
 +      mmTPC4_QM_GLBL_SECURE_PROPS,
 +      mmTPC4_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC4_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC4_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC4_CFG_ARUSER,
 +      mmTPC4_CFG_AWUSER,
 +      mmTPC5_QM_GLBL_SECURE_PROPS,
 +      mmTPC5_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC5_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC5_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC5_CFG_ARUSER,
 +      mmTPC5_CFG_AWUSER,
 +      mmTPC6_QM_GLBL_SECURE_PROPS,
 +      mmTPC6_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC6_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC6_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC6_CFG_ARUSER,
 +      mmTPC6_CFG_AWUSER,
 +      mmTPC7_QM_GLBL_SECURE_PROPS,
 +      mmTPC7_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC7_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC7_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC7_CFG_ARUSER,
 +      mmTPC7_CFG_AWUSER,
 +      mmMME_QM_GLBL_SECURE_PROPS,
 +      mmMME_QM_GLBL_NON_SECURE_PROPS,
 +      mmMME_CMDQ_GLBL_SECURE_PROPS,
 +      mmMME_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmMME_SBA_CONTROL_DATA,
 +      mmMME_SBB_CONTROL_DATA,
 +      mmMME_SBC_CONTROL_DATA,
 +      mmMME_WBC_CONTROL_DATA,
 +      mmPCIE_WRAP_PSOC_ARUSER,
 +      mmPCIE_WRAP_PSOC_AWUSER
 +};
 +
 +static u32 goya_all_events[] = {
 +      GOYA_ASYNC_EVENT_ID_PCIE_IF,
 +      GOYA_ASYNC_EVENT_ID_TPC0_ECC,
 +      GOYA_ASYNC_EVENT_ID_TPC1_ECC,
 +      GOYA_ASYNC_EVENT_ID_TPC2_ECC,
 +      GOYA_ASYNC_EVENT_ID_TPC3_ECC,
 +      GOYA_ASYNC_EVENT_ID_TPC4_ECC,
 +      GOYA_ASYNC_EVENT_ID_TPC5_ECC,
 +      GOYA_ASYNC_EVENT_ID_TPC6_ECC,
 +      GOYA_ASYNC_EVENT_ID_TPC7_ECC,
 +      GOYA_ASYNC_EVENT_ID_MME_ECC,
 +      GOYA_ASYNC_EVENT_ID_MME_ECC_EXT,
 +      GOYA_ASYNC_EVENT_ID_MMU_ECC,
 +      GOYA_ASYNC_EVENT_ID_DMA_MACRO,
 +      GOYA_ASYNC_EVENT_ID_DMA_ECC,
 +      GOYA_ASYNC_EVENT_ID_CPU_IF_ECC,
 +      GOYA_ASYNC_EVENT_ID_PSOC_MEM,
 +      GOYA_ASYNC_EVENT_ID_PSOC_CORESIGHT,
 +      GOYA_ASYNC_EVENT_ID_SRAM0,
 +      GOYA_ASYNC_EVENT_ID_SRAM1,
 +      GOYA_ASYNC_EVENT_ID_SRAM2,
 +      GOYA_ASYNC_EVENT_ID_SRAM3,
 +      GOYA_ASYNC_EVENT_ID_SRAM4,
 +      GOYA_ASYNC_EVENT_ID_SRAM5,
 +      GOYA_ASYNC_EVENT_ID_SRAM6,
 +      GOYA_ASYNC_EVENT_ID_SRAM7,
 +      GOYA_ASYNC_EVENT_ID_SRAM8,
 +      GOYA_ASYNC_EVENT_ID_SRAM9,
 +      GOYA_ASYNC_EVENT_ID_SRAM10,
 +      GOYA_ASYNC_EVENT_ID_SRAM11,
 +      GOYA_ASYNC_EVENT_ID_SRAM12,
 +      GOYA_ASYNC_EVENT_ID_SRAM13,
 +      GOYA_ASYNC_EVENT_ID_SRAM14,
 +      GOYA_ASYNC_EVENT_ID_SRAM15,
 +      GOYA_ASYNC_EVENT_ID_SRAM16,
 +      GOYA_ASYNC_EVENT_ID_SRAM17,
 +      GOYA_ASYNC_EVENT_ID_SRAM18,
 +      GOYA_ASYNC_EVENT_ID_SRAM19,
 +      GOYA_ASYNC_EVENT_ID_SRAM20,
 +      GOYA_ASYNC_EVENT_ID_SRAM21,
 +      GOYA_ASYNC_EVENT_ID_SRAM22,
 +      GOYA_ASYNC_EVENT_ID_SRAM23,
 +      GOYA_ASYNC_EVENT_ID_SRAM24,
 +      GOYA_ASYNC_EVENT_ID_SRAM25,
 +      GOYA_ASYNC_EVENT_ID_SRAM26,
 +      GOYA_ASYNC_EVENT_ID_SRAM27,
 +      GOYA_ASYNC_EVENT_ID_SRAM28,
 +      GOYA_ASYNC_EVENT_ID_SRAM29,
 +      GOYA_ASYNC_EVENT_ID_GIC500,
 +      GOYA_ASYNC_EVENT_ID_PLL0,
 +      GOYA_ASYNC_EVENT_ID_PLL1,
 +      GOYA_ASYNC_EVENT_ID_PLL3,
 +      GOYA_ASYNC_EVENT_ID_PLL4,
 +      GOYA_ASYNC_EVENT_ID_PLL5,
 +      GOYA_ASYNC_EVENT_ID_PLL6,
 +      GOYA_ASYNC_EVENT_ID_AXI_ECC,
 +      GOYA_ASYNC_EVENT_ID_L2_RAM_ECC,
 +      GOYA_ASYNC_EVENT_ID_PSOC_GPIO_05_SW_RESET,
 +      GOYA_ASYNC_EVENT_ID_PSOC_GPIO_10_VRHOT_ICRIT,
 +      GOYA_ASYNC_EVENT_ID_PCIE_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC0_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC1_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC2_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC3_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC4_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC5_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC6_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC7_DEC,
 +      GOYA_ASYNC_EVENT_ID_MME_WACS,
 +      GOYA_ASYNC_EVENT_ID_MME_WACSD,
 +      GOYA_ASYNC_EVENT_ID_CPU_AXI_SPLITTER,
 +      GOYA_ASYNC_EVENT_ID_PSOC_AXI_DEC,
 +      GOYA_ASYNC_EVENT_ID_PSOC,
 +      GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC1_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC2_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC3_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC4_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC5_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC6_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC7_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC0_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC1_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC2_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC3_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC4_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC5_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC6_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC7_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC0_QM,
 +      GOYA_ASYNC_EVENT_ID_TPC1_QM,
 +      GOYA_ASYNC_EVENT_ID_TPC2_QM,
 +      GOYA_ASYNC_EVENT_ID_TPC3_QM,
 +      GOYA_ASYNC_EVENT_ID_TPC4_QM,
 +      GOYA_ASYNC_EVENT_ID_TPC5_QM,
 +      GOYA_ASYNC_EVENT_ID_TPC6_QM,
 +      GOYA_ASYNC_EVENT_ID_TPC7_QM,
 +      GOYA_ASYNC_EVENT_ID_MME_QM,
 +      GOYA_ASYNC_EVENT_ID_MME_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_DMA0_QM,
 +      GOYA_ASYNC_EVENT_ID_DMA1_QM,
 +      GOYA_ASYNC_EVENT_ID_DMA2_QM,
 +      GOYA_ASYNC_EVENT_ID_DMA3_QM,
 +      GOYA_ASYNC_EVENT_ID_DMA4_QM,
 +      GOYA_ASYNC_EVENT_ID_DMA0_CH,
 +      GOYA_ASYNC_EVENT_ID_DMA1_CH,
 +      GOYA_ASYNC_EVENT_ID_DMA2_CH,
 +      GOYA_ASYNC_EVENT_ID_DMA3_CH,
 +      GOYA_ASYNC_EVENT_ID_DMA4_CH,
 +      GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_TPC1_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_TPC2_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_TPC3_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_TPC4_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_TPC5_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_TPC6_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_TPC7_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_DMA_BM_CH0,
 +      GOYA_ASYNC_EVENT_ID_DMA_BM_CH1,
 +      GOYA_ASYNC_EVENT_ID_DMA_BM_CH2,
 +      GOYA_ASYNC_EVENT_ID_DMA_BM_CH3,
 +      GOYA_ASYNC_EVENT_ID_DMA_BM_CH4,
 +      GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_S,
 +      GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_E,
 +      GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_S,
 +      GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_E
 +};
 +
 +static s64 goya_state_dump_specs_props[SP_MAX] = {0};
 +
 +static int goya_mmu_clear_pgt_range(struct hl_device *hdev);
 +static int goya_mmu_set_dram_default_page(struct hl_device *hdev);
 +static int goya_mmu_add_mappings_for_device_cpu(struct hl_device *hdev);
 +static void goya_mmu_prepare(struct hl_device *hdev, u32 asid);
 +
 +int goya_set_fixed_properties(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int i;
 +
 +      prop->max_queues = GOYA_QUEUE_ID_SIZE;
 +      prop->hw_queues_props = kcalloc(prop->max_queues,
 +                      sizeof(struct hw_queue_properties),
 +                      GFP_KERNEL);
 +
 +      if (!prop->hw_queues_props)
 +              return -ENOMEM;
 +
 +      for (i = 0 ; i < NUMBER_OF_EXT_HW_QUEUES ; i++) {
 +              prop->hw_queues_props[i].type = QUEUE_TYPE_EXT;
 +              prop->hw_queues_props[i].driver_only = 0;
 +              prop->hw_queues_props[i].cb_alloc_flags = CB_ALLOC_KERNEL;
 +      }
 +
 +      for (; i < NUMBER_OF_EXT_HW_QUEUES + NUMBER_OF_CPU_HW_QUEUES ; i++) {
 +              prop->hw_queues_props[i].type = QUEUE_TYPE_CPU;
 +              prop->hw_queues_props[i].driver_only = 1;
 +              prop->hw_queues_props[i].cb_alloc_flags = CB_ALLOC_KERNEL;
 +      }
 +
 +      for (; i < NUMBER_OF_EXT_HW_QUEUES + NUMBER_OF_CPU_HW_QUEUES +
 +                      NUMBER_OF_INT_HW_QUEUES; i++) {
 +              prop->hw_queues_props[i].type = QUEUE_TYPE_INT;
 +              prop->hw_queues_props[i].driver_only = 0;
 +              prop->hw_queues_props[i].cb_alloc_flags = CB_ALLOC_USER;
 +      }
 +
 +      prop->cfg_base_address = CFG_BASE;
 +      prop->device_dma_offset_for_host_access = HOST_PHYS_BASE;
 +      prop->host_base_address = HOST_PHYS_BASE;
 +      prop->host_end_address = prop->host_base_address + HOST_PHYS_SIZE;
 +      prop->completion_queues_count = NUMBER_OF_CMPLT_QUEUES;
 +      prop->completion_mode = HL_COMPLETION_MODE_JOB;
 +      prop->dram_base_address = DRAM_PHYS_BASE;
 +      prop->dram_size = DRAM_PHYS_DEFAULT_SIZE;
 +      prop->dram_end_address = prop->dram_base_address + prop->dram_size;
 +      prop->dram_user_base_address = DRAM_BASE_ADDR_USER;
 +
 +      prop->sram_base_address = SRAM_BASE_ADDR;
 +      prop->sram_size = SRAM_SIZE;
 +      prop->sram_end_address = prop->sram_base_address + prop->sram_size;
 +      prop->sram_user_base_address = prop->sram_base_address +
 +                                              SRAM_USER_BASE_OFFSET;
 +
 +      prop->mmu_pgt_addr = MMU_PAGE_TABLES_ADDR;
 +      prop->mmu_dram_default_page_addr = MMU_DRAM_DEFAULT_PAGE_ADDR;
 +      if (hdev->pldm)
 +              prop->mmu_pgt_size = 0x800000; /* 8MB */
 +      else
 +              prop->mmu_pgt_size = MMU_PAGE_TABLES_SIZE;
 +      prop->mmu_pte_size = HL_PTE_SIZE;
 +      prop->mmu_hop_table_size = HOP_TABLE_SIZE_512_PTE;
 +      prop->mmu_hop0_tables_total_size = HOP0_512_PTE_TABLES_TOTAL_SIZE;
 +      prop->dram_page_size = PAGE_SIZE_2MB;
 +      prop->device_mem_alloc_default_page_size = prop->dram_page_size;
 +      prop->dram_supports_virtual_memory = true;
 +
 +      prop->dmmu.hop_shifts[MMU_HOP0] = MMU_V1_0_HOP0_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP1] = MMU_V1_0_HOP1_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP2] = MMU_V1_0_HOP2_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP3] = MMU_V1_0_HOP3_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP4] = MMU_V1_0_HOP4_SHIFT;
 +      prop->dmmu.hop_masks[MMU_HOP0] = MMU_V1_0_HOP0_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP1] = MMU_V1_0_HOP1_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP2] = MMU_V1_0_HOP2_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP3] = MMU_V1_0_HOP3_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP4] = MMU_V1_0_HOP4_MASK;
 +      prop->dmmu.start_addr = VA_DDR_SPACE_START;
 +      prop->dmmu.end_addr = VA_DDR_SPACE_END;
 +      prop->dmmu.page_size = PAGE_SIZE_2MB;
 +      prop->dmmu.num_hops = MMU_ARCH_5_HOPS;
 +      prop->dmmu.last_mask = LAST_MASK;
 +      /* TODO: will be duplicated until implementing per-MMU props */
 +      prop->dmmu.hop_table_size = prop->mmu_hop_table_size;
 +      prop->dmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
 +
 +      /* shifts and masks are the same in PMMU and DMMU */
 +      memcpy(&prop->pmmu, &prop->dmmu, sizeof(prop->dmmu));
 +      prop->pmmu.start_addr = VA_HOST_SPACE_START;
 +      prop->pmmu.end_addr = VA_HOST_SPACE_END;
 +      prop->pmmu.page_size = PAGE_SIZE_4KB;
 +      prop->pmmu.num_hops = MMU_ARCH_5_HOPS;
 +      prop->pmmu.last_mask = LAST_MASK;
 +      /* TODO: will be duplicated until implementing per-MMU props */
 +      prop->pmmu.hop_table_size = prop->mmu_hop_table_size;
 +      prop->pmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
 +
 +      /* PMMU and HPMMU are the same except of page size */
 +      memcpy(&prop->pmmu_huge, &prop->pmmu, sizeof(prop->pmmu));
 +      prop->pmmu_huge.page_size = PAGE_SIZE_2MB;
 +
 +      prop->dram_size_for_default_page_mapping = VA_DDR_SPACE_END;
 +      prop->cfg_size = CFG_SIZE;
 +      prop->max_asid = MAX_ASID;
 +      prop->num_of_events = GOYA_ASYNC_EVENT_ID_SIZE;
 +      prop->high_pll = PLL_HIGH_DEFAULT;
 +      prop->cb_pool_cb_cnt = GOYA_CB_POOL_CB_CNT;
 +      prop->cb_pool_cb_size = GOYA_CB_POOL_CB_SIZE;
 +      prop->max_power_default = MAX_POWER_DEFAULT;
 +      prop->dc_power_default = DC_POWER_DEFAULT;
 +      prop->tpc_enabled_mask = TPC_ENABLED_MASK;
 +      prop->pcie_dbi_base_address = mmPCIE_DBI_BASE;
 +      prop->pcie_aux_dbi_reg_addr = CFG_BASE + mmPCIE_AUX_DBI;
 +
 +      strncpy(prop->cpucp_info.card_name, GOYA_DEFAULT_CARD_NAME,
 +              CARD_NAME_MAX_LEN);
 +
 +      prop->max_pending_cs = GOYA_MAX_PENDING_CS;
 +
 +      prop->first_available_user_interrupt = USHRT_MAX;
 +
 +      for (i = 0 ; i < HL_MAX_DCORES ; i++)
 +              prop->first_available_cq[i] = USHRT_MAX;
 +
 +      prop->fw_cpu_boot_dev_sts0_valid = false;
 +      prop->fw_cpu_boot_dev_sts1_valid = false;
 +      prop->hard_reset_done_by_fw = false;
 +      prop->gic_interrupts_enable = true;
 +
 +      prop->server_type = HL_SERVER_TYPE_UNKNOWN;
 +
 +      prop->clk_pll_index = HL_GOYA_MME_PLL;
 +
 +      prop->use_get_power_for_reset_history = true;
 +
 +      prop->configurable_stop_on_err = true;
 +
 +      prop->set_max_power_on_device_init = true;
 +
 +      prop->dma_mask = 48;
 +
 +      return 0;
 +}
 +
 +/*
 + * goya_pci_bars_map - Map PCI BARS of Goya device
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Request PCI regions and map them to kernel virtual addresses.
 + * Returns 0 on success
 + *
 + */
 +static int goya_pci_bars_map(struct hl_device *hdev)
 +{
 +      static const char * const name[] = {"SRAM_CFG", "MSIX", "DDR"};
 +      bool is_wc[3] = {false, false, true};
 +      int rc;
 +
 +      rc = hl_pci_bars_map(hdev, name, is_wc);
 +      if (rc)
 +              return rc;
 +
 +      hdev->rmmio = hdev->pcie_bar[SRAM_CFG_BAR_ID] +
 +                      (CFG_BASE - SRAM_BASE_ADDR);
 +
 +      return 0;
 +}
 +
 +static u64 goya_set_ddr_bar_base(struct hl_device *hdev, u64 addr)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      struct hl_inbound_pci_region pci_region;
 +      u64 old_addr = addr;
 +      int rc;
 +
 +      if ((goya) && (goya->ddr_bar_cur_addr == addr))
 +              return old_addr;
 +
 +      /* Inbound Region 1 - Bar 4 - Point to DDR */
 +      pci_region.mode = PCI_BAR_MATCH_MODE;
 +      pci_region.bar = DDR_BAR_ID;
 +      pci_region.addr = addr;
 +      rc = hl_pci_set_inbound_region(hdev, 1, &pci_region);
 +      if (rc)
 +              return U64_MAX;
 +
 +      if (goya) {
 +              old_addr = goya->ddr_bar_cur_addr;
 +              goya->ddr_bar_cur_addr = addr;
 +      }
 +
 +      return old_addr;
 +}
 +
 +/*
 + * goya_init_iatu - Initialize the iATU unit inside the PCI controller
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * This is needed in case the firmware doesn't initialize the iATU
 + *
 + */
 +static int goya_init_iatu(struct hl_device *hdev)
 +{
 +      struct hl_inbound_pci_region inbound_region;
 +      struct hl_outbound_pci_region outbound_region;
 +      int rc;
 +
 +      if (hdev->asic_prop.iatu_done_by_fw)
 +              return 0;
 +
 +      /* Inbound Region 0 - Bar 0 - Point to SRAM and CFG */
 +      inbound_region.mode = PCI_BAR_MATCH_MODE;
 +      inbound_region.bar = SRAM_CFG_BAR_ID;
 +      inbound_region.addr = SRAM_BASE_ADDR;
 +      rc = hl_pci_set_inbound_region(hdev, 0, &inbound_region);
 +      if (rc)
 +              goto done;
 +
 +      /* Inbound Region 1 - Bar 4 - Point to DDR */
 +      inbound_region.mode = PCI_BAR_MATCH_MODE;
 +      inbound_region.bar = DDR_BAR_ID;
 +      inbound_region.addr = DRAM_PHYS_BASE;
 +      rc = hl_pci_set_inbound_region(hdev, 1, &inbound_region);
 +      if (rc)
 +              goto done;
 +
 +      /* Outbound Region 0 - Point to Host  */
 +      outbound_region.addr = HOST_PHYS_BASE;
 +      outbound_region.size = HOST_PHYS_SIZE;
 +      rc = hl_pci_set_outbound_region(hdev, &outbound_region);
 +
 +done:
 +      return rc;
 +}
 +
 +static enum hl_device_hw_state goya_get_hw_state(struct hl_device *hdev)
 +{
 +      return RREG32(mmHW_STATE);
 +}
 +
 +/*
 + * goya_early_init - GOYA early initialization code
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Verify PCI bars
 + * Set DMA masks
 + * PCI controller initialization
 + * Map PCI bars
 + *
 + */
 +static int goya_early_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct pci_dev *pdev = hdev->pdev;
 +      resource_size_t pci_bar_size;
 +      u32 fw_boot_status, val;
 +      int rc;
 +
 +      rc = goya_set_fixed_properties(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to get fixed properties\n");
 +              return rc;
 +      }
 +
 +      /* Check BAR sizes */
 +      pci_bar_size = pci_resource_len(pdev, SRAM_CFG_BAR_ID);
 +
 +      if (pci_bar_size != CFG_BAR_SIZE) {
 +              dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
 +                      SRAM_CFG_BAR_ID, &pci_bar_size, CFG_BAR_SIZE);
 +              rc = -ENODEV;
 +              goto free_queue_props;
 +      }
 +
 +      pci_bar_size = pci_resource_len(pdev, MSIX_BAR_ID);
 +
 +      if (pci_bar_size != MSIX_BAR_SIZE) {
 +              dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
 +                      MSIX_BAR_ID, &pci_bar_size, MSIX_BAR_SIZE);
 +              rc = -ENODEV;
 +              goto free_queue_props;
 +      }
 +
 +      prop->dram_pci_bar_size = pci_resource_len(pdev, DDR_BAR_ID);
 +      hdev->dram_pci_bar_start = pci_resource_start(pdev, DDR_BAR_ID);
 +
 +      /* If FW security is enabled at this point it means no access to ELBI */
 +      if (hdev->asic_prop.fw_security_enabled) {
 +              hdev->asic_prop.iatu_done_by_fw = true;
 +              goto pci_init;
 +      }
 +
 +      rc = hl_pci_elbi_read(hdev, CFG_BASE + mmCPU_BOOT_DEV_STS0,
 +                              &fw_boot_status);
 +      if (rc)
 +              goto free_queue_props;
 +
 +      /* Check whether FW is configuring iATU */
 +      if ((fw_boot_status & CPU_BOOT_DEV_STS0_ENABLED) &&
 +                      (fw_boot_status & CPU_BOOT_DEV_STS0_FW_IATU_CONF_EN))
 +              hdev->asic_prop.iatu_done_by_fw = true;
 +
 +pci_init:
 +      rc = hl_pci_init(hdev);
 +      if (rc)
 +              goto free_queue_props;
 +
 +      /* Before continuing in the initialization, we need to read the preboot
 +       * version to determine whether we run with a security-enabled firmware
 +       */
 +      rc = hl_fw_read_preboot_status(hdev);
 +      if (rc) {
 +              if (hdev->reset_on_preboot_fail)
 +                      hdev->asic_funcs->hw_fini(hdev, true, false);
 +              goto pci_fini;
 +      }
 +
 +      if (goya_get_hw_state(hdev) == HL_DEVICE_HW_STATE_DIRTY) {
 +              dev_dbg(hdev->dev, "H/W state is dirty, must reset before initializing\n");
 +              hdev->asic_funcs->hw_fini(hdev, true, false);
 +      }
 +
 +      if (!hdev->pldm) {
 +              val = RREG32(mmPSOC_GLOBAL_CONF_BOOT_STRAP_PINS);
 +              if (val & PSOC_GLOBAL_CONF_BOOT_STRAP_PINS_SRIOV_EN_MASK)
 +                      dev_warn(hdev->dev,
 +                              "PCI strap is not configured correctly, PCI bus errors may occur\n");
 +      }
 +
 +      return 0;
 +
 +pci_fini:
 +      hl_pci_fini(hdev);
 +free_queue_props:
 +      kfree(hdev->asic_prop.hw_queues_props);
 +      return rc;
 +}
 +
 +/*
 + * goya_early_fini - GOYA early finalization code
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Unmap PCI bars
 + *
 + */
 +static int goya_early_fini(struct hl_device *hdev)
 +{
 +      kfree(hdev->asic_prop.hw_queues_props);
 +      hl_pci_fini(hdev);
 +
 +      return 0;
 +}
 +
 +static void goya_mmu_prepare_reg(struct hl_device *hdev, u64 reg, u32 asid)
 +{
 +      /* mask to zero the MMBP and ASID bits */
 +      WREG32_AND(reg, ~0x7FF);
 +      WREG32_OR(reg, asid);
 +}
 +
 +static void goya_qman0_set_security(struct hl_device *hdev, bool secure)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU))
 +              return;
 +
 +      if (secure)
 +              WREG32(mmDMA_QM_0_GLBL_PROT, QMAN_DMA_FULLY_TRUSTED);
 +      else
 +              WREG32(mmDMA_QM_0_GLBL_PROT, QMAN_DMA_PARTLY_TRUSTED);
 +
 +      RREG32(mmDMA_QM_0_GLBL_PROT);
 +}
 +
 +/*
 + * goya_fetch_psoc_frequency - Fetch PSOC frequency values
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + */
 +static void goya_fetch_psoc_frequency(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u32 nr = 0, nf = 0, od = 0, div_fctr = 0, pll_clk, div_sel;
 +      u16 pll_freq_arr[HL_PLL_NUM_OUTPUTS], freq;
 +      int rc;
 +
 +      if (hdev->asic_prop.fw_security_enabled) {
 +              struct goya_device *goya = hdev->asic_specific;
 +
 +              if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
 +                      return;
 +
 +              rc = hl_fw_cpucp_pll_info_get(hdev, HL_GOYA_PCI_PLL,
 +                              pll_freq_arr);
 +
 +              if (rc)
 +                      return;
 +
 +              freq = pll_freq_arr[1];
 +      } else {
 +              div_fctr = RREG32(mmPSOC_PCI_PLL_DIV_FACTOR_1);
 +              div_sel = RREG32(mmPSOC_PCI_PLL_DIV_SEL_1);
 +              nr = RREG32(mmPSOC_PCI_PLL_NR);
 +              nf = RREG32(mmPSOC_PCI_PLL_NF);
 +              od = RREG32(mmPSOC_PCI_PLL_OD);
 +
 +              if (div_sel == DIV_SEL_REF_CLK ||
 +                              div_sel == DIV_SEL_DIVIDED_REF) {
 +                      if (div_sel == DIV_SEL_REF_CLK)
 +                              freq = PLL_REF_CLK;
 +                      else
 +                              freq = PLL_REF_CLK / (div_fctr + 1);
 +              } else if (div_sel == DIV_SEL_PLL_CLK ||
 +                              div_sel == DIV_SEL_DIVIDED_PLL) {
 +                      pll_clk = PLL_REF_CLK * (nf + 1) /
 +                                      ((nr + 1) * (od + 1));
 +                      if (div_sel == DIV_SEL_PLL_CLK)
 +                              freq = pll_clk;
 +                      else
 +                              freq = pll_clk / (div_fctr + 1);
 +              } else {
 +                      dev_warn(hdev->dev,
 +                              "Received invalid div select value: %d",
 +                              div_sel);
 +                      freq = 0;
 +              }
 +      }
 +
 +      prop->psoc_timestamp_frequency = freq;
 +      prop->psoc_pci_pll_nr = nr;
 +      prop->psoc_pci_pll_nf = nf;
 +      prop->psoc_pci_pll_od = od;
 +      prop->psoc_pci_pll_div_factor = div_fctr;
 +}
 +
 +/*
 + * goya_set_frequency - set the frequency of the device
 + *
 + * @hdev: pointer to habanalabs device structure
 + * @freq: the new frequency value
 + *
 + * Change the frequency if needed. This function has no protection against
 + * concurrency, therefore it is assumed that the calling function has protected
 + * itself against the case of calling this function from multiple threads with
 + * different values
 + *
 + * Returns 0 if no change was done, otherwise returns 1
 + */
 +int goya_set_frequency(struct hl_device *hdev, enum hl_pll_frequency freq)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if ((goya->pm_mng_profile == PM_MANUAL) ||
 +                      (goya->curr_pll_profile == freq))
 +              return 0;
 +
 +      dev_dbg(hdev->dev, "Changing device frequency to %s\n",
 +              freq == PLL_HIGH ? "high" : "low");
 +
 +      goya_set_pll_profile(hdev, freq);
 +
 +      goya->curr_pll_profile = freq;
 +
 +      return 1;
 +}
 +
 +static void goya_set_freq_to_low_job(struct work_struct *work)
 +{
 +      struct goya_work_freq *goya_work = container_of(work,
 +                                              struct goya_work_freq,
 +                                              work_freq.work);
 +      struct hl_device *hdev = goya_work->hdev;
 +
 +      mutex_lock(&hdev->fpriv_list_lock);
 +
 +      if (!hdev->is_compute_ctx_active)
 +              goya_set_frequency(hdev, PLL_LOW);
 +
 +      mutex_unlock(&hdev->fpriv_list_lock);
 +
 +      schedule_delayed_work(&goya_work->work_freq,
 +                      usecs_to_jiffies(HL_PLL_LOW_JOB_FREQ_USEC));
 +}
 +
 +int goya_late_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct goya_device *goya = hdev->asic_specific;
 +      int rc;
 +
 +      goya_fetch_psoc_frequency(hdev);
 +
 +      rc = goya_mmu_clear_pgt_range(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to clear MMU page tables range %d\n", rc);
 +              return rc;
 +      }
 +
 +      rc = goya_mmu_set_dram_default_page(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to set DRAM default page %d\n", rc);
 +              return rc;
 +      }
 +
 +      rc = goya_mmu_add_mappings_for_device_cpu(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = goya_init_cpu_queues(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = goya_test_cpu_queue(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = goya_cpucp_info_get(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to get cpucp info %d\n", rc);
 +              return rc;
 +      }
 +
 +      /* Now that we have the DRAM size in ASIC prop, we need to check
 +       * its size and configure the DMA_IF DDR wrap protection (which is in
 +       * the MMU block) accordingly. The value is the log2 of the DRAM size
 +       */
 +      WREG32(mmMMU_LOG2_DDR_SIZE, ilog2(prop->dram_size));
 +
 +      rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_ENABLE_PCI_ACCESS, 0x0);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to enable PCI access from CPU %d\n", rc);
 +              return rc;
 +      }
 +
 +      /* force setting to low frequency */
 +      goya->curr_pll_profile = PLL_LOW;
 +
 +      goya->pm_mng_profile = PM_AUTO;
 +
 +      goya_set_pll_profile(hdev, PLL_LOW);
 +
 +      schedule_delayed_work(&goya->goya_work->work_freq,
 +              usecs_to_jiffies(HL_PLL_LOW_JOB_FREQ_USEC));
 +
 +      return 0;
 +}
 +
 +/*
 + * goya_late_fini - GOYA late tear-down code
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Free sensors allocated structures
 + */
 +void goya_late_fini(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      cancel_delayed_work_sync(&goya->goya_work->work_freq);
 +
 +      hl_hwmon_release_resources(hdev);
 +}
 +
 +static void goya_set_pci_memory_regions(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct pci_mem_region *region;
 +
 +      /* CFG */
 +      region = &hdev->pci_mem_region[PCI_REGION_CFG];
 +      region->region_base = CFG_BASE;
 +      region->region_size = CFG_SIZE;
 +      region->offset_in_bar = CFG_BASE - SRAM_BASE_ADDR;
 +      region->bar_size = CFG_BAR_SIZE;
 +      region->bar_id = SRAM_CFG_BAR_ID;
 +      region->used = 1;
 +
 +      /* SRAM */
 +      region = &hdev->pci_mem_region[PCI_REGION_SRAM];
 +      region->region_base = SRAM_BASE_ADDR;
 +      region->region_size = SRAM_SIZE;
 +      region->offset_in_bar = 0;
 +      region->bar_size = CFG_BAR_SIZE;
 +      region->bar_id = SRAM_CFG_BAR_ID;
 +      region->used = 1;
 +
 +      /* DRAM */
 +      region = &hdev->pci_mem_region[PCI_REGION_DRAM];
 +      region->region_base = DRAM_PHYS_BASE;
 +      region->region_size = hdev->asic_prop.dram_size;
 +      region->offset_in_bar = 0;
 +      region->bar_size = prop->dram_pci_bar_size;
 +      region->bar_id = DDR_BAR_ID;
 +      region->used = 1;
 +}
 +
 +/*
 + * goya_sw_init - Goya software initialization code
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + */
 +static int goya_sw_init(struct hl_device *hdev)
 +{
 +      struct goya_device *goya;
 +      int rc;
 +
 +      /* Allocate device structure */
 +      goya = kzalloc(sizeof(*goya), GFP_KERNEL);
 +      if (!goya)
 +              return -ENOMEM;
 +
 +      /* according to goya_init_iatu */
 +      goya->ddr_bar_cur_addr = DRAM_PHYS_BASE;
 +
 +      goya->mme_clk = GOYA_PLL_FREQ_LOW;
 +      goya->tpc_clk = GOYA_PLL_FREQ_LOW;
 +      goya->ic_clk = GOYA_PLL_FREQ_LOW;
 +
 +      hdev->asic_specific = goya;
 +
 +      /* Create DMA pool for small allocations */
 +      hdev->dma_pool = dma_pool_create(dev_name(hdev->dev),
 +                      &hdev->pdev->dev, GOYA_DMA_POOL_BLK_SIZE, 8, 0);
 +      if (!hdev->dma_pool) {
 +              dev_err(hdev->dev, "failed to create DMA pool\n");
 +              rc = -ENOMEM;
 +              goto free_goya_device;
 +      }
 +
 +      hdev->cpu_accessible_dma_mem = hl_asic_dma_alloc_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE,
 +                                                      &hdev->cpu_accessible_dma_address,
 +                                                      GFP_KERNEL | __GFP_ZERO);
 +
 +      if (!hdev->cpu_accessible_dma_mem) {
 +              rc = -ENOMEM;
 +              goto free_dma_pool;
 +      }
 +
 +      dev_dbg(hdev->dev, "cpu accessible memory at bus address %pad\n",
 +              &hdev->cpu_accessible_dma_address);
 +
 +      hdev->cpu_accessible_dma_pool = gen_pool_create(ilog2(32), -1);
 +      if (!hdev->cpu_accessible_dma_pool) {
 +              dev_err(hdev->dev,
 +                      "Failed to create CPU accessible DMA pool\n");
 +              rc = -ENOMEM;
 +              goto free_cpu_dma_mem;
 +      }
 +
 +      rc = gen_pool_add(hdev->cpu_accessible_dma_pool,
 +                              (uintptr_t) hdev->cpu_accessible_dma_mem,
 +                              HL_CPU_ACCESSIBLE_MEM_SIZE, -1);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to add memory to CPU accessible DMA pool\n");
 +              rc = -EFAULT;
 +              goto free_cpu_accessible_dma_pool;
 +      }
 +
 +      spin_lock_init(&goya->hw_queues_lock);
 +      hdev->supports_coresight = true;
 +      hdev->asic_prop.supports_compute_reset = true;
 +      hdev->asic_prop.allow_inference_soft_reset = true;
 +      hdev->supports_wait_for_multi_cs = false;
 +      hdev->supports_ctx_switch = true;
 +
 +      hdev->asic_funcs->set_pci_memory_regions(hdev);
 +
 +      goya->goya_work = kmalloc(sizeof(struct goya_work_freq), GFP_KERNEL);
 +      if (!goya->goya_work) {
 +              rc = -ENOMEM;
 +              goto free_cpu_accessible_dma_pool;
 +      }
 +
 +      goya->goya_work->hdev = hdev;
 +      INIT_DELAYED_WORK(&goya->goya_work->work_freq, goya_set_freq_to_low_job);
 +
 +      return 0;
 +
 +free_cpu_accessible_dma_pool:
 +      gen_pool_destroy(hdev->cpu_accessible_dma_pool);
 +free_cpu_dma_mem:
 +      hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
 +                                      hdev->cpu_accessible_dma_address);
 +free_dma_pool:
 +      dma_pool_destroy(hdev->dma_pool);
 +free_goya_device:
 +      kfree(goya);
 +
 +      return rc;
 +}
 +
 +/*
 + * goya_sw_fini - Goya software tear-down code
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + */
 +static int goya_sw_fini(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      gen_pool_destroy(hdev->cpu_accessible_dma_pool);
 +
 +      hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
 +                                      hdev->cpu_accessible_dma_address);
 +
 +      dma_pool_destroy(hdev->dma_pool);
 +
 +      kfree(goya->goya_work);
 +      kfree(goya);
 +
 +      return 0;
 +}
 +
 +static void goya_init_dma_qman(struct hl_device *hdev, int dma_id,
 +              dma_addr_t bus_address)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 mtr_base_lo, mtr_base_hi;
 +      u32 so_base_lo, so_base_hi;
 +      u32 gic_base_lo, gic_base_hi;
 +      u32 reg_off = dma_id * (mmDMA_QM_1_PQ_PI - mmDMA_QM_0_PQ_PI);
 +      u32 dma_err_cfg = QMAN_DMA_ERR_MSG_EN;
 +
 +      mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +
 +      gic_base_lo =
 +              lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +      gic_base_hi =
 +              upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +
 +      WREG32(mmDMA_QM_0_PQ_BASE_LO + reg_off, lower_32_bits(bus_address));
 +      WREG32(mmDMA_QM_0_PQ_BASE_HI + reg_off, upper_32_bits(bus_address));
 +
 +      WREG32(mmDMA_QM_0_PQ_SIZE + reg_off, ilog2(HL_QUEUE_LENGTH));
 +      WREG32(mmDMA_QM_0_PQ_PI + reg_off, 0);
 +      WREG32(mmDMA_QM_0_PQ_CI + reg_off, 0);
 +
 +      WREG32(mmDMA_QM_0_CP_MSG_BASE0_ADDR_LO + reg_off, mtr_base_lo);
 +      WREG32(mmDMA_QM_0_CP_MSG_BASE0_ADDR_HI + reg_off, mtr_base_hi);
 +      WREG32(mmDMA_QM_0_CP_MSG_BASE1_ADDR_LO + reg_off, so_base_lo);
 +      WREG32(mmDMA_QM_0_CP_MSG_BASE1_ADDR_HI + reg_off, so_base_hi);
 +      WREG32(mmDMA_QM_0_GLBL_ERR_ADDR_LO + reg_off, gic_base_lo);
 +      WREG32(mmDMA_QM_0_GLBL_ERR_ADDR_HI + reg_off, gic_base_hi);
 +      WREG32(mmDMA_QM_0_GLBL_ERR_WDATA + reg_off,
 +                      GOYA_ASYNC_EVENT_ID_DMA0_QM + dma_id);
 +
 +      /* PQ has buffer of 2 cache lines, while CQ has 8 lines */
 +      WREG32(mmDMA_QM_0_PQ_CFG1 + reg_off, 0x00020002);
 +      WREG32(mmDMA_QM_0_CQ_CFG1 + reg_off, 0x00080008);
 +
 +      if (goya->hw_cap_initialized & HW_CAP_MMU)
 +              WREG32(mmDMA_QM_0_GLBL_PROT + reg_off, QMAN_DMA_PARTLY_TRUSTED);
 +      else
 +              WREG32(mmDMA_QM_0_GLBL_PROT + reg_off, QMAN_DMA_FULLY_TRUSTED);
 +
 +      if (hdev->stop_on_err)
 +              dma_err_cfg |= 1 << DMA_QM_0_GLBL_ERR_CFG_DMA_STOP_ON_ERR_SHIFT;
 +
 +      WREG32(mmDMA_QM_0_GLBL_ERR_CFG + reg_off, dma_err_cfg);
 +      WREG32(mmDMA_QM_0_GLBL_CFG0 + reg_off, QMAN_DMA_ENABLE);
 +}
 +
 +static void goya_init_dma_ch(struct hl_device *hdev, int dma_id)
 +{
 +      u32 gic_base_lo, gic_base_hi;
 +      u64 sob_addr;
 +      u32 reg_off = dma_id * (mmDMA_CH_1_CFG1 - mmDMA_CH_0_CFG1);
 +
 +      gic_base_lo =
 +              lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +      gic_base_hi =
 +              upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +
 +      WREG32(mmDMA_CH_0_ERRMSG_ADDR_LO + reg_off, gic_base_lo);
 +      WREG32(mmDMA_CH_0_ERRMSG_ADDR_HI + reg_off, gic_base_hi);
 +      WREG32(mmDMA_CH_0_ERRMSG_WDATA + reg_off,
 +                      GOYA_ASYNC_EVENT_ID_DMA0_CH + dma_id);
 +
 +      if (dma_id)
 +              sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1000 +
 +                              (dma_id - 1) * 4;
 +      else
 +              sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1007;
 +
 +      WREG32(mmDMA_CH_0_WR_COMP_ADDR_HI + reg_off, upper_32_bits(sob_addr));
 +      WREG32(mmDMA_CH_0_WR_COMP_WDATA + reg_off, 0x80000001);
 +}
 +
 +/*
 + * goya_init_dma_qmans - Initialize QMAN DMA registers
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Initialize the H/W registers of the QMAN DMA channels
 + *
 + */
 +void goya_init_dma_qmans(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      struct hl_hw_queue *q;
 +      int i;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_DMA)
 +              return;
 +
 +      q = &hdev->kernel_queues[0];
 +
 +      for (i = 0 ; i < NUMBER_OF_EXT_HW_QUEUES ; i++, q++) {
 +              q->cq_id = q->msi_vec = i;
 +              goya_init_dma_qman(hdev, i, q->bus_address);
 +              goya_init_dma_ch(hdev, i);
 +      }
 +
 +      goya->hw_cap_initialized |= HW_CAP_DMA;
 +}
 +
 +/*
 + * goya_disable_external_queues - Disable external queues
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + */
 +static void goya_disable_external_queues(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_DMA))
 +              return;
 +
 +      WREG32(mmDMA_QM_0_GLBL_CFG0, 0);
 +      WREG32(mmDMA_QM_1_GLBL_CFG0, 0);
 +      WREG32(mmDMA_QM_2_GLBL_CFG0, 0);
 +      WREG32(mmDMA_QM_3_GLBL_CFG0, 0);
 +      WREG32(mmDMA_QM_4_GLBL_CFG0, 0);
 +}
 +
 +static int goya_stop_queue(struct hl_device *hdev, u32 cfg_reg,
 +                              u32 cp_sts_reg, u32 glbl_sts0_reg)
 +{
 +      int rc;
 +      u32 status;
 +
 +      /* use the values of TPC0 as they are all the same*/
 +
 +      WREG32(cfg_reg, 1 << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +
 +      status = RREG32(cp_sts_reg);
 +      if (status & TPC0_QM_CP_STS_FENCE_IN_PROGRESS_MASK) {
 +              rc = hl_poll_timeout(
 +                      hdev,
 +                      cp_sts_reg,
 +                      status,
 +                      !(status & TPC0_QM_CP_STS_FENCE_IN_PROGRESS_MASK),
 +                      1000,
 +                      QMAN_FENCE_TIMEOUT_USEC);
 +
 +              /* if QMAN is stuck in fence no need to check for stop */
 +              if (rc)
 +                      return 0;
 +      }
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              glbl_sts0_reg,
 +              status,
 +              (status & TPC0_QM_GLBL_STS0_CP_IS_STOP_MASK),
 +              1000,
 +              QMAN_STOP_TIMEOUT_USEC);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Timeout while waiting for QMAN to stop\n");
 +              return -EINVAL;
 +      }
 +
 +      return 0;
 +}
 +
 +/*
 + * goya_stop_external_queues - Stop external queues
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Returns 0 on success
 + *
 + */
 +static int goya_stop_external_queues(struct hl_device *hdev)
 +{
 +      int rc, retval = 0;
 +
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_DMA))
 +              return retval;
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmDMA_QM_0_GLBL_CFG1,
 +                      mmDMA_QM_0_CP_STS,
 +                      mmDMA_QM_0_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop DMA QMAN 0\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmDMA_QM_1_GLBL_CFG1,
 +                      mmDMA_QM_1_CP_STS,
 +                      mmDMA_QM_1_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop DMA QMAN 1\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmDMA_QM_2_GLBL_CFG1,
 +                      mmDMA_QM_2_CP_STS,
 +                      mmDMA_QM_2_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop DMA QMAN 2\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmDMA_QM_3_GLBL_CFG1,
 +                      mmDMA_QM_3_CP_STS,
 +                      mmDMA_QM_3_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop DMA QMAN 3\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmDMA_QM_4_GLBL_CFG1,
 +                      mmDMA_QM_4_CP_STS,
 +                      mmDMA_QM_4_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop DMA QMAN 4\n");
 +              retval = -EIO;
 +      }
 +
 +      return retval;
 +}
 +
 +/*
 + * goya_init_cpu_queues - Initialize PQ/CQ/EQ of CPU
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Returns 0 on success
 + *
 + */
 +int goya_init_cpu_queues(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct hl_eq *eq;
 +      u32 status;
 +      struct hl_hw_queue *cpu_pq = &hdev->kernel_queues[GOYA_QUEUE_ID_CPU_PQ];
 +      int err;
 +
 +      if (!hdev->cpu_queues_enable)
 +              return 0;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_CPU_Q)
 +              return 0;
 +
 +      eq = &hdev->event_queue;
 +
 +      WREG32(mmCPU_PQ_BASE_ADDR_LOW, lower_32_bits(cpu_pq->bus_address));
 +      WREG32(mmCPU_PQ_BASE_ADDR_HIGH, upper_32_bits(cpu_pq->bus_address));
 +
 +      WREG32(mmCPU_EQ_BASE_ADDR_LOW, lower_32_bits(eq->bus_address));
 +      WREG32(mmCPU_EQ_BASE_ADDR_HIGH, upper_32_bits(eq->bus_address));
 +
 +      WREG32(mmCPU_CQ_BASE_ADDR_LOW,
 +                      lower_32_bits(VA_CPU_ACCESSIBLE_MEM_ADDR));
 +      WREG32(mmCPU_CQ_BASE_ADDR_HIGH,
 +                      upper_32_bits(VA_CPU_ACCESSIBLE_MEM_ADDR));
 +
 +      WREG32(mmCPU_PQ_LENGTH, HL_QUEUE_SIZE_IN_BYTES);
 +      WREG32(mmCPU_EQ_LENGTH, HL_EQ_SIZE_IN_BYTES);
 +      WREG32(mmCPU_CQ_LENGTH, HL_CPU_ACCESSIBLE_MEM_SIZE);
 +
 +      /* Used for EQ CI */
 +      WREG32(mmCPU_EQ_CI, 0);
 +
 +      WREG32(mmCPU_IF_PF_PQ_PI, 0);
 +
 +      WREG32(mmCPU_PQ_INIT_STATUS, PQ_INIT_STATUS_READY_FOR_CP);
 +
 +      WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
 +                      GOYA_ASYNC_EVENT_ID_PI_UPDATE);
 +
 +      err = hl_poll_timeout(
 +              hdev,
 +              mmCPU_PQ_INIT_STATUS,
 +              status,
 +              (status == PQ_INIT_STATUS_READY_FOR_HOST),
 +              1000,
 +              GOYA_CPU_TIMEOUT_USEC);
 +
 +      if (err) {
 +              dev_err(hdev->dev,
 +                      "Failed to setup communication with device CPU\n");
 +              return -EIO;
 +      }
 +
 +      /* update FW application security bits */
 +      if (prop->fw_cpu_boot_dev_sts0_valid)
 +              prop->fw_app_cpu_boot_dev_sts0 = RREG32(mmCPU_BOOT_DEV_STS0);
 +
 +      if (prop->fw_cpu_boot_dev_sts1_valid)
 +              prop->fw_app_cpu_boot_dev_sts1 = RREG32(mmCPU_BOOT_DEV_STS1);
 +
 +      goya->hw_cap_initialized |= HW_CAP_CPU_Q;
 +      return 0;
 +}
 +
 +static void goya_set_pll_refclk(struct hl_device *hdev)
 +{
 +      WREG32(mmCPU_PLL_DIV_SEL_0, 0x0);
 +      WREG32(mmCPU_PLL_DIV_SEL_1, 0x0);
 +      WREG32(mmCPU_PLL_DIV_SEL_2, 0x0);
 +      WREG32(mmCPU_PLL_DIV_SEL_3, 0x0);
 +
 +      WREG32(mmIC_PLL_DIV_SEL_0, 0x0);
 +      WREG32(mmIC_PLL_DIV_SEL_1, 0x0);
 +      WREG32(mmIC_PLL_DIV_SEL_2, 0x0);
 +      WREG32(mmIC_PLL_DIV_SEL_3, 0x0);
 +
 +      WREG32(mmMC_PLL_DIV_SEL_0, 0x0);
 +      WREG32(mmMC_PLL_DIV_SEL_1, 0x0);
 +      WREG32(mmMC_PLL_DIV_SEL_2, 0x0);
 +      WREG32(mmMC_PLL_DIV_SEL_3, 0x0);
 +
 +      WREG32(mmPSOC_MME_PLL_DIV_SEL_0, 0x0);
 +      WREG32(mmPSOC_MME_PLL_DIV_SEL_1, 0x0);
 +      WREG32(mmPSOC_MME_PLL_DIV_SEL_2, 0x0);
 +      WREG32(mmPSOC_MME_PLL_DIV_SEL_3, 0x0);
 +
 +      WREG32(mmPSOC_PCI_PLL_DIV_SEL_0, 0x0);
 +      WREG32(mmPSOC_PCI_PLL_DIV_SEL_1, 0x0);
 +      WREG32(mmPSOC_PCI_PLL_DIV_SEL_2, 0x0);
 +      WREG32(mmPSOC_PCI_PLL_DIV_SEL_3, 0x0);
 +
 +      WREG32(mmPSOC_EMMC_PLL_DIV_SEL_0, 0x0);
 +      WREG32(mmPSOC_EMMC_PLL_DIV_SEL_1, 0x0);
 +      WREG32(mmPSOC_EMMC_PLL_DIV_SEL_2, 0x0);
 +      WREG32(mmPSOC_EMMC_PLL_DIV_SEL_3, 0x0);
 +
 +      WREG32(mmTPC_PLL_DIV_SEL_0, 0x0);
 +      WREG32(mmTPC_PLL_DIV_SEL_1, 0x0);
 +      WREG32(mmTPC_PLL_DIV_SEL_2, 0x0);
 +      WREG32(mmTPC_PLL_DIV_SEL_3, 0x0);
 +}
 +
 +static void goya_disable_clk_rlx(struct hl_device *hdev)
 +{
 +      WREG32(mmPSOC_MME_PLL_CLK_RLX_0, 0x100010);
 +      WREG32(mmIC_PLL_CLK_RLX_0, 0x100010);
 +}
 +
 +static void _goya_tpc_mbist_workaround(struct hl_device *hdev, u8 tpc_id)
 +{
 +      u64 tpc_eml_address;
 +      u32 val, tpc_offset, tpc_eml_offset, tpc_slm_offset;
 +      int err, slm_index;
 +
 +      tpc_offset = tpc_id * 0x40000;
 +      tpc_eml_offset = tpc_id * 0x200000;
 +      tpc_eml_address = (mmTPC0_EML_CFG_BASE + tpc_eml_offset - CFG_BASE);
 +      tpc_slm_offset = tpc_eml_address + 0x100000;
 +
 +      /*
 +       * Workaround for Bug H2 #2443 :
 +       * "TPC SB is not initialized on chip reset"
 +       */
 +
 +      val = RREG32(mmTPC0_CFG_FUNC_MBIST_CNTRL + tpc_offset);
 +      if (val & TPC0_CFG_FUNC_MBIST_CNTRL_MBIST_ACTIVE_MASK)
 +              dev_warn(hdev->dev, "TPC%d MBIST ACTIVE is not cleared\n",
 +                      tpc_id);
 +
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_PAT + tpc_offset, val & 0xFFFFF000);
 +
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_0 + tpc_offset, 0x37FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_1 + tpc_offset, 0x303F);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_2 + tpc_offset, 0x71FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_3 + tpc_offset, 0x71FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_4 + tpc_offset, 0x70FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_5 + tpc_offset, 0x70FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_6 + tpc_offset, 0x70FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_7 + tpc_offset, 0x70FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_8 + tpc_offset, 0x70FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_9 + tpc_offset, 0x70FF);
 +
 +      WREG32_OR(mmTPC0_CFG_FUNC_MBIST_CNTRL + tpc_offset,
 +              1 << TPC0_CFG_FUNC_MBIST_CNTRL_MBIST_START_SHIFT);
 +
 +      err = hl_poll_timeout(
 +              hdev,
 +              mmTPC0_CFG_FUNC_MBIST_CNTRL + tpc_offset,
 +              val,
 +              (val & TPC0_CFG_FUNC_MBIST_CNTRL_MBIST_DONE_MASK),
 +              1000,
 +              HL_DEVICE_TIMEOUT_USEC);
 +
 +      if (err)
 +              dev_err(hdev->dev,
 +                      "Timeout while waiting for TPC%d MBIST DONE\n", tpc_id);
 +
 +      WREG32_OR(mmTPC0_EML_CFG_DBG_CNT + tpc_eml_offset,
 +              1 << TPC0_EML_CFG_DBG_CNT_CORE_RST_SHIFT);
 +
 +      msleep(GOYA_RESET_WAIT_MSEC);
 +
 +      WREG32_AND(mmTPC0_EML_CFG_DBG_CNT + tpc_eml_offset,
 +              ~(1 << TPC0_EML_CFG_DBG_CNT_CORE_RST_SHIFT));
 +
 +      msleep(GOYA_RESET_WAIT_MSEC);
 +
 +      for (slm_index = 0 ; slm_index < 256 ; slm_index++)
 +              WREG32(tpc_slm_offset + (slm_index << 2), 0);
 +
 +      val = RREG32(tpc_slm_offset);
 +}
 +
 +static void goya_tpc_mbist_workaround(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      int i;
 +
 +      if (hdev->pldm)
 +              return;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_TPC_MBIST)
 +              return;
 +
 +      /* Workaround for H2 #2443 */
 +
 +      for (i = 0 ; i < TPC_MAX_NUM ; i++)
 +              _goya_tpc_mbist_workaround(hdev, i);
 +
 +      goya->hw_cap_initialized |= HW_CAP_TPC_MBIST;
 +}
 +
 +/*
 + * goya_init_golden_registers - Initialize golden registers
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Initialize the H/W registers of the device
 + *
 + */
 +static void goya_init_golden_registers(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 polynom[10], tpc_intr_mask, offset;
 +      int i;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_GOLDEN)
 +              return;
 +
 +      polynom[0] = 0x00020080;
 +      polynom[1] = 0x00401000;
 +      polynom[2] = 0x00200800;
 +      polynom[3] = 0x00002000;
 +      polynom[4] = 0x00080200;
 +      polynom[5] = 0x00040100;
 +      polynom[6] = 0x00100400;
 +      polynom[7] = 0x00004000;
 +      polynom[8] = 0x00010000;
 +      polynom[9] = 0x00008000;
 +
 +      /* Mask all arithmetic interrupts from TPC */
 +      tpc_intr_mask = 0x7FFF;
 +
 +      for (i = 0, offset = 0 ; i < 6 ; i++, offset += 0x20000) {
 +              WREG32(mmSRAM_Y0_X0_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
 +              WREG32(mmSRAM_Y0_X1_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
 +              WREG32(mmSRAM_Y0_X2_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
 +              WREG32(mmSRAM_Y0_X3_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
 +              WREG32(mmSRAM_Y0_X4_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
 +
 +              WREG32(mmSRAM_Y0_X0_RTR_HBW_DATA_L_ARB + offset, 0x204);
 +              WREG32(mmSRAM_Y0_X1_RTR_HBW_DATA_L_ARB + offset, 0x204);
 +              WREG32(mmSRAM_Y0_X2_RTR_HBW_DATA_L_ARB + offset, 0x204);
 +              WREG32(mmSRAM_Y0_X3_RTR_HBW_DATA_L_ARB + offset, 0x204);
 +              WREG32(mmSRAM_Y0_X4_RTR_HBW_DATA_L_ARB + offset, 0x204);
 +
 +
 +              WREG32(mmSRAM_Y0_X0_RTR_HBW_DATA_E_ARB + offset, 0x206);
 +              WREG32(mmSRAM_Y0_X1_RTR_HBW_DATA_E_ARB + offset, 0x206);
 +              WREG32(mmSRAM_Y0_X2_RTR_HBW_DATA_E_ARB + offset, 0x206);
 +              WREG32(mmSRAM_Y0_X3_RTR_HBW_DATA_E_ARB + offset, 0x207);
 +              WREG32(mmSRAM_Y0_X4_RTR_HBW_DATA_E_ARB + offset, 0x207);
 +
 +              WREG32(mmSRAM_Y0_X0_RTR_HBW_DATA_W_ARB + offset, 0x207);
 +              WREG32(mmSRAM_Y0_X1_RTR_HBW_DATA_W_ARB + offset, 0x207);
 +              WREG32(mmSRAM_Y0_X2_RTR_HBW_DATA_W_ARB + offset, 0x206);
 +              WREG32(mmSRAM_Y0_X3_RTR_HBW_DATA_W_ARB + offset, 0x206);
 +              WREG32(mmSRAM_Y0_X4_RTR_HBW_DATA_W_ARB + offset, 0x206);
 +
 +              WREG32(mmSRAM_Y0_X0_RTR_HBW_WR_RS_E_ARB + offset, 0x101);
 +              WREG32(mmSRAM_Y0_X1_RTR_HBW_WR_RS_E_ARB + offset, 0x102);
 +              WREG32(mmSRAM_Y0_X2_RTR_HBW_WR_RS_E_ARB + offset, 0x103);
 +              WREG32(mmSRAM_Y0_X3_RTR_HBW_WR_RS_E_ARB + offset, 0x104);
 +              WREG32(mmSRAM_Y0_X4_RTR_HBW_WR_RS_E_ARB + offset, 0x105);
 +
 +              WREG32(mmSRAM_Y0_X0_RTR_HBW_WR_RS_W_ARB + offset, 0x105);
 +              WREG32(mmSRAM_Y0_X1_RTR_HBW_WR_RS_W_ARB + offset, 0x104);
 +              WREG32(mmSRAM_Y0_X2_RTR_HBW_WR_RS_W_ARB + offset, 0x103);
 +              WREG32(mmSRAM_Y0_X3_RTR_HBW_WR_RS_W_ARB + offset, 0x102);
 +              WREG32(mmSRAM_Y0_X4_RTR_HBW_WR_RS_W_ARB + offset, 0x101);
 +      }
 +
 +      WREG32(mmMME_STORE_MAX_CREDIT, 0x21);
 +      WREG32(mmMME_AGU, 0x0f0f0f10);
 +      WREG32(mmMME_SEI_MASK, ~0x0);
 +
 +      WREG32(mmMME6_RTR_HBW_RD_RQ_N_ARB, 0x01010101);
 +      WREG32(mmMME5_RTR_HBW_RD_RQ_N_ARB, 0x01040101);
 +      WREG32(mmMME4_RTR_HBW_RD_RQ_N_ARB, 0x01030101);
 +      WREG32(mmMME3_RTR_HBW_RD_RQ_N_ARB, 0x01020101);
 +      WREG32(mmMME2_RTR_HBW_RD_RQ_N_ARB, 0x01010101);
 +      WREG32(mmMME1_RTR_HBW_RD_RQ_N_ARB, 0x07010701);
 +      WREG32(mmMME6_RTR_HBW_RD_RQ_S_ARB, 0x04010401);
 +      WREG32(mmMME5_RTR_HBW_RD_RQ_S_ARB, 0x04050401);
 +      WREG32(mmMME4_RTR_HBW_RD_RQ_S_ARB, 0x03070301);
 +      WREG32(mmMME3_RTR_HBW_RD_RQ_S_ARB, 0x01030101);
 +      WREG32(mmMME2_RTR_HBW_RD_RQ_S_ARB, 0x01040101);
 +      WREG32(mmMME1_RTR_HBW_RD_RQ_S_ARB, 0x01050105);
 +      WREG32(mmMME6_RTR_HBW_RD_RQ_W_ARB, 0x01010501);
 +      WREG32(mmMME5_RTR_HBW_RD_RQ_W_ARB, 0x01010501);
 +      WREG32(mmMME4_RTR_HBW_RD_RQ_W_ARB, 0x01040301);
 +      WREG32(mmMME3_RTR_HBW_RD_RQ_W_ARB, 0x01030401);
 +      WREG32(mmMME2_RTR_HBW_RD_RQ_W_ARB, 0x01040101);
 +      WREG32(mmMME1_RTR_HBW_RD_RQ_W_ARB, 0x01050101);
 +      WREG32(mmMME6_RTR_HBW_WR_RQ_N_ARB, 0x02020202);
 +      WREG32(mmMME5_RTR_HBW_WR_RQ_N_ARB, 0x01070101);
 +      WREG32(mmMME4_RTR_HBW_WR_RQ_N_ARB, 0x02020201);
 +      WREG32(mmMME3_RTR_HBW_WR_RQ_N_ARB, 0x07020701);
 +      WREG32(mmMME2_RTR_HBW_WR_RQ_N_ARB, 0x01020101);
 +      WREG32(mmMME1_RTR_HBW_WR_RQ_S_ARB, 0x01010101);
 +      WREG32(mmMME6_RTR_HBW_WR_RQ_S_ARB, 0x01070101);
 +      WREG32(mmMME5_RTR_HBW_WR_RQ_S_ARB, 0x01070101);
 +      WREG32(mmMME4_RTR_HBW_WR_RQ_S_ARB, 0x07020701);
 +      WREG32(mmMME3_RTR_HBW_WR_RQ_S_ARB, 0x02020201);
 +      WREG32(mmMME2_RTR_HBW_WR_RQ_S_ARB, 0x01070101);
 +      WREG32(mmMME1_RTR_HBW_WR_RQ_S_ARB, 0x01020102);
 +      WREG32(mmMME6_RTR_HBW_WR_RQ_W_ARB, 0x01020701);
 +      WREG32(mmMME5_RTR_HBW_WR_RQ_W_ARB, 0x01020701);
 +      WREG32(mmMME4_RTR_HBW_WR_RQ_W_ARB, 0x07020707);
 +      WREG32(mmMME3_RTR_HBW_WR_RQ_W_ARB, 0x01020201);
 +      WREG32(mmMME2_RTR_HBW_WR_RQ_W_ARB, 0x01070201);
 +      WREG32(mmMME1_RTR_HBW_WR_RQ_W_ARB, 0x01070201);
 +      WREG32(mmMME6_RTR_HBW_RD_RS_N_ARB, 0x01070102);
 +      WREG32(mmMME5_RTR_HBW_RD_RS_N_ARB, 0x01070102);
 +      WREG32(mmMME4_RTR_HBW_RD_RS_N_ARB, 0x01060102);
 +      WREG32(mmMME3_RTR_HBW_RD_RS_N_ARB, 0x01040102);
 +      WREG32(mmMME2_RTR_HBW_RD_RS_N_ARB, 0x01020102);
 +      WREG32(mmMME1_RTR_HBW_RD_RS_N_ARB, 0x01020107);
 +      WREG32(mmMME6_RTR_HBW_RD_RS_S_ARB, 0x01020106);
 +      WREG32(mmMME5_RTR_HBW_RD_RS_S_ARB, 0x01020102);
 +      WREG32(mmMME4_RTR_HBW_RD_RS_S_ARB, 0x01040102);
 +      WREG32(mmMME3_RTR_HBW_RD_RS_S_ARB, 0x01060102);
 +      WREG32(mmMME2_RTR_HBW_RD_RS_S_ARB, 0x01070102);
 +      WREG32(mmMME1_RTR_HBW_RD_RS_S_ARB, 0x01070102);
 +      WREG32(mmMME6_RTR_HBW_RD_RS_E_ARB, 0x01020702);
 +      WREG32(mmMME5_RTR_HBW_RD_RS_E_ARB, 0x01020702);
 +      WREG32(mmMME4_RTR_HBW_RD_RS_E_ARB, 0x01040602);
 +      WREG32(mmMME3_RTR_HBW_RD_RS_E_ARB, 0x01060402);
 +      WREG32(mmMME2_RTR_HBW_RD_RS_E_ARB, 0x01070202);
 +      WREG32(mmMME1_RTR_HBW_RD_RS_E_ARB, 0x01070102);
 +      WREG32(mmMME6_RTR_HBW_RD_RS_W_ARB, 0x01060401);
 +      WREG32(mmMME5_RTR_HBW_RD_RS_W_ARB, 0x01060401);
 +      WREG32(mmMME4_RTR_HBW_RD_RS_W_ARB, 0x01060401);
 +      WREG32(mmMME3_RTR_HBW_RD_RS_W_ARB, 0x01060401);
 +      WREG32(mmMME2_RTR_HBW_RD_RS_W_ARB, 0x01060401);
 +      WREG32(mmMME1_RTR_HBW_RD_RS_W_ARB, 0x01060401);
 +      WREG32(mmMME6_RTR_HBW_WR_RS_N_ARB, 0x01050101);
 +      WREG32(mmMME5_RTR_HBW_WR_RS_N_ARB, 0x01040101);
 +      WREG32(mmMME4_RTR_HBW_WR_RS_N_ARB, 0x01030101);
 +      WREG32(mmMME3_RTR_HBW_WR_RS_N_ARB, 0x01020101);
 +      WREG32(mmMME2_RTR_HBW_WR_RS_N_ARB, 0x01010101);
 +      WREG32(mmMME1_RTR_HBW_WR_RS_N_ARB, 0x01010107);
 +      WREG32(mmMME6_RTR_HBW_WR_RS_S_ARB, 0x01010107);
 +      WREG32(mmMME5_RTR_HBW_WR_RS_S_ARB, 0x01010101);
 +      WREG32(mmMME4_RTR_HBW_WR_RS_S_ARB, 0x01020101);
 +      WREG32(mmMME3_RTR_HBW_WR_RS_S_ARB, 0x01030101);
 +      WREG32(mmMME2_RTR_HBW_WR_RS_S_ARB, 0x01040101);
 +      WREG32(mmMME1_RTR_HBW_WR_RS_S_ARB, 0x01050101);
 +      WREG32(mmMME6_RTR_HBW_WR_RS_E_ARB, 0x01010501);
 +      WREG32(mmMME5_RTR_HBW_WR_RS_E_ARB, 0x01010501);
 +      WREG32(mmMME4_RTR_HBW_WR_RS_E_ARB, 0x01040301);
 +      WREG32(mmMME3_RTR_HBW_WR_RS_E_ARB, 0x01030401);
 +      WREG32(mmMME2_RTR_HBW_WR_RS_E_ARB, 0x01040101);
 +      WREG32(mmMME1_RTR_HBW_WR_RS_E_ARB, 0x01050101);
 +      WREG32(mmMME6_RTR_HBW_WR_RS_W_ARB, 0x01010101);
 +      WREG32(mmMME5_RTR_HBW_WR_RS_W_ARB, 0x01010101);
 +      WREG32(mmMME4_RTR_HBW_WR_RS_W_ARB, 0x01010101);
 +      WREG32(mmMME3_RTR_HBW_WR_RS_W_ARB, 0x01010101);
 +      WREG32(mmMME2_RTR_HBW_WR_RS_W_ARB, 0x01010101);
 +      WREG32(mmMME1_RTR_HBW_WR_RS_W_ARB, 0x01010101);
 +
 +      WREG32(mmTPC1_RTR_HBW_RD_RQ_N_ARB, 0x01010101);
 +      WREG32(mmTPC1_RTR_HBW_RD_RQ_S_ARB, 0x01010101);
 +      WREG32(mmTPC1_RTR_HBW_RD_RQ_E_ARB, 0x01060101);
 +      WREG32(mmTPC1_RTR_HBW_WR_RQ_N_ARB, 0x02020102);
 +      WREG32(mmTPC1_RTR_HBW_WR_RQ_S_ARB, 0x01010101);
 +      WREG32(mmTPC1_RTR_HBW_WR_RQ_E_ARB, 0x02070202);
 +      WREG32(mmTPC1_RTR_HBW_RD_RS_N_ARB, 0x01020201);
 +      WREG32(mmTPC1_RTR_HBW_RD_RS_S_ARB, 0x01070201);
 +      WREG32(mmTPC1_RTR_HBW_RD_RS_W_ARB, 0x01070202);
 +      WREG32(mmTPC1_RTR_HBW_WR_RS_N_ARB, 0x01010101);
 +      WREG32(mmTPC1_RTR_HBW_WR_RS_S_ARB, 0x01050101);
 +      WREG32(mmTPC1_RTR_HBW_WR_RS_W_ARB, 0x01050101);
 +
 +      WREG32(mmTPC2_RTR_HBW_RD_RQ_N_ARB, 0x01020101);
 +      WREG32(mmTPC2_RTR_HBW_RD_RQ_S_ARB, 0x01050101);
 +      WREG32(mmTPC2_RTR_HBW_RD_RQ_E_ARB, 0x01010201);
 +      WREG32(mmTPC2_RTR_HBW_WR_RQ_N_ARB, 0x02040102);
 +      WREG32(mmTPC2_RTR_HBW_WR_RQ_S_ARB, 0x01050101);
 +      WREG32(mmTPC2_RTR_HBW_WR_RQ_E_ARB, 0x02060202);
 +      WREG32(mmTPC2_RTR_HBW_RD_RS_N_ARB, 0x01020201);
 +      WREG32(mmTPC2_RTR_HBW_RD_RS_S_ARB, 0x01070201);
 +      WREG32(mmTPC2_RTR_HBW_RD_RS_W_ARB, 0x01070202);
 +      WREG32(mmTPC2_RTR_HBW_WR_RS_N_ARB, 0x01010101);
 +      WREG32(mmTPC2_RTR_HBW_WR_RS_S_ARB, 0x01040101);
 +      WREG32(mmTPC2_RTR_HBW_WR_RS_W_ARB, 0x01040101);
 +
 +      WREG32(mmTPC3_RTR_HBW_RD_RQ_N_ARB, 0x01030101);
 +      WREG32(mmTPC3_RTR_HBW_RD_RQ_S_ARB, 0x01040101);
 +      WREG32(mmTPC3_RTR_HBW_RD_RQ_E_ARB, 0x01040301);
 +      WREG32(mmTPC3_RTR_HBW_WR_RQ_N_ARB, 0x02060102);
 +      WREG32(mmTPC3_RTR_HBW_WR_RQ_S_ARB, 0x01040101);
 +      WREG32(mmTPC3_RTR_HBW_WR_RQ_E_ARB, 0x01040301);
 +      WREG32(mmTPC3_RTR_HBW_RD_RS_N_ARB, 0x01040201);
 +      WREG32(mmTPC3_RTR_HBW_RD_RS_S_ARB, 0x01060201);
 +      WREG32(mmTPC3_RTR_HBW_RD_RS_W_ARB, 0x01060402);
 +      WREG32(mmTPC3_RTR_HBW_WR_RS_N_ARB, 0x01020101);
 +      WREG32(mmTPC3_RTR_HBW_WR_RS_S_ARB, 0x01030101);
 +      WREG32(mmTPC3_RTR_HBW_WR_RS_W_ARB, 0x01030401);
 +
 +      WREG32(mmTPC4_RTR_HBW_RD_RQ_N_ARB, 0x01040101);
 +      WREG32(mmTPC4_RTR_HBW_RD_RQ_S_ARB, 0x01030101);
 +      WREG32(mmTPC4_RTR_HBW_RD_RQ_E_ARB, 0x01030401);
 +      WREG32(mmTPC4_RTR_HBW_WR_RQ_N_ARB, 0x02070102);
 +      WREG32(mmTPC4_RTR_HBW_WR_RQ_S_ARB, 0x01030101);
 +      WREG32(mmTPC4_RTR_HBW_WR_RQ_E_ARB, 0x02060702);
 +      WREG32(mmTPC4_RTR_HBW_RD_RS_N_ARB, 0x01060201);
 +      WREG32(mmTPC4_RTR_HBW_RD_RS_S_ARB, 0x01040201);
 +      WREG32(mmTPC4_RTR_HBW_RD_RS_W_ARB, 0x01040602);
 +      WREG32(mmTPC4_RTR_HBW_WR_RS_N_ARB, 0x01030101);
 +      WREG32(mmTPC4_RTR_HBW_WR_RS_S_ARB, 0x01020101);
 +      WREG32(mmTPC4_RTR_HBW_WR_RS_W_ARB, 0x01040301);
 +
 +      WREG32(mmTPC5_RTR_HBW_RD_RQ_N_ARB, 0x01050101);
 +      WREG32(mmTPC5_RTR_HBW_RD_RQ_S_ARB, 0x01020101);
 +      WREG32(mmTPC5_RTR_HBW_RD_RQ_E_ARB, 0x01200501);
 +      WREG32(mmTPC5_RTR_HBW_WR_RQ_N_ARB, 0x02070102);
 +      WREG32(mmTPC5_RTR_HBW_WR_RQ_S_ARB, 0x01020101);
 +      WREG32(mmTPC5_RTR_HBW_WR_RQ_E_ARB, 0x02020602);
 +      WREG32(mmTPC5_RTR_HBW_RD_RS_N_ARB, 0x01070201);
 +      WREG32(mmTPC5_RTR_HBW_RD_RS_S_ARB, 0x01020201);
 +      WREG32(mmTPC5_RTR_HBW_RD_RS_W_ARB, 0x01020702);
 +      WREG32(mmTPC5_RTR_HBW_WR_RS_N_ARB, 0x01040101);
 +      WREG32(mmTPC5_RTR_HBW_WR_RS_S_ARB, 0x01010101);
 +      WREG32(mmTPC5_RTR_HBW_WR_RS_W_ARB, 0x01010501);
 +
 +      WREG32(mmTPC6_RTR_HBW_RD_RQ_N_ARB, 0x01010101);
 +      WREG32(mmTPC6_RTR_HBW_RD_RQ_S_ARB, 0x01010101);
 +      WREG32(mmTPC6_RTR_HBW_RD_RQ_E_ARB, 0x01010601);
 +      WREG32(mmTPC6_RTR_HBW_WR_RQ_N_ARB, 0x01010101);
 +      WREG32(mmTPC6_RTR_HBW_WR_RQ_S_ARB, 0x01010101);
 +      WREG32(mmTPC6_RTR_HBW_WR_RQ_E_ARB, 0x02020702);
 +      WREG32(mmTPC6_RTR_HBW_RD_RS_N_ARB, 0x01010101);
 +      WREG32(mmTPC6_RTR_HBW_RD_RS_S_ARB, 0x01010101);
 +      WREG32(mmTPC6_RTR_HBW_RD_RS_W_ARB, 0x01020702);
 +      WREG32(mmTPC6_RTR_HBW_WR_RS_N_ARB, 0x01050101);
 +      WREG32(mmTPC6_RTR_HBW_WR_RS_S_ARB, 0x01010101);
 +      WREG32(mmTPC6_RTR_HBW_WR_RS_W_ARB, 0x01010501);
 +
 +      for (i = 0, offset = 0 ; i < 10 ; i++, offset += 4) {
 +              WREG32(mmMME1_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmMME2_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmMME3_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmMME4_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmMME5_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmMME6_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +
 +              WREG32(mmTPC0_NRTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmTPC1_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmTPC2_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmTPC3_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmTPC4_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmTPC5_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmTPC6_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmTPC7_NRTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +
 +              WREG32(mmPCI_NRTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmDMA_NRTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +      }
 +
 +      for (i = 0, offset = 0 ; i < 6 ; i++, offset += 0x40000) {
 +              WREG32(mmMME1_RTR_SCRAMB_EN + offset,
 +                              1 << MME1_RTR_SCRAMB_EN_VAL_SHIFT);
 +              WREG32(mmMME1_RTR_NON_LIN_SCRAMB + offset,
 +                              1 << MME1_RTR_NON_LIN_SCRAMB_EN_SHIFT);
 +      }
 +
 +      for (i = 0, offset = 0 ; i < 8 ; i++, offset += 0x40000) {
 +              /*
 +               * Workaround for Bug H2 #2441 :
 +               * "ST.NOP set trace event illegal opcode"
 +               */
 +              WREG32(mmTPC0_CFG_TPC_INTR_MASK + offset, tpc_intr_mask);
 +
 +              WREG32(mmTPC0_NRTR_SCRAMB_EN + offset,
 +                              1 << TPC0_NRTR_SCRAMB_EN_VAL_SHIFT);
 +              WREG32(mmTPC0_NRTR_NON_LIN_SCRAMB + offset,
 +                              1 << TPC0_NRTR_NON_LIN_SCRAMB_EN_SHIFT);
 +
 +              WREG32_FIELD(TPC0_CFG_MSS_CONFIG, offset,
 +                              ICACHE_FETCH_LINE_NUM, 2);
 +      }
 +
 +      WREG32(mmDMA_NRTR_SCRAMB_EN, 1 << DMA_NRTR_SCRAMB_EN_VAL_SHIFT);
 +      WREG32(mmDMA_NRTR_NON_LIN_SCRAMB,
 +                      1 << DMA_NRTR_NON_LIN_SCRAMB_EN_SHIFT);
 +
 +      WREG32(mmPCI_NRTR_SCRAMB_EN, 1 << PCI_NRTR_SCRAMB_EN_VAL_SHIFT);
 +      WREG32(mmPCI_NRTR_NON_LIN_SCRAMB,
 +                      1 << PCI_NRTR_NON_LIN_SCRAMB_EN_SHIFT);
 +
 +      /*
 +       * Workaround for H2 #HW-23 bug
 +       * Set DMA max outstanding read requests to 240 on DMA CH 1.
 +       * This limitation is still large enough to not affect Gen4 bandwidth.
 +       * We need to only limit that DMA channel because the user can only read
 +       * from Host using DMA CH 1
 +       */
 +      WREG32(mmDMA_CH_1_CFG0, 0x0fff00F0);
 +
 +      WREG32(mmTPC_PLL_CLK_RLX_0, 0x200020);
 +
 +      goya->hw_cap_initialized |= HW_CAP_GOLDEN;
 +}
 +
 +static void goya_init_mme_qman(struct hl_device *hdev)
 +{
 +      u32 mtr_base_lo, mtr_base_hi;
 +      u32 so_base_lo, so_base_hi;
 +      u32 gic_base_lo, gic_base_hi;
 +      u64 qman_base_addr;
 +
 +      mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +
 +      gic_base_lo =
 +              lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +      gic_base_hi =
 +              upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +
 +      qman_base_addr = hdev->asic_prop.sram_base_address +
 +                              MME_QMAN_BASE_OFFSET;
 +
 +      WREG32(mmMME_QM_PQ_BASE_LO, lower_32_bits(qman_base_addr));
 +      WREG32(mmMME_QM_PQ_BASE_HI, upper_32_bits(qman_base_addr));
 +      WREG32(mmMME_QM_PQ_SIZE, ilog2(MME_QMAN_LENGTH));
 +      WREG32(mmMME_QM_PQ_PI, 0);
 +      WREG32(mmMME_QM_PQ_CI, 0);
 +      WREG32(mmMME_QM_CP_LDMA_SRC_BASE_LO_OFFSET, 0x10C0);
 +      WREG32(mmMME_QM_CP_LDMA_SRC_BASE_HI_OFFSET, 0x10C4);
 +      WREG32(mmMME_QM_CP_LDMA_TSIZE_OFFSET, 0x10C8);
 +      WREG32(mmMME_QM_CP_LDMA_COMMIT_OFFSET, 0x10CC);
 +
 +      WREG32(mmMME_QM_CP_MSG_BASE0_ADDR_LO, mtr_base_lo);
 +      WREG32(mmMME_QM_CP_MSG_BASE0_ADDR_HI, mtr_base_hi);
 +      WREG32(mmMME_QM_CP_MSG_BASE1_ADDR_LO, so_base_lo);
 +      WREG32(mmMME_QM_CP_MSG_BASE1_ADDR_HI, so_base_hi);
 +
 +      /* QMAN CQ has 8 cache lines */
 +      WREG32(mmMME_QM_CQ_CFG1, 0x00080008);
 +
 +      WREG32(mmMME_QM_GLBL_ERR_ADDR_LO, gic_base_lo);
 +      WREG32(mmMME_QM_GLBL_ERR_ADDR_HI, gic_base_hi);
 +
 +      WREG32(mmMME_QM_GLBL_ERR_WDATA, GOYA_ASYNC_EVENT_ID_MME_QM);
 +
 +      WREG32(mmMME_QM_GLBL_ERR_CFG, QMAN_MME_ERR_MSG_EN);
 +
 +      WREG32(mmMME_QM_GLBL_PROT, QMAN_MME_ERR_PROT);
 +
 +      WREG32(mmMME_QM_GLBL_CFG0, QMAN_MME_ENABLE);
 +}
 +
 +static void goya_init_mme_cmdq(struct hl_device *hdev)
 +{
 +      u32 mtr_base_lo, mtr_base_hi;
 +      u32 so_base_lo, so_base_hi;
 +      u32 gic_base_lo, gic_base_hi;
 +
 +      mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +
 +      gic_base_lo =
 +              lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +      gic_base_hi =
 +              upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +
 +      WREG32(mmMME_CMDQ_CP_MSG_BASE0_ADDR_LO, mtr_base_lo);
 +      WREG32(mmMME_CMDQ_CP_MSG_BASE0_ADDR_HI, mtr_base_hi);
 +      WREG32(mmMME_CMDQ_CP_MSG_BASE1_ADDR_LO, so_base_lo);
 +      WREG32(mmMME_CMDQ_CP_MSG_BASE1_ADDR_HI, so_base_hi);
 +
 +      /* CMDQ CQ has 20 cache lines */
 +      WREG32(mmMME_CMDQ_CQ_CFG1, 0x00140014);
 +
 +      WREG32(mmMME_CMDQ_GLBL_ERR_ADDR_LO, gic_base_lo);
 +      WREG32(mmMME_CMDQ_GLBL_ERR_ADDR_HI, gic_base_hi);
 +
 +      WREG32(mmMME_CMDQ_GLBL_ERR_WDATA, GOYA_ASYNC_EVENT_ID_MME_CMDQ);
 +
 +      WREG32(mmMME_CMDQ_GLBL_ERR_CFG, CMDQ_MME_ERR_MSG_EN);
 +
 +      WREG32(mmMME_CMDQ_GLBL_PROT, CMDQ_MME_ERR_PROT);
 +
 +      WREG32(mmMME_CMDQ_GLBL_CFG0, CMDQ_MME_ENABLE);
 +}
 +
 +void goya_init_mme_qmans(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 so_base_lo, so_base_hi;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_MME)
 +              return;
 +
 +      so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +
 +      WREG32(mmMME_SM_BASE_ADDRESS_LOW, so_base_lo);
 +      WREG32(mmMME_SM_BASE_ADDRESS_HIGH, so_base_hi);
 +
 +      goya_init_mme_qman(hdev);
 +      goya_init_mme_cmdq(hdev);
 +
 +      goya->hw_cap_initialized |= HW_CAP_MME;
 +}
 +
 +static void goya_init_tpc_qman(struct hl_device *hdev, u32 base_off, int tpc_id)
 +{
 +      u32 mtr_base_lo, mtr_base_hi;
 +      u32 so_base_lo, so_base_hi;
 +      u32 gic_base_lo, gic_base_hi;
 +      u64 qman_base_addr;
 +      u32 reg_off = tpc_id * (mmTPC1_QM_PQ_PI - mmTPC0_QM_PQ_PI);
 +
 +      mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +
 +      gic_base_lo =
 +              lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +      gic_base_hi =
 +              upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +
 +      qman_base_addr = hdev->asic_prop.sram_base_address + base_off;
 +
 +      WREG32(mmTPC0_QM_PQ_BASE_LO + reg_off, lower_32_bits(qman_base_addr));
 +      WREG32(mmTPC0_QM_PQ_BASE_HI + reg_off, upper_32_bits(qman_base_addr));
 +      WREG32(mmTPC0_QM_PQ_SIZE + reg_off, ilog2(TPC_QMAN_LENGTH));
 +      WREG32(mmTPC0_QM_PQ_PI + reg_off, 0);
 +      WREG32(mmTPC0_QM_PQ_CI + reg_off, 0);
 +      WREG32(mmTPC0_QM_CP_LDMA_SRC_BASE_LO_OFFSET + reg_off, 0x10C0);
 +      WREG32(mmTPC0_QM_CP_LDMA_SRC_BASE_HI_OFFSET + reg_off, 0x10C4);
 +      WREG32(mmTPC0_QM_CP_LDMA_TSIZE_OFFSET + reg_off, 0x10C8);
 +      WREG32(mmTPC0_QM_CP_LDMA_COMMIT_OFFSET + reg_off, 0x10CC);
 +
 +      WREG32(mmTPC0_QM_CP_MSG_BASE0_ADDR_LO + reg_off, mtr_base_lo);
 +      WREG32(mmTPC0_QM_CP_MSG_BASE0_ADDR_HI + reg_off, mtr_base_hi);
 +      WREG32(mmTPC0_QM_CP_MSG_BASE1_ADDR_LO + reg_off, so_base_lo);
 +      WREG32(mmTPC0_QM_CP_MSG_BASE1_ADDR_HI + reg_off, so_base_hi);
 +
 +      WREG32(mmTPC0_QM_CQ_CFG1 + reg_off, 0x00080008);
 +
 +      WREG32(mmTPC0_QM_GLBL_ERR_ADDR_LO + reg_off, gic_base_lo);
 +      WREG32(mmTPC0_QM_GLBL_ERR_ADDR_HI + reg_off, gic_base_hi);
 +
 +      WREG32(mmTPC0_QM_GLBL_ERR_WDATA + reg_off,
 +                      GOYA_ASYNC_EVENT_ID_TPC0_QM + tpc_id);
 +
 +      WREG32(mmTPC0_QM_GLBL_ERR_CFG + reg_off, QMAN_TPC_ERR_MSG_EN);
 +
 +      WREG32(mmTPC0_QM_GLBL_PROT + reg_off, QMAN_TPC_ERR_PROT);
 +
 +      WREG32(mmTPC0_QM_GLBL_CFG0 + reg_off, QMAN_TPC_ENABLE);
 +}
 +
 +static void goya_init_tpc_cmdq(struct hl_device *hdev, int tpc_id)
 +{
 +      u32 mtr_base_lo, mtr_base_hi;
 +      u32 so_base_lo, so_base_hi;
 +      u32 gic_base_lo, gic_base_hi;
 +      u32 reg_off = tpc_id * (mmTPC1_CMDQ_CQ_CFG1 - mmTPC0_CMDQ_CQ_CFG1);
 +
 +      mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +
 +      gic_base_lo =
 +              lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +      gic_base_hi =
 +              upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +
 +      WREG32(mmTPC0_CMDQ_CP_MSG_BASE0_ADDR_LO + reg_off, mtr_base_lo);
 +      WREG32(mmTPC0_CMDQ_CP_MSG_BASE0_ADDR_HI + reg_off, mtr_base_hi);
 +      WREG32(mmTPC0_CMDQ_CP_MSG_BASE1_ADDR_LO + reg_off, so_base_lo);
 +      WREG32(mmTPC0_CMDQ_CP_MSG_BASE1_ADDR_HI + reg_off, so_base_hi);
 +
 +      WREG32(mmTPC0_CMDQ_CQ_CFG1 + reg_off, 0x00140014);
 +
 +      WREG32(mmTPC0_CMDQ_GLBL_ERR_ADDR_LO + reg_off, gic_base_lo);
 +      WREG32(mmTPC0_CMDQ_GLBL_ERR_ADDR_HI + reg_off, gic_base_hi);
 +
 +      WREG32(mmTPC0_CMDQ_GLBL_ERR_WDATA + reg_off,
 +                      GOYA_ASYNC_EVENT_ID_TPC0_CMDQ + tpc_id);
 +
 +      WREG32(mmTPC0_CMDQ_GLBL_ERR_CFG + reg_off, CMDQ_TPC_ERR_MSG_EN);
 +
 +      WREG32(mmTPC0_CMDQ_GLBL_PROT + reg_off, CMDQ_TPC_ERR_PROT);
 +
 +      WREG32(mmTPC0_CMDQ_GLBL_CFG0 + reg_off, CMDQ_TPC_ENABLE);
 +}
 +
 +void goya_init_tpc_qmans(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 so_base_lo, so_base_hi;
 +      u32 cfg_off = mmTPC1_CFG_SM_BASE_ADDRESS_LOW -
 +                      mmTPC0_CFG_SM_BASE_ADDRESS_LOW;
 +      int i;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_TPC)
 +              return;
 +
 +      so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +
 +      for (i = 0 ; i < TPC_MAX_NUM ; i++) {
 +              WREG32(mmTPC0_CFG_SM_BASE_ADDRESS_LOW + i * cfg_off,
 +                              so_base_lo);
 +              WREG32(mmTPC0_CFG_SM_BASE_ADDRESS_HIGH + i * cfg_off,
 +                              so_base_hi);
 +      }
 +
 +      goya_init_tpc_qman(hdev, TPC0_QMAN_BASE_OFFSET, 0);
 +      goya_init_tpc_qman(hdev, TPC1_QMAN_BASE_OFFSET, 1);
 +      goya_init_tpc_qman(hdev, TPC2_QMAN_BASE_OFFSET, 2);
 +      goya_init_tpc_qman(hdev, TPC3_QMAN_BASE_OFFSET, 3);
 +      goya_init_tpc_qman(hdev, TPC4_QMAN_BASE_OFFSET, 4);
 +      goya_init_tpc_qman(hdev, TPC5_QMAN_BASE_OFFSET, 5);
 +      goya_init_tpc_qman(hdev, TPC6_QMAN_BASE_OFFSET, 6);
 +      goya_init_tpc_qman(hdev, TPC7_QMAN_BASE_OFFSET, 7);
 +
 +      for (i = 0 ; i < TPC_MAX_NUM ; i++)
 +              goya_init_tpc_cmdq(hdev, i);
 +
 +      goya->hw_cap_initialized |= HW_CAP_TPC;
 +}
 +
 +/*
 + * goya_disable_internal_queues - Disable internal queues
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + */
 +static void goya_disable_internal_queues(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MME))
 +              goto disable_tpc;
 +
 +      WREG32(mmMME_QM_GLBL_CFG0, 0);
 +      WREG32(mmMME_CMDQ_GLBL_CFG0, 0);
 +
 +disable_tpc:
 +      if (!(goya->hw_cap_initialized & HW_CAP_TPC))
 +              return;
 +
 +      WREG32(mmTPC0_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC0_CMDQ_GLBL_CFG0, 0);
 +
 +      WREG32(mmTPC1_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC1_CMDQ_GLBL_CFG0, 0);
 +
 +      WREG32(mmTPC2_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC2_CMDQ_GLBL_CFG0, 0);
 +
 +      WREG32(mmTPC3_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC3_CMDQ_GLBL_CFG0, 0);
 +
 +      WREG32(mmTPC4_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC4_CMDQ_GLBL_CFG0, 0);
 +
 +      WREG32(mmTPC5_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC5_CMDQ_GLBL_CFG0, 0);
 +
 +      WREG32(mmTPC6_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC6_CMDQ_GLBL_CFG0, 0);
 +
 +      WREG32(mmTPC7_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC7_CMDQ_GLBL_CFG0, 0);
 +}
 +
 +/*
 + * goya_stop_internal_queues - Stop internal queues
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Returns 0 on success
 + *
 + */
 +static int goya_stop_internal_queues(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      int rc, retval = 0;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MME))
 +              goto stop_tpc;
 +
 +      /*
 +       * Each queue (QMAN) is a separate H/W logic. That means that each
 +       * QMAN can be stopped independently and failure to stop one does NOT
 +       * mandate we should not try to stop other QMANs
 +       */
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmMME_QM_GLBL_CFG1,
 +                      mmMME_QM_CP_STS,
 +                      mmMME_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop MME QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmMME_CMDQ_GLBL_CFG1,
 +                      mmMME_CMDQ_CP_STS,
 +                      mmMME_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop MME CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +stop_tpc:
 +      if (!(goya->hw_cap_initialized & HW_CAP_TPC))
 +              return retval;
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC0_QM_GLBL_CFG1,
 +                      mmTPC0_QM_CP_STS,
 +                      mmTPC0_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 0 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC0_CMDQ_GLBL_CFG1,
 +                      mmTPC0_CMDQ_CP_STS,
 +                      mmTPC0_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 0 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC1_QM_GLBL_CFG1,
 +                      mmTPC1_QM_CP_STS,
 +                      mmTPC1_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 1 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC1_CMDQ_GLBL_CFG1,
 +                      mmTPC1_CMDQ_CP_STS,
 +                      mmTPC1_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 1 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC2_QM_GLBL_CFG1,
 +                      mmTPC2_QM_CP_STS,
 +                      mmTPC2_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 2 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC2_CMDQ_GLBL_CFG1,
 +                      mmTPC2_CMDQ_CP_STS,
 +                      mmTPC2_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 2 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC3_QM_GLBL_CFG1,
 +                      mmTPC3_QM_CP_STS,
 +                      mmTPC3_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 3 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC3_CMDQ_GLBL_CFG1,
 +                      mmTPC3_CMDQ_CP_STS,
 +                      mmTPC3_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 3 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC4_QM_GLBL_CFG1,
 +                      mmTPC4_QM_CP_STS,
 +                      mmTPC4_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 4 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC4_CMDQ_GLBL_CFG1,
 +                      mmTPC4_CMDQ_CP_STS,
 +                      mmTPC4_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 4 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC5_QM_GLBL_CFG1,
 +                      mmTPC5_QM_CP_STS,
 +                      mmTPC5_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 5 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC5_CMDQ_GLBL_CFG1,
 +                      mmTPC5_CMDQ_CP_STS,
 +                      mmTPC5_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 5 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC6_QM_GLBL_CFG1,
 +                      mmTPC6_QM_CP_STS,
 +                      mmTPC6_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 6 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC6_CMDQ_GLBL_CFG1,
 +                      mmTPC6_CMDQ_CP_STS,
 +                      mmTPC6_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 6 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC7_QM_GLBL_CFG1,
 +                      mmTPC7_QM_CP_STS,
 +                      mmTPC7_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 7 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC7_CMDQ_GLBL_CFG1,
 +                      mmTPC7_CMDQ_CP_STS,
 +                      mmTPC7_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 7 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      return retval;
 +}
 +
 +static void goya_dma_stall(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_DMA))
 +              return;
 +
 +      WREG32(mmDMA_QM_0_GLBL_CFG1, 1 << DMA_QM_0_GLBL_CFG1_DMA_STOP_SHIFT);
 +      WREG32(mmDMA_QM_1_GLBL_CFG1, 1 << DMA_QM_1_GLBL_CFG1_DMA_STOP_SHIFT);
 +      WREG32(mmDMA_QM_2_GLBL_CFG1, 1 << DMA_QM_2_GLBL_CFG1_DMA_STOP_SHIFT);
 +      WREG32(mmDMA_QM_3_GLBL_CFG1, 1 << DMA_QM_3_GLBL_CFG1_DMA_STOP_SHIFT);
 +      WREG32(mmDMA_QM_4_GLBL_CFG1, 1 << DMA_QM_4_GLBL_CFG1_DMA_STOP_SHIFT);
 +}
 +
 +static void goya_tpc_stall(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_TPC))
 +              return;
 +
 +      WREG32(mmTPC0_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC1_CFG_TPC_STALL, 1 << TPC1_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC2_CFG_TPC_STALL, 1 << TPC2_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC3_CFG_TPC_STALL, 1 << TPC3_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC4_CFG_TPC_STALL, 1 << TPC4_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC5_CFG_TPC_STALL, 1 << TPC5_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC6_CFG_TPC_STALL, 1 << TPC6_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC7_CFG_TPC_STALL, 1 << TPC7_CFG_TPC_STALL_V_SHIFT);
 +}
 +
 +static void goya_mme_stall(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MME))
 +              return;
 +
 +      WREG32(mmMME_STALL, 0xFFFFFFFF);
 +}
 +
 +static int goya_enable_msix(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      int cq_cnt = hdev->asic_prop.completion_queues_count;
 +      int rc, i, irq_cnt_init, irq;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_MSIX)
 +              return 0;
 +
 +      rc = pci_alloc_irq_vectors(hdev->pdev, GOYA_MSIX_ENTRIES,
 +                              GOYA_MSIX_ENTRIES, PCI_IRQ_MSIX);
 +      if (rc < 0) {
 +              dev_err(hdev->dev,
 +                      "MSI-X: Failed to enable support -- %d/%d\n",
 +                      GOYA_MSIX_ENTRIES, rc);
 +              return rc;
 +      }
 +
 +      for (i = 0, irq_cnt_init = 0 ; i < cq_cnt ; i++, irq_cnt_init++) {
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              rc = request_irq(irq, hl_irq_handler_cq, 0, goya_irq_name[i],
 +                              &hdev->completion_queue[i]);
 +              if (rc) {
 +                      dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +                      goto free_irqs;
 +              }
 +      }
 +
 +      irq = pci_irq_vector(hdev->pdev, GOYA_EVENT_QUEUE_MSIX_IDX);
 +
 +      rc = request_irq(irq, hl_irq_handler_eq, 0,
 +                      goya_irq_name[GOYA_EVENT_QUEUE_MSIX_IDX],
 +                      &hdev->event_queue);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +              goto free_irqs;
 +      }
 +
 +      goya->hw_cap_initialized |= HW_CAP_MSIX;
 +      return 0;
 +
 +free_irqs:
 +      for (i = 0 ; i < irq_cnt_init ; i++)
 +              free_irq(pci_irq_vector(hdev->pdev, i),
 +                      &hdev->completion_queue[i]);
 +
 +      pci_free_irq_vectors(hdev->pdev);
 +      return rc;
 +}
 +
 +static void goya_sync_irqs(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      int i;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MSIX))
 +              return;
 +
 +      /* Wait for all pending IRQs to be finished */
 +      for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++)
 +              synchronize_irq(pci_irq_vector(hdev->pdev, i));
 +
 +      synchronize_irq(pci_irq_vector(hdev->pdev, GOYA_EVENT_QUEUE_MSIX_IDX));
 +}
 +
 +static void goya_disable_msix(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      int i, irq;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MSIX))
 +              return;
 +
 +      goya_sync_irqs(hdev);
 +
 +      irq = pci_irq_vector(hdev->pdev, GOYA_EVENT_QUEUE_MSIX_IDX);
 +      free_irq(irq, &hdev->event_queue);
 +
 +      for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++) {
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              free_irq(irq, &hdev->completion_queue[i]);
 +      }
 +
 +      pci_free_irq_vectors(hdev->pdev);
 +
 +      goya->hw_cap_initialized &= ~HW_CAP_MSIX;
 +}
 +
 +static void goya_enable_timestamp(struct hl_device *hdev)
 +{
 +      /* Disable the timestamp counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 0);
 +
 +      /* Zero the lower/upper parts of the 64-bit counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE + 0xC, 0);
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE + 0x8, 0);
 +
 +      /* Enable the counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 1);
 +}
 +
 +static void goya_disable_timestamp(struct hl_device *hdev)
 +{
 +      /* Disable the timestamp counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 0);
 +}
 +
 +static void goya_halt_engines(struct hl_device *hdev, bool hard_reset, bool fw_reset)
 +{
 +      u32 wait_timeout_ms;
 +
 +      if (hdev->pldm)
 +              wait_timeout_ms = GOYA_PLDM_RESET_WAIT_MSEC;
 +      else
 +              wait_timeout_ms = GOYA_RESET_WAIT_MSEC;
 +
 +      goya_stop_external_queues(hdev);
 +      goya_stop_internal_queues(hdev);
 +
 +      msleep(wait_timeout_ms);
 +
 +      goya_dma_stall(hdev);
 +      goya_tpc_stall(hdev);
 +      goya_mme_stall(hdev);
 +
 +      msleep(wait_timeout_ms);
 +
 +      goya_disable_external_queues(hdev);
 +      goya_disable_internal_queues(hdev);
 +
 +      goya_disable_timestamp(hdev);
 +
 +      if (hard_reset) {
 +              goya_disable_msix(hdev);
 +              goya_mmu_remove_device_cpu_mappings(hdev);
 +      } else {
 +              goya_sync_irqs(hdev);
 +      }
 +}
 +
 +/*
 + * goya_load_firmware_to_device() - Load LINUX FW code to device.
 + * @hdev: Pointer to hl_device structure.
 + *
 + * Copy LINUX fw code from firmware file to HBM BAR.
 + *
 + * Return: 0 on success, non-zero for failure.
 + */
 +static int goya_load_firmware_to_device(struct hl_device *hdev)
 +{
 +      void __iomem *dst;
 +
 +      dst = hdev->pcie_bar[DDR_BAR_ID] + LINUX_FW_OFFSET;
 +
 +      return hl_fw_load_fw_to_device(hdev, GOYA_LINUX_FW_FILE, dst, 0, 0);
 +}
 +
 +/*
 + * goya_load_boot_fit_to_device() - Load boot fit to device.
 + * @hdev: Pointer to hl_device structure.
 + *
 + * Copy boot fit file to SRAM BAR.
 + *
 + * Return: 0 on success, non-zero for failure.
 + */
 +static int goya_load_boot_fit_to_device(struct hl_device *hdev)
 +{
 +      void __iomem *dst;
 +
 +      dst = hdev->pcie_bar[SRAM_CFG_BAR_ID] + BOOT_FIT_SRAM_OFFSET;
 +
 +      return hl_fw_load_fw_to_device(hdev, GOYA_BOOT_FIT_FILE, dst, 0, 0);
 +}
 +
 +static void goya_init_dynamic_firmware_loader(struct hl_device *hdev)
 +{
 +      struct dynamic_fw_load_mgr *dynamic_loader;
 +      struct cpu_dyn_regs *dyn_regs;
 +
 +      dynamic_loader = &hdev->fw_loader.dynamic_loader;
 +
 +      /*
 +       * here we update initial values for few specific dynamic regs (as
 +       * before reading the first descriptor from FW those value has to be
 +       * hard-coded) in later stages of the protocol those values will be
 +       * updated automatically by reading the FW descriptor so data there
 +       * will always be up-to-date
 +       */
 +      dyn_regs = &dynamic_loader->comm_desc.cpu_dyn_regs;
 +      dyn_regs->kmd_msg_to_cpu =
 +                              cpu_to_le32(mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU);
 +      dyn_regs->cpu_cmd_status_to_host =
 +                              cpu_to_le32(mmCPU_CMD_STATUS_TO_HOST);
 +
 +      dynamic_loader->wait_for_bl_timeout = GOYA_WAIT_FOR_BL_TIMEOUT_USEC;
 +}
 +
 +static void goya_init_static_firmware_loader(struct hl_device *hdev)
 +{
 +      struct static_fw_load_mgr *static_loader;
 +
 +      static_loader = &hdev->fw_loader.static_loader;
 +
 +      static_loader->preboot_version_max_off = SRAM_SIZE - VERSION_MAX_LEN;
 +      static_loader->boot_fit_version_max_off = SRAM_SIZE - VERSION_MAX_LEN;
 +      static_loader->kmd_msg_to_cpu_reg = mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU;
 +      static_loader->cpu_cmd_status_to_host_reg = mmCPU_CMD_STATUS_TO_HOST;
 +      static_loader->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
 +      static_loader->cpu_boot_dev_status0_reg = mmCPU_BOOT_DEV_STS0;
 +      static_loader->cpu_boot_dev_status1_reg = mmCPU_BOOT_DEV_STS1;
 +      static_loader->boot_err0_reg = mmCPU_BOOT_ERR0;
 +      static_loader->boot_err1_reg = mmCPU_BOOT_ERR1;
 +      static_loader->preboot_version_offset_reg = mmPREBOOT_VER_OFFSET;
 +      static_loader->boot_fit_version_offset_reg = mmUBOOT_VER_OFFSET;
 +      static_loader->sram_offset_mask = ~(lower_32_bits(SRAM_BASE_ADDR));
 +}
 +
 +static void goya_init_firmware_preload_params(struct hl_device *hdev)
 +{
 +      struct pre_fw_load_props *pre_fw_load = &hdev->fw_loader.pre_fw_load;
 +
 +      pre_fw_load->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
 +      pre_fw_load->sts_boot_dev_sts0_reg = mmCPU_BOOT_DEV_STS0;
 +      pre_fw_load->sts_boot_dev_sts1_reg = mmCPU_BOOT_DEV_STS1;
 +      pre_fw_load->boot_err0_reg = mmCPU_BOOT_ERR0;
 +      pre_fw_load->boot_err1_reg = mmCPU_BOOT_ERR1;
 +      pre_fw_load->wait_for_preboot_timeout = GOYA_BOOT_FIT_REQ_TIMEOUT_USEC;
 +}
 +
 +static void goya_init_firmware_loader(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct fw_load_mgr *fw_loader = &hdev->fw_loader;
 +
 +      /* fill common fields */
 +      fw_loader->fw_comp_loaded = FW_TYPE_NONE;
 +      fw_loader->boot_fit_img.image_name = GOYA_BOOT_FIT_FILE;
 +      fw_loader->linux_img.image_name = GOYA_LINUX_FW_FILE;
 +      fw_loader->cpu_timeout = GOYA_CPU_TIMEOUT_USEC;
 +      fw_loader->boot_fit_timeout = GOYA_BOOT_FIT_REQ_TIMEOUT_USEC;
 +      fw_loader->skip_bmc = false;
 +      fw_loader->sram_bar_id = SRAM_CFG_BAR_ID;
 +      fw_loader->dram_bar_id = DDR_BAR_ID;
 +
 +      if (prop->dynamic_fw_load)
 +              goya_init_dynamic_firmware_loader(hdev);
 +      else
 +              goya_init_static_firmware_loader(hdev);
 +}
 +
 +static int goya_init_cpu(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      int rc;
 +
 +      if (!(hdev->fw_components & FW_TYPE_PREBOOT_CPU))
 +              return 0;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_CPU)
 +              return 0;
 +
 +      /*
 +       * Before pushing u-boot/linux to device, need to set the ddr bar to
 +       * base address of dram
 +       */
 +      if (goya_set_ddr_bar_base(hdev, DRAM_PHYS_BASE) == U64_MAX) {
 +              dev_err(hdev->dev,
 +                      "failed to map DDR bar to DRAM base address\n");
 +              return -EIO;
 +      }
 +
 +      rc = hl_fw_init_cpu(hdev);
 +
 +      if (rc)
 +              return rc;
 +
 +      goya->hw_cap_initialized |= HW_CAP_CPU;
 +
 +      return 0;
 +}
 +
 +static int goya_mmu_update_asid_hop0_addr(struct hl_device *hdev, u32 asid,
 +                                              u64 phys_addr)
 +{
 +      u32 status, timeout_usec;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GOYA_PLDM_MMU_TIMEOUT_USEC;
 +      else
 +              timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
 +
 +      WREG32(MMU_HOP0_PA43_12, phys_addr >> MMU_HOP0_PA43_12_SHIFT);
 +      WREG32(MMU_HOP0_PA49_44, phys_addr >> MMU_HOP0_PA49_44_SHIFT);
 +      WREG32(MMU_ASID_BUSY, 0x80000000 | asid);
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              MMU_ASID_BUSY,
 +              status,
 +              !(status & 0x80000000),
 +              1000,
 +              timeout_usec);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Timeout during MMU hop0 config of asid %d\n", asid);
 +              return rc;
 +      }
 +
 +      return 0;
 +}
 +
 +int goya_mmu_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct goya_device *goya = hdev->asic_specific;
 +      u64 hop0_addr;
 +      int rc, i;
 +
 +      if (!hdev->mmu_enable)
 +              return 0;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_MMU)
 +              return 0;
 +
 +      hdev->dram_default_page_mapping = true;
 +
 +      for (i = 0 ; i < prop->max_asid ; i++) {
 +              hop0_addr = prop->mmu_pgt_addr +
 +                              (i * prop->mmu_hop_table_size);
 +
 +              rc = goya_mmu_update_asid_hop0_addr(hdev, i, hop0_addr);
 +              if (rc) {
 +                      dev_err(hdev->dev,
 +                              "failed to set hop0 addr for asid %d\n", i);
 +                      goto err;
 +              }
 +      }
 +
 +      goya->hw_cap_initialized |= HW_CAP_MMU;
 +
 +      /* init MMU cache manage page */
 +      WREG32(mmSTLB_CACHE_INV_BASE_39_8,
 +                              lower_32_bits(MMU_CACHE_MNG_ADDR >> 8));
 +      WREG32(mmSTLB_CACHE_INV_BASE_49_40, MMU_CACHE_MNG_ADDR >> 40);
 +
 +      /* Remove follower feature due to performance bug */
 +      WREG32_AND(mmSTLB_STLB_FEATURE_EN,
 +                      (~STLB_STLB_FEATURE_EN_FOLLOWER_EN_MASK));
 +
 +      hl_mmu_invalidate_cache(hdev, true, MMU_OP_USERPTR | MMU_OP_PHYS_PACK);
 +
 +      WREG32(mmMMU_MMU_ENABLE, 1);
 +      WREG32(mmMMU_SPI_MASK, 0xF);
 +
 +      return 0;
 +
 +err:
 +      return rc;
 +}
 +
 +/*
 + * goya_hw_init - Goya hardware initialization code
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Returns 0 on success
 + *
 + */
 +static int goya_hw_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int rc;
 +
 +      /* Perform read from the device to make sure device is up */
 +      RREG32(mmPCIE_DBI_DEVICE_ID_VENDOR_ID_REG);
 +
 +      /*
 +       * Let's mark in the H/W that we have reached this point. We check
 +       * this value in the reset_before_init function to understand whether
 +       * we need to reset the chip before doing H/W init. This register is
 +       * cleared by the H/W upon H/W reset
 +       */
 +      WREG32(mmHW_STATE, HL_DEVICE_HW_STATE_DIRTY);
 +
 +      rc = goya_init_cpu(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to initialize CPU\n");
 +              return rc;
 +      }
 +
 +      goya_tpc_mbist_workaround(hdev);
 +
 +      goya_init_golden_registers(hdev);
 +
 +      /*
 +       * After CPU initialization is finished, change DDR bar mapping inside
 +       * iATU to point to the start address of the MMU page tables
 +       */
 +      if (goya_set_ddr_bar_base(hdev, (MMU_PAGE_TABLES_ADDR &
 +                      ~(prop->dram_pci_bar_size - 0x1ull))) == U64_MAX) {
 +              dev_err(hdev->dev,
 +                      "failed to map DDR bar to MMU page tables\n");
 +              return -EIO;
 +      }
 +
 +      rc = goya_mmu_init(hdev);
 +      if (rc)
 +              return rc;
 +
 +      goya_init_security(hdev);
 +
 +      goya_init_dma_qmans(hdev);
 +
 +      goya_init_mme_qmans(hdev);
 +
 +      goya_init_tpc_qmans(hdev);
 +
 +      goya_enable_timestamp(hdev);
 +
 +      /* MSI-X must be enabled before CPU queues are initialized */
 +      rc = goya_enable_msix(hdev);
 +      if (rc)
 +              goto disable_queues;
 +
 +      /* Perform read from the device to flush all MSI-X configuration */
 +      RREG32(mmPCIE_DBI_DEVICE_ID_VENDOR_ID_REG);
 +
 +      return 0;
 +
 +disable_queues:
 +      goya_disable_internal_queues(hdev);
 +      goya_disable_external_queues(hdev);
 +
 +      return rc;
 +}
 +
 +static void goya_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 reset_timeout_ms, cpu_timeout_ms, status;
 +
 +      if (hdev->pldm) {
 +              reset_timeout_ms = GOYA_PLDM_RESET_TIMEOUT_MSEC;
 +              cpu_timeout_ms = GOYA_PLDM_RESET_WAIT_MSEC;
 +      } else {
 +              reset_timeout_ms = GOYA_RESET_TIMEOUT_MSEC;
 +              cpu_timeout_ms = GOYA_CPU_RESET_WAIT_MSEC;
 +      }
 +
 +      if (hard_reset) {
 +              /* I don't know what is the state of the CPU so make sure it is
 +               * stopped in any means necessary
 +               */
 +              WREG32(mmPSOC_GLOBAL_CONF_UBOOT_MAGIC, KMD_MSG_GOTO_WFE);
 +              WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
 +                      GOYA_ASYNC_EVENT_ID_HALT_MACHINE);
 +
 +              msleep(cpu_timeout_ms);
 +
 +              goya_set_ddr_bar_base(hdev, DRAM_PHYS_BASE);
 +              goya_disable_clk_rlx(hdev);
 +              goya_set_pll_refclk(hdev);
 +
 +              WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST_CFG, RESET_ALL);
 +              dev_dbg(hdev->dev,
 +                      "Issued HARD reset command, going to wait %dms\n",
 +                      reset_timeout_ms);
 +      } else {
 +              WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST_CFG, DMA_MME_TPC_RESET);
 +              dev_dbg(hdev->dev,
 +                      "Issued SOFT reset command, going to wait %dms\n",
 +                      reset_timeout_ms);
 +      }
 +
 +      /*
 +       * After hard reset, we can't poll the BTM_FSM register because the PSOC
 +       * itself is in reset. In either reset we need to wait until the reset
 +       * is deasserted
 +       */
 +      msleep(reset_timeout_ms);
 +
 +      status = RREG32(mmPSOC_GLOBAL_CONF_BTM_FSM);
 +      if (status & PSOC_GLOBAL_CONF_BTM_FSM_STATE_MASK)
 +              dev_err(hdev->dev,
 +                      "Timeout while waiting for device to reset 0x%x\n",
 +                      status);
 +
 +      if (!hard_reset && goya) {
 +              goya->hw_cap_initialized &= ~(HW_CAP_DMA | HW_CAP_MME |
 +                                              HW_CAP_GOLDEN | HW_CAP_TPC);
 +              WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
 +                              GOYA_ASYNC_EVENT_ID_SOFT_RESET);
 +              return;
 +      }
 +
 +      /* Chicken bit to re-initiate boot sequencer flow */
 +      WREG32(mmPSOC_GLOBAL_CONF_BOOT_SEQ_RE_START,
 +              1 << PSOC_GLOBAL_CONF_BOOT_SEQ_RE_START_IND_SHIFT);
 +      /* Move boot manager FSM to pre boot sequencer init state */
 +      WREG32(mmPSOC_GLOBAL_CONF_SW_BTM_FSM,
 +                      0xA << PSOC_GLOBAL_CONF_SW_BTM_FSM_CTRL_SHIFT);
 +
 +      if (goya) {
 +              goya->hw_cap_initialized &= ~(HW_CAP_CPU | HW_CAP_CPU_Q |
 +                              HW_CAP_DDR_0 | HW_CAP_DDR_1 |
 +                              HW_CAP_DMA | HW_CAP_MME |
 +                              HW_CAP_MMU | HW_CAP_TPC_MBIST |
 +                              HW_CAP_GOLDEN | HW_CAP_TPC);
 +
 +              memset(goya->events_stat, 0, sizeof(goya->events_stat));
 +      }
 +}
 +
 +int goya_suspend(struct hl_device *hdev)
 +{
 +      int rc;
 +
 +      rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
 +      if (rc)
 +              dev_err(hdev->dev, "Failed to disable PCI access from CPU\n");
 +
 +      return rc;
 +}
 +
 +int goya_resume(struct hl_device *hdev)
 +{
 +      return goya_init_iatu(hdev);
 +}
 +
 +static int goya_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
 +                      void *cpu_addr, dma_addr_t dma_addr, size_t size)
 +{
 +      int rc;
 +
++      vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
++                      VM_DONTCOPY | VM_NORESERVE);
 +
 +      rc = dma_mmap_coherent(hdev->dev, vma, cpu_addr,
 +                              (dma_addr - HOST_PHYS_BASE), size);
 +      if (rc)
 +              dev_err(hdev->dev, "dma_mmap_coherent error %d", rc);
 +
 +      return rc;
 +}
 +
 +void goya_ring_doorbell(struct hl_device *hdev, u32 hw_queue_id, u32 pi)
 +{
 +      u32 db_reg_offset, db_value;
 +
 +      switch (hw_queue_id) {
 +      case GOYA_QUEUE_ID_DMA_0:
 +              db_reg_offset = mmDMA_QM_0_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_DMA_1:
 +              db_reg_offset = mmDMA_QM_1_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_DMA_2:
 +              db_reg_offset = mmDMA_QM_2_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_DMA_3:
 +              db_reg_offset = mmDMA_QM_3_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_DMA_4:
 +              db_reg_offset = mmDMA_QM_4_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_CPU_PQ:
 +              db_reg_offset = mmCPU_IF_PF_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_MME:
 +              db_reg_offset = mmMME_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC0:
 +              db_reg_offset = mmTPC0_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC1:
 +              db_reg_offset = mmTPC1_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC2:
 +              db_reg_offset = mmTPC2_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC3:
 +              db_reg_offset = mmTPC3_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC4:
 +              db_reg_offset = mmTPC4_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC5:
 +              db_reg_offset = mmTPC5_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC6:
 +              db_reg_offset = mmTPC6_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC7:
 +              db_reg_offset = mmTPC7_QM_PQ_PI;
 +              break;
 +
 +      default:
 +              /* Should never get here */
 +              dev_err(hdev->dev, "H/W queue %d is invalid. Can't set pi\n",
 +                      hw_queue_id);
 +              return;
 +      }
 +
 +      db_value = pi;
 +
 +      /* ring the doorbell */
 +      WREG32(db_reg_offset, db_value);
 +
 +      if (hw_queue_id == GOYA_QUEUE_ID_CPU_PQ) {
 +              /* make sure device CPU will read latest data from host */
 +              mb();
 +              WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
 +                              GOYA_ASYNC_EVENT_ID_PI_UPDATE);
 +      }
 +}
 +
 +void goya_pqe_write(struct hl_device *hdev, __le64 *pqe, struct hl_bd *bd)
 +{
 +      /* The QMANs are on the SRAM so need to copy to IO space */
 +      memcpy_toio((void __iomem *) pqe, bd, sizeof(struct hl_bd));
 +}
 +
 +static void *goya_dma_alloc_coherent(struct hl_device *hdev, size_t size,
 +                                      dma_addr_t *dma_handle, gfp_t flags)
 +{
 +      void *kernel_addr = dma_alloc_coherent(&hdev->pdev->dev, size,
 +                                              dma_handle, flags);
 +
 +      /* Shift to the device's base physical address of host memory */
 +      if (kernel_addr)
 +              *dma_handle += HOST_PHYS_BASE;
 +
 +      return kernel_addr;
 +}
 +
 +static void goya_dma_free_coherent(struct hl_device *hdev, size_t size,
 +                                      void *cpu_addr, dma_addr_t dma_handle)
 +{
 +      /* Cancel the device's base physical address of host memory */
 +      dma_addr_t fixed_dma_handle = dma_handle - HOST_PHYS_BASE;
 +
 +      dma_free_coherent(&hdev->pdev->dev, size, cpu_addr, fixed_dma_handle);
 +}
 +
 +int goya_scrub_device_mem(struct hl_device *hdev)
 +{
 +      return 0;
 +}
 +
 +void *goya_get_int_queue_base(struct hl_device *hdev, u32 queue_id,
 +                              dma_addr_t *dma_handle, u16 *queue_len)
 +{
 +      void *base;
 +      u32 offset;
 +
 +      *dma_handle = hdev->asic_prop.sram_base_address;
 +
 +      base = (__force void *) hdev->pcie_bar[SRAM_CFG_BAR_ID];
 +
 +      switch (queue_id) {
 +      case GOYA_QUEUE_ID_MME:
 +              offset = MME_QMAN_BASE_OFFSET;
 +              *queue_len = MME_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC0:
 +              offset = TPC0_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC1:
 +              offset = TPC1_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC2:
 +              offset = TPC2_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC3:
 +              offset = TPC3_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC4:
 +              offset = TPC4_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC5:
 +              offset = TPC5_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC6:
 +              offset = TPC6_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC7:
 +              offset = TPC7_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      default:
 +              dev_err(hdev->dev, "Got invalid queue id %d\n", queue_id);
 +              return NULL;
 +      }
 +
 +      base += offset;
 +      *dma_handle += offset;
 +
 +      return base;
 +}
 +
 +static int goya_send_job_on_qman0(struct hl_device *hdev, struct hl_cs_job *job)
 +{
 +      struct packet_msg_prot *fence_pkt;
 +      u32 *fence_ptr;
 +      dma_addr_t fence_dma_addr;
 +      struct hl_cb *cb;
 +      u32 tmp, timeout;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout = GOYA_PLDM_QMAN0_TIMEOUT_USEC;
 +      else
 +              timeout = HL_DEVICE_TIMEOUT_USEC;
 +
 +      if (!hdev->asic_funcs->is_device_idle(hdev, NULL, 0, NULL)) {
 +              dev_err_ratelimited(hdev->dev,
 +                      "Can't send driver job on QMAN0 because the device is not idle\n");
 +              return -EBUSY;
 +      }
 +
 +      fence_ptr = hl_asic_dma_pool_zalloc(hdev, 4, GFP_KERNEL, &fence_dma_addr);
 +      if (!fence_ptr) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate fence memory for QMAN0\n");
 +              return -ENOMEM;
 +      }
 +
 +      goya_qman0_set_security(hdev, true);
 +
 +      cb = job->patched_cb;
 +
 +      fence_pkt = cb->kernel_address +
 +                      job->job_cb_size - sizeof(struct packet_msg_prot);
 +
 +      tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
 +                      (1 << GOYA_PKT_CTL_EB_SHIFT) |
 +                      (1 << GOYA_PKT_CTL_MB_SHIFT);
 +      fence_pkt->ctl = cpu_to_le32(tmp);
 +      fence_pkt->value = cpu_to_le32(GOYA_QMAN0_FENCE_VAL);
 +      fence_pkt->addr = cpu_to_le64(fence_dma_addr);
 +
 +      rc = hl_hw_queue_send_cb_no_cmpl(hdev, GOYA_QUEUE_ID_DMA_0,
 +                                      job->job_cb_size, cb->bus_address);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to send CB on QMAN0, %d\n", rc);
 +              goto free_fence_ptr;
 +      }
 +
 +      rc = hl_poll_timeout_memory(hdev, fence_ptr, tmp,
 +                              (tmp == GOYA_QMAN0_FENCE_VAL), 1000,
 +                              timeout, true);
 +
 +      hl_hw_queue_inc_ci_kernel(hdev, GOYA_QUEUE_ID_DMA_0);
 +
 +      if (rc == -ETIMEDOUT) {
 +              dev_err(hdev->dev, "QMAN0 Job timeout (0x%x)\n", tmp);
 +              goto free_fence_ptr;
 +      }
 +
 +free_fence_ptr:
 +      hl_asic_dma_pool_free(hdev, (void *) fence_ptr, fence_dma_addr);
 +
 +      goya_qman0_set_security(hdev, false);
 +
 +      return rc;
 +}
 +
 +int goya_send_cpu_message(struct hl_device *hdev, u32 *msg, u16 len,
 +                              u32 timeout, u64 *result)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q)) {
 +              if (result)
 +                      *result = 0;
 +              return 0;
 +      }
 +
 +      if (!timeout)
 +              timeout = GOYA_MSG_TO_CPU_TIMEOUT_USEC;
 +
 +      return hl_fw_send_cpu_message(hdev, GOYA_QUEUE_ID_CPU_PQ, msg, len,
 +                                      timeout, result);
 +}
 +
 +int goya_test_queue(struct hl_device *hdev, u32 hw_queue_id)
 +{
 +      struct packet_msg_prot *fence_pkt;
 +      dma_addr_t pkt_dma_addr;
 +      u32 fence_val, tmp;
 +      dma_addr_t fence_dma_addr;
 +      u32 *fence_ptr;
 +      int rc;
 +
 +      fence_val = GOYA_QMAN0_FENCE_VAL;
 +
 +      fence_ptr = hl_asic_dma_pool_zalloc(hdev, 4, GFP_KERNEL, &fence_dma_addr);
 +      if (!fence_ptr) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate memory for H/W queue %d testing\n",
 +                      hw_queue_id);
 +              return -ENOMEM;
 +      }
 +
 +      *fence_ptr = 0;
 +
 +      fence_pkt = hl_asic_dma_pool_zalloc(hdev, sizeof(struct packet_msg_prot), GFP_KERNEL,
 +                                              &pkt_dma_addr);
 +      if (!fence_pkt) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate packet for H/W queue %d testing\n",
 +                      hw_queue_id);
 +              rc = -ENOMEM;
 +              goto free_fence_ptr;
 +      }
 +
 +      tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
 +                      (1 << GOYA_PKT_CTL_EB_SHIFT) |
 +                      (1 << GOYA_PKT_CTL_MB_SHIFT);
 +      fence_pkt->ctl = cpu_to_le32(tmp);
 +      fence_pkt->value = cpu_to_le32(fence_val);
 +      fence_pkt->addr = cpu_to_le64(fence_dma_addr);
 +
 +      rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id,
 +                                      sizeof(struct packet_msg_prot),
 +                                      pkt_dma_addr);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to send fence packet to H/W queue %d\n",
 +                      hw_queue_id);
 +              goto free_pkt;
 +      }
 +
 +      rc = hl_poll_timeout_memory(hdev, fence_ptr, tmp, (tmp == fence_val),
 +                                      1000, GOYA_TEST_QUEUE_WAIT_USEC, true);
 +
 +      hl_hw_queue_inc_ci_kernel(hdev, hw_queue_id);
 +
 +      if (rc == -ETIMEDOUT) {
 +              dev_err(hdev->dev,
 +                      "H/W queue %d test failed (scratch(0x%08llX) == 0x%08X)\n",
 +                      hw_queue_id, (unsigned long long) fence_dma_addr, tmp);
 +              rc = -EIO;
 +      }
 +
 +free_pkt:
 +      hl_asic_dma_pool_free(hdev, (void *) fence_pkt, pkt_dma_addr);
 +free_fence_ptr:
 +      hl_asic_dma_pool_free(hdev, (void *) fence_ptr, fence_dma_addr);
 +      return rc;
 +}
 +
 +int goya_test_cpu_queue(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      /*
 +       * check capability here as send_cpu_message() won't update the result
 +       * value if no capability
 +       */
 +      if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_test_cpu_queue(hdev);
 +}
 +
 +int goya_test_queues(struct hl_device *hdev)
 +{
 +      int i, rc, ret_val = 0;
 +
 +      for (i = 0 ; i < NUMBER_OF_EXT_HW_QUEUES ; i++) {
 +              rc = goya_test_queue(hdev, i);
 +              if (rc)
 +                      ret_val = -EINVAL;
 +      }
 +
 +      return ret_val;
 +}
 +
 +static void *goya_dma_pool_zalloc(struct hl_device *hdev, size_t size,
 +                                      gfp_t mem_flags, dma_addr_t *dma_handle)
 +{
 +      void *kernel_addr;
 +
 +      if (size > GOYA_DMA_POOL_BLK_SIZE)
 +              return NULL;
 +
 +      kernel_addr =  dma_pool_zalloc(hdev->dma_pool, mem_flags, dma_handle);
 +
 +      /* Shift to the device's base physical address of host memory */
 +      if (kernel_addr)
 +              *dma_handle += HOST_PHYS_BASE;
 +
 +      return kernel_addr;
 +}
 +
 +static void goya_dma_pool_free(struct hl_device *hdev, void *vaddr,
 +                              dma_addr_t dma_addr)
 +{
 +      /* Cancel the device's base physical address of host memory */
 +      dma_addr_t fixed_dma_addr = dma_addr - HOST_PHYS_BASE;
 +
 +      dma_pool_free(hdev->dma_pool, vaddr, fixed_dma_addr);
 +}
 +
 +void *goya_cpu_accessible_dma_pool_alloc(struct hl_device *hdev, size_t size,
 +                                      dma_addr_t *dma_handle)
 +{
 +      void *vaddr;
 +
 +      vaddr = hl_fw_cpu_accessible_dma_pool_alloc(hdev, size, dma_handle);
 +      *dma_handle = (*dma_handle) - hdev->cpu_accessible_dma_address +
 +                      VA_CPU_ACCESSIBLE_MEM_ADDR;
 +
 +      return vaddr;
 +}
 +
 +void goya_cpu_accessible_dma_pool_free(struct hl_device *hdev, size_t size,
 +                                      void *vaddr)
 +{
 +      hl_fw_cpu_accessible_dma_pool_free(hdev, size, vaddr);
 +}
 +
 +u32 goya_get_dma_desc_list_size(struct hl_device *hdev, struct sg_table *sgt)
 +{
 +      struct scatterlist *sg, *sg_next_iter;
 +      u32 count, dma_desc_cnt;
 +      u64 len, len_next;
 +      dma_addr_t addr, addr_next;
 +
 +      dma_desc_cnt = 0;
 +
 +      for_each_sgtable_dma_sg(sgt, sg, count) {
 +              len = sg_dma_len(sg);
 +              addr = sg_dma_address(sg);
 +
 +              if (len == 0)
 +                      break;
 +
 +              while ((count + 1) < sgt->nents) {
 +                      sg_next_iter = sg_next(sg);
 +                      len_next = sg_dma_len(sg_next_iter);
 +                      addr_next = sg_dma_address(sg_next_iter);
 +
 +                      if (len_next == 0)
 +                              break;
 +
 +                      if ((addr + len == addr_next) &&
 +                              (len + len_next <= DMA_MAX_TRANSFER_SIZE)) {
 +                              len += len_next;
 +                              count++;
 +                              sg = sg_next_iter;
 +                      } else {
 +                              break;
 +                      }
 +              }
 +
 +              dma_desc_cnt++;
 +      }
 +
 +      return dma_desc_cnt * sizeof(struct packet_lin_dma);
 +}
 +
 +static int goya_pin_memory_before_cs(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt,
 +                              u64 addr, enum dma_data_direction dir)
 +{
 +      struct hl_userptr *userptr;
 +      int rc;
 +
 +      if (hl_userptr_is_pinned(hdev, addr, le32_to_cpu(user_dma_pkt->tsize),
 +                      parser->job_userptr_list, &userptr))
 +              goto already_pinned;
 +
 +      userptr = kzalloc(sizeof(*userptr), GFP_KERNEL);
 +      if (!userptr)
 +              return -ENOMEM;
 +
 +      rc = hl_pin_host_memory(hdev, addr, le32_to_cpu(user_dma_pkt->tsize),
 +                              userptr);
 +      if (rc)
 +              goto free_userptr;
 +
 +      list_add_tail(&userptr->job_node, parser->job_userptr_list);
 +
 +      rc = hdev->asic_funcs->asic_dma_map_sgtable(hdev, userptr->sgt, dir);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to map sgt with DMA region\n");
 +              goto unpin_memory;
 +      }
 +
 +      userptr->dma_mapped = true;
 +      userptr->dir = dir;
 +
 +already_pinned:
 +      parser->patched_cb_size +=
 +                      goya_get_dma_desc_list_size(hdev, userptr->sgt);
 +
 +      return 0;
 +
 +unpin_memory:
 +      list_del(&userptr->job_node);
 +      hl_unpin_host_memory(hdev, userptr);
 +free_userptr:
 +      kfree(userptr);
 +      return rc;
 +}
 +
 +static int goya_validate_dma_pkt_host(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt)
 +{
 +      u64 device_memory_addr, addr;
 +      enum dma_data_direction dir;
 +      enum hl_goya_dma_direction user_dir;
 +      bool sram_addr = true;
 +      bool skip_host_mem_pin = false;
 +      bool user_memset;
 +      u32 ctl;
 +      int rc = 0;
 +
 +      ctl = le32_to_cpu(user_dma_pkt->ctl);
 +
 +      user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
 +                      GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 +
 +      user_memset = (ctl & GOYA_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
 +                      GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT;
 +
 +      switch (user_dir) {
 +      case HL_DMA_HOST_TO_DRAM:
 +              dev_dbg(hdev->dev, "DMA direction is HOST --> DRAM\n");
 +              dir = DMA_TO_DEVICE;
 +              sram_addr = false;
 +              addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              if (user_memset)
 +                      skip_host_mem_pin = true;
 +              break;
 +
 +      case HL_DMA_DRAM_TO_HOST:
 +              dev_dbg(hdev->dev, "DMA direction is DRAM --> HOST\n");
 +              dir = DMA_FROM_DEVICE;
 +              sram_addr = false;
 +              addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              break;
 +
 +      case HL_DMA_HOST_TO_SRAM:
 +              dev_dbg(hdev->dev, "DMA direction is HOST --> SRAM\n");
 +              dir = DMA_TO_DEVICE;
 +              addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              if (user_memset)
 +                      skip_host_mem_pin = true;
 +              break;
 +
 +      case HL_DMA_SRAM_TO_HOST:
 +              dev_dbg(hdev->dev, "DMA direction is SRAM --> HOST\n");
 +              dir = DMA_FROM_DEVICE;
 +              addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              break;
 +      default:
 +              dev_err(hdev->dev, "DMA direction %d is unsupported/undefined\n", user_dir);
 +              return -EFAULT;
 +      }
 +
 +      if (sram_addr) {
 +              if (!hl_mem_area_inside_range(device_memory_addr,
 +                              le32_to_cpu(user_dma_pkt->tsize),
 +                              hdev->asic_prop.sram_user_base_address,
 +                              hdev->asic_prop.sram_end_address)) {
 +
 +                      dev_err(hdev->dev,
 +                              "SRAM address 0x%llx + 0x%x is invalid\n",
 +                              device_memory_addr,
 +                              user_dma_pkt->tsize);
 +                      return -EFAULT;
 +              }
 +      } else {
 +              if (!hl_mem_area_inside_range(device_memory_addr,
 +                              le32_to_cpu(user_dma_pkt->tsize),
 +                              hdev->asic_prop.dram_user_base_address,
 +                              hdev->asic_prop.dram_end_address)) {
 +
 +                      dev_err(hdev->dev,
 +                              "DRAM address 0x%llx + 0x%x is invalid\n",
 +                              device_memory_addr,
 +                              user_dma_pkt->tsize);
 +                      return -EFAULT;
 +              }
 +      }
 +
 +      if (skip_host_mem_pin)
 +              parser->patched_cb_size += sizeof(*user_dma_pkt);
 +      else {
 +              if ((dir == DMA_TO_DEVICE) &&
 +                              (parser->hw_queue_id > GOYA_QUEUE_ID_DMA_1)) {
 +                      dev_err(hdev->dev,
 +                              "Can't DMA from host on queue other then 1\n");
 +                      return -EFAULT;
 +              }
 +
 +              rc = goya_pin_memory_before_cs(hdev, parser, user_dma_pkt,
 +                                              addr, dir);
 +      }
 +
 +      return rc;
 +}
 +
 +static int goya_validate_dma_pkt_no_host(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt)
 +{
 +      u64 sram_memory_addr, dram_memory_addr;
 +      enum hl_goya_dma_direction user_dir;
 +      u32 ctl;
 +
 +      ctl = le32_to_cpu(user_dma_pkt->ctl);
 +      user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
 +                      GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 +
 +      if (user_dir == HL_DMA_DRAM_TO_SRAM) {
 +              dev_dbg(hdev->dev, "DMA direction is DRAM --> SRAM\n");
 +              dram_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              sram_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +      } else {
 +              dev_dbg(hdev->dev, "DMA direction is SRAM --> DRAM\n");
 +              sram_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              dram_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +      }
 +
 +      if (!hl_mem_area_inside_range(sram_memory_addr,
 +                              le32_to_cpu(user_dma_pkt->tsize),
 +                              hdev->asic_prop.sram_user_base_address,
 +                              hdev->asic_prop.sram_end_address)) {
 +              dev_err(hdev->dev, "SRAM address 0x%llx + 0x%x is invalid\n",
 +                      sram_memory_addr, user_dma_pkt->tsize);
 +              return -EFAULT;
 +      }
 +
 +      if (!hl_mem_area_inside_range(dram_memory_addr,
 +                              le32_to_cpu(user_dma_pkt->tsize),
 +                              hdev->asic_prop.dram_user_base_address,
 +                              hdev->asic_prop.dram_end_address)) {
 +              dev_err(hdev->dev, "DRAM address 0x%llx + 0x%x is invalid\n",
 +                      dram_memory_addr, user_dma_pkt->tsize);
 +              return -EFAULT;
 +      }
 +
 +      parser->patched_cb_size += sizeof(*user_dma_pkt);
 +
 +      return 0;
 +}
 +
 +static int goya_validate_dma_pkt_no_mmu(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt)
 +{
 +      enum hl_goya_dma_direction user_dir;
 +      u32 ctl;
 +      int rc;
 +
 +      dev_dbg(hdev->dev, "DMA packet details:\n");
 +      dev_dbg(hdev->dev, "source == 0x%llx\n",
 +              le64_to_cpu(user_dma_pkt->src_addr));
 +      dev_dbg(hdev->dev, "destination == 0x%llx\n",
 +              le64_to_cpu(user_dma_pkt->dst_addr));
 +      dev_dbg(hdev->dev, "size == %u\n", le32_to_cpu(user_dma_pkt->tsize));
 +
 +      ctl = le32_to_cpu(user_dma_pkt->ctl);
 +      user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
 +                      GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 +
 +      /*
 +       * Special handling for DMA with size 0. The H/W has a bug where
 +       * this can cause the QMAN DMA to get stuck, so block it here.
 +       */
 +      if (user_dma_pkt->tsize == 0) {
 +              dev_err(hdev->dev,
 +                      "Got DMA with size 0, might reset the device\n");
 +              return -EINVAL;
 +      }
 +
 +      if ((user_dir == HL_DMA_DRAM_TO_SRAM) || (user_dir == HL_DMA_SRAM_TO_DRAM))
 +              rc = goya_validate_dma_pkt_no_host(hdev, parser, user_dma_pkt);
 +      else
 +              rc = goya_validate_dma_pkt_host(hdev, parser, user_dma_pkt);
 +
 +      return rc;
 +}
 +
 +static int goya_validate_dma_pkt_mmu(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt)
 +{
 +      dev_dbg(hdev->dev, "DMA packet details:\n");
 +      dev_dbg(hdev->dev, "source == 0x%llx\n",
 +              le64_to_cpu(user_dma_pkt->src_addr));
 +      dev_dbg(hdev->dev, "destination == 0x%llx\n",
 +              le64_to_cpu(user_dma_pkt->dst_addr));
 +      dev_dbg(hdev->dev, "size == %u\n", le32_to_cpu(user_dma_pkt->tsize));
 +
 +      /*
 +       * WA for HW-23.
 +       * We can't allow user to read from Host using QMANs other than 1.
 +       * PMMU and HPMMU addresses are equal, check only one of them.
 +       */
 +      if (parser->hw_queue_id != GOYA_QUEUE_ID_DMA_1 &&
 +              hl_mem_area_inside_range(le64_to_cpu(user_dma_pkt->src_addr),
 +                              le32_to_cpu(user_dma_pkt->tsize),
 +                              hdev->asic_prop.pmmu.start_addr,
 +                              hdev->asic_prop.pmmu.end_addr)) {
 +              dev_err(hdev->dev,
 +                      "Can't DMA from host on queue other then 1\n");
 +              return -EFAULT;
 +      }
 +
 +      if (user_dma_pkt->tsize == 0) {
 +              dev_err(hdev->dev,
 +                      "Got DMA with size 0, might reset the device\n");
 +              return -EINVAL;
 +      }
 +
 +      parser->patched_cb_size += sizeof(*user_dma_pkt);
 +
 +      return 0;
 +}
 +
 +static int goya_validate_wreg32(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_wreg32 *wreg_pkt)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 sob_start_addr, sob_end_addr;
 +      u16 reg_offset;
 +
 +      reg_offset = le32_to_cpu(wreg_pkt->ctl) &
 +                      GOYA_PKT_WREG32_CTL_REG_OFFSET_MASK;
 +
 +      dev_dbg(hdev->dev, "WREG32 packet details:\n");
 +      dev_dbg(hdev->dev, "reg_offset == 0x%x\n", reg_offset);
 +      dev_dbg(hdev->dev, "value      == 0x%x\n",
 +              le32_to_cpu(wreg_pkt->value));
 +
 +      if (reg_offset != (mmDMA_CH_0_WR_COMP_ADDR_LO & 0x1FFF)) {
 +              dev_err(hdev->dev, "WREG32 packet with illegal address 0x%x\n",
 +                      reg_offset);
 +              return -EPERM;
 +      }
 +
 +      /*
 +       * With MMU, DMA channels are not secured, so it doesn't matter where
 +       * the WR COMP will be written to because it will go out with
 +       * non-secured property
 +       */
 +      if (goya->hw_cap_initialized & HW_CAP_MMU)
 +              return 0;
 +
 +      sob_start_addr = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      sob_end_addr = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1023);
 +
 +      if ((le32_to_cpu(wreg_pkt->value) < sob_start_addr) ||
 +                      (le32_to_cpu(wreg_pkt->value) > sob_end_addr)) {
 +
 +              dev_err(hdev->dev, "WREG32 packet with illegal value 0x%x\n",
 +                      wreg_pkt->value);
 +              return -EPERM;
 +      }
 +
 +      return 0;
 +}
 +
 +static int goya_validate_cb(struct hl_device *hdev,
 +                      struct hl_cs_parser *parser, bool is_mmu)
 +{
 +      u32 cb_parsed_length = 0;
 +      int rc = 0;
 +
 +      parser->patched_cb_size = 0;
 +
 +      /* cb_user_size is more than 0 so loop will always be executed */
 +      while (cb_parsed_length < parser->user_cb_size) {
 +              enum packet_id pkt_id;
 +              u16 pkt_size;
 +              struct goya_packet *user_pkt;
 +
 +              user_pkt = parser->user_cb->kernel_address + cb_parsed_length;
 +
 +              pkt_id = (enum packet_id) (
 +                              (le64_to_cpu(user_pkt->header) &
 +                              PACKET_HEADER_PACKET_ID_MASK) >>
 +                                      PACKET_HEADER_PACKET_ID_SHIFT);
 +
 +              if (!validate_packet_id(pkt_id)) {
 +                      dev_err(hdev->dev, "Invalid packet id %u\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              pkt_size = goya_packet_sizes[pkt_id];
 +              cb_parsed_length += pkt_size;
 +              if (cb_parsed_length > parser->user_cb_size) {
 +                      dev_err(hdev->dev,
 +                              "packet 0x%x is out of CB boundary\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              switch (pkt_id) {
 +              case PACKET_WREG_32:
 +                      /*
 +                       * Although it is validated after copy in patch_cb(),
 +                       * need to validate here as well because patch_cb() is
 +                       * not called in MMU path while this function is called
 +                       */
 +                      rc = goya_validate_wreg32(hdev,
 +                              parser, (struct packet_wreg32 *) user_pkt);
 +                      parser->patched_cb_size += pkt_size;
 +                      break;
 +
 +              case PACKET_WREG_BULK:
 +                      dev_err(hdev->dev,
 +                              "User not allowed to use WREG_BULK\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_MSG_PROT:
 +                      dev_err(hdev->dev,
 +                              "User not allowed to use MSG_PROT\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_CP_DMA:
 +                      dev_err(hdev->dev, "User not allowed to use CP_DMA\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_STOP:
 +                      dev_err(hdev->dev, "User not allowed to use STOP\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_LIN_DMA:
 +                      if (is_mmu)
 +                              rc = goya_validate_dma_pkt_mmu(hdev, parser,
 +                                      (struct packet_lin_dma *) user_pkt);
 +                      else
 +                              rc = goya_validate_dma_pkt_no_mmu(hdev, parser,
 +                                      (struct packet_lin_dma *) user_pkt);
 +                      break;
 +
 +              case PACKET_MSG_LONG:
 +              case PACKET_MSG_SHORT:
 +              case PACKET_FENCE:
 +              case PACKET_NOP:
 +                      parser->patched_cb_size += pkt_size;
 +                      break;
 +
 +              default:
 +                      dev_err(hdev->dev, "Invalid packet header 0x%x\n",
 +                              pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              if (rc)
 +                      break;
 +      }
 +
 +      /*
 +       * The new CB should have space at the end for two MSG_PROT packets:
 +       * 1. A packet that will act as a completion packet
 +       * 2. A packet that will generate MSI-X interrupt
 +       */
 +      parser->patched_cb_size += sizeof(struct packet_msg_prot) * 2;
 +
 +      return rc;
 +}
 +
 +static int goya_patch_dma_packet(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt,
 +                              struct packet_lin_dma *new_dma_pkt,
 +                              u32 *new_dma_pkt_size)
 +{
 +      struct hl_userptr *userptr;
 +      struct scatterlist *sg, *sg_next_iter;
 +      u32 count, dma_desc_cnt;
 +      u64 len, len_next;
 +      dma_addr_t dma_addr, dma_addr_next;
 +      enum hl_goya_dma_direction user_dir;
 +      u64 device_memory_addr, addr;
 +      enum dma_data_direction dir;
 +      struct sg_table *sgt;
 +      bool skip_host_mem_pin = false;
 +      bool user_memset;
 +      u32 user_rdcomp_mask, user_wrcomp_mask, ctl;
 +
 +      ctl = le32_to_cpu(user_dma_pkt->ctl);
 +
 +      user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
 +                      GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 +
 +      user_memset = (ctl & GOYA_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
 +                      GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT;
 +
 +      if ((user_dir == HL_DMA_DRAM_TO_SRAM) || (user_dir == HL_DMA_SRAM_TO_DRAM) ||
 +                      (user_dma_pkt->tsize == 0)) {
 +              memcpy(new_dma_pkt, user_dma_pkt, sizeof(*new_dma_pkt));
 +              *new_dma_pkt_size = sizeof(*new_dma_pkt);
 +              return 0;
 +      }
 +
 +      if ((user_dir == HL_DMA_HOST_TO_DRAM) || (user_dir == HL_DMA_HOST_TO_SRAM)) {
 +              addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              dir = DMA_TO_DEVICE;
 +              if (user_memset)
 +                      skip_host_mem_pin = true;
 +      } else {
 +              addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              dir = DMA_FROM_DEVICE;
 +      }
 +
 +      if ((!skip_host_mem_pin) &&
 +              (hl_userptr_is_pinned(hdev, addr,
 +                      le32_to_cpu(user_dma_pkt->tsize),
 +                      parser->job_userptr_list, &userptr) == false)) {
 +              dev_err(hdev->dev, "Userptr 0x%llx + 0x%x NOT mapped\n",
 +                              addr, user_dma_pkt->tsize);
 +              return -EFAULT;
 +      }
 +
 +      if ((user_memset) && (dir == DMA_TO_DEVICE)) {
 +              memcpy(new_dma_pkt, user_dma_pkt, sizeof(*user_dma_pkt));
 +              *new_dma_pkt_size = sizeof(*user_dma_pkt);
 +              return 0;
 +      }
 +
 +      user_rdcomp_mask = ctl & GOYA_PKT_LIN_DMA_CTL_RDCOMP_MASK;
 +
 +      user_wrcomp_mask = ctl & GOYA_PKT_LIN_DMA_CTL_WRCOMP_MASK;
 +
 +      sgt = userptr->sgt;
 +      dma_desc_cnt = 0;
 +
 +      for_each_sgtable_dma_sg(sgt, sg, count) {
 +              len = sg_dma_len(sg);
 +              dma_addr = sg_dma_address(sg);
 +
 +              if (len == 0)
 +                      break;
 +
 +              while ((count + 1) < sgt->nents) {
 +                      sg_next_iter = sg_next(sg);
 +                      len_next = sg_dma_len(sg_next_iter);
 +                      dma_addr_next = sg_dma_address(sg_next_iter);
 +
 +                      if (len_next == 0)
 +                              break;
 +
 +                      if ((dma_addr + len == dma_addr_next) &&
 +                              (len + len_next <= DMA_MAX_TRANSFER_SIZE)) {
 +                              len += len_next;
 +                              count++;
 +                              sg = sg_next_iter;
 +                      } else {
 +                              break;
 +                      }
 +              }
 +
 +              ctl = le32_to_cpu(user_dma_pkt->ctl);
 +              if (likely(dma_desc_cnt))
 +                      ctl &= ~GOYA_PKT_CTL_EB_MASK;
 +              ctl &= ~(GOYA_PKT_LIN_DMA_CTL_RDCOMP_MASK |
 +                              GOYA_PKT_LIN_DMA_CTL_WRCOMP_MASK);
 +              new_dma_pkt->ctl = cpu_to_le32(ctl);
 +              new_dma_pkt->tsize = cpu_to_le32((u32) len);
 +
 +              if (dir == DMA_TO_DEVICE) {
 +                      new_dma_pkt->src_addr = cpu_to_le64(dma_addr);
 +                      new_dma_pkt->dst_addr = cpu_to_le64(device_memory_addr);
 +              } else {
 +                      new_dma_pkt->src_addr = cpu_to_le64(device_memory_addr);
 +                      new_dma_pkt->dst_addr = cpu_to_le64(dma_addr);
 +              }
 +
 +              if (!user_memset)
 +                      device_memory_addr += len;
 +              dma_desc_cnt++;
 +              new_dma_pkt++;
 +      }
 +
 +      if (!dma_desc_cnt) {
 +              dev_err(hdev->dev,
 +                      "Error of 0 SG entries when patching DMA packet\n");
 +              return -EFAULT;
 +      }
 +
 +      /* Fix the last dma packet - rdcomp/wrcomp must be as user set them */
 +      new_dma_pkt--;
 +      new_dma_pkt->ctl |= cpu_to_le32(user_rdcomp_mask | user_wrcomp_mask);
 +
 +      *new_dma_pkt_size = dma_desc_cnt * sizeof(struct packet_lin_dma);
 +
 +      return 0;
 +}
 +
 +static int goya_patch_cb(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser)
 +{
 +      u32 cb_parsed_length = 0;
 +      u32 cb_patched_cur_length = 0;
 +      int rc = 0;
 +
 +      /* cb_user_size is more than 0 so loop will always be executed */
 +      while (cb_parsed_length < parser->user_cb_size) {
 +              enum packet_id pkt_id;
 +              u16 pkt_size;
 +              u32 new_pkt_size = 0;
 +              struct goya_packet *user_pkt, *kernel_pkt;
 +
 +              user_pkt = parser->user_cb->kernel_address + cb_parsed_length;
 +              kernel_pkt = parser->patched_cb->kernel_address +
 +                                      cb_patched_cur_length;
 +
 +              pkt_id = (enum packet_id) (
 +                              (le64_to_cpu(user_pkt->header) &
 +                              PACKET_HEADER_PACKET_ID_MASK) >>
 +                                      PACKET_HEADER_PACKET_ID_SHIFT);
 +
 +              if (!validate_packet_id(pkt_id)) {
 +                      dev_err(hdev->dev, "Invalid packet id %u\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              pkt_size = goya_packet_sizes[pkt_id];
 +              cb_parsed_length += pkt_size;
 +              if (cb_parsed_length > parser->user_cb_size) {
 +                      dev_err(hdev->dev,
 +                              "packet 0x%x is out of CB boundary\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              switch (pkt_id) {
 +              case PACKET_LIN_DMA:
 +                      rc = goya_patch_dma_packet(hdev, parser,
 +                                      (struct packet_lin_dma *) user_pkt,
 +                                      (struct packet_lin_dma *) kernel_pkt,
 +                                      &new_pkt_size);
 +                      cb_patched_cur_length += new_pkt_size;
 +                      break;
 +
 +              case PACKET_WREG_32:
 +                      memcpy(kernel_pkt, user_pkt, pkt_size);
 +                      cb_patched_cur_length += pkt_size;
 +                      rc = goya_validate_wreg32(hdev, parser,
 +                                      (struct packet_wreg32 *) kernel_pkt);
 +                      break;
 +
 +              case PACKET_WREG_BULK:
 +                      dev_err(hdev->dev,
 +                              "User not allowed to use WREG_BULK\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_MSG_PROT:
 +                      dev_err(hdev->dev,
 +                              "User not allowed to use MSG_PROT\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_CP_DMA:
 +                      dev_err(hdev->dev, "User not allowed to use CP_DMA\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_STOP:
 +                      dev_err(hdev->dev, "User not allowed to use STOP\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_MSG_LONG:
 +              case PACKET_MSG_SHORT:
 +              case PACKET_FENCE:
 +              case PACKET_NOP:
 +                      memcpy(kernel_pkt, user_pkt, pkt_size);
 +                      cb_patched_cur_length += pkt_size;
 +                      break;
 +
 +              default:
 +                      dev_err(hdev->dev, "Invalid packet header 0x%x\n",
 +                              pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              if (rc)
 +                      break;
 +      }
 +
 +      return rc;
 +}
 +
 +static int goya_parse_cb_mmu(struct hl_device *hdev,
 +              struct hl_cs_parser *parser)
 +{
 +      u64 handle;
 +      u32 patched_cb_size;
 +      struct hl_cb *user_cb;
 +      int rc;
 +
 +      /*
 +       * The new CB should have space at the end for two MSG_PROT pkt:
 +       * 1. A packet that will act as a completion packet
 +       * 2. A packet that will generate MSI-X interrupt
 +       */
 +      parser->patched_cb_size = parser->user_cb_size +
 +                      sizeof(struct packet_msg_prot) * 2;
 +
 +      rc = hl_cb_create(hdev, &hdev->kernel_mem_mgr, hdev->kernel_ctx,
 +                              parser->patched_cb_size, false, false,
 +                              &handle);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate patched CB for DMA CS %d\n",
 +                      rc);
 +              return rc;
 +      }
 +
 +      parser->patched_cb = hl_cb_get(&hdev->kernel_mem_mgr, handle);
 +      /* hl_cb_get should never fail here */
 +      if (!parser->patched_cb) {
 +              dev_crit(hdev->dev, "DMA CB handle invalid 0x%llx\n", handle);
 +              rc = -EFAULT;
 +              goto out;
 +      }
 +
 +      /*
 +       * The check that parser->user_cb_size <= parser->user_cb->size was done
 +       * in validate_queue_index().
 +       */
 +      memcpy(parser->patched_cb->kernel_address,
 +              parser->user_cb->kernel_address,
 +              parser->user_cb_size);
 +
 +      patched_cb_size = parser->patched_cb_size;
 +
 +      /* validate patched CB instead of user CB */
 +      user_cb = parser->user_cb;
 +      parser->user_cb = parser->patched_cb;
 +      rc = goya_validate_cb(hdev, parser, true);
 +      parser->user_cb = user_cb;
 +
 +      if (rc) {
 +              hl_cb_put(parser->patched_cb);
 +              goto out;
 +      }
 +
 +      if (patched_cb_size != parser->patched_cb_size) {
 +              dev_err(hdev->dev, "user CB size mismatch\n");
 +              hl_cb_put(parser->patched_cb);
 +              rc = -EINVAL;
 +              goto out;
 +      }
 +
 +out:
 +      /*
 +       * Always call cb destroy here because we still have 1 reference
 +       * to it by calling cb_get earlier. After the job will be completed,
 +       * cb_put will release it, but here we want to remove it from the
 +       * idr
 +       */
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, handle);
 +
 +      return rc;
 +}
 +
 +static int goya_parse_cb_no_mmu(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser)
 +{
 +      u64 handle;
 +      int rc;
 +
 +      rc = goya_validate_cb(hdev, parser, false);
 +
 +      if (rc)
 +              goto free_userptr;
 +
 +      rc = hl_cb_create(hdev, &hdev->kernel_mem_mgr, hdev->kernel_ctx,
 +                              parser->patched_cb_size, false, false,
 +                              &handle);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate patched CB for DMA CS %d\n", rc);
 +              goto free_userptr;
 +      }
 +
 +      parser->patched_cb = hl_cb_get(&hdev->kernel_mem_mgr, handle);
 +      /* hl_cb_get should never fail here */
 +      if (!parser->patched_cb) {
 +              dev_crit(hdev->dev, "DMA CB handle invalid 0x%llx\n", handle);
 +              rc = -EFAULT;
 +              goto out;
 +      }
 +
 +      rc = goya_patch_cb(hdev, parser);
 +
 +      if (rc)
 +              hl_cb_put(parser->patched_cb);
 +
 +out:
 +      /*
 +       * Always call cb destroy here because we still have 1 reference
 +       * to it by calling cb_get earlier. After the job will be completed,
 +       * cb_put will release it, but here we want to remove it from the
 +       * idr
 +       */
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, handle);
 +
 +free_userptr:
 +      if (rc)
 +              hl_userptr_delete_list(hdev, parser->job_userptr_list);
 +      return rc;
 +}
 +
 +static int goya_parse_cb_no_ext_queue(struct hl_device *hdev,
 +                                      struct hl_cs_parser *parser)
 +{
 +      struct asic_fixed_properties *asic_prop = &hdev->asic_prop;
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_MMU)
 +              return 0;
 +
 +      /* For internal queue jobs, just check if CB address is valid */
 +      if (hl_mem_area_inside_range(
 +                      (u64) (uintptr_t) parser->user_cb,
 +                      parser->user_cb_size,
 +                      asic_prop->sram_user_base_address,
 +                      asic_prop->sram_end_address))
 +              return 0;
 +
 +      if (hl_mem_area_inside_range(
 +                      (u64) (uintptr_t) parser->user_cb,
 +                      parser->user_cb_size,
 +                      asic_prop->dram_user_base_address,
 +                      asic_prop->dram_end_address))
 +              return 0;
 +
 +      dev_err(hdev->dev,
 +              "Internal CB address 0x%px + 0x%x is not in SRAM nor in DRAM\n",
 +              parser->user_cb, parser->user_cb_size);
 +
 +      return -EFAULT;
 +}
 +
 +int goya_cs_parser(struct hl_device *hdev, struct hl_cs_parser *parser)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (parser->queue_type == QUEUE_TYPE_INT)
 +              return goya_parse_cb_no_ext_queue(hdev, parser);
 +
 +      if (goya->hw_cap_initialized & HW_CAP_MMU)
 +              return goya_parse_cb_mmu(hdev, parser);
 +      else
 +              return goya_parse_cb_no_mmu(hdev, parser);
 +}
 +
 +void goya_add_end_of_cb_packets(struct hl_device *hdev, void *kernel_address,
 +                              u32 len, u32 original_len, u64 cq_addr, u32 cq_val,
 +                              u32 msix_vec, bool eb)
 +{
 +      struct packet_msg_prot *cq_pkt;
 +      u32 tmp;
 +
 +      cq_pkt = kernel_address + len - (sizeof(struct packet_msg_prot) * 2);
 +
 +      tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
 +                      (1 << GOYA_PKT_CTL_EB_SHIFT) |
 +                      (1 << GOYA_PKT_CTL_MB_SHIFT);
 +      cq_pkt->ctl = cpu_to_le32(tmp);
 +      cq_pkt->value = cpu_to_le32(cq_val);
 +      cq_pkt->addr = cpu_to_le64(cq_addr);
 +
 +      cq_pkt++;
 +
 +      tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
 +                      (1 << GOYA_PKT_CTL_MB_SHIFT);
 +      cq_pkt->ctl = cpu_to_le32(tmp);
 +      cq_pkt->value = cpu_to_le32(msix_vec & 0x7FF);
 +      cq_pkt->addr = cpu_to_le64(CFG_BASE + mmPCIE_DBI_MSIX_DOORBELL_OFF);
 +}
 +
 +void goya_update_eq_ci(struct hl_device *hdev, u32 val)
 +{
 +      WREG32(mmCPU_EQ_CI, val);
 +}
 +
 +void goya_restore_phase_topology(struct hl_device *hdev)
 +{
 +
 +}
 +
 +static void goya_clear_sm_regs(struct hl_device *hdev)
 +{
 +      int i, num_of_sob_in_longs, num_of_mon_in_longs;
 +
 +      num_of_sob_in_longs =
 +              ((mmSYNC_MNGR_SOB_OBJ_1023 - mmSYNC_MNGR_SOB_OBJ_0) + 4);
 +
 +      num_of_mon_in_longs =
 +              ((mmSYNC_MNGR_MON_STATUS_255 - mmSYNC_MNGR_MON_STATUS_0) + 4);
 +
 +      for (i = 0 ; i < num_of_sob_in_longs ; i += 4)
 +              WREG32(mmSYNC_MNGR_SOB_OBJ_0 + i, 0);
 +
 +      for (i = 0 ; i < num_of_mon_in_longs ; i += 4)
 +              WREG32(mmSYNC_MNGR_MON_STATUS_0 + i, 0);
 +
 +      /* Flush all WREG to prevent race */
 +      i = RREG32(mmSYNC_MNGR_SOB_OBJ_0);
 +}
 +
 +static int goya_debugfs_read_dma(struct hl_device *hdev, u64 addr, u32 size, void *blob_addr)
 +{
 +      dev_err(hdev->dev, "Reading via DMA is unimplemented yet\n");
 +      return -EPERM;
 +}
 +
 +static u64 goya_read_pte(struct hl_device *hdev, u64 addr)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (hdev->reset_info.hard_reset_pending)
 +              return U64_MAX;
 +
 +      return readq(hdev->pcie_bar[DDR_BAR_ID] +
 +                      (addr - goya->ddr_bar_cur_addr));
 +}
 +
 +static void goya_write_pte(struct hl_device *hdev, u64 addr, u64 val)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (hdev->reset_info.hard_reset_pending)
 +              return;
 +
 +      writeq(val, hdev->pcie_bar[DDR_BAR_ID] +
 +                      (addr - goya->ddr_bar_cur_addr));
 +}
 +
 +static const char *_goya_get_event_desc(u16 event_type)
 +{
 +      switch (event_type) {
 +      case GOYA_ASYNC_EVENT_ID_PCIE_IF:
 +              return "PCIe_if";
 +      case GOYA_ASYNC_EVENT_ID_TPC0_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_ECC:
 +              return "TPC%d_ecc";
 +      case GOYA_ASYNC_EVENT_ID_MME_ECC:
 +              return "MME_ecc";
 +      case GOYA_ASYNC_EVENT_ID_MME_ECC_EXT:
 +              return "MME_ecc_ext";
 +      case GOYA_ASYNC_EVENT_ID_MMU_ECC:
 +              return "MMU_ecc";
 +      case GOYA_ASYNC_EVENT_ID_DMA_MACRO:
 +              return "DMA_macro";
 +      case GOYA_ASYNC_EVENT_ID_DMA_ECC:
 +              return "DMA_ecc";
 +      case GOYA_ASYNC_EVENT_ID_CPU_IF_ECC:
 +              return "CPU_if_ecc";
 +      case GOYA_ASYNC_EVENT_ID_PSOC_MEM:
 +              return "PSOC_mem";
 +      case GOYA_ASYNC_EVENT_ID_PSOC_CORESIGHT:
 +              return "PSOC_coresight";
 +      case GOYA_ASYNC_EVENT_ID_SRAM0 ... GOYA_ASYNC_EVENT_ID_SRAM29:
 +              return "SRAM%d";
 +      case GOYA_ASYNC_EVENT_ID_GIC500:
 +              return "GIC500";
 +      case GOYA_ASYNC_EVENT_ID_PLL0 ... GOYA_ASYNC_EVENT_ID_PLL6:
 +              return "PLL%d";
 +      case GOYA_ASYNC_EVENT_ID_AXI_ECC:
 +              return "AXI_ecc";
 +      case GOYA_ASYNC_EVENT_ID_L2_RAM_ECC:
 +              return "L2_ram_ecc";
 +      case GOYA_ASYNC_EVENT_ID_PSOC_GPIO_05_SW_RESET:
 +              return "PSOC_gpio_05_sw_reset";
 +      case GOYA_ASYNC_EVENT_ID_PSOC_GPIO_10_VRHOT_ICRIT:
 +              return "PSOC_gpio_10_vrhot_icrit";
 +      case GOYA_ASYNC_EVENT_ID_PCIE_DEC:
 +              return "PCIe_dec";
 +      case GOYA_ASYNC_EVENT_ID_TPC0_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_DEC:
 +              return "TPC%d_dec";
 +      case GOYA_ASYNC_EVENT_ID_MME_WACS:
 +              return "MME_wacs";
 +      case GOYA_ASYNC_EVENT_ID_MME_WACSD:
 +              return "MME_wacsd";
 +      case GOYA_ASYNC_EVENT_ID_CPU_AXI_SPLITTER:
 +              return "CPU_axi_splitter";
 +      case GOYA_ASYNC_EVENT_ID_PSOC_AXI_DEC:
 +              return "PSOC_axi_dec";
 +      case GOYA_ASYNC_EVENT_ID_PSOC:
 +              return "PSOC";
 +      case GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_KRN_ERR:
 +              return "TPC%d_krn_err";
 +      case GOYA_ASYNC_EVENT_ID_TPC0_CMDQ ... GOYA_ASYNC_EVENT_ID_TPC7_CMDQ:
 +              return "TPC%d_cq";
 +      case GOYA_ASYNC_EVENT_ID_TPC0_QM ... GOYA_ASYNC_EVENT_ID_TPC7_QM:
 +              return "TPC%d_qm";
 +      case GOYA_ASYNC_EVENT_ID_MME_QM:
 +              return "MME_qm";
 +      case GOYA_ASYNC_EVENT_ID_MME_CMDQ:
 +              return "MME_cq";
 +      case GOYA_ASYNC_EVENT_ID_DMA0_QM ... GOYA_ASYNC_EVENT_ID_DMA4_QM:
 +              return "DMA%d_qm";
 +      case GOYA_ASYNC_EVENT_ID_DMA0_CH ... GOYA_ASYNC_EVENT_ID_DMA4_CH:
 +              return "DMA%d_ch";
 +      case GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_BMON_SPMU:
 +              return "TPC%d_bmon_spmu";
 +      case GOYA_ASYNC_EVENT_ID_DMA_BM_CH0 ... GOYA_ASYNC_EVENT_ID_DMA_BM_CH4:
 +              return "DMA_bm_ch%d";
 +      case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_S:
 +              return "POWER_ENV_S";
 +      case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_E:
 +              return "POWER_ENV_E";
 +      case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_S:
 +              return "THERMAL_ENV_S";
 +      case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_E:
 +              return "THERMAL_ENV_E";
 +      case GOYA_ASYNC_EVENT_PKT_QUEUE_OUT_SYNC:
 +              return "QUEUE_OUT_OF_SYNC";
 +      default:
 +              return "N/A";
 +      }
 +}
 +
 +static void goya_get_event_desc(u16 event_type, char *desc, size_t size)
 +{
 +      u8 index;
 +
 +      switch (event_type) {
 +      case GOYA_ASYNC_EVENT_ID_TPC0_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_ECC:
 +              index = (event_type - GOYA_ASYNC_EVENT_ID_TPC0_ECC) / 3;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_SRAM0 ... GOYA_ASYNC_EVENT_ID_SRAM29:
 +              index = event_type - GOYA_ASYNC_EVENT_ID_SRAM0;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_PLL0 ... GOYA_ASYNC_EVENT_ID_PLL6:
 +              index = event_type - GOYA_ASYNC_EVENT_ID_PLL0;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_TPC0_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_DEC:
 +              index = (event_type - GOYA_ASYNC_EVENT_ID_TPC0_DEC) / 3;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_KRN_ERR:
 +              index = (event_type - GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR) / 10;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_TPC0_CMDQ ... GOYA_ASYNC_EVENT_ID_TPC7_CMDQ:
 +              index = event_type - GOYA_ASYNC_EVENT_ID_TPC0_CMDQ;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_TPC0_QM ... GOYA_ASYNC_EVENT_ID_TPC7_QM:
 +              index = event_type - GOYA_ASYNC_EVENT_ID_TPC0_QM;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_DMA0_QM ... GOYA_ASYNC_EVENT_ID_DMA4_QM:
 +              index = event_type - GOYA_ASYNC_EVENT_ID_DMA0_QM;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_DMA0_CH ... GOYA_ASYNC_EVENT_ID_DMA4_CH:
 +              index = event_type - GOYA_ASYNC_EVENT_ID_DMA0_CH;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_BMON_SPMU:
 +              index = (event_type - GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU) / 10;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_DMA_BM_CH0 ... GOYA_ASYNC_EVENT_ID_DMA_BM_CH4:
 +              index = event_type - GOYA_ASYNC_EVENT_ID_DMA_BM_CH0;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_PKT_QUEUE_OUT_SYNC:
 +              snprintf(desc, size, _goya_get_event_desc(event_type));
 +              break;
 +      default:
 +              snprintf(desc, size, _goya_get_event_desc(event_type));
 +              break;
 +      }
 +}
 +
 +static void goya_print_razwi_info(struct hl_device *hdev)
 +{
 +      if (RREG32(mmDMA_MACRO_RAZWI_LBW_WT_VLD)) {
 +              dev_err_ratelimited(hdev->dev, "Illegal write to LBW\n");
 +              WREG32(mmDMA_MACRO_RAZWI_LBW_WT_VLD, 0);
 +      }
 +
 +      if (RREG32(mmDMA_MACRO_RAZWI_LBW_RD_VLD)) {
 +              dev_err_ratelimited(hdev->dev, "Illegal read from LBW\n");
 +              WREG32(mmDMA_MACRO_RAZWI_LBW_RD_VLD, 0);
 +      }
 +
 +      if (RREG32(mmDMA_MACRO_RAZWI_HBW_WT_VLD)) {
 +              dev_err_ratelimited(hdev->dev, "Illegal write to HBW\n");
 +              WREG32(mmDMA_MACRO_RAZWI_HBW_WT_VLD, 0);
 +      }
 +
 +      if (RREG32(mmDMA_MACRO_RAZWI_HBW_RD_VLD)) {
 +              dev_err_ratelimited(hdev->dev, "Illegal read from HBW\n");
 +              WREG32(mmDMA_MACRO_RAZWI_HBW_RD_VLD, 0);
 +      }
 +}
 +
 +static void goya_print_mmu_error_info(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u64 addr;
 +      u32 val;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU))
 +              return;
 +
 +      val = RREG32(mmMMU_PAGE_ERROR_CAPTURE);
 +      if (val & MMU_PAGE_ERROR_CAPTURE_ENTRY_VALID_MASK) {
 +              addr = val & MMU_PAGE_ERROR_CAPTURE_VA_49_32_MASK;
 +              addr <<= 32;
 +              addr |= RREG32(mmMMU_PAGE_ERROR_CAPTURE_VA);
 +
 +              dev_err_ratelimited(hdev->dev, "MMU page fault on va 0x%llx\n",
 +                                      addr);
 +
 +              WREG32(mmMMU_PAGE_ERROR_CAPTURE, 0);
 +      }
 +}
 +
 +static void goya_print_out_of_sync_info(struct hl_device *hdev,
 +                                      struct cpucp_pkt_sync_err *sync_err)
 +{
 +      struct hl_hw_queue *q = &hdev->kernel_queues[GOYA_QUEUE_ID_CPU_PQ];
 +
 +      dev_err(hdev->dev, "Out of sync with FW, FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n",
 +              le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci));
 +}
 +
 +static void goya_print_irq_info(struct hl_device *hdev, u16 event_type,
 +                              bool razwi)
 +{
 +      char desc[20] = "";
 +
 +      goya_get_event_desc(event_type, desc, sizeof(desc));
 +      dev_err_ratelimited(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n",
 +              event_type, desc);
 +
 +      if (razwi) {
 +              goya_print_razwi_info(hdev);
 +              goya_print_mmu_error_info(hdev);
 +      }
 +}
 +
 +static int goya_unmask_irq_arr(struct hl_device *hdev, u32 *irq_arr,
 +              size_t irq_arr_size)
 +{
 +      struct cpucp_unmask_irq_arr_packet *pkt;
 +      size_t total_pkt_size;
 +      u64 result;
 +      int rc;
 +      int irq_num_entries, irq_arr_index;
 +      __le32 *goya_irq_arr;
 +
 +      total_pkt_size = sizeof(struct cpucp_unmask_irq_arr_packet) +
 +                      irq_arr_size;
 +
 +      /* data should be aligned to 8 bytes in order to CPU-CP to copy it */
 +      total_pkt_size = (total_pkt_size + 0x7) & ~0x7;
 +
 +      /* total_pkt_size is casted to u16 later on */
 +      if (total_pkt_size > USHRT_MAX) {
 +              dev_err(hdev->dev, "too many elements in IRQ array\n");
 +              return -EINVAL;
 +      }
 +
 +      pkt = kzalloc(total_pkt_size, GFP_KERNEL);
 +      if (!pkt)
 +              return -ENOMEM;
 +
 +      irq_num_entries = irq_arr_size / sizeof(irq_arr[0]);
 +      pkt->length = cpu_to_le32(irq_num_entries);
 +
 +      /* We must perform any necessary endianness conversation on the irq
 +       * array being passed to the goya hardware
 +       */
 +      for (irq_arr_index = 0, goya_irq_arr = (__le32 *) &pkt->irqs;
 +                      irq_arr_index < irq_num_entries ; irq_arr_index++)
 +              goya_irq_arr[irq_arr_index] =
 +                              cpu_to_le32(irq_arr[irq_arr_index]);
 +
 +      pkt->cpucp_pkt.ctl = cpu_to_le32(CPUCP_PACKET_UNMASK_RAZWI_IRQ_ARRAY <<
 +                                              CPUCP_PKT_CTL_OPCODE_SHIFT);
 +
 +      rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) pkt,
 +                                              total_pkt_size, 0, &result);
 +
 +      if (rc)
 +              dev_err(hdev->dev, "failed to unmask IRQ array\n");
 +
 +      kfree(pkt);
 +
 +      return rc;
 +}
 +
 +static int goya_compute_reset_late_init(struct hl_device *hdev)
 +{
 +      /*
 +       * Unmask all IRQs since some could have been received
 +       * during the soft reset
 +       */
 +      return goya_unmask_irq_arr(hdev, goya_all_events,
 +                                      sizeof(goya_all_events));
 +}
 +
 +static int goya_unmask_irq(struct hl_device *hdev, u16 event_type)
 +{
 +      struct cpucp_packet pkt;
 +      u64 result;
 +      int rc;
 +
 +      memset(&pkt, 0, sizeof(pkt));
 +
 +      pkt.ctl = cpu_to_le32(CPUCP_PACKET_UNMASK_RAZWI_IRQ <<
 +                              CPUCP_PKT_CTL_OPCODE_SHIFT);
 +      pkt.value = cpu_to_le64(event_type);
 +
 +      rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 +                                              0, &result);
 +
 +      if (rc)
 +              dev_err(hdev->dev, "failed to unmask RAZWI IRQ %d", event_type);
 +
 +      return rc;
 +}
 +
 +static void goya_print_clk_change_info(struct hl_device *hdev, u16 event_type)
 +{
 +      ktime_t zero_time = ktime_set(0, 0);
 +
 +      mutex_lock(&hdev->clk_throttling.lock);
 +
 +      switch (event_type) {
 +      case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_S:
 +              hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].start = ktime_get();
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = zero_time;
 +              dev_info_ratelimited(hdev->dev,
 +                      "Clock throttling due to power consumption\n");
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_E:
 +              hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = ktime_get();
 +              dev_info_ratelimited(hdev->dev,
 +                      "Power envelop is safe, back to optimal clock\n");
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_S:
 +              hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].start = ktime_get();
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = zero_time;
 +              dev_info_ratelimited(hdev->dev,
 +                      "Clock throttling due to overheating\n");
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_E:
 +              hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = ktime_get();
 +              dev_info_ratelimited(hdev->dev,
 +                      "Thermal envelop is safe, back to optimal clock\n");
 +              break;
 +
 +      default:
 +              dev_err(hdev->dev, "Received invalid clock change event %d\n",
 +                      event_type);
 +              break;
 +      }
 +
 +      mutex_unlock(&hdev->clk_throttling.lock);
 +}
 +
 +void goya_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entry)
 +{
 +      u32 ctl = le32_to_cpu(eq_entry->hdr.ctl);
 +      u16 event_type = ((ctl & EQ_CTL_EVENT_TYPE_MASK)
 +                              >> EQ_CTL_EVENT_TYPE_SHIFT);
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (event_type >= GOYA_ASYNC_EVENT_ID_SIZE) {
 +              dev_err(hdev->dev, "Event type %u exceeds maximum of %u",
 +                              event_type, GOYA_ASYNC_EVENT_ID_SIZE - 1);
 +              return;
 +      }
 +
 +      goya->events_stat[event_type]++;
 +      goya->events_stat_aggregate[event_type]++;
 +
 +      switch (event_type) {
 +      case GOYA_ASYNC_EVENT_ID_PCIE_IF:
 +      case GOYA_ASYNC_EVENT_ID_TPC0_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_ECC:
 +      case GOYA_ASYNC_EVENT_ID_MME_ECC:
 +      case GOYA_ASYNC_EVENT_ID_MME_ECC_EXT:
 +      case GOYA_ASYNC_EVENT_ID_MMU_ECC:
 +      case GOYA_ASYNC_EVENT_ID_DMA_MACRO:
 +      case GOYA_ASYNC_EVENT_ID_DMA_ECC:
 +      case GOYA_ASYNC_EVENT_ID_CPU_IF_ECC:
 +      case GOYA_ASYNC_EVENT_ID_PSOC_MEM:
 +      case GOYA_ASYNC_EVENT_ID_PSOC_CORESIGHT:
 +      case GOYA_ASYNC_EVENT_ID_SRAM0 ... GOYA_ASYNC_EVENT_ID_SRAM29:
 +      case GOYA_ASYNC_EVENT_ID_GIC500:
 +      case GOYA_ASYNC_EVENT_ID_PLL0 ... GOYA_ASYNC_EVENT_ID_PLL6:
 +      case GOYA_ASYNC_EVENT_ID_AXI_ECC:
 +      case GOYA_ASYNC_EVENT_ID_L2_RAM_ECC:
 +              goya_print_irq_info(hdev, event_type, false);
 +              if (hdev->hard_reset_on_fw_events)
 +                      hl_device_reset(hdev, (HL_DRV_RESET_HARD |
 +                                              HL_DRV_RESET_FW_FATAL_ERR));
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_ID_PSOC_GPIO_05_SW_RESET:
 +              goya_print_irq_info(hdev, event_type, false);
 +              if (hdev->hard_reset_on_fw_events)
 +                      hl_device_reset(hdev, HL_DRV_RESET_HARD);
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_ID_PCIE_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC0_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_DEC:
 +      case GOYA_ASYNC_EVENT_ID_MME_WACS:
 +      case GOYA_ASYNC_EVENT_ID_MME_WACSD:
 +      case GOYA_ASYNC_EVENT_ID_CPU_AXI_SPLITTER:
 +      case GOYA_ASYNC_EVENT_ID_PSOC_AXI_DEC:
 +      case GOYA_ASYNC_EVENT_ID_PSOC:
 +      case GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC0_CMDQ ... GOYA_ASYNC_EVENT_ID_TPC7_QM:
 +      case GOYA_ASYNC_EVENT_ID_MME_QM:
 +      case GOYA_ASYNC_EVENT_ID_MME_CMDQ:
 +      case GOYA_ASYNC_EVENT_ID_DMA0_QM ... GOYA_ASYNC_EVENT_ID_DMA4_QM:
 +      case GOYA_ASYNC_EVENT_ID_DMA0_CH ... GOYA_ASYNC_EVENT_ID_DMA4_CH:
 +              goya_print_irq_info(hdev, event_type, true);
 +              goya_unmask_irq(hdev, event_type);
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_ID_PSOC_GPIO_10_VRHOT_ICRIT:
 +      case GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_DMA_BM_CH0 ... GOYA_ASYNC_EVENT_ID_DMA_BM_CH4:
 +              goya_print_irq_info(hdev, event_type, false);
 +              goya_unmask_irq(hdev, event_type);
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_S:
 +      case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_E:
 +      case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_S:
 +      case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_E:
 +              goya_print_clk_change_info(hdev, event_type);
 +              goya_unmask_irq(hdev, event_type);
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_PKT_QUEUE_OUT_SYNC:
 +              goya_print_irq_info(hdev, event_type, false);
 +              goya_print_out_of_sync_info(hdev, &eq_entry->pkt_sync_err);
 +              if (hdev->hard_reset_on_fw_events)
 +                      hl_device_reset(hdev, HL_DRV_RESET_HARD);
 +              else
 +                      hl_fw_unmask_irq(hdev, event_type);
 +              break;
 +
 +      default:
 +              dev_err(hdev->dev, "Received invalid H/W interrupt %d\n",
 +                              event_type);
 +              break;
 +      }
 +}
 +
 +void *goya_get_events_stat(struct hl_device *hdev, bool aggregate, u32 *size)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (aggregate) {
 +              *size = (u32) sizeof(goya->events_stat_aggregate);
 +              return goya->events_stat_aggregate;
 +      }
 +
 +      *size = (u32) sizeof(goya->events_stat);
 +      return goya->events_stat;
 +}
 +
 +static int goya_memset_device_memory(struct hl_device *hdev, u64 addr, u64 size,
 +                              u64 val, bool is_dram)
 +{
 +      struct packet_lin_dma *lin_dma_pkt;
 +      struct hl_cs_job *job;
 +      u32 cb_size, ctl;
 +      struct hl_cb *cb;
 +      int rc, lin_dma_pkts_cnt;
 +
 +      lin_dma_pkts_cnt = DIV_ROUND_UP_ULL(size, SZ_2G);
 +      cb_size = lin_dma_pkts_cnt * sizeof(struct packet_lin_dma) +
 +                                              sizeof(struct packet_msg_prot);
 +      cb = hl_cb_kernel_create(hdev, cb_size, false);
 +      if (!cb)
 +              return -ENOMEM;
 +
 +      lin_dma_pkt = cb->kernel_address;
 +
 +      do {
 +              memset(lin_dma_pkt, 0, sizeof(*lin_dma_pkt));
 +
 +              ctl = ((PACKET_LIN_DMA << GOYA_PKT_CTL_OPCODE_SHIFT) |
 +                              (1 << GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT) |
 +                              (1 << GOYA_PKT_LIN_DMA_CTL_WO_SHIFT) |
 +                              (1 << GOYA_PKT_CTL_RB_SHIFT) |
 +                              (1 << GOYA_PKT_CTL_MB_SHIFT));
 +              ctl |= (is_dram ? HL_DMA_HOST_TO_DRAM : HL_DMA_HOST_TO_SRAM) <<
 +                              GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 +              lin_dma_pkt->ctl = cpu_to_le32(ctl);
 +
 +              lin_dma_pkt->src_addr = cpu_to_le64(val);
 +              lin_dma_pkt->dst_addr = cpu_to_le64(addr);
 +              if (lin_dma_pkts_cnt > 1)
 +                      lin_dma_pkt->tsize = cpu_to_le32(SZ_2G);
 +              else
 +                      lin_dma_pkt->tsize = cpu_to_le32(size);
 +
 +              size -= SZ_2G;
 +              addr += SZ_2G;
 +              lin_dma_pkt++;
 +      } while (--lin_dma_pkts_cnt);
 +
 +      job = hl_cs_allocate_job(hdev, QUEUE_TYPE_EXT, true);
 +      if (!job) {
 +              dev_err(hdev->dev, "Failed to allocate a new job\n");
 +              rc = -ENOMEM;
 +              goto release_cb;
 +      }
 +
 +      job->id = 0;
 +      job->user_cb = cb;
 +      atomic_inc(&job->user_cb->cs_cnt);
 +      job->user_cb_size = cb_size;
 +      job->hw_queue_id = GOYA_QUEUE_ID_DMA_0;
 +      job->patched_cb = job->user_cb;
 +      job->job_cb_size = job->user_cb_size;
 +
 +      hl_debugfs_add_job(hdev, job);
 +
 +      rc = goya_send_job_on_qman0(hdev, job);
 +
 +      hl_debugfs_remove_job(hdev, job);
 +      kfree(job);
 +      atomic_dec(&cb->cs_cnt);
 +
 +release_cb:
 +      hl_cb_put(cb);
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
 +
 +      return rc;
 +}
 +
 +int goya_context_switch(struct hl_device *hdev, u32 asid)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 addr = prop->sram_base_address, sob_addr;
 +      u32 size = hdev->pldm ? 0x10000 : prop->sram_size;
 +      u64 val = 0x7777777777777777ull;
 +      int rc, dma_id;
 +      u32 channel_off = mmDMA_CH_1_WR_COMP_ADDR_LO -
 +                                      mmDMA_CH_0_WR_COMP_ADDR_LO;
 +
 +      rc = goya_memset_device_memory(hdev, addr, size, val, false);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to clear SRAM in context switch\n");
 +              return rc;
 +      }
 +
 +      /* we need to reset registers that the user is allowed to change */
 +      sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1007;
 +      WREG32(mmDMA_CH_0_WR_COMP_ADDR_LO, lower_32_bits(sob_addr));
 +
 +      for (dma_id = 1 ; dma_id < NUMBER_OF_EXT_HW_QUEUES ; dma_id++) {
 +              sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1000 +
 +                                                      (dma_id - 1) * 4;
 +              WREG32(mmDMA_CH_0_WR_COMP_ADDR_LO + channel_off * dma_id,
 +                                              lower_32_bits(sob_addr));
 +      }
 +
 +      WREG32(mmTPC_PLL_CLK_RLX_0, 0x200020);
 +
 +      goya_clear_sm_regs(hdev);
 +
 +      return 0;
 +}
 +
 +static int goya_mmu_clear_pgt_range(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct goya_device *goya = hdev->asic_specific;
 +      u64 addr = prop->mmu_pgt_addr;
 +      u32 size = prop->mmu_pgt_size + MMU_DRAM_DEFAULT_PAGE_SIZE +
 +                      MMU_CACHE_MNG_SIZE;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU))
 +              return 0;
 +
 +      return goya_memset_device_memory(hdev, addr, size, 0, true);
 +}
 +
 +static int goya_mmu_set_dram_default_page(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u64 addr = hdev->asic_prop.mmu_dram_default_page_addr;
 +      u32 size = MMU_DRAM_DEFAULT_PAGE_SIZE;
 +      u64 val = 0x9999999999999999ull;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU))
 +              return 0;
 +
 +      return goya_memset_device_memory(hdev, addr, size, val, true);
 +}
 +
 +static int goya_mmu_add_mappings_for_device_cpu(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct goya_device *goya = hdev->asic_specific;
 +      s64 off, cpu_off;
 +      int rc;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU))
 +              return 0;
 +
 +      for (off = 0 ; off < CPU_FW_IMAGE_SIZE ; off += PAGE_SIZE_2MB) {
 +              rc = hl_mmu_map_page(hdev->kernel_ctx,
 +                      prop->dram_base_address + off,
 +                      prop->dram_base_address + off, PAGE_SIZE_2MB,
 +                      (off + PAGE_SIZE_2MB) == CPU_FW_IMAGE_SIZE);
 +              if (rc) {
 +                      dev_err(hdev->dev, "Map failed for address 0x%llx\n",
 +                              prop->dram_base_address + off);
 +                      goto unmap;
 +              }
 +      }
 +
 +      if (!(hdev->cpu_accessible_dma_address & (PAGE_SIZE_2MB - 1))) {
 +              rc = hl_mmu_map_page(hdev->kernel_ctx,
 +                      VA_CPU_ACCESSIBLE_MEM_ADDR,
 +                      hdev->cpu_accessible_dma_address,
 +                      PAGE_SIZE_2MB, true);
 +
 +              if (rc) {
 +                      dev_err(hdev->dev,
 +                              "Map failed for CPU accessible memory\n");
 +                      off -= PAGE_SIZE_2MB;
 +                      goto unmap;
 +              }
 +      } else {
 +              for (cpu_off = 0 ; cpu_off < SZ_2M ; cpu_off += PAGE_SIZE_4KB) {
 +                      rc = hl_mmu_map_page(hdev->kernel_ctx,
 +                              VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off,
 +                              hdev->cpu_accessible_dma_address + cpu_off,
 +                              PAGE_SIZE_4KB, true);
 +                      if (rc) {
 +                              dev_err(hdev->dev,
 +                                      "Map failed for CPU accessible memory\n");
 +                              cpu_off -= PAGE_SIZE_4KB;
 +                              goto unmap_cpu;
 +                      }
 +              }
 +      }
 +
 +      goya_mmu_prepare_reg(hdev, mmCPU_IF_ARUSER_OVR, HL_KERNEL_ASID_ID);
 +      goya_mmu_prepare_reg(hdev, mmCPU_IF_AWUSER_OVR, HL_KERNEL_ASID_ID);
 +      WREG32(mmCPU_IF_ARUSER_OVR_EN, 0x7FF);
 +      WREG32(mmCPU_IF_AWUSER_OVR_EN, 0x7FF);
 +
 +      /* Make sure configuration is flushed to device */
 +      RREG32(mmCPU_IF_AWUSER_OVR_EN);
 +
 +      goya->device_cpu_mmu_mappings_done = true;
 +
 +      return 0;
 +
 +unmap_cpu:
 +      for (; cpu_off >= 0 ; cpu_off -= PAGE_SIZE_4KB)
 +              if (hl_mmu_unmap_page(hdev->kernel_ctx,
 +                              VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off,
 +                              PAGE_SIZE_4KB, true))
 +                      dev_warn_ratelimited(hdev->dev,
 +                              "failed to unmap address 0x%llx\n",
 +                              VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off);
 +unmap:
 +      for (; off >= 0 ; off -= PAGE_SIZE_2MB)
 +              if (hl_mmu_unmap_page(hdev->kernel_ctx,
 +                              prop->dram_base_address + off, PAGE_SIZE_2MB,
 +                              true))
 +                      dev_warn_ratelimited(hdev->dev,
 +                              "failed to unmap address 0x%llx\n",
 +                              prop->dram_base_address + off);
 +
 +      return rc;
 +}
 +
 +void goya_mmu_remove_device_cpu_mappings(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 off, cpu_off;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU))
 +              return;
 +
 +      if (!goya->device_cpu_mmu_mappings_done)
 +              return;
 +
 +      WREG32(mmCPU_IF_ARUSER_OVR_EN, 0);
 +      WREG32(mmCPU_IF_AWUSER_OVR_EN, 0);
 +
 +      if (!(hdev->cpu_accessible_dma_address & (PAGE_SIZE_2MB - 1))) {
 +              if (hl_mmu_unmap_page(hdev->kernel_ctx,
 +                              VA_CPU_ACCESSIBLE_MEM_ADDR,
 +                              PAGE_SIZE_2MB, true))
 +                      dev_warn(hdev->dev,
 +                              "Failed to unmap CPU accessible memory\n");
 +      } else {
 +              for (cpu_off = 0 ; cpu_off < SZ_2M ; cpu_off += PAGE_SIZE_4KB)
 +                      if (hl_mmu_unmap_page(hdev->kernel_ctx,
 +                                      VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off,
 +                                      PAGE_SIZE_4KB,
 +                                      (cpu_off + PAGE_SIZE_4KB) >= SZ_2M))
 +                              dev_warn_ratelimited(hdev->dev,
 +                                      "failed to unmap address 0x%llx\n",
 +                                      VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off);
 +      }
 +
 +      for (off = 0 ; off < CPU_FW_IMAGE_SIZE ; off += PAGE_SIZE_2MB)
 +              if (hl_mmu_unmap_page(hdev->kernel_ctx,
 +                              prop->dram_base_address + off, PAGE_SIZE_2MB,
 +                              (off + PAGE_SIZE_2MB) >= CPU_FW_IMAGE_SIZE))
 +                      dev_warn_ratelimited(hdev->dev,
 +                                      "Failed to unmap address 0x%llx\n",
 +                                      prop->dram_base_address + off);
 +
 +      goya->device_cpu_mmu_mappings_done = false;
 +}
 +
 +static void goya_mmu_prepare(struct hl_device *hdev, u32 asid)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      int i;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU))
 +              return;
 +
 +      if (asid & ~MME_QM_GLBL_SECURE_PROPS_ASID_MASK) {
 +              dev_crit(hdev->dev, "asid %u is too big\n", asid);
 +              return;
 +      }
 +
 +      /* zero the MMBP and ASID bits and then set the ASID */
 +      for (i = 0 ; i < GOYA_MMU_REGS_NUM ; i++)
 +              goya_mmu_prepare_reg(hdev, goya_mmu_regs[i], asid);
 +}
 +
 +static int goya_mmu_invalidate_cache(struct hl_device *hdev, bool is_hard,
 +                                      u32 flags)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 status, timeout_usec;
 +      int rc;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU) ||
 +              hdev->reset_info.hard_reset_pending)
 +              return 0;
 +
 +      /* no need in L1 only invalidation in Goya */
 +      if (!is_hard)
 +              return 0;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GOYA_PLDM_MMU_TIMEOUT_USEC;
 +      else
 +              timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
 +
 +      /* L0 & L1 invalidation */
 +      WREG32(mmSTLB_INV_ALL_START, 1);
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              mmSTLB_INV_ALL_START,
 +              status,
 +              !status,
 +              1000,
 +              timeout_usec);
 +
 +      return rc;
 +}
 +
 +static int goya_mmu_invalidate_cache_range(struct hl_device *hdev,
 +                                              bool is_hard, u32 flags,
 +                                              u32 asid, u64 va, u64 size)
 +{
 +      /* Treat as invalidate all because there is no range invalidation
 +       * in Goya
 +       */
 +      return hl_mmu_invalidate_cache(hdev, is_hard, flags);
 +}
 +
 +int goya_send_heartbeat(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_send_heartbeat(hdev);
 +}
 +
 +int goya_cpucp_info_get(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 dram_size;
 +      int rc;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      rc = hl_fw_cpucp_handshake(hdev, mmCPU_BOOT_DEV_STS0,
 +                                      mmCPU_BOOT_DEV_STS1, mmCPU_BOOT_ERR0,
 +                                      mmCPU_BOOT_ERR1);
 +      if (rc)
 +              return rc;
 +
 +      dram_size = le64_to_cpu(prop->cpucp_info.dram_size);
 +      if (dram_size) {
 +              if ((!is_power_of_2(dram_size)) ||
 +                              (dram_size < DRAM_PHYS_DEFAULT_SIZE)) {
 +                      dev_err(hdev->dev,
 +                              "F/W reported invalid DRAM size %llu. Trying to use default size\n",
 +                              dram_size);
 +                      dram_size = DRAM_PHYS_DEFAULT_SIZE;
 +              }
 +
 +              prop->dram_size = dram_size;
 +              prop->dram_end_address = prop->dram_base_address + dram_size;
 +      }
 +
 +      if (!strlen(prop->cpucp_info.card_name))
 +              strncpy(prop->cpucp_info.card_name, GOYA_DEFAULT_CARD_NAME,
 +                              CARD_NAME_MAX_LEN);
 +
 +      return 0;
 +}
 +
 +static bool goya_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask_len,
 +                              struct engines_data *e)
 +{
 +      const char *fmt = "%-5d%-9s%#-14x%#-16x%#x\n";
 +      const char *dma_fmt = "%-5d%-9s%#-14x%#x\n";
 +      unsigned long *mask = (unsigned long *)mask_arr;
 +      u32 qm_glbl_sts0, cmdq_glbl_sts0, dma_core_sts0, tpc_cfg_sts,
 +              mme_arch_sts;
 +      bool is_idle = true, is_eng_idle;
 +      u64 offset;
 +      int i;
 +
 +      if (e)
 +              hl_engine_data_sprintf(e, "\nDMA  is_idle  QM_GLBL_STS0  DMA_CORE_STS0\n"
 +                                      "---  -------  ------------  -------------\n");
 +
 +      offset = mmDMA_QM_1_GLBL_STS0 - mmDMA_QM_0_GLBL_STS0;
 +
 +      for (i = 0 ; i < DMA_MAX_NUM ; i++) {
 +              qm_glbl_sts0 = RREG32(mmDMA_QM_0_GLBL_STS0 + i * offset);
 +              dma_core_sts0 = RREG32(mmDMA_CH_0_STS0 + i * offset);
 +              is_eng_idle = IS_DMA_QM_IDLE(qm_glbl_sts0) &&
 +                              IS_DMA_IDLE(dma_core_sts0);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(GOYA_ENGINE_ID_DMA_0 + i, mask);
 +              if (e)
 +                      hl_engine_data_sprintf(e, dma_fmt, i, is_eng_idle ? "Y" : "N",
 +                                      qm_glbl_sts0, dma_core_sts0);
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                      "\nTPC  is_idle  QM_GLBL_STS0  CMDQ_GLBL_STS0  CFG_STATUS\n"
 +                      "---  -------  ------------  --------------  ----------\n");
 +
 +      offset = mmTPC1_QM_GLBL_STS0 - mmTPC0_QM_GLBL_STS0;
 +
 +      for (i = 0 ; i < TPC_MAX_NUM ; i++) {
 +              qm_glbl_sts0 = RREG32(mmTPC0_QM_GLBL_STS0 + i * offset);
 +              cmdq_glbl_sts0 = RREG32(mmTPC0_CMDQ_GLBL_STS0 + i * offset);
 +              tpc_cfg_sts = RREG32(mmTPC0_CFG_STATUS + i * offset);
 +              is_eng_idle = IS_TPC_QM_IDLE(qm_glbl_sts0) &&
 +                              IS_TPC_CMDQ_IDLE(cmdq_glbl_sts0) &&
 +                              IS_TPC_IDLE(tpc_cfg_sts);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(GOYA_ENGINE_ID_TPC_0 + i, mask);
 +              if (e)
 +                      hl_engine_data_sprintf(e, fmt, i, is_eng_idle ? "Y" : "N",
 +                              qm_glbl_sts0, cmdq_glbl_sts0, tpc_cfg_sts);
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                      "\nMME  is_idle  QM_GLBL_STS0  CMDQ_GLBL_STS0  ARCH_STATUS\n"
 +                      "---  -------  ------------  --------------  -----------\n");
 +
 +      qm_glbl_sts0 = RREG32(mmMME_QM_GLBL_STS0);
 +      cmdq_glbl_sts0 = RREG32(mmMME_CMDQ_GLBL_STS0);
 +      mme_arch_sts = RREG32(mmMME_ARCH_STATUS);
 +      is_eng_idle = IS_MME_QM_IDLE(qm_glbl_sts0) &&
 +                      IS_MME_CMDQ_IDLE(cmdq_glbl_sts0) &&
 +                      IS_MME_IDLE(mme_arch_sts);
 +      is_idle &= is_eng_idle;
 +
 +      if (mask && !is_eng_idle)
 +              set_bit(GOYA_ENGINE_ID_MME_0, mask);
 +      if (e) {
 +              hl_engine_data_sprintf(e, fmt, 0, is_eng_idle ? "Y" : "N", qm_glbl_sts0,
 +                              cmdq_glbl_sts0, mme_arch_sts);
 +              hl_engine_data_sprintf(e, "\n");
 +      }
 +
 +      return is_idle;
 +}
 +
 +static void goya_hw_queues_lock(struct hl_device *hdev)
 +      __acquires(&goya->hw_queues_lock)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      spin_lock(&goya->hw_queues_lock);
 +}
 +
 +static void goya_hw_queues_unlock(struct hl_device *hdev)
 +      __releases(&goya->hw_queues_lock)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      spin_unlock(&goya->hw_queues_lock);
 +}
 +
 +static u32 goya_get_pci_id(struct hl_device *hdev)
 +{
 +      return hdev->pdev->device;
 +}
 +
 +static int goya_get_eeprom_data(struct hl_device *hdev, void *data,
 +                              size_t max_size)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_get_eeprom_data(hdev, data, max_size);
 +}
 +
 +static void goya_cpu_init_scrambler_dram(struct hl_device *hdev)
 +{
 +
 +}
 +
 +static int goya_ctx_init(struct hl_ctx *ctx)
 +{
 +      if (ctx->asid != HL_KERNEL_ASID_ID)
 +              goya_mmu_prepare(ctx->hdev, ctx->asid);
 +
 +      return 0;
 +}
 +
 +static int goya_pre_schedule_cs(struct hl_cs *cs)
 +{
 +      return 0;
 +}
 +
 +u32 goya_get_queue_id_for_cq(struct hl_device *hdev, u32 cq_idx)
 +{
 +      return cq_idx;
 +}
 +
 +static u32 goya_get_signal_cb_size(struct hl_device *hdev)
 +{
 +      return 0;
 +}
 +
 +static u32 goya_get_wait_cb_size(struct hl_device *hdev)
 +{
 +      return 0;
 +}
 +
 +static u32 goya_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id,
 +                              u32 size, bool eb)
 +{
 +      return 0;
 +}
 +
 +static u32 goya_gen_wait_cb(struct hl_device *hdev,
 +              struct hl_gen_wait_properties *prop)
 +{
 +      return 0;
 +}
 +
 +static void goya_reset_sob(struct hl_device *hdev, void *data)
 +{
 +
 +}
 +
 +static void goya_reset_sob_group(struct hl_device *hdev, u16 sob_group)
 +{
 +
 +}
 +
 +u64 goya_get_device_time(struct hl_device *hdev)
 +{
 +      u64 device_time = ((u64) RREG32(mmPSOC_TIMESTAMP_CNTCVU)) << 32;
 +
 +      return device_time | RREG32(mmPSOC_TIMESTAMP_CNTCVL);
 +}
 +
 +static int goya_collective_wait_init_cs(struct hl_cs *cs)
 +{
 +      return 0;
 +}
 +
 +static int goya_collective_wait_create_jobs(struct hl_device *hdev,
 +              struct hl_ctx *ctx, struct hl_cs *cs, u32 wait_queue_id,
 +              u32 collective_engine_id, u32 encaps_signal_offset)
 +{
 +      return -EINVAL;
 +}
 +
 +static void goya_ctx_fini(struct hl_ctx *ctx)
 +{
 +
 +}
 +
 +static int goya_get_hw_block_id(struct hl_device *hdev, u64 block_addr,
 +                      u32 *block_size, u32 *block_id)
 +{
 +      return -EPERM;
 +}
 +
 +static int goya_block_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
 +                              u32 block_id, u32 block_size)
 +{
 +      return -EPERM;
 +}
 +
 +static void goya_enable_events_from_fw(struct hl_device *hdev)
 +{
 +      WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
 +                      GOYA_ASYNC_EVENT_ID_INTS_REGISTER);
 +}
 +
 +static int goya_ack_mmu_page_fault_or_access_error(struct hl_device *hdev, u64 mmu_cap_mask)
 +{
 +      return -EINVAL;
 +}
 +
 +static int goya_map_pll_idx_to_fw_idx(u32 pll_idx)
 +{
 +      switch (pll_idx) {
 +      case HL_GOYA_CPU_PLL: return CPU_PLL;
 +      case HL_GOYA_PCI_PLL: return PCI_PLL;
 +      case HL_GOYA_MME_PLL: return MME_PLL;
 +      case HL_GOYA_TPC_PLL: return TPC_PLL;
 +      case HL_GOYA_IC_PLL: return IC_PLL;
 +      case HL_GOYA_MC_PLL: return MC_PLL;
 +      case HL_GOYA_EMMC_PLL: return EMMC_PLL;
 +      default: return -EINVAL;
 +      }
 +}
 +
 +static int goya_gen_sync_to_engine_map(struct hl_device *hdev,
 +                              struct hl_sync_to_engine_map *map)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +static int goya_monitor_valid(struct hl_mon_state_dump *mon)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +static int goya_print_single_monitor(char **buf, size_t *size, size_t *offset,
 +                              struct hl_device *hdev,
 +                              struct hl_mon_state_dump *mon)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +
 +static int goya_print_fences_single_engine(
 +      struct hl_device *hdev, u64 base_offset, u64 status_base_offset,
 +      enum hl_sync_engine_type engine_type, u32 engine_id, char **buf,
 +      size_t *size, size_t *offset)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +
 +static struct hl_state_dump_specs_funcs goya_state_dump_funcs = {
 +      .monitor_valid = goya_monitor_valid,
 +      .print_single_monitor = goya_print_single_monitor,
 +      .gen_sync_to_engine_map = goya_gen_sync_to_engine_map,
 +      .print_fences_single_engine = goya_print_fences_single_engine,
 +};
 +
 +static void goya_state_dump_init(struct hl_device *hdev)
 +{
 +      /* Not implemented */
 +      hdev->state_dump_specs.props = goya_state_dump_specs_props;
 +      hdev->state_dump_specs.funcs = goya_state_dump_funcs;
 +}
 +
 +static u32 goya_get_sob_addr(struct hl_device *hdev, u32 sob_id)
 +{
 +      return 0;
 +}
 +
 +static u32 *goya_get_stream_master_qid_arr(void)
 +{
 +      return NULL;
 +}
 +
 +static int goya_get_monitor_dump(struct hl_device *hdev, void *data)
 +{
 +      return -EOPNOTSUPP;
 +}
 +
 +static void goya_check_if_razwi_happened(struct hl_device *hdev)
 +{
 +}
 +
 +static int goya_scrub_device_dram(struct hl_device *hdev, u64 val)
 +{
 +      return -EOPNOTSUPP;
 +}
 +
 +static int goya_set_dram_properties(struct hl_device *hdev)
 +{
 +      return 0;
 +}
 +
 +static int goya_set_binning_masks(struct hl_device *hdev)
 +{
 +      return 0;
 +}
 +
 +static int goya_send_device_activity(struct hl_device *hdev, bool open)
 +{
 +      return 0;
 +}
 +
 +static const struct hl_asic_funcs goya_funcs = {
 +      .early_init = goya_early_init,
 +      .early_fini = goya_early_fini,
 +      .late_init = goya_late_init,
 +      .late_fini = goya_late_fini,
 +      .sw_init = goya_sw_init,
 +      .sw_fini = goya_sw_fini,
 +      .hw_init = goya_hw_init,
 +      .hw_fini = goya_hw_fini,
 +      .halt_engines = goya_halt_engines,
 +      .suspend = goya_suspend,
 +      .resume = goya_resume,
 +      .mmap = goya_mmap,
 +      .ring_doorbell = goya_ring_doorbell,
 +      .pqe_write = goya_pqe_write,
 +      .asic_dma_alloc_coherent = goya_dma_alloc_coherent,
 +      .asic_dma_free_coherent = goya_dma_free_coherent,
 +      .scrub_device_mem = goya_scrub_device_mem,
 +      .scrub_device_dram = goya_scrub_device_dram,
 +      .get_int_queue_base = goya_get_int_queue_base,
 +      .test_queues = goya_test_queues,
 +      .asic_dma_pool_zalloc = goya_dma_pool_zalloc,
 +      .asic_dma_pool_free = goya_dma_pool_free,
 +      .cpu_accessible_dma_pool_alloc = goya_cpu_accessible_dma_pool_alloc,
 +      .cpu_accessible_dma_pool_free = goya_cpu_accessible_dma_pool_free,
 +      .hl_dma_unmap_sgtable = hl_dma_unmap_sgtable,
 +      .cs_parser = goya_cs_parser,
 +      .asic_dma_map_sgtable = hl_dma_map_sgtable,
 +      .add_end_of_cb_packets = goya_add_end_of_cb_packets,
 +      .update_eq_ci = goya_update_eq_ci,
 +      .context_switch = goya_context_switch,
 +      .restore_phase_topology = goya_restore_phase_topology,
 +      .debugfs_read_dma = goya_debugfs_read_dma,
 +      .add_device_attr = goya_add_device_attr,
 +      .handle_eqe = goya_handle_eqe,
 +      .get_events_stat = goya_get_events_stat,
 +      .read_pte = goya_read_pte,
 +      .write_pte = goya_write_pte,
 +      .mmu_invalidate_cache = goya_mmu_invalidate_cache,
 +      .mmu_invalidate_cache_range = goya_mmu_invalidate_cache_range,
 +      .mmu_prefetch_cache_range = NULL,
 +      .send_heartbeat = goya_send_heartbeat,
 +      .debug_coresight = goya_debug_coresight,
 +      .is_device_idle = goya_is_device_idle,
 +      .compute_reset_late_init = goya_compute_reset_late_init,
 +      .hw_queues_lock = goya_hw_queues_lock,
 +      .hw_queues_unlock = goya_hw_queues_unlock,
 +      .get_pci_id = goya_get_pci_id,
 +      .get_eeprom_data = goya_get_eeprom_data,
 +      .get_monitor_dump = goya_get_monitor_dump,
 +      .send_cpu_message = goya_send_cpu_message,
 +      .pci_bars_map = goya_pci_bars_map,
 +      .init_iatu = goya_init_iatu,
 +      .rreg = hl_rreg,
 +      .wreg = hl_wreg,
 +      .halt_coresight = goya_halt_coresight,
 +      .ctx_init = goya_ctx_init,
 +      .ctx_fini = goya_ctx_fini,
 +      .pre_schedule_cs = goya_pre_schedule_cs,
 +      .get_queue_id_for_cq = goya_get_queue_id_for_cq,
 +      .load_firmware_to_device = goya_load_firmware_to_device,
 +      .load_boot_fit_to_device = goya_load_boot_fit_to_device,
 +      .get_signal_cb_size = goya_get_signal_cb_size,
 +      .get_wait_cb_size = goya_get_wait_cb_size,
 +      .gen_signal_cb = goya_gen_signal_cb,
 +      .gen_wait_cb = goya_gen_wait_cb,
 +      .reset_sob = goya_reset_sob,
 +      .reset_sob_group = goya_reset_sob_group,
 +      .get_device_time = goya_get_device_time,
 +      .pb_print_security_errors = NULL,
 +      .collective_wait_init_cs = goya_collective_wait_init_cs,
 +      .collective_wait_create_jobs = goya_collective_wait_create_jobs,
 +      .get_dec_base_addr = NULL,
 +      .scramble_addr = hl_mmu_scramble_addr,
 +      .descramble_addr = hl_mmu_descramble_addr,
 +      .ack_protection_bits_errors = goya_ack_protection_bits_errors,
 +      .get_hw_block_id = goya_get_hw_block_id,
 +      .hw_block_mmap = goya_block_mmap,
 +      .enable_events_from_fw = goya_enable_events_from_fw,
 +      .ack_mmu_errors = goya_ack_mmu_page_fault_or_access_error,
 +      .map_pll_idx_to_fw_idx = goya_map_pll_idx_to_fw_idx,
 +      .init_firmware_preload_params = goya_init_firmware_preload_params,
 +      .init_firmware_loader = goya_init_firmware_loader,
 +      .init_cpu_scrambler_dram = goya_cpu_init_scrambler_dram,
 +      .state_dump_init = goya_state_dump_init,
 +      .get_sob_addr = &goya_get_sob_addr,
 +      .set_pci_memory_regions = goya_set_pci_memory_regions,
 +      .get_stream_master_qid_arr = goya_get_stream_master_qid_arr,
 +      .check_if_razwi_happened = goya_check_if_razwi_happened,
 +      .mmu_get_real_page_size = hl_mmu_get_real_page_size,
 +      .access_dev_mem = hl_access_dev_mem,
 +      .set_dram_bar_base = goya_set_ddr_bar_base,
 +      .send_device_activity = goya_send_device_activity,
 +      .set_dram_properties = goya_set_dram_properties,
 +      .set_binning_masks = goya_set_binning_masks,
 +};
 +
 +/*
 + * goya_set_asic_funcs - set Goya function pointers
 + *
 + * @*hdev: pointer to hl_device structure
 + *
 + */
 +void goya_set_asic_funcs(struct hl_device *hdev)
 +{
 +      hdev->asic_funcs = &goya_funcs;
 +}
index 01d47d3bad5bbb517158e6a37d6181996f4acec3,0000000000000000000000000000000000000000..52b339aefadcae0dd01f5d0d5d1a20aec44c1412
mode 100644,000000..100644
--- /dev/null
@@@ -1,749 -1,0 +1,749 @@@
-       vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND;
 +// SPDX-License-Identifier: GPL-2.0-only
 +/*
 + * Copyright (C) 2020-2023 Intel Corporation
 + */
 +
 +#include <linux/dma-buf.h>
 +#include <linux/highmem.h>
 +#include <linux/module.h>
 +#include <linux/set_memory.h>
 +#include <linux/xarray.h>
 +
 +#include <drm/drm_cache.h>
 +#include <drm/drm_debugfs.h>
 +#include <drm/drm_file.h>
 +#include <drm/drm_utils.h>
 +
 +#include "ivpu_drv.h"
 +#include "ivpu_gem.h"
 +#include "ivpu_hw.h"
 +#include "ivpu_mmu.h"
 +#include "ivpu_mmu_context.h"
 +
 +MODULE_IMPORT_NS(DMA_BUF);
 +
 +static const struct drm_gem_object_funcs ivpu_gem_funcs;
 +
 +static struct lock_class_key prime_bo_lock_class_key;
 +
 +static int __must_check prime_alloc_pages_locked(struct ivpu_bo *bo)
 +{
 +      /* Pages are managed by the underlying dma-buf */
 +      return 0;
 +}
 +
 +static void prime_free_pages_locked(struct ivpu_bo *bo)
 +{
 +      /* Pages are managed by the underlying dma-buf */
 +}
 +
 +static int prime_map_pages_locked(struct ivpu_bo *bo)
 +{
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +      struct sg_table *sgt;
 +
 +      sgt = dma_buf_map_attachment_unlocked(bo->base.import_attach, DMA_BIDIRECTIONAL);
 +      if (IS_ERR(sgt)) {
 +              ivpu_err(vdev, "Failed to map attachment: %ld\n", PTR_ERR(sgt));
 +              return PTR_ERR(sgt);
 +      }
 +
 +      bo->sgt = sgt;
 +      return 0;
 +}
 +
 +static void prime_unmap_pages_locked(struct ivpu_bo *bo)
 +{
 +      dma_buf_unmap_attachment_unlocked(bo->base.import_attach, bo->sgt, DMA_BIDIRECTIONAL);
 +      bo->sgt = NULL;
 +}
 +
 +static const struct ivpu_bo_ops prime_ops = {
 +      .type = IVPU_BO_TYPE_PRIME,
 +      .name = "prime",
 +      .alloc_pages = prime_alloc_pages_locked,
 +      .free_pages = prime_free_pages_locked,
 +      .map_pages = prime_map_pages_locked,
 +      .unmap_pages = prime_unmap_pages_locked,
 +};
 +
 +static int __must_check shmem_alloc_pages_locked(struct ivpu_bo *bo)
 +{
 +      int npages = bo->base.size >> PAGE_SHIFT;
 +      struct page **pages;
 +
 +      pages = drm_gem_get_pages(&bo->base);
 +      if (IS_ERR(pages))
 +              return PTR_ERR(pages);
 +
 +      if (bo->flags & DRM_IVPU_BO_WC)
 +              set_pages_array_wc(pages, npages);
 +      else if (bo->flags & DRM_IVPU_BO_UNCACHED)
 +              set_pages_array_uc(pages, npages);
 +
 +      bo->pages = pages;
 +      return 0;
 +}
 +
 +static void shmem_free_pages_locked(struct ivpu_bo *bo)
 +{
 +      if (ivpu_bo_cache_mode(bo) != DRM_IVPU_BO_CACHED)
 +              set_pages_array_wb(bo->pages, bo->base.size >> PAGE_SHIFT);
 +
 +      drm_gem_put_pages(&bo->base, bo->pages, true, false);
 +      bo->pages = NULL;
 +}
 +
 +static int ivpu_bo_map_pages_locked(struct ivpu_bo *bo)
 +{
 +      int npages = bo->base.size >> PAGE_SHIFT;
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +      struct sg_table *sgt;
 +      int ret;
 +
 +      sgt = drm_prime_pages_to_sg(&vdev->drm, bo->pages, npages);
 +      if (IS_ERR(sgt)) {
 +              ivpu_err(vdev, "Failed to allocate sgtable\n");
 +              return PTR_ERR(sgt);
 +      }
 +
 +      ret = dma_map_sgtable(vdev->drm.dev, sgt, DMA_BIDIRECTIONAL, 0);
 +      if (ret) {
 +              ivpu_err(vdev, "Failed to map BO in IOMMU: %d\n", ret);
 +              goto err_free_sgt;
 +      }
 +
 +      bo->sgt = sgt;
 +      return 0;
 +
 +err_free_sgt:
 +      kfree(sgt);
 +      return ret;
 +}
 +
 +static void ivpu_bo_unmap_pages_locked(struct ivpu_bo *bo)
 +{
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +
 +      dma_unmap_sgtable(vdev->drm.dev, bo->sgt, DMA_BIDIRECTIONAL, 0);
 +      sg_free_table(bo->sgt);
 +      kfree(bo->sgt);
 +      bo->sgt = NULL;
 +}
 +
 +static const struct ivpu_bo_ops shmem_ops = {
 +      .type = IVPU_BO_TYPE_SHMEM,
 +      .name = "shmem",
 +      .alloc_pages = shmem_alloc_pages_locked,
 +      .free_pages = shmem_free_pages_locked,
 +      .map_pages = ivpu_bo_map_pages_locked,
 +      .unmap_pages = ivpu_bo_unmap_pages_locked,
 +};
 +
 +static int __must_check internal_alloc_pages_locked(struct ivpu_bo *bo)
 +{
 +      unsigned int i, npages = bo->base.size >> PAGE_SHIFT;
 +      struct page **pages;
 +      int ret;
 +
 +      pages = kvmalloc_array(npages, sizeof(*bo->pages), GFP_KERNEL);
 +      if (!pages)
 +              return -ENOMEM;
 +
 +      for (i = 0; i < npages; i++) {
 +              pages[i] = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
 +              if (!pages[i]) {
 +                      ret = -ENOMEM;
 +                      goto err_free_pages;
 +              }
 +              cond_resched();
 +      }
 +
 +      bo->pages = pages;
 +      return 0;
 +
 +err_free_pages:
 +      while (i--)
 +              put_page(pages[i]);
 +      kvfree(pages);
 +      return ret;
 +}
 +
 +static void internal_free_pages_locked(struct ivpu_bo *bo)
 +{
 +      unsigned int i, npages = bo->base.size >> PAGE_SHIFT;
 +
 +      for (i = 0; i < npages; i++)
 +              put_page(bo->pages[i]);
 +
 +      kvfree(bo->pages);
 +      bo->pages = NULL;
 +}
 +
 +static const struct ivpu_bo_ops internal_ops = {
 +      .type = IVPU_BO_TYPE_INTERNAL,
 +      .name = "internal",
 +      .alloc_pages = internal_alloc_pages_locked,
 +      .free_pages = internal_free_pages_locked,
 +      .map_pages = ivpu_bo_map_pages_locked,
 +      .unmap_pages = ivpu_bo_unmap_pages_locked,
 +};
 +
 +static int __must_check ivpu_bo_alloc_and_map_pages_locked(struct ivpu_bo *bo)
 +{
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +      int ret;
 +
 +      lockdep_assert_held(&bo->lock);
 +      drm_WARN_ON(&vdev->drm, bo->sgt);
 +
 +      ret = bo->ops->alloc_pages(bo);
 +      if (ret) {
 +              ivpu_err(vdev, "Failed to allocate pages for BO: %d", ret);
 +              return ret;
 +      }
 +
 +      ret = bo->ops->map_pages(bo);
 +      if (ret) {
 +              ivpu_err(vdev, "Failed to map pages for BO: %d", ret);
 +              goto err_free_pages;
 +      }
 +      return ret;
 +
 +err_free_pages:
 +      bo->ops->free_pages(bo);
 +      return ret;
 +}
 +
 +static void ivpu_bo_unmap_and_free_pages(struct ivpu_bo *bo)
 +{
 +      mutex_lock(&bo->lock);
 +
 +      WARN_ON(!bo->sgt);
 +      bo->ops->unmap_pages(bo);
 +      WARN_ON(bo->sgt);
 +      bo->ops->free_pages(bo);
 +      WARN_ON(bo->pages);
 +
 +      mutex_unlock(&bo->lock);
 +}
 +
 +/*
 + * ivpu_bo_pin() - pin the backing physical pages and map them to VPU.
 + *
 + * This function pins physical memory pages, then maps the physical pages
 + * to IOMMU address space and finally updates the VPU MMU page tables
 + * to allow the VPU to translate VPU address to IOMMU address.
 + */
 +int __must_check ivpu_bo_pin(struct ivpu_bo *bo)
 +{
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +      int ret = 0;
 +
 +      mutex_lock(&bo->lock);
 +
 +      if (!bo->vpu_addr) {
 +              ivpu_err(vdev, "vpu_addr not set for BO ctx_id: %d handle: %d\n",
 +                       bo->ctx->id, bo->handle);
 +              ret = -EINVAL;
 +              goto unlock;
 +      }
 +
 +      if (!bo->sgt) {
 +              ret = ivpu_bo_alloc_and_map_pages_locked(bo);
 +              if (ret)
 +                      goto unlock;
 +      }
 +
 +      if (!bo->mmu_mapped) {
 +              ret = ivpu_mmu_context_map_sgt(vdev, bo->ctx, bo->vpu_addr, bo->sgt,
 +                                             ivpu_bo_is_snooped(bo));
 +              if (ret) {
 +                      ivpu_err(vdev, "Failed to map BO in MMU: %d\n", ret);
 +                      goto unlock;
 +              }
 +              bo->mmu_mapped = true;
 +      }
 +
 +unlock:
 +      mutex_unlock(&bo->lock);
 +
 +      return ret;
 +}
 +
 +static int
 +ivpu_bo_alloc_vpu_addr(struct ivpu_bo *bo, struct ivpu_mmu_context *ctx,
 +                     const struct ivpu_addr_range *range)
 +{
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +      int ret;
 +
 +      if (!range) {
 +              if (bo->flags & DRM_IVPU_BO_HIGH_MEM)
 +                      range = &vdev->hw->ranges.user_high;
 +              else
 +                      range = &vdev->hw->ranges.user_low;
 +      }
 +
 +      mutex_lock(&ctx->lock);
 +      ret = ivpu_mmu_context_insert_node_locked(ctx, range, bo->base.size, &bo->mm_node);
 +      if (!ret) {
 +              bo->ctx = ctx;
 +              bo->vpu_addr = bo->mm_node.start;
 +              list_add_tail(&bo->ctx_node, &ctx->bo_list);
 +      }
 +      mutex_unlock(&ctx->lock);
 +
 +      return ret;
 +}
 +
 +static void ivpu_bo_free_vpu_addr(struct ivpu_bo *bo)
 +{
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +      struct ivpu_mmu_context *ctx = bo->ctx;
 +
 +      ivpu_dbg(vdev, BO, "remove from ctx: ctx %d vpu_addr 0x%llx allocated %d mmu_mapped %d\n",
 +               ctx->id, bo->vpu_addr, (bool)bo->sgt, bo->mmu_mapped);
 +
 +      mutex_lock(&bo->lock);
 +
 +      if (bo->mmu_mapped) {
 +              drm_WARN_ON(&vdev->drm, !bo->sgt);
 +              ivpu_mmu_context_unmap_sgt(vdev, ctx, bo->vpu_addr, bo->sgt);
 +              bo->mmu_mapped = false;
 +      }
 +
 +      mutex_lock(&ctx->lock);
 +      list_del(&bo->ctx_node);
 +      bo->vpu_addr = 0;
 +      bo->ctx = NULL;
 +      ivpu_mmu_context_remove_node_locked(ctx, &bo->mm_node);
 +      mutex_unlock(&ctx->lock);
 +
 +      mutex_unlock(&bo->lock);
 +}
 +
 +void ivpu_bo_remove_all_bos_from_context(struct ivpu_mmu_context *ctx)
 +{
 +      struct ivpu_bo *bo, *tmp;
 +
 +      list_for_each_entry_safe(bo, tmp, &ctx->bo_list, ctx_node)
 +              ivpu_bo_free_vpu_addr(bo);
 +}
 +
 +static struct ivpu_bo *
 +ivpu_bo_alloc(struct ivpu_device *vdev, struct ivpu_mmu_context *mmu_context,
 +            u64 size, u32 flags, const struct ivpu_bo_ops *ops,
 +            const struct ivpu_addr_range *range, u64 user_ptr)
 +{
 +      struct ivpu_bo *bo;
 +      int ret = 0;
 +
 +      if (drm_WARN_ON(&vdev->drm, size == 0 || !PAGE_ALIGNED(size)))
 +              return ERR_PTR(-EINVAL);
 +
 +      switch (flags & DRM_IVPU_BO_CACHE_MASK) {
 +      case DRM_IVPU_BO_CACHED:
 +      case DRM_IVPU_BO_UNCACHED:
 +      case DRM_IVPU_BO_WC:
 +              break;
 +      default:
 +              return ERR_PTR(-EINVAL);
 +      }
 +
 +      bo = kzalloc(sizeof(*bo), GFP_KERNEL);
 +      if (!bo)
 +              return ERR_PTR(-ENOMEM);
 +
 +      mutex_init(&bo->lock);
 +      bo->base.funcs = &ivpu_gem_funcs;
 +      bo->flags = flags;
 +      bo->ops = ops;
 +      bo->user_ptr = user_ptr;
 +
 +      if (ops->type == IVPU_BO_TYPE_SHMEM)
 +              ret = drm_gem_object_init(&vdev->drm, &bo->base, size);
 +      else
 +              drm_gem_private_object_init(&vdev->drm, &bo->base, size);
 +
 +      if (ret) {
 +              ivpu_err(vdev, "Failed to initialize drm object\n");
 +              goto err_free;
 +      }
 +
 +      if (flags & DRM_IVPU_BO_MAPPABLE) {
 +              ret = drm_gem_create_mmap_offset(&bo->base);
 +              if (ret) {
 +                      ivpu_err(vdev, "Failed to allocate mmap offset\n");
 +                      goto err_release;
 +              }
 +      }
 +
 +      if (mmu_context) {
 +              ret = ivpu_bo_alloc_vpu_addr(bo, mmu_context, range);
 +              if (ret) {
 +                      ivpu_err(vdev, "Failed to add BO to context: %d\n", ret);
 +                      goto err_release;
 +              }
 +      }
 +
 +      return bo;
 +
 +err_release:
 +      drm_gem_object_release(&bo->base);
 +err_free:
 +      kfree(bo);
 +      return ERR_PTR(ret);
 +}
 +
 +static void ivpu_bo_free(struct drm_gem_object *obj)
 +{
 +      struct ivpu_bo *bo = to_ivpu_bo(obj);
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +
 +      if (bo->ctx)
 +              ivpu_dbg(vdev, BO, "free: ctx %d vpu_addr 0x%llx allocated %d mmu_mapped %d\n",
 +                       bo->ctx->id, bo->vpu_addr, (bool)bo->sgt, bo->mmu_mapped);
 +      else
 +              ivpu_dbg(vdev, BO, "free: ctx (released) allocated %d mmu_mapped %d\n",
 +                       (bool)bo->sgt, bo->mmu_mapped);
 +
 +      drm_WARN_ON(&vdev->drm, !dma_resv_test_signaled(obj->resv, DMA_RESV_USAGE_READ));
 +
 +      vunmap(bo->kvaddr);
 +
 +      if (bo->ctx)
 +              ivpu_bo_free_vpu_addr(bo);
 +
 +      if (bo->sgt)
 +              ivpu_bo_unmap_and_free_pages(bo);
 +
 +      if (bo->base.import_attach)
 +              drm_prime_gem_destroy(&bo->base, bo->sgt);
 +
 +      drm_gem_object_release(&bo->base);
 +
 +      mutex_destroy(&bo->lock);
 +      kfree(bo);
 +}
 +
 +static int ivpu_bo_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
 +{
 +      struct ivpu_bo *bo = to_ivpu_bo(obj);
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +
 +      ivpu_dbg(vdev, BO, "mmap: ctx %u handle %u vpu_addr 0x%llx size %zu type %s",
 +               bo->ctx->id, bo->handle, bo->vpu_addr, bo->base.size, bo->ops->name);
 +
 +      if (obj->import_attach) {
 +              /* Drop the reference drm_gem_mmap_obj() acquired.*/
 +              drm_gem_object_put(obj);
 +              vma->vm_private_data = NULL;
 +              return dma_buf_mmap(obj->dma_buf, vma, 0);
 +      }
 +
++      vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND);
 +      vma->vm_page_prot = ivpu_bo_pgprot(bo, vm_get_page_prot(vma->vm_flags));
 +
 +      return 0;
 +}
 +
 +static struct sg_table *ivpu_bo_get_sg_table(struct drm_gem_object *obj)
 +{
 +      struct ivpu_bo *bo = to_ivpu_bo(obj);
 +      loff_t npages = obj->size >> PAGE_SHIFT;
 +      int ret = 0;
 +
 +      mutex_lock(&bo->lock);
 +
 +      if (!bo->sgt)
 +              ret = ivpu_bo_alloc_and_map_pages_locked(bo);
 +
 +      mutex_unlock(&bo->lock);
 +
 +      if (ret)
 +              return ERR_PTR(ret);
 +
 +      return drm_prime_pages_to_sg(obj->dev, bo->pages, npages);
 +}
 +
 +static vm_fault_t ivpu_vm_fault(struct vm_fault *vmf)
 +{
 +      struct vm_area_struct *vma = vmf->vma;
 +      struct drm_gem_object *obj = vma->vm_private_data;
 +      struct ivpu_bo *bo = to_ivpu_bo(obj);
 +      loff_t npages = obj->size >> PAGE_SHIFT;
 +      pgoff_t page_offset;
 +      struct page *page;
 +      vm_fault_t ret;
 +      int err;
 +
 +      mutex_lock(&bo->lock);
 +
 +      if (!bo->sgt) {
 +              err = ivpu_bo_alloc_and_map_pages_locked(bo);
 +              if (err) {
 +                      ret = vmf_error(err);
 +                      goto unlock;
 +              }
 +      }
 +
 +      /* We don't use vmf->pgoff since that has the fake offset */
 +      page_offset = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
 +      if (page_offset >= npages) {
 +              ret = VM_FAULT_SIGBUS;
 +      } else {
 +              page = bo->pages[page_offset];
 +              ret = vmf_insert_pfn(vma, vmf->address, page_to_pfn(page));
 +      }
 +
 +unlock:
 +      mutex_unlock(&bo->lock);
 +
 +      return ret;
 +}
 +
 +static const struct vm_operations_struct ivpu_vm_ops = {
 +      .fault = ivpu_vm_fault,
 +      .open = drm_gem_vm_open,
 +      .close = drm_gem_vm_close,
 +};
 +
 +static const struct drm_gem_object_funcs ivpu_gem_funcs = {
 +      .free = ivpu_bo_free,
 +      .mmap = ivpu_bo_mmap,
 +      .vm_ops = &ivpu_vm_ops,
 +      .get_sg_table = ivpu_bo_get_sg_table,
 +};
 +
 +int
 +ivpu_bo_create_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 +{
 +      struct ivpu_file_priv *file_priv = file->driver_priv;
 +      struct ivpu_device *vdev = file_priv->vdev;
 +      struct drm_ivpu_bo_create *args = data;
 +      u64 size = PAGE_ALIGN(args->size);
 +      struct ivpu_bo *bo;
 +      int ret;
 +
 +      if (args->flags & ~DRM_IVPU_BO_FLAGS)
 +              return -EINVAL;
 +
 +      if (size == 0)
 +              return -EINVAL;
 +
 +      bo = ivpu_bo_alloc(vdev, &file_priv->ctx, size, args->flags, &shmem_ops, NULL, 0);
 +      if (IS_ERR(bo)) {
 +              ivpu_err(vdev, "Failed to create BO: %pe (ctx %u size %llu flags 0x%x)",
 +                       bo, file_priv->ctx.id, args->size, args->flags);
 +              return PTR_ERR(bo);
 +      }
 +
 +      ret = drm_gem_handle_create(file, &bo->base, &bo->handle);
 +      if (!ret) {
 +              args->vpu_addr = bo->vpu_addr;
 +              args->handle = bo->handle;
 +      }
 +
 +      drm_gem_object_put(&bo->base);
 +
 +      ivpu_dbg(vdev, BO, "alloc shmem: ctx %u vpu_addr 0x%llx size %zu flags 0x%x\n",
 +               file_priv->ctx.id, bo->vpu_addr, bo->base.size, bo->flags);
 +
 +      return ret;
 +}
 +
 +struct ivpu_bo *
 +ivpu_bo_alloc_internal(struct ivpu_device *vdev, u64 vpu_addr, u64 size, u32 flags)
 +{
 +      const struct ivpu_addr_range *range;
 +      struct ivpu_addr_range fixed_range;
 +      struct ivpu_bo *bo;
 +      pgprot_t prot;
 +      int ret;
 +
 +      drm_WARN_ON(&vdev->drm, !PAGE_ALIGNED(vpu_addr));
 +      drm_WARN_ON(&vdev->drm, !PAGE_ALIGNED(size));
 +
 +      if (vpu_addr) {
 +              fixed_range.start = vpu_addr;
 +              fixed_range.end = vpu_addr + size;
 +              range = &fixed_range;
 +      } else {
 +              range = &vdev->hw->ranges.global_low;
 +      }
 +
 +      bo = ivpu_bo_alloc(vdev, &vdev->gctx, size, flags, &internal_ops, range, 0);
 +      if (IS_ERR(bo)) {
 +              ivpu_err(vdev, "Failed to create BO: %pe (vpu_addr 0x%llx size %llu flags 0x%x)",
 +                       bo, vpu_addr, size, flags);
 +              return NULL;
 +      }
 +
 +      ret = ivpu_bo_pin(bo);
 +      if (ret)
 +              goto err_put;
 +
 +      if (ivpu_bo_cache_mode(bo) != DRM_IVPU_BO_CACHED)
 +              drm_clflush_pages(bo->pages, bo->base.size >> PAGE_SHIFT);
 +
 +      prot = ivpu_bo_pgprot(bo, PAGE_KERNEL);
 +      bo->kvaddr = vmap(bo->pages, bo->base.size >> PAGE_SHIFT, VM_MAP, prot);
 +      if (!bo->kvaddr) {
 +              ivpu_err(vdev, "Failed to map BO into kernel virtual memory\n");
 +              goto err_put;
 +      }
 +
 +      ivpu_dbg(vdev, BO, "alloc internal: ctx 0 vpu_addr 0x%llx size %zu flags 0x%x\n",
 +               bo->vpu_addr, bo->base.size, flags);
 +
 +      return bo;
 +
 +err_put:
 +      drm_gem_object_put(&bo->base);
 +      return NULL;
 +}
 +
 +void ivpu_bo_free_internal(struct ivpu_bo *bo)
 +{
 +      drm_gem_object_put(&bo->base);
 +}
 +
 +struct drm_gem_object *ivpu_gem_prime_import(struct drm_device *dev, struct dma_buf *buf)
 +{
 +      struct ivpu_device *vdev = to_ivpu_device(dev);
 +      struct dma_buf_attachment *attach;
 +      struct ivpu_bo *bo;
 +
 +      attach = dma_buf_attach(buf, dev->dev);
 +      if (IS_ERR(attach))
 +              return ERR_CAST(attach);
 +
 +      get_dma_buf(buf);
 +
 +      bo = ivpu_bo_alloc(vdev, NULL, buf->size, DRM_IVPU_BO_MAPPABLE, &prime_ops, NULL, 0);
 +      if (IS_ERR(bo)) {
 +              ivpu_err(vdev, "Failed to import BO: %pe (size %lu)", bo, buf->size);
 +              goto err_detach;
 +      }
 +
 +      lockdep_set_class(&bo->lock, &prime_bo_lock_class_key);
 +
 +      bo->base.import_attach = attach;
 +
 +      return &bo->base;
 +
 +err_detach:
 +      dma_buf_detach(buf, attach);
 +      dma_buf_put(buf);
 +      return ERR_CAST(bo);
 +}
 +
 +int ivpu_bo_info_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 +{
 +      struct ivpu_file_priv *file_priv = file->driver_priv;
 +      struct ivpu_device *vdev = to_ivpu_device(dev);
 +      struct drm_ivpu_bo_info *args = data;
 +      struct drm_gem_object *obj;
 +      struct ivpu_bo *bo;
 +      int ret = 0;
 +
 +      obj = drm_gem_object_lookup(file, args->handle);
 +      if (!obj)
 +              return -ENOENT;
 +
 +      bo = to_ivpu_bo(obj);
 +
 +      mutex_lock(&bo->lock);
 +
 +      if (!bo->ctx) {
 +              ret = ivpu_bo_alloc_vpu_addr(bo, &file_priv->ctx, NULL);
 +              if (ret) {
 +                      ivpu_err(vdev, "Failed to allocate vpu_addr: %d\n", ret);
 +                      goto unlock;
 +              }
 +      }
 +
 +      args->flags = bo->flags;
 +      args->mmap_offset = drm_vma_node_offset_addr(&obj->vma_node);
 +      args->vpu_addr = bo->vpu_addr;
 +      args->size = obj->size;
 +unlock:
 +      mutex_unlock(&bo->lock);
 +      drm_gem_object_put(obj);
 +      return ret;
 +}
 +
 +int ivpu_bo_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 +{
 +      struct drm_ivpu_bo_wait *args = data;
 +      struct drm_gem_object *obj;
 +      unsigned long timeout;
 +      long ret;
 +
 +      timeout = drm_timeout_abs_to_jiffies(args->timeout_ns);
 +
 +      obj = drm_gem_object_lookup(file, args->handle);
 +      if (!obj)
 +              return -EINVAL;
 +
 +      ret = dma_resv_wait_timeout(obj->resv, DMA_RESV_USAGE_READ, true, timeout);
 +      if (ret == 0) {
 +              ret = -ETIMEDOUT;
 +      } else if (ret > 0) {
 +              ret = 0;
 +              args->job_status = to_ivpu_bo(obj)->job_status;
 +      }
 +
 +      drm_gem_object_put(obj);
 +
 +      return ret;
 +}
 +
 +static void ivpu_bo_print_info(struct ivpu_bo *bo, struct drm_printer *p)
 +{
 +      unsigned long dma_refcount = 0;
 +
 +      if (bo->base.dma_buf && bo->base.dma_buf->file)
 +              dma_refcount = atomic_long_read(&bo->base.dma_buf->file->f_count);
 +
 +      drm_printf(p, "%5u %6d %16llx %10lu %10u %12lu %14s\n",
 +                 bo->ctx->id, bo->handle, bo->vpu_addr, bo->base.size,
 +                 kref_read(&bo->base.refcount), dma_refcount, bo->ops->name);
 +}
 +
 +void ivpu_bo_list(struct drm_device *dev, struct drm_printer *p)
 +{
 +      struct ivpu_device *vdev = to_ivpu_device(dev);
 +      struct ivpu_file_priv *file_priv;
 +      unsigned long ctx_id;
 +      struct ivpu_bo *bo;
 +
 +      drm_printf(p, "%5s %6s %16s %10s %10s %12s %14s\n",
 +                 "ctx", "handle", "vpu_addr", "size", "refcount", "dma_refcount", "type");
 +
 +      mutex_lock(&vdev->gctx.lock);
 +      list_for_each_entry(bo, &vdev->gctx.bo_list, ctx_node)
 +              ivpu_bo_print_info(bo, p);
 +      mutex_unlock(&vdev->gctx.lock);
 +
 +      xa_for_each(&vdev->context_xa, ctx_id, file_priv) {
 +              file_priv = ivpu_file_priv_get_by_ctx_id(vdev, ctx_id);
 +              if (!file_priv)
 +                      continue;
 +
 +              mutex_lock(&file_priv->ctx.lock);
 +              list_for_each_entry(bo, &file_priv->ctx.bo_list, ctx_node)
 +                      ivpu_bo_print_info(bo, p);
 +              mutex_unlock(&file_priv->ctx.lock);
 +
 +              ivpu_file_priv_put(&file_priv);
 +      }
 +}
 +
 +void ivpu_bo_list_print(struct drm_device *dev)
 +{
 +      struct drm_printer p = drm_info_printer(dev->dev);
 +
 +      ivpu_bo_list(dev, &p);
 +}
index a8a77a1efe1e369c77eef227639de9ef3179cb86,37dce184eb56c6bffbeaa7272bd686f330ec5931..34177f1bd97dc09a4ccdd55f2ea1972f431d7e45
@@@ -417,8 -397,8 +403,9 @@@ static int brd_alloc(int i
  
        /* Tell the block layer that this is not a rotational device */
        blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
+       blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, disk->queue);
        blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, disk->queue);
 +      blk_queue_flag_set(QUEUE_FLAG_NOWAIT, disk->queue);
        err = add_disk(disk);
        if (err)
                goto out_cleanup_disk;
Simple merge
Simple merge
Simple merge
Simple merge
Simple merge
Simple merge
Simple merge
Simple merge
Simple merge
Simple merge
diff --cc fs/afs/write.c
Simple merge
Simple merge
diff --cc fs/buffer.c
index 623e77d6ef770b1ce8e6db156f78d4bbc5f021af,7e42d67bcaadf7178d9a8d6789af0d402ced2473..9e1e2add541e07a593bcc743d73eb05ed51bfbd9
@@@ -353,24 -319,15 +353,24 @@@ static void decrypt_bh(struct work_stru
   */
  static void end_buffer_async_read_io(struct buffer_head *bh, int uptodate)
  {
-       struct inode *inode = bh->b_page->mapping->host;
 -      /* Decrypt if needed */
 -      if (uptodate &&
 -          fscrypt_inode_uses_fs_layer_crypto(bh->b_folio->mapping->host)) {
 -              struct decrypt_bh_ctx *ctx = kmalloc(sizeof(*ctx), GFP_ATOMIC);
++      struct inode *inode = bh->b_folio->mapping->host;
 +      bool decrypt = fscrypt_inode_uses_fs_layer_crypto(inode);
 +      bool verify = need_fsverity(bh);
 +
 +      /* Decrypt (with fscrypt) and/or verify (with fsverity) if needed. */
 +      if (uptodate && (decrypt || verify)) {
 +              struct postprocess_bh_ctx *ctx =
 +                      kmalloc(sizeof(*ctx), GFP_ATOMIC);
  
                if (ctx) {
 -                      INIT_WORK(&ctx->work, decrypt_bh);
                        ctx->bh = bh;
 -                      fscrypt_enqueue_decrypt_work(&ctx->work);
 +                      if (decrypt) {
 +                              INIT_WORK(&ctx->work, decrypt_bh);
 +                              fscrypt_enqueue_decrypt_work(&ctx->work);
 +                      } else {
 +                              INIT_WORK(&ctx->work, verify_bh);
 +                              fsverity_enqueue_verify_work(&ctx->work);
 +                      }
                        return;
                }
                uptodate = 0;
diff --cc fs/ceph/addr.c
Simple merge
diff --cc fs/cifs/file.c
index 0e602173ac76c8abf04e492632ceb36660cf302b,162fab5a4583d88ca21c66b4abde0888dd94c31f..5365a329908884712935fd3d5551f6bfc4be1f4c
@@@ -2607,372 -2521,336 +2607,386 @@@ static int cifs_partialpagewrite(struc
        return rc;
  }
  
 -static struct cifs_writedata *
 -wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping,
 -                        pgoff_t end, pgoff_t *index,
 -                        unsigned int *found_pages)
 +/*
 + * Extend the region to be written back to include subsequent contiguously
 + * dirty pages if possible, but don't sleep while doing so.
 + */
 +static void cifs_extend_writeback(struct address_space *mapping,
 +                                long *_count,
 +                                loff_t start,
 +                                int max_pages,
 +                                size_t max_len,
 +                                unsigned int *_len)
  {
 -      struct cifs_writedata *wdata;
 -      struct folio_batch fbatch;
 -      unsigned int i, idx, p, nr;
 -      wdata = cifs_writedata_alloc((unsigned int)tofind,
 -                                   cifs_writev_complete);
 -      if (!wdata)
 -              return NULL;
 -
 -      folio_batch_init(&fbatch);
 -      *found_pages = 0;
 -
 -again:
 -      nr = filemap_get_folios_tag(mapping, index, end,
 -                              PAGECACHE_TAG_DIRTY, &fbatch);
 -      if (!nr)
 -              goto out; /* No dirty pages left in the range */
 -
 -      for (i = 0; i < nr; i++) {
 -              struct folio *folio = fbatch.folios[i];
 -
 -              idx = 0;
 -              p = folio_nr_pages(folio);
 -add_more:
 -              wdata->pages[*found_pages] = folio_page(folio, idx);
 -              folio_get(folio);
 -              if (++*found_pages == tofind) {
 -                      folio_batch_release(&fbatch);
 -                      goto out;
 -              }
 -              if (++idx < p)
 -                      goto add_more;
 -      }
 -      folio_batch_release(&fbatch);
 -      goto again;
 -out:
 -      return wdata;
 -}
 +      struct folio_batch batch;
 +      struct folio *folio;
 +      unsigned int psize, nr_pages;
 +      size_t len = *_len;
 +      pgoff_t index = (start + len) / PAGE_SIZE;
 +      bool stop = true;
 +      unsigned int i;
 +      XA_STATE(xas, &mapping->i_pages, index);
  
 -static unsigned int
 -wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages,
 -                  struct address_space *mapping,
 -                  struct writeback_control *wbc,
 -                  pgoff_t end, pgoff_t *index, pgoff_t *next, bool *done)
 -{
 -      unsigned int nr_pages = 0, i;
 -      struct page *page;
 +      folio_batch_init(&batch);
  
 -      for (i = 0; i < found_pages; i++) {
 -              page = wdata->pages[i];
 -              /*
 -               * At this point we hold neither the i_pages lock nor the
 -               * page lock: the page may be truncated or invalidated
 -               * (changing page->mapping to NULL), or even swizzled
 -               * back from swapper_space to tmpfs file mapping
 +      do {
 +              /* Firstly, we gather up a batch of contiguous dirty pages
 +               * under the RCU read lock - but we can't clear the dirty flags
 +               * there if any of those pages are mapped.
                 */
 +              rcu_read_lock();
  
 -              if (nr_pages == 0)
 -                      lock_page(page);
 -              else if (!trylock_page(page))
 -                      break;
 -
 -              if (unlikely(page->mapping != mapping)) {
 -                      unlock_page(page);
 -                      break;
 -              }
 +              xas_for_each(&xas, folio, ULONG_MAX) {
 +                      stop = true;
 +                      if (xas_retry(&xas, folio))
 +                              continue;
 +                      if (xa_is_value(folio))
 +                              break;
 +                      if (folio_index(folio) != index)
 +                              break;
 +                      if (!folio_try_get_rcu(folio)) {
 +                              xas_reset(&xas);
 +                              continue;
 +                      }
 +                      nr_pages = folio_nr_pages(folio);
 +                      if (nr_pages > max_pages)
 +                              break;
  
 -              if (!wbc->range_cyclic && page->index > end) {
 -                      *done = true;
 -                      unlock_page(page);
 -                      break;
 -              }
 +                      /* Has the page moved or been split? */
 +                      if (unlikely(folio != xas_reload(&xas))) {
 +                              folio_put(folio);
 +                              break;
 +                      }
  
 -              if (*next && (page->index != *next)) {
 -                      /* Not next consecutive page */
 -                      unlock_page(page);
 -                      break;
 -              }
 +                      if (!folio_trylock(folio)) {
 +                              folio_put(folio);
 +                              break;
 +                      }
 +                      if (!folio_test_dirty(folio) || folio_test_writeback(folio)) {
 +                              folio_unlock(folio);
 +                              folio_put(folio);
 +                              break;
 +                      }
  
 -              if (wbc->sync_mode != WB_SYNC_NONE)
 -                      wait_on_page_writeback(page);
 +                      max_pages -= nr_pages;
 +                      psize = folio_size(folio);
 +                      len += psize;
 +                      stop = false;
 +                      if (max_pages <= 0 || len >= max_len || *_count <= 0)
 +                              stop = true;
  
 -              if (PageWriteback(page) ||
 -                              !clear_page_dirty_for_io(page)) {
 -                      unlock_page(page);
 -                      break;
 +                      index += nr_pages;
 +                      if (!folio_batch_add(&batch, folio))
 +                              break;
 +                      if (stop)
 +                              break;
                }
  
 -              /*
 -               * This actually clears the dirty bit in the radix tree.
 -               * See cifs_writepage() for more commentary.
 +              if (!stop)
 +                      xas_pause(&xas);
 +              rcu_read_unlock();
 +
 +              /* Now, if we obtained any pages, we can shift them to being
 +               * writable and mark them for caching.
                 */
 -              set_page_writeback(page);
 -              if (page_offset(page) >= i_size_read(mapping->host)) {
 -                      *done = true;
 -                      unlock_page(page);
 -                      end_page_writeback(page);
 +              if (!folio_batch_count(&batch))
                        break;
 -              }
  
 -              wdata->pages[i] = page;
 -              *next = page->index + 1;
 -              ++nr_pages;
 -      }
 +              for (i = 0; i < folio_batch_count(&batch); i++) {
 +                      folio = batch.folios[i];
 +                      /* The folio should be locked, dirty and not undergoing
 +                       * writeback from the loop above.
 +                       */
 +                      if (!folio_clear_dirty_for_io(folio))
 +                              WARN_ON(1);
 +                      if (folio_start_writeback(folio))
 +                              WARN_ON(1);
  
 -      /* reset index to refind any pages skipped */
 -      if (nr_pages == 0)
 -              *index = wdata->pages[0]->index + 1;
 +                      *_count -= folio_nr_pages(folio);
 +                      folio_unlock(folio);
 +              }
  
 -      /* put any pages we aren't going to use */
 -      for (i = nr_pages; i < found_pages; i++) {
 -              put_page(wdata->pages[i]);
 -              wdata->pages[i] = NULL;
 -      }
 +              folio_batch_release(&batch);
 +              cond_resched();
 +      } while (!stop);
  
 -      return nr_pages;
 +      *_len = len;
  }
  
 -static int
 -wdata_send_pages(struct cifs_writedata *wdata, unsigned int nr_pages,
 -               struct address_space *mapping, struct writeback_control *wbc)
 +/*
 + * Write back the locked page and any subsequent non-locked dirty pages.
 + */
 +static ssize_t cifs_write_back_from_locked_folio(struct address_space *mapping,
 +                                               struct writeback_control *wbc,
 +                                               struct folio *folio,
 +                                               loff_t start, loff_t end)
  {
 +      struct inode *inode = mapping->host;
 +      struct TCP_Server_Info *server;
 +      struct cifs_writedata *wdata;
 +      struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
 +      struct cifs_credits credits_on_stack;
 +      struct cifs_credits *credits = &credits_on_stack;
 +      struct cifsFileInfo *cfile = NULL;
 +      unsigned int xid, wsize, len;
 +      loff_t i_size = i_size_read(inode);
 +      size_t max_len;
 +      long count = wbc->nr_to_write;
        int rc;
  
 -      wdata->sync_mode = wbc->sync_mode;
 -      wdata->nr_pages = nr_pages;
 -      wdata->offset = page_offset(wdata->pages[0]);
 -      wdata->pagesz = PAGE_SIZE;
 -      wdata->tailsz = min(i_size_read(mapping->host) -
 -                      page_offset(wdata->pages[nr_pages - 1]),
 -                      (loff_t)PAGE_SIZE);
 -      wdata->bytes = ((nr_pages - 1) * PAGE_SIZE) + wdata->tailsz;
 -      wdata->pid = wdata->cfile->pid;
 -
 -      rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes);
 -      if (rc)
 -              return rc;
 -
 -      if (wdata->cfile->invalidHandle)
 -              rc = -EAGAIN;
 -      else
 -              rc = wdata->server->ops->async_writev(wdata,
 -                                                    cifs_writedata_release);
 +      /* The folio should be locked, dirty and not undergoing writeback. */
 +      if (folio_start_writeback(folio))
 +              WARN_ON(1);
  
 -      return rc;
 -}
 +      count -= folio_nr_pages(folio);
 +      len = folio_size(folio);
  
 -static int
 -cifs_writepage_locked(struct page *page, struct writeback_control *wbc);
 +      xid = get_xid();
 +      server = cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses);
  
 -static int cifs_write_one_page(struct folio *folio,
 -              struct writeback_control *wbc, void *data)
 -{
 -      struct address_space *mapping = data;
 -      int ret;
 +      rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile);
 +      if (rc) {
 +              cifs_dbg(VFS, "No writable handle in writepages rc=%d\n", rc);
 +              goto err_xid;
 +      }
  
 -      ret = cifs_writepage_locked(&folio->page, wbc);
 -      folio_unlock(folio);
 -      mapping_set_error(mapping, ret);
 -      return ret;
 -}
 +      rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize,
 +                                         &wsize, credits);
 +      if (rc != 0)
 +              goto err_close;
  
 -static int cifs_writepages(struct address_space *mapping,
 -                         struct writeback_control *wbc)
 -{
 -      struct inode *inode = mapping->host;
 -      struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
 -      struct TCP_Server_Info *server;
 -      bool done = false, scanned = false, range_whole = false;
 -      pgoff_t end, index;
 -      struct cifs_writedata *wdata;
 -      struct cifsFileInfo *cfile = NULL;
 -      int rc = 0;
 -      int saved_rc = 0;
 -      unsigned int xid;
 +      wdata = cifs_writedata_alloc(cifs_writev_complete);
 +      if (!wdata) {
 +              rc = -ENOMEM;
 +              goto err_uncredit;
 +      }
  
 -      /*
 -       * If wsize is smaller than the page cache size, default to writing
 -       * one page at a time.
 +      wdata->sync_mode = wbc->sync_mode;
 +      wdata->offset = folio_pos(folio);
 +      wdata->pid = cfile->pid;
 +      wdata->credits = credits_on_stack;
 +      wdata->cfile = cfile;
 +      wdata->server = server;
 +      cfile = NULL;
 +
 +      /* Find all consecutive lockable dirty pages, stopping when we find a
 +       * page that is not immediately lockable, is not dirty or is missing,
 +       * or we reach the end of the range.
         */
 -      if (cifs_sb->ctx->wsize < PAGE_SIZE)
 -              return write_cache_pages(mapping, wbc, cifs_write_one_page,
 -                              mapping);
 +      if (start < i_size) {
 +              /* Trim the write to the EOF; the extra data is ignored.  Also
 +               * put an upper limit on the size of a single storedata op.
 +               */
 +              max_len = wsize;
 +              max_len = min_t(unsigned long long, max_len, end - start + 1);
 +              max_len = min_t(unsigned long long, max_len, i_size - start);
  
 -      xid = get_xid();
 -      if (wbc->range_cyclic) {
 -              index = mapping->writeback_index; /* Start from prev offset */
 -              end = -1;
 -      } else {
 -              index = wbc->range_start >> PAGE_SHIFT;
 -              end = wbc->range_end >> PAGE_SHIFT;
 -              if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
 -                      range_whole = true;
 -              scanned = true;
 +              if (len < max_len) {
 +                      int max_pages = INT_MAX;
 +
 +#ifdef CONFIG_CIFS_SMB_DIRECT
 +                      if (server->smbd_conn)
 +                              max_pages = server->smbd_conn->max_frmr_depth;
 +#endif
 +                      max_pages -= folio_nr_pages(folio);
 +
 +                      if (max_pages > 0)
 +                              cifs_extend_writeback(mapping, &count, start,
 +                                                    max_pages, max_len, &len);
 +              }
 +              len = min_t(loff_t, len, max_len);
        }
 -      server = cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses);
  
 -retry:
 -      while (!done && index <= end) {
 -              unsigned int i, nr_pages, found_pages, wsize;
 -              pgoff_t next = 0, tofind, saved_index = index;
 -              struct cifs_credits credits_on_stack;
 -              struct cifs_credits *credits = &credits_on_stack;
 -              int get_file_rc = 0;
 +      wdata->bytes = len;
  
 -              if (cfile)
 -                      cifsFileInfo_put(cfile);
 +      /* We now have a contiguous set of dirty pages, each with writeback
 +       * set; the first page is still locked at this point, but all the rest
 +       * have been unlocked.
 +       */
 +      folio_unlock(folio);
  
 -              rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile);
 +      if (start < i_size) {
 +              iov_iter_xarray(&wdata->iter, ITER_SOURCE, &mapping->i_pages,
 +                              start, len);
  
 -              /* in case of an error store it to return later */
 +              rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes);
                if (rc)
 -                      get_file_rc = rc;
 +                      goto err_wdata;
  
 -              rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize,
 -                                                 &wsize, credits);
 -              if (rc != 0) {
 -                      done = true;
 -                      break;
 +              if (wdata->cfile->invalidHandle)
 +                      rc = -EAGAIN;
 +              else
 +                      rc = wdata->server->ops->async_writev(wdata,
 +                                                            cifs_writedata_release);
 +              if (rc >= 0) {
 +                      kref_put(&wdata->refcount, cifs_writedata_release);
 +                      goto err_close;
                }
 +      } else {
 +              /* The dirty region was entirely beyond the EOF. */
 +              cifs_pages_written_back(inode, start, len);
 +              rc = 0;
 +      }
  
 -              tofind = min((wsize / PAGE_SIZE) - 1, end - index) + 1;
 +err_wdata:
 +      kref_put(&wdata->refcount, cifs_writedata_release);
 +err_uncredit:
 +      add_credits_and_wake_if(server, credits, 0);
 +err_close:
 +      if (cfile)
 +              cifsFileInfo_put(cfile);
 +err_xid:
 +      free_xid(xid);
 +      if (rc == 0) {
 +              wbc->nr_to_write = count;
 +      } else if (is_retryable_error(rc)) {
 +              cifs_pages_write_redirty(inode, start, len);
 +      } else {
 +              cifs_pages_write_failed(inode, start, len);
 +              mapping_set_error(mapping, rc);
 +      }
 +      /* Indication to update ctime and mtime as close is deferred */
 +      set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags);
 +      return rc;
 +}
  
 -              wdata = wdata_alloc_and_fillpages(tofind, mapping, end, &index,
 -                                                &found_pages);
 -              if (!wdata) {
 -                      rc = -ENOMEM;
 -                      done = true;
 -                      add_credits_and_wake_if(server, credits, 0);
 -                      break;
 -              }
 +/*
 + * write a region of pages back to the server
 + */
 +static int cifs_writepages_region(struct address_space *mapping,
 +                                struct writeback_control *wbc,
 +                                loff_t start, loff_t end, loff_t *_next)
 +{
-       struct folio *folio;
-       struct page *head_page;
-       ssize_t ret;
-       int n, skips = 0;
++      struct folio_batch fbatch;
++      int skips = 0;
  
 -              if (found_pages == 0) {
 -                      kref_put(&wdata->refcount, cifs_writedata_release);
 -                      add_credits_and_wake_if(server, credits, 0);
++      folio_batch_init(&fbatch);
 +      do {
++              int nr;
 +              pgoff_t index = start / PAGE_SIZE;
 +
-               n = find_get_pages_range_tag(mapping, &index, end / PAGE_SIZE,
-                                            PAGECACHE_TAG_DIRTY, 1, &head_page);
-               if (!n)
++              nr = filemap_get_folios_tag(mapping, &index, end / PAGE_SIZE,
++                                          PAGECACHE_TAG_DIRTY, &fbatch);
++              if (!nr)
                        break;
 -              }
  
-               folio = page_folio(head_page);
-               start = folio_pos(folio); /* May regress with THPs */
 -              nr_pages = wdata_prepare_pages(wdata, found_pages, mapping, wbc,
 -                                             end, &index, &next, &done);
++              for (int i = 0; i < nr; i++) {
++                      ssize_t ret;
++                      struct folio *folio = fbatch.folios[i];
  
-               /* At this point we hold neither the i_pages lock nor the
-                * page lock: the page may be truncated or invalidated
-                * (changing page->mapping to NULL), or even swizzled
-                * back from swapper_space to tmpfs file mapping
-                */
-               if (wbc->sync_mode != WB_SYNC_NONE) {
-                       ret = folio_lock_killable(folio);
-                       if (ret < 0) {
-                               folio_put(folio);
-                               return ret;
 -              /* nothing to write? */
 -              if (nr_pages == 0) {
 -                      kref_put(&wdata->refcount, cifs_writedata_release);
 -                      add_credits_and_wake_if(server, credits, 0);
 -                      continue;
 -              }
++redo_folio:
++                      start = folio_pos(folio); /* May regress with THPs */
 -              wdata->credits = credits_on_stack;
 -              wdata->cfile = cfile;
 -              wdata->server = server;
 -              cfile = NULL;
++                      /* At this point we hold neither the i_pages lock nor the
++                       * page lock: the page may be truncated or invalidated
++                       * (changing page->mapping to NULL), or even swizzled
++                       * back from swapper_space to tmpfs file mapping
++                       */
++                      if (wbc->sync_mode != WB_SYNC_NONE) {
++                              ret = folio_lock_killable(folio);
++                              if (ret < 0)
++                                      goto write_error;
++                      } else {
++                              if (!folio_trylock(folio))
++                                      goto skip_write;
 +                      }
-               } else {
-                       if (!folio_trylock(folio)) {
-                               folio_put(folio);
-                               return 0;
 -              if (!wdata->cfile) {
 -                      cifs_dbg(VFS, "No writable handle in writepages rc=%d\n",
 -                               get_file_rc);
 -                      if (is_retryable_error(get_file_rc))
 -                              rc = get_file_rc;
 -                      else
 -                              rc = -EBADF;
 -              } else
 -                      rc = wdata_send_pages(wdata, nr_pages, mapping, wbc);
++                      if (folio_mapping(folio) != mapping ||
++                          !folio_test_dirty(folio)) {
++                              folio_unlock(folio);
++                              goto skip_write;
 +                      }
-               }
  
-               if (folio_mapping(folio) != mapping ||
-                   !folio_test_dirty(folio)) {
-                       start += folio_size(folio);
-                       folio_unlock(folio);
-                       folio_put(folio);
-                       continue;
-               }
 -              for (i = 0; i < nr_pages; ++i)
 -                      unlock_page(wdata->pages[i]);
++                      if (folio_test_writeback(folio) ||
++                          folio_test_fscache(folio)) {
++                              folio_unlock(folio);
++                              if (wbc->sync_mode == WB_SYNC_NONE)
++                                      goto skip_write;
  
-               if (folio_test_writeback(folio) ||
-                   folio_test_fscache(folio)) {
-                       folio_unlock(folio);
-                       if (wbc->sync_mode != WB_SYNC_NONE) {
 -              /* send failure -- clean up the mess */
 -              if (rc != 0) {
 -                      add_credits_and_wake_if(server, &wdata->credits, 0);
 -                      for (i = 0; i < nr_pages; ++i) {
 -                              if (is_retryable_error(rc))
 -                                      redirty_page_for_writepage(wbc,
 -                                                         wdata->pages[i]);
 -                              else
 -                                      SetPageError(wdata->pages[i]);
 -                              end_page_writeback(wdata->pages[i]);
 -                              put_page(wdata->pages[i]);
 +                              folio_wait_writeback(folio);
 +#ifdef CONFIG_CIFS_FSCACHE
 +                              folio_wait_fscache(folio);
 +#endif
-                       } else {
-                               start += folio_size(folio);
++                              goto redo_folio;
                        }
-                       folio_put(folio);
-                       if (wbc->sync_mode == WB_SYNC_NONE) {
-                               if (skips >= 5 || need_resched())
-                                       break;
-                               skips++;
-                       }
-                       continue;
 -                      if (!is_retryable_error(rc))
 -                              mapping_set_error(mapping, rc);
--              }
 -              kref_put(&wdata->refcount, cifs_writedata_release);
  
-               if (!folio_clear_dirty_for_io(folio))
-                       /* We hold the page lock - it should've been dirty. */
-                       WARN_ON(1);
 -              if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN) {
 -                      index = saved_index;
 -                      continue;
 -              }
++                      if (!folio_clear_dirty_for_io(folio))
++                              /* We hold the page lock - it should've been dirty. */
++                              WARN_ON(1);
  
-               ret = cifs_write_back_from_locked_folio(mapping, wbc, folio, start, end);
-               folio_put(folio);
-               if (ret < 0)
 -              /* Return immediately if we received a signal during writing */
 -              if (is_interrupt_error(rc)) {
 -                      done = true;
 -                      break;
 -              }
++                      ret = cifs_write_back_from_locked_folio(mapping, wbc, folio, start, end);
++                      if (ret < 0)
++                              goto write_error;
 -              if (rc != 0 && saved_rc == 0)
 -                      saved_rc = rc;
++                      start += ret;
++                      continue;
 -              wbc->nr_to_write -= nr_pages;
 -              if (wbc->nr_to_write <= 0)
 -                      done = true;
++write_error:
++                      folio_batch_release(&fbatch);
++                      *_next = start;
 +                      return ret;
  
-               start += ret;
 -              index = next;
 -      }
++skip_write:
++                      /*
++                       * Too many skipped writes, or need to reschedule?
++                       * Treat it as a write error without an error code.
++                       */
++                      if (skips >= 5 || need_resched()) {
++                              ret = 0;
++                              goto write_error;
++                      }
 -      if (!scanned && !done) {
 -              /*
 -               * We hit the last page and there is more work to be done: wrap
 -               * back to the start of the file
 -               */
 -              scanned = true;
 -              index = 0;
 -              goto retry;
 -      }
++                      /* Otherwise, just skip that folio and go on to the next */
++                      skips++;
++                      start += folio_size(folio);
++                      continue;
++              }
 -      if (saved_rc != 0)
 -              rc = saved_rc;
++              folio_batch_release(&fbatch);           
 +              cond_resched();
 +      } while (wbc->nr_to_write > 0);
  
 -      if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
 -              mapping->writeback_index = index;
 +      *_next = start;
 +      return 0;
 +}
  
 -      if (cfile)
 -              cifsFileInfo_put(cfile);
 -      free_xid(xid);
 -      /* Indication to update ctime and mtime as close is deferred */
 -      set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags);
 -      return rc;
 +/*
 + * Write some of the pending data back to the server
 + */
 +static int cifs_writepages(struct address_space *mapping,
 +                         struct writeback_control *wbc)
 +{
 +      loff_t start, next;
 +      int ret;
 +
 +      /* We have to be careful as we can end up racing with setattr()
 +       * truncating the pagecache since the caller doesn't take a lock here
 +       * to prevent it.
 +       */
 +
 +      if (wbc->range_cyclic) {
 +              start = mapping->writeback_index * PAGE_SIZE;
 +              ret = cifs_writepages_region(mapping, wbc, start, LLONG_MAX, &next);
 +              if (ret == 0) {
 +                      mapping->writeback_index = next / PAGE_SIZE;
 +                      if (start > 0 && wbc->nr_to_write > 0) {
 +                              ret = cifs_writepages_region(mapping, wbc, 0,
 +                                                           start, &next);
 +                              if (ret == 0)
 +                                      mapping->writeback_index =
 +                                              next / PAGE_SIZE;
 +                      }
 +              }
 +      } else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) {
 +              ret = cifs_writepages_region(mapping, wbc, 0, LLONG_MAX, &next);
 +              if (wbc->nr_to_write > 0 && ret == 0)
 +                      mapping->writeback_index = next / PAGE_SIZE;
 +      } else {
 +              ret = cifs_writepages_region(mapping, wbc,
 +                                           wbc->range_start, wbc->range_end, &next);
 +      }
 +
 +      return ret;
  }
  
  static int
diff --cc fs/coredump.c
Simple merge
diff --cc fs/erofs/data.c
Simple merge
diff --cc fs/exec.c
Simple merge
diff --cc fs/ext4/inode.c
Simple merge
diff --cc fs/ext4/super.c
Simple merge
diff --cc fs/f2fs/data.c
Simple merge
diff --cc fs/fuse/file.c
Simple merge
diff --cc fs/gfs2/aops.c
Simple merge
diff --cc fs/gfs2/glops.c
Simple merge
diff --cc fs/gfs2/log.c
index 61323deb80bc7b70d91a161898fff3ea6a31e2a1,1fcc829f02ab291186edda465c6f811226302fd3..d750d1128bed7c433e075f1e9f33deea0098ed0d
@@@ -80,15 -80,6 +80,15 @@@ void gfs2_remove_from_ail(struct gfs2_b
        brelse(bd->bd_bh);
  }
  
- static int __gfs2_writepage(struct page *page, struct writeback_control *wbc,
++static int __gfs2_writepage(struct folio *folio, struct writeback_control *wbc,
 +                     void *data)
 +{
 +      struct address_space *mapping = data;
-       int ret = mapping->a_ops->writepage(page, wbc);
++      int ret = mapping->a_ops->writepage(&folio->page, wbc);
 +      mapping_set_error(mapping, ret);
 +      return ret;
 +}
 +
  /**
   * gfs2_ail1_start_one - Start I/O on a transaction
   * @sdp: The superblock
Simple merge
Simple merge
diff --cc fs/mpage.c
Simple merge
diff --cc fs/nfs/write.c
index b508c985eb14abdc7d41e793f1e37b1f8beafd8b,9d6432cb3f4465021855221aac55d94ee3bb0ff4..f4cca8f00c0c20f6906e4e2e08ea19fc1a692e00
@@@ -688,15 -689,14 +688,14 @@@ int nfs_writepage(struct page *page, st
        return ret;
  }
  
- static int nfs_writepages_callback(struct page *page,
+ static int nfs_writepages_callback(struct folio *folio,
 -              struct writeback_control *wbc, void *data)
 +                                 struct writeback_control *wbc, void *data)
  {
-       struct folio *folio = page_folio(page);
        int ret;
  
 -      ret = nfs_do_writepage(&folio->page, wbc, data);
 +      ret = nfs_do_writepage(folio, wbc, data);
        if (ret != AOP_WRITEPAGE_ACTIVATE)
-               unlock_page(page);
+               folio_unlock(folio);
        return ret;
  }
  
Simple merge
Simple merge
Simple merge
Simple merge
diff --cc fs/udf/inode.c
index 3b2adf4cbc5793b2e85825c10962e51cccb4f907,34e416327dd4ee2e4f3e399a0b50eabdd7ff44e1..f7a9607c2b9578ce459dee3ebc87360ea834f327
@@@ -185,45 -182,10 +185,46 @@@ static void udf_write_failed(struct add
        }
  }
  
- static int udf_adinicb_writepage(struct page *page,
++static int udf_adinicb_writepage(struct folio *folio,
 +                               struct writeback_control *wbc, void *data)
 +{
++      struct page *page = &folio->page;
 +      struct inode *inode = page->mapping->host;
 +      struct udf_inode_info *iinfo = UDF_I(inode);
 +
 +      BUG_ON(!PageLocked(page));
 +      memcpy_to_page(page, 0, iinfo->i_data + iinfo->i_lenEAttr,
 +                     i_size_read(inode));
 +      unlock_page(page);
 +      mark_inode_dirty(inode);
 +
 +      return 0;
 +}
 +
  static int udf_writepages(struct address_space *mapping,
 -                      struct writeback_control *wbc)
 +                        struct writeback_control *wbc)
  {
 -      return mpage_writepages(mapping, wbc, udf_get_block);
 +      struct inode *inode = mapping->host;
 +      struct udf_inode_info *iinfo = UDF_I(inode);
 +
 +      if (iinfo->i_alloc_type != ICBTAG_FLAG_AD_IN_ICB)
 +              return mpage_writepages(mapping, wbc, udf_get_block_wb);
 +      return write_cache_pages(mapping, wbc, udf_adinicb_writepage, NULL);
 +}
 +
 +static void udf_adinicb_readpage(struct page *page)
 +{
 +      struct inode *inode = page->mapping->host;
 +      char *kaddr;
 +      struct udf_inode_info *iinfo = UDF_I(inode);
 +      loff_t isize = i_size_read(inode);
 +
 +      kaddr = kmap_local_page(page);
 +      memcpy(kaddr, iinfo->i_data + iinfo->i_lenEAttr, isize);
 +      memset(kaddr + isize, 0, PAGE_SIZE - isize);
 +      flush_dcache_page(page);
 +      SetPageUptodate(page);
 +      kunmap_local(kaddr);
  }
  
  static int udf_read_folio(struct file *file, struct folio *folio)
Simple merge
Simple merge
Simple merge
Simple merge
index 1e38e99998c79bb4eca5191fe6197ae165e79adb,5567319027d1806f3a261109993a9cca9fad9ff3..b6eda2ab205dc7133472ba56a4f658cfaec5ff4f
@@@ -1754,17 -1776,11 +1776,17 @@@ struct obj_cgroup *get_obj_cgroup_from_
  int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size);
  void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size);
  
- extern struct static_key_false memcg_kmem_enabled_key;
 +extern struct static_key_false memcg_bpf_enabled_key;
 +static inline bool memcg_bpf_enabled(void)
 +{
 +      return static_branch_likely(&memcg_bpf_enabled_key);
 +}
 +
+ extern struct static_key_false memcg_kmem_online_key;
  
- static inline bool memcg_kmem_enabled(void)
+ static inline bool memcg_kmem_online(void)
  {
-       return static_branch_likely(&memcg_kmem_enabled_key);
+       return static_branch_likely(&memcg_kmem_online_key);
  }
  
  static inline int memcg_kmem_charge_page(struct page *page, gfp_t gfp,
@@@ -1838,12 -1854,7 +1860,12 @@@ static inline struct obj_cgroup *get_ob
        return NULL;
  }
  
- static inline bool memcg_kmem_enabled(void)
 +static inline bool memcg_bpf_enabled(void)
 +{
 +      return false;
 +}
 +
+ static inline bool memcg_kmem_online(void)
  {
        return false;
  }
Simple merge
index af8119776ab18a57932378d12ae6285ab7284ac1,56753d0f096d06591c5c2ff29b05b72fb212c7b3..0722859c36478d92f06455c6841144a7bfe5d207
@@@ -645,20 -598,9 +598,20 @@@ struct mm_struct 
                 * &struct mm_struct is freed.
                 */
                atomic_t mm_count;
 -
 +#ifdef CONFIG_SCHED_MM_CID
 +              /**
 +               * @cid_lock: Protect cid bitmap updates vs lookups.
 +               *
 +               * Prevent situations where updates to the cid bitmap happen
 +               * concurrently with lookups. Those can lead to situations
 +               * where a lookup cannot find a free bit simply because it was
 +               * unlucky enough to load, non-atomically, bitmap words as they
 +               * were being concurrently updated by the updaters.
 +               */
 +              raw_spinlock_t cid_lock;
 +#endif
  #ifdef CONFIG_MMU
-               atomic_long_t pgtables_bytes;   /* PTE page table pages */
+               atomic_long_t pgtables_bytes;   /* size of all page tables */
  #endif
                int map_count;                  /* number of VMAs */
  
@@@ -915,41 -857,9 +868,39 @@@ struct vma_iterator 
  static inline void vma_iter_init(struct vma_iterator *vmi,
                struct mm_struct *mm, unsigned long addr)
  {
-       vmi->mas.tree = &mm->mm_mt;
-       vmi->mas.index = addr;
-       vmi->mas.node = MAS_START;
+       mas_init(&vmi->mas, &mm->mm_mt, addr);
  }
  
 +#ifdef CONFIG_SCHED_MM_CID
 +/* Accessor for struct mm_struct's cidmask. */
 +static inline cpumask_t *mm_cidmask(struct mm_struct *mm)
 +{
 +      unsigned long cid_bitmap = (unsigned long)mm;
 +
 +      cid_bitmap += offsetof(struct mm_struct, cpu_bitmap);
 +      /* Skip cpu_bitmap */
 +      cid_bitmap += cpumask_size();
 +      return (struct cpumask *)cid_bitmap;
 +}
 +
 +static inline void mm_init_cid(struct mm_struct *mm)
 +{
 +      raw_spin_lock_init(&mm->cid_lock);
 +      cpumask_clear(mm_cidmask(mm));
 +}
 +
 +static inline unsigned int mm_cid_size(void)
 +{
 +      return cpumask_size();
 +}
 +#else /* CONFIG_SCHED_MM_CID */
 +static inline void mm_init_cid(struct mm_struct *mm) { }
 +static inline unsigned int mm_cid_size(void)
 +{
 +      return 0;
 +}
 +#endif /* CONFIG_SCHED_MM_CID */
 +
  struct mmu_gather;
  extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm);
  extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm);
Simple merge
diff --cc init/main.c
Simple merge
Simple merge
Simple merge
Simple merge
diff --cc kernel/fork.c
Simple merge
Simple merge
Simple merge
diff --cc kernel/sys.c
Simple merge
Simple merge
diff --cc mm/compaction.c
Simple merge
diff --cc mm/filemap.c
Simple merge
index 1b791b26d72d7aa678512a40488beab342c2648b,1343a7d88299be850afdc1f022607d675bf92d9b..4fc43859e59a31932a657cd2fac2b511c00e812b
@@@ -3272,8 -3274,10 +3274,8 @@@ void remove_migration_pmd(struct page_v
        pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot));
        if (pmd_swp_soft_dirty(*pvmw->pmd))
                pmde = pmd_mksoft_dirty(pmde);
 -      if (is_writable_migration_entry(entry))
 -              pmde = maybe_pmd_mkwrite(pmde, vma);
        if (pmd_swp_uffd_wp(*pvmw->pmd))
-               pmde = pmd_wrprotect(pmd_mkuffd_wp(pmde));
+               pmde = pmd_mkuffd_wp(pmde);
        if (!is_migration_entry_young(entry))
                pmde = pmd_mkold(pmde);
        /* NOTE: this may contain setting soft-dirty on some archs */
diff --cc mm/internal.h
Simple merge
Simple merge
diff --cc mm/khugepaged.c
Simple merge
diff --cc mm/madvise.c
Simple merge
diff --cc mm/memcontrol.c
index 49f40730e7117da7747940942016b97c7ff1e101,25f2465d5a37b55d9588e91f3f70e41785261ecd..5abffe6f8389e27a705068e028dee875c91efa91
@@@ -348,11 -345,8 +348,11 @@@ static void memcg_reparent_objcgs(struc
   * conditional to this static branch, we'll have to allow modules that does
   * kmem_cache_alloc and the such to see this symbol as well
   */
- DEFINE_STATIC_KEY_FALSE(memcg_kmem_enabled_key);
- EXPORT_SYMBOL(memcg_kmem_enabled_key);
+ DEFINE_STATIC_KEY_FALSE(memcg_kmem_online_key);
+ EXPORT_SYMBOL(memcg_kmem_online_key);
 +
 +DEFINE_STATIC_KEY_FALSE(memcg_bpf_enabled_key);
 +EXPORT_SYMBOL(memcg_bpf_enabled_key);
  #endif
  
  /**
diff --cc mm/migrate.c
Simple merge
diff --cc mm/page_alloc.c
Simple merge
diff --cc mm/page_io.c
Simple merge
diff --cc mm/secretmem.c
Simple merge
diff --cc mm/shmem.c
index 41f82c5a5e28a1ffdb0054a52e46e6cf16d800b6,577b3838c6b9eb6622466d5e25105dc4dfcd431f..448f393d8ab2b1bd5f22eec603195977716a8727
@@@ -1066,9 -1065,9 +1065,9 @@@ static int shmem_getattr(struct mnt_idm
        stat->attributes_mask |= (STATX_ATTR_APPEND |
                        STATX_ATTR_IMMUTABLE |
                        STATX_ATTR_NODUMP);
 -      generic_fillattr(&init_user_ns, inode, stat);
 +      generic_fillattr(idmap, inode, stat);
  
-       if (shmem_is_huge(NULL, inode, 0, false))
+       if (shmem_is_huge(inode, 0, false, NULL, 0))
                stat->blksize = HPAGE_PMD_SIZE;
  
        if (request_mask & STATX_BTIME) {
diff --cc mm/slab.c
Simple merge
diff --cc mm/slub.c
Simple merge
diff --cc mm/swap.c
Simple merge
diff --cc mm/swapfile.c
Simple merge
diff --cc mm/vmalloc.c
Simple merge
diff --cc net/ipv4/tcp.c
Simple merge
Simple merge
Simple merge
index 0000000000000000000000000000000000000000,d90cdc06aa59e398b5d3833e6cedf20fda64788d..c31d952cff68fd3681951a9544eeb6c55eca7116
mode 000000,100644..100644
--- /dev/null
@@@ -1,0 -1,185 +1,185 @@@
 -CFLAGS = -Wall -I $(top_srcdir) -I $(top_srcdir)/usr/include $(EXTRA_CFLAGS) $(KHDR_INCLUDES)
+ # SPDX-License-Identifier: GPL-2.0
+ # Makefile for mm selftests
+ LOCAL_HDRS += $(selfdir)/mm/local_config.h $(top_srcdir)/mm/gup_test.h
+ include local_config.mk
+ ifeq ($(CROSS_COMPILE),)
+ uname_M := $(shell uname -m 2>/dev/null || echo not)
+ else
+ uname_M := $(shell echo $(CROSS_COMPILE) | grep -o '^[a-z0-9]\+')
+ endif
+ MACHINE ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/' -e 's/ppc64.*/ppc64/')
+ # Without this, failed build products remain, with up-to-date timestamps,
+ # thus tricking Make (and you!) into believing that All Is Well, in subsequent
+ # make invocations:
+ .DELETE_ON_ERROR:
+ # Avoid accidental wrong builds, due to built-in rules working just a little
+ # bit too well--but not quite as well as required for our situation here.
+ #
+ # In other words, "make userfaultfd" is supposed to fail to build at all,
+ # because this Makefile only supports either "make" (all), or "make /full/path".
+ # However,  the built-in rules, if not suppressed, will pick up CFLAGS and the
+ # initial LDLIBS (but not the target-specific LDLIBS, because those are only
+ # set for the full path target!). This causes it to get pretty far into building
+ # things despite using incorrect values such as an *occasionally* incomplete
+ # LDLIBS.
+ MAKEFLAGS += --no-builtin-rules
++CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES)
+ LDLIBS = -lrt -lpthread
+ TEST_GEN_FILES = cow
+ TEST_GEN_FILES += compaction_test
+ TEST_GEN_FILES += gup_test
+ TEST_GEN_FILES += hmm-tests
+ TEST_GEN_FILES += hugetlb-madvise
+ TEST_GEN_FILES += hugepage-mmap
+ TEST_GEN_FILES += hugepage-mremap
+ TEST_GEN_FILES += hugepage-shm
+ TEST_GEN_FILES += hugepage-vmemmap
+ TEST_GEN_FILES += khugepaged
+ TEST_GEN_PROGS = madv_populate
+ TEST_GEN_FILES += map_fixed_noreplace
+ TEST_GEN_FILES += map_hugetlb
+ TEST_GEN_FILES += map_populate
+ TEST_GEN_FILES += memfd_secret
+ TEST_GEN_FILES += migration
+ TEST_GEN_FILES += mlock-random-test
+ TEST_GEN_FILES += mlock2-tests
+ TEST_GEN_FILES += mrelease_test
+ TEST_GEN_FILES += mremap_dontunmap
+ TEST_GEN_FILES += mremap_test
+ TEST_GEN_FILES += on-fault-limit
+ TEST_GEN_FILES += thuge-gen
+ TEST_GEN_FILES += transhuge-stress
+ TEST_GEN_FILES += userfaultfd
+ TEST_GEN_PROGS += soft-dirty
+ TEST_GEN_PROGS += split_huge_page_test
+ TEST_GEN_FILES += ksm_tests
+ TEST_GEN_PROGS += ksm_functional_tests
+ TEST_GEN_PROGS += mdwe_test
+ ifeq ($(MACHINE),x86_64)
+ CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32)
+ CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_64bit_program.c)
+ CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_program.c -no-pie)
+ VMTARGETS := protection_keys
+ BINARIES_32 := $(VMTARGETS:%=%_32)
+ BINARIES_64 := $(VMTARGETS:%=%_64)
+ ifeq ($(CAN_BUILD_WITH_NOPIE),1)
+ CFLAGS += -no-pie
+ endif
+ ifeq ($(CAN_BUILD_I386),1)
+ TEST_GEN_FILES += $(BINARIES_32)
+ endif
+ ifeq ($(CAN_BUILD_X86_64),1)
+ TEST_GEN_FILES += $(BINARIES_64)
+ endif
+ else
+ ifneq (,$(findstring $(MACHINE),ppc64))
+ TEST_GEN_FILES += protection_keys
+ endif
+ endif
+ ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64))
+ TEST_GEN_FILES += va_128TBswitch
+ TEST_GEN_FILES += virtual_address_range
+ TEST_GEN_FILES += write_to_hugetlbfs
+ endif
+ TEST_PROGS := run_vmtests.sh
+ TEST_FILES := test_vmalloc.sh
+ TEST_FILES += test_hmm.sh
+ TEST_FILES += va_128TBswitch.sh
+ include ../lib.mk
+ $(OUTPUT)/cow: vm_util.c
+ $(OUTPUT)/khugepaged: vm_util.c
+ $(OUTPUT)/ksm_functional_tests: vm_util.c
+ $(OUTPUT)/madv_populate: vm_util.c
+ $(OUTPUT)/soft-dirty: vm_util.c
+ $(OUTPUT)/split_huge_page_test: vm_util.c
+ $(OUTPUT)/userfaultfd: vm_util.c
+ ifeq ($(MACHINE),x86_64)
+ BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32))
+ BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64))
+ define gen-target-rule-32
+ $(1) $(1)_32: $(OUTPUT)/$(1)_32
+ .PHONY: $(1) $(1)_32
+ endef
+ define gen-target-rule-64
+ $(1) $(1)_64: $(OUTPUT)/$(1)_64
+ .PHONY: $(1) $(1)_64
+ endef
+ ifeq ($(CAN_BUILD_I386),1)
+ $(BINARIES_32): CFLAGS += -m32 -mxsave
+ $(BINARIES_32): LDLIBS += -lrt -ldl -lm
+ $(BINARIES_32): $(OUTPUT)/%_32: %.c
+       $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
+ $(foreach t,$(VMTARGETS),$(eval $(call gen-target-rule-32,$(t))))
+ endif
+ ifeq ($(CAN_BUILD_X86_64),1)
+ $(BINARIES_64): CFLAGS += -m64 -mxsave
+ $(BINARIES_64): LDLIBS += -lrt -ldl
+ $(BINARIES_64): $(OUTPUT)/%_64: %.c
+       $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
+ $(foreach t,$(VMTARGETS),$(eval $(call gen-target-rule-64,$(t))))
+ endif
+ # x86_64 users should be encouraged to install 32-bit libraries
+ ifeq ($(CAN_BUILD_I386)$(CAN_BUILD_X86_64),01)
+ all: warn_32bit_failure
+ warn_32bit_failure:
+       @echo "Warning: you seem to have a broken 32-bit build" 2>&1;           \
+       echo  "environment. This will reduce test coverage of 64-bit" 2>&1;     \
+       echo  "kernels. If you are using a Debian-like distribution," 2>&1;     \
+       echo  "try:"; 2>&1;                                                     \
+       echo  "";                                                               \
+       echo  "  apt-get install gcc-multilib libc6-i386 libc6-dev-i386";       \
+       echo  "";                                                               \
+       echo  "If you are using a Fedora-like distribution, try:";              \
+       echo  "";                                                               \
+       echo  "  yum install glibc-devel.*i686";                                \
+       exit 0;
+ endif
+ endif
+ # cow_EXTRA_LIBS may get set in local_config.mk, or it may be left empty.
+ $(OUTPUT)/cow: LDLIBS += $(COW_EXTRA_LIBS)
+ $(OUTPUT)/mlock-random-test $(OUTPUT)/memfd_secret: LDLIBS += -lcap
+ $(OUTPUT)/ksm_tests: LDLIBS += -lnuma
+ $(OUTPUT)/migration: LDLIBS += -lnuma
+ local_config.mk local_config.h: check_config.sh
+       /bin/sh ./check_config.sh $(CC)
+ EXTRA_CLEAN += local_config.mk local_config.h
+ ifeq ($(COW_EXTRA_LIBS),)
+ all: warn_missing_liburing
+ warn_missing_liburing:
+       @echo ; \
+       echo "Warning: missing liburing support. Some COW tests will be skipped." ; \
+       echo
+ endif