platform/kernel/linux-rpi.git
17 months agoMerge tag 'drm-intel-next-2023-01-27' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Mon, 30 Jan 2023 03:35:56 +0000 (13:35 +1000)]
Merge tag 'drm-intel-next-2023-01-27' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

drm/i915 feature pull #2 v6.3:

Features and functionality:
- Enable HF-EEODB by switching HDMI, DP and LVDS to use struct drm_edid (Jani)
- Start using unversioned DMC firmware paths for new platforms (Gustavo)

Refactoring and cleanups:
- ELD refactor: Stop using hardware buffer, precompute ELD, and wire up ELD in
  the state checker (Ville)
- Use generics for debugfs device parameters (Jani)
- DSB refactoring and fixes (Ville)
- Header refactoring, add new intel_display_limits.h (Jani)
- Split out GMCH code to a new file (Jani)
- Split out vblank code to a new file (Jani)
- i915_drv.h and struct drm_i915_private cleanups (Jani)
- Simplify FBC and DRRS debug attributes (Deepak R Varma)
- Remove some single-use macros (Rodrigo)

Fixes:
- Fix scaler limits for display versions 12 and 13 (Luca)
- Fix plane source size check for zero height (Drew Davenport)
- Implement PSR2 selective fetch workaround (Jouni)
- Expand a PSR workaound to more platforms and pipes (Jouni)
- Expand an HDMI infoframe workaround to all MTL steppings (Jouni)
- Enable PIPEDMC whenever its corresponding pipe is enabled (Imre)

Merges:
- Backmerge drm-next (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87tu0c44gv.fsf@intel.com
17 months agoMerge tag 'drm-habanalabs-next-2023-01-26' of https://git.kernel.org/pub/scm/linux...
Dave Airlie [Mon, 30 Jan 2023 02:43:10 +0000 (12:43 +1000)]
Merge tag 'drm-habanalabs-next-2023-01-26' of https://git./linux/kernel/git/ogabbay/linux into drm-next

This tag contains habanalabs driver and accel changes for v6.3:

- Moved the driver to the accel subsystem. Currently only the files were
  moved (including the uapi file which was also renamed). This doesn't
  include registering to the accel subsystem. This will probably be only
  in the next kernel version.

- In case of decoder error (axi error) in Gaudi2, we can now find the exact
  IP that initiated the erroneous transaction and print the details for
  better debug.

- Add more trace events. We now can trace mmio transactions and communication
  with the preboot firmware.

- Add to Gaudi2 support for abrupt reset that is done by the firmware. This
  was support so far only for Gaudi1.

- Add uAPI to flush memory transactions (to the device memory). This is
  needed by the communications library in case of doing p2p with a host NIC
  which access our HBM directly through the PCI BAR.

- Add uAPI to pass-through a request from user-space to firmware and get the
  result back to user-space. This will allow the driver code to avoid the
  need to add new packet (in the communication channel with the firmware) for
  every new request type.

- Remove the option to export dma-buf by memory allocation handle in our uAPI.
  This was planned for Gaudi2 but was never used. Instead, we will do export
  by memory address (same as Gaudi1). In addition, we added the option to
  specify an offset to the address. This is needed in Gaudi2 because there
  the user allocates the entire HBM in one allocation, but would like to
  export only small part of it.

- Multiple bug fixes, refactors and small optimizations.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Oded Gabbay <ogabbay@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230126213317.GA1520525@ogabbay-vm-u20.habana-labs.com
17 months agoMerge tag 'drm-misc-next-2023-01-26' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Mon, 30 Jan 2023 01:26:08 +0000 (11:26 +1000)]
Merge tag 'drm-misc-next-2023-01-26' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for v6.3:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:

 * fbdev-helper: Streamline code in generic fbdev and its helpers

 * TTM: Fixes plus their reverts

Driver Changes:

 * accel/ivpu: Typo fixes

 * i915: TTM-related fixes

 * nouveau: Remove unused return value from disable helper

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/Y9I2nOzHxTxPeTjg@linux-uq9g
17 months agodrm/i915/mtl: Apply Wa_14013475917 for all MTL steppings
Jouni Högander [Tue, 24 Jan 2023 10:26:36 +0000 (12:26 +0200)]
drm/i915/mtl: Apply Wa_14013475917 for all MTL steppings

Wa_14013475917 has to be applied for all MTL steppings.

Bspec: 66624

Cc: Mika Kahola <mika.kahola@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230124102636.2567292-3-jouni.hogander@intel.com
17 months agodrm/i915/psr: Implement Wa_14014971492
Jouni Högander [Tue, 24 Jan 2023 10:26:35 +0000 (12:26 +0200)]
drm/i915/psr: Implement Wa_14014971492

Implement Wa_14014971492 and apply it for affected platforms.

Bspec: 52890, 54369, 55378, 66624

v2: Adjust platforms where applied

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230124102636.2567292-2-jouni.hogander@intel.com
17 months agodrm/i915/panel: move panel fixed EDID to struct intel_panel
Jani Nikula [Wed, 25 Jan 2023 11:10:52 +0000 (13:10 +0200)]
drm/i915/panel: move panel fixed EDID to struct intel_panel

It's a bit confusing to have two cached EDIDs in struct intel_connector
with slightly different purposes. Make the distinction a bit clearer by
moving the EDID cached for eDP and LVDS panels at connector init time to
struct intel_panel, and name it fixed_edid. That's what it is, a fixed
EDID for the panels.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/328350ef918638928a8286cdbab3107c8258332d.1674643465.git.jani.nikula@intel.com
17 months agodrm/i915/opregion: convert intel_opregion_get_edid() to struct drm_edid
Jani Nikula [Wed, 25 Jan 2023 11:10:51 +0000 (13:10 +0200)]
drm/i915/opregion: convert intel_opregion_get_edid() to struct drm_edid

Simplify validation and use by converting to drm_edid.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6abb01f1e97d54a3c11bec24377f035df412b492.1674643465.git.jani.nikula@intel.com
17 months agodrm/i915/bios: convert intel_bios_init_panel() to drm_edid
Jani Nikula [Wed, 25 Jan 2023 11:10:50 +0000 (13:10 +0200)]
drm/i915/bios: convert intel_bios_init_panel() to drm_edid

Try to use struct drm_edid where possible, even if having to fall back
to looking into struct edid down low via drm_edid_raw().

v2: Rebase

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/897807d62f74f690a173ecd405e25c6ccdd63b98.1674643465.git.jani.nikula@intel.com
17 months agodrm/i915/edid: convert DP, HDMI and LVDS to drm_edid
Jani Nikula [Wed, 25 Jan 2023 11:10:49 +0000 (13:10 +0200)]
drm/i915/edid: convert DP, HDMI and LVDS to drm_edid

Convert all the connectors that use cached connector edid and
detect_edid to drm_edid.

Since drm_get_edid() calls drm_connector_update_edid_property() while
drm_edid_read*() do not, we need to call drm_edid_connector_update()
separately, in part due to the EDID caching behaviour in HDMI and
DP. Especially DP depends on the details parsed from EDID. (The big
behavioural change conflating EDID reading with parsing and property
update was done in commit 5186421cbfe2 ("drm: Introduce epoch counter to
drm_connector"))

v6: Rebase on drm_edid_connector_add_modes()

v5: Fix potential uninitialized var use (kernel test robot <lkp@intel.com>)

v4: Call drm_edid_connector_update() after reading HDMI/DP EDID

v3: Don't leak vga switcheroo EDID in LVDS init (Ville)

v2: Don't leak opregion fallback EDID (Ville)

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/eabb4de932841b38b34cc2818ea9fbf7c10224fd.1674643465.git.jani.nikula@intel.com
17 months agohabanalabs: Fix list of /sys/class/habanalabs/hl<n>/status
Bagas Sanjaya [Fri, 20 Jan 2023 12:35:33 +0000 (19:35 +0700)]
habanalabs: Fix list of /sys/class/habanalabs/hl<n>/status

Stephen Rothwell reported htmldocs warnings when merging accel tree:

Documentation/ABI/testing/sysfs-driver-habanalabs:201: ERROR: Unexpected indentation.
Documentation/ABI/testing/sysfs-driver-habanalabs:201: WARNING: Block quote ends without a blank line; unexpected unindent.
Documentation/ABI/testing/sysfs-driver-habanalabs:201: ERROR: Unexpected indentation.
Documentation/ABI/testing/sysfs-driver-habanalabs:201: WARNING: Block quote ends without a blank line; unexpected unindent.

Fix these by fixing alignment of list of card status returned by
/sys/class/habanalabs/hl<n>/status.

Link: https://lore.kernel.org/linux-next/20230120130634.61c3e857@canb.auug.org.au/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agoDocumentation: accel: escape wildcard in special file path
Bagas Sanjaya [Fri, 20 Jan 2023 12:35:32 +0000 (19:35 +0700)]
Documentation: accel: escape wildcard in special file path

Stephen Rothwell reported htmldocs warning then merging accel tree:

Documentation/accel/introduction.rst:72: WARNING: Inline emphasis start-string without end-string.

Sphinx confuses the file wildcards with inline emphasis (italics), hence
the warning.

Fix the warning by escaping wildcards.

Link: https://lore.kernel.org/linux-next/20230120132116.21de1104@canb.auug.org.au/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agodocs: accel: Fix debugfs path
Jeffrey Hugo [Thu, 19 Jan 2023 16:26:08 +0000 (09:26 -0700)]
docs: accel: Fix debugfs path

The device specific directory in debugfs does not have "accel".  For
example, the documentation says device 0 should have a debugfs entry as
/sys/kernel/debug/accel/accel0/ but in reality the entry is
/sys/kernel/debug/accel/0/

Fix the documentation to match the implementation.

Fixes: 8c5577a5ccc6 ("doc: add documentation for accel subsystem")
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: find decode error root cause
Koby Elbaz [Sun, 15 Jan 2023 10:38:53 +0000 (12:38 +0200)]
habanalabs/gaudi2: find decode error root cause

When a decode error happens, we often don't know the exact root
cause (the erroneous address that was accessed) and the exact engine
that created the erroneous transaction.

To find out, we need to go over all the relevant register blocks
in the ASIC. Once we find the relevant engine, we print its details
and the offending address.

This helps tremendously when debugging an error that was created
by running a user workload.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: unsecure tpc kernel_config registers
Ofir Bitton [Wed, 18 Jan 2023 07:36:06 +0000 (09:36 +0200)]
habanalabs/gaudi2: unsecure tpc kernel_config registers

This is required in order to allow the kernel to control relevant
configuration space via load and store instructions.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: clear in_compute_reset when escalating to hard reset
Tomer Tayar [Tue, 17 Jan 2023 17:45:24 +0000 (19:45 +0200)]
habanalabs: clear in_compute_reset when escalating to hard reset

If resetting device upon release while the release watchdog work is
scheduled, the compute reset is replaced with hard reset.
In this case, need to clear the in_compute_reset indication in the
device reset information structure.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: run error handling if scrub_device_mem fails after reset
Tomer Tayar [Tue, 17 Jan 2023 13:50:41 +0000 (15:50 +0200)]
habanalabs: run error handling if scrub_device_mem fails after reset

If device memory scrubbing from hl_device_reset() fails, we return with
an error code but not perform error handling code.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: enhance info printed on FW load errors
Moti Haimovski [Tue, 3 Jan 2023 08:28:24 +0000 (10:28 +0200)]
habanalabs: enhance info printed on FW load errors

This commit enhances the following error messages to also provide the
type of error occurred, this in order to ease debugging of errors
detected during firmware-load.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: optimize command submission completion timestamp
Ofir Bitton [Tue, 10 Jan 2023 09:41:39 +0000 (11:41 +0200)]
habanalabs: optimize command submission completion timestamp

Completion timestamp is taken during the actual command submission
release. As the release happens in a work queue, the timestamp taken
is not accurate. Hence, we will take the timestamp in the interrupt
handler itself while propagating it to the release function.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: refactor user interrupt type
Ofir Bitton [Mon, 16 Jan 2023 18:20:22 +0000 (20:20 +0200)]
habanalabs: refactor user interrupt type

In order to support more user interrupt types in the future, we
enumerate the user interrupt type instead of using a boolean.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: fix emda range registers razwi handling
Dani Liberman [Mon, 16 Jan 2023 10:00:05 +0000 (12:00 +0200)]
habanalabs/gaudi2: fix emda range registers razwi handling

Handling edma razwi is different than all other engines since edma
uses sft routers. For hbw transactions sft router contain separate
interface for each edma and for lbw there is common interface for
both edma engines of the same dcore.

To handle the razwi correctly we need to:
1. Simplify the calculation of the sft router address.
2. Add razwi handling for edma qm errors, since edma qman doesn't
   reports axi error response.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: block soft-reset on an unusable device
Koby Elbaz [Wed, 11 Jan 2023 13:43:21 +0000 (15:43 +0200)]
habanalabs: block soft-reset on an unusable device

A device with status malfunction indicates that it can't be used.
In such a case we do not support certain reset types, e.g.,
all kinds of soft-resets (compute reset, inference soft-reset),
and reset upon device release.

A hard-reset is the only way that an unusable device can change its
status. All other reset procedures can't put the device in a reset
procedure, which might ultimately cause the device to change its
status, unintentionally, to become operational again.

Such a scenario has recently occurred, when a user requested
a hard-reset while another heavy user workload was ongoing (reset
request is queued).
Since the workload couldn't finish within reset's timeout limits, the
reset has failed and set a device status malfunction.
Eventually, when the user released the FD, an unsuccessful soft-reset
occurred, hence followed by an additional hard-reset that changed the
ASICs status back to be operational.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: print page fault axi transaction id
Dani Liberman [Tue, 10 Jan 2023 13:48:36 +0000 (15:48 +0200)]
habanalabs/gaudi2: print page fault axi transaction id

AXI transaction id holds information about the initiator which caused
the page fault. In the future it will be translated automatically by
driver to an initiator name.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: update device status sysfs documentation
Ofir Bitton [Tue, 10 Jan 2023 19:43:28 +0000 (21:43 +0200)]
habanalabs: update device status sysfs documentation

As device status was changed recently, we must update the
documentation as well.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agoaccel: Add .mmap to DRM_ACCEL_FOPS
Jeffrey Hugo [Tue, 17 Jan 2023 17:45:58 +0000 (10:45 -0700)]
accel: Add .mmap to DRM_ACCEL_FOPS

In reviewing the ivpu driver, DEFINE_DRM_ACCEL_FOPS could have been used
if DRM_ACCEL_FOPS defined .mmap to be drm_gem_mmap.  Lets add that since
accel drivers are a variant of drm drivers, modern drm drivers are
expected to use GEM, and mmap() is a common operation that is expected
to be heavily used in accel drivers thus the common accel driver should
be able to just use DEFINE_DRM_ACCEL_FOPS() for convenience.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agoMAINTAINERS/ACCEL: Add include/drm/drm_accel.h to the accel entry
Jeffrey Hugo [Tue, 17 Jan 2023 18:04:30 +0000 (11:04 -0700)]
MAINTAINERS/ACCEL: Add include/drm/drm_accel.h to the accel entry

get_maintainer.pl does not suggest Oded Gabbay, the DRM COMPUTE
ACCELERATORS DRIVERS AND FRAMEWORK maintainer for changes that touch
the Accel Subsystem header - drm_accel.h.  This is because that file is
missing from the Accel Subsystem entry.  Fix this.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabe/gaudi2: add cfg base when displaying razwi addresses
Dani Liberman [Wed, 11 Jan 2023 08:50:09 +0000 (10:50 +0200)]
habanalabe/gaudi2: add cfg base when displaying razwi addresses

Captured addresses of low b/w razwi information contains only the
offset from the cfg base. To make it more user readable, add the cfg
base to it.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: read mmio razwi information
Dani Liberman [Thu, 5 Jan 2023 15:12:28 +0000 (17:12 +0200)]
habanalabs/gaudi2: read mmio razwi information

In gaudi2 there night be different routers for low b/w and high b/w
transactions. But in the code that collects razwi information, we used
the same router for high b/w and low b/w.

Fixed it by reading the information also from low b/w routers.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: fix bug in timestamps registration code
farah kassabri [Tue, 10 Jan 2023 10:29:55 +0000 (12:29 +0200)]
habanalabs: fix bug in timestamps registration code

Protect re-using the same timestamp buffer record before actually
adding it to the to interrupt wait list.
Mark ts buff offset as in use in the spinlock protection area of the
interrupt wait list to avoid getting in the re-use section in
ts_buff_get_kernel_ts_record before adding the node to the list.
this scenario might happen when multiple threads are racing on
same offset and one thread could set data in the ts buff in
ts_buff_get_kernel_ts_record then the other thread takes over
and get to ts_buff_get_kernel_ts_record and we will try
to re-use the same ts buff offset then we will try to
delete a non existing node from the list.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: bugs fixes in timestamps buff alloc
farah kassabri [Sun, 8 Jan 2023 15:33:44 +0000 (17:33 +0200)]
habanalabs: bugs fixes in timestamps buff alloc

use argument instead of fixed GFP value for allocation
in Timestamps buffers alloc function.
change data type of size to size_t.

Fixes: 9158bf69e74f ("habanalabs: Timestamps buffers registration")
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: check pad and reserved fields in ioctls
farah kassabri [Tue, 3 Jan 2023 12:23:55 +0000 (14:23 +0200)]
habanalabs: check pad and reserved fields in ioctls

Make sure all reserved/pad fields in uapi input structures
are set to 0.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: remove unnecessary (void*) conversions
XU pengfei [Tue, 10 Jan 2023 10:35:13 +0000 (18:35 +0800)]
habanalabs: remove unnecessary (void*) conversions

data is a void * type and does not require a cast.

Signed-off-by: XU pengfei <xupengfei@nfschina.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: Replace zero-length arrays with flexible-array members
Gustavo A. R. Silva [Tue, 10 Jan 2023 01:39:47 +0000 (19:39 -0600)]
habanalabs: Replace zero-length arrays with flexible-array members

Zero-length arrays are deprecated[1] and we are moving towards
adopting C99 flexible-array members instead. So, replace zero-length
arrays in a couple of structures with flex-array members.

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [2].

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html
Link: https://github.com/KSPP/linux/issues/78
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: extend fatal messages to contain PCI info
Moti Haimovski [Thu, 29 Dec 2022 10:44:09 +0000 (12:44 +0200)]
habanalabs: extend fatal messages to contain PCI info

This commit attaches the PCI device address to driver fatal messages
in order to ease debugging in multi-device setups.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: remove use of razwi info received from f/w
Dani Liberman [Tue, 3 Jan 2023 22:05:03 +0000 (00:05 +0200)]
habanalabs/gaudi2: remove use of razwi info received from f/w

Because f/w does not update razwi info when sending events, remove the
use of it.
The driver is responsible to check if razwi happened and to
collect razwi data.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: trace LBW reads/writes
Ohad Sharabi [Wed, 30 Nov 2022 12:41:49 +0000 (14:41 +0200)]
habanalabs: trace LBW reads/writes

Add traces to LBW reads/writes.
This may be handy when debugging configuration failure or events when
tracking configuration flow.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: define events to trace PCI LBW access
Ohad Sharabi [Wed, 30 Nov 2022 12:02:00 +0000 (14:02 +0200)]
habanalabs: define events to trace PCI LBW access

There are cases where it may be useful to dump the whole LBW configs.
Yet, doing so while spamming the kernel log will probably shade other
important messages since the LBW access is done in sheer volume.
To answer this we add trace events for those too.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: fix log for sob value overflow/underflow
Carmit Carmel [Wed, 4 Jan 2023 09:13:01 +0000 (11:13 +0200)]
habanalabs/gaudi2: fix log for sob value overflow/underflow

The value in SM_SEI_CAUSE includes the SOB index and not the SOB group
index.
Remove usage of log_mask in sm_sei_cause structure as it was never
used.

Signed-off-by: Carmit Carmel <ccarmel@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: add set engines masks ASIC function
Ohad Sharabi [Mon, 2 Jan 2023 14:44:28 +0000 (16:44 +0200)]
habanalabs: add set engines masks ASIC function

This function shall be used whenever components enable/binning masks
should be updated.

Usage is in one of the below cases:
- update user (or default) component masks
- update when getting the masks from FW (either CPUCP or COMMS)

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: protect access to dynamic mem 'user_mappings'
Koby Elbaz [Fri, 23 Dec 2022 13:02:05 +0000 (15:02 +0200)]
habanalabs: protect access to dynamic mem 'user_mappings'

When HL_INFO_USER_MAPPINGS IOCTL is called, we copy_to_user from
a dynamically allocated memory - 'user_mappings'.
Since freeing/allocating it happens in runtime (upon a page fault),
it not unlikely to access it even before being initially allocated
(i.e., accessing a NULL pointer).

The solution is to simply mark the spot when the err info has been
collected, and that way to know whether err info (either page fault
or RAZWI) is available to be read.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: remove redundant memset
Tom Rix [Sat, 7 Jan 2023 18:48:27 +0000 (13:48 -0500)]
habanalabs: remove redundant memset

From reviewing the code, the line
  memset(kdata, 0, usize);
is not needed because kdata is either zeroed by
  kdata = kzalloc(asize, GFP_KERNEL);
when allocated at runtime or by
  char stack_kdata[128] = {0};
at compile time.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: refactor razwi/page-fault information structures
Koby Elbaz [Sun, 25 Dec 2022 10:43:04 +0000 (12:43 +0200)]
habanalabs: refactor razwi/page-fault information structures

This refactor makes the code clearer and the new variables' names
better describe their roles.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: avoid reconfiguring the same PB registers
Koby Elbaz [Wed, 21 Dec 2022 15:49:42 +0000 (17:49 +0200)]
habanalabs/gaudi2: avoid reconfiguring the same PB registers

It appears that, within the sync manager security configuration,
we reconfigure PB registers over and over without any need to do that.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi: allow device acquire while in debug mode
Ofir Bitton [Sun, 25 Dec 2022 14:27:24 +0000 (16:27 +0200)]
habanalabs/gaudi: allow device acquire while in debug mode

During device acquire, the driver is using a QMAN for clearing some
registers. In order to avoid internal races, the driver verifies
the device is idle before submitting the register clear job.

This check introduces an issue, as debug mode will cause the device
to be non-idle which will lead to device acquire failure.

In order to overcome this issue we can entirely remove the idle
check as the driver is using the QMAN only when there is no active
context.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: move some prints to debug level
Oded Gabbay [Thu, 22 Dec 2022 10:28:54 +0000 (12:28 +0200)]
habanalabs: move some prints to debug level

When entering an IOCTL, the driver prints a message in case device is
not operational. This message should be printed in debug level as
it can spam the kernel log and it is not an error.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: update f/w files
Oded Gabbay [Wed, 21 Dec 2022 10:51:13 +0000 (12:51 +0200)]
habanalabs: update f/w files

Update common firmware files with the latest version.
There is no functional change.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: update f/w files
Oded Gabbay [Wed, 21 Dec 2022 10:18:55 +0000 (12:18 +0200)]
habanalabs/gaudi2: update f/w files

Update gaudi2 firmware files with the latest version.
There is no functional change.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: update asic register files
Oded Gabbay [Wed, 21 Dec 2022 09:55:54 +0000 (11:55 +0200)]
habanalabs/gaudi2: update asic register files

Update some register files with the latest h/w auto-generated files.
There is no functional change.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: verify that kernel CB is destroyed only once
Tomer Tayar [Tue, 6 Dec 2022 17:54:10 +0000 (19:54 +0200)]
habanalabs: verify that kernel CB is destroyed only once

Remove the distinction between user CB and kernel CB, and verify for
both that they are not destroyed more than once.

As kernel CB might be taken from the pre-allocated CB pool, so we need
to clear the handle destroyed indication when returning a CB to the
pool.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: add uapi to flush inbound HBM transactions
Ohad Sharabi [Sun, 18 Dec 2022 07:42:34 +0000 (09:42 +0200)]
habanalabs: add uapi to flush inbound HBM transactions

When doing p2p with a NIC device, the NIC needs to make sure all the
writes to the HBM (through the PCI bar of the Gaudi device) were
flushed.

It can be done by either the NIC or the host reading through the PCI
bar.

To support the host side, we supply a simple uapi to perform this flush
through the driver, because the user can't create such a transaction
by itself (the PCI bar isn't exposed to normal users).

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: move driver to accel subsystem
Oded Gabbay [Mon, 26 Dec 2022 21:05:00 +0000 (23:05 +0200)]
habanalabs: move driver to accel subsystem

Now that we have a subsystem for compute accelerators, move the
habanalabs driver to it.

This patch only moves the files and fixes the Makefiles. Future
patches will change the existing code to register to the accel
subsystem and expose the accel device char files instead of the
habanalabs device char files.

Update the MAINTAINERS file to reflect this change.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/uapi: move uapi file to drm
Oded Gabbay [Tue, 20 Dec 2022 12:12:19 +0000 (14:12 +0200)]
habanalabs/uapi: move uapi file to drm

Move the habanalabs.h uapi file from include/uapi/misc to
include/uapi/drm, and rename it to habanalabs_accel.h.

This is required before moving the actual driver to the accel
subsystem.

Update MAINTAINERS file accordingly.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: fix dma-buf release handling if dma_buf_fd() fails
Tomer Tayar [Thu, 15 Dec 2022 14:36:53 +0000 (16:36 +0200)]
habanalabs: fix dma-buf release handling if dma_buf_fd() fails

The dma-buf private object is freed if a call to dma_buf_fd() fails,
and because a file was already associated with the dma-buf in
dma_buf_export(), the release op will be called and will use this
object.

Mark the 'priv' field as NULL in this case, and avoid accessing it from
the release op.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: dump event description even if no cause
Ofir Bitton [Wed, 14 Dec 2022 14:52:05 +0000 (16:52 +0200)]
habanalabs/gaudi2: dump event description even if no cause

In order to have the no-cause error print be more informative,
we add the event description in addition to the event id.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: pass-through request from user to f/w
farah kassabri [Wed, 16 Nov 2022 13:40:30 +0000 (15:40 +0200)]
habanalabs: pass-through request from user to f/w

Add a uAPI, as part of the INFO IOCTL, to allow users to send
requests directly to f/w, according to a pre-defined set of opcodes
that the f/w exposes.

The f/w will put the result in a kernel-allocated buffer, which the
driver will then copy to the user-supplied buffer.

This will allow f/w tools to communicate directly with the f/w
without the need to add a new uAPI to the driver for each new type
of request.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: support receiving ascii message from preboot f/w
Tal Cohen [Thu, 1 Dec 2022 14:37:30 +0000 (16:37 +0200)]
habanalabs: support receiving ascii message from preboot f/w

An Ascii message that is sent from preboot towards the driver
will indicate the specific error that occurred on the f/w.
This commit supports that message and parse the ascii string
in order to print it into the kernel log

The commit also changes the way the descriptor struct is declared.
While its size increased (it now above 1024 bytes), it will be
allocated by using kmalloc instead of stack declaration.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: fix asic-specific functions documentation
Ohad Sharabi [Thu, 8 Dec 2022 13:19:10 +0000 (15:19 +0200)]
habanalabs: fix asic-specific functions documentation

- Add missing documentation of set DRAM props
- fix typo

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: fix wrong variable type used for vzalloc
farah kassabri [Mon, 28 Nov 2022 11:11:44 +0000 (13:11 +0200)]
habanalabs: fix wrong variable type used for vzalloc

vzalloc expects void* and not void __iomem*.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: wait for preboot ready if HW state is dirty
Ohad Sharabi [Wed, 30 Nov 2022 12:26:10 +0000 (14:26 +0200)]
habanalabs/gaudi2: wait for preboot ready if HW state is dirty

Instead of waiting for BTM indication we should wait for preboot ready.
Consider the below scenario:
    1. FW update is being triggered
           - setting the dirty bit
    2. hard reset will be triggered due to the dirty bit
    3. FW initiates the reset:
           - dirty bit cleared
           - BTM indication cleared
           - preboot ready indication cleared
    4. during hard reset:
           - BTM indication will be set
           - BIST test performed and another reset triggered
    5. only after this reset the preboot will set the preboot ready

When polling on BTM indication alone we can lose sync with FW while
trying to communicate with FW that is during reset.
To overcome this we will always wait to preboot ready indication.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: put fences in case of unexpected wait status
Tomer Tayar [Sun, 4 Dec 2022 21:23:47 +0000 (23:23 +0200)]
habanalabs: put fences in case of unexpected wait status

Need to put fences even if an unexpected status value is received while
waiting for a fence.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: fix handling of wait CS for interrupting signals
Tomer Tayar [Sun, 4 Dec 2022 20:09:08 +0000 (22:09 +0200)]
habanalabs: fix handling of wait CS for interrupting signals

The -ERESTARTSYS return value is not handled correctly when a signal is
received while waiting for CS completion.
This can lead to bad output values to user when waiting for a single CS
completion, and more severe, it can cause a non-stopping loop when
waiting to multi-CS completion and until a CS timeout.

Fix the handling and exit the waiting if this return value is received.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: fix dmabuf to export only required size
Ohad Sharabi [Thu, 10 Nov 2022 11:43:02 +0000 (13:43 +0200)]
habanalabs: fix dmabuf to export only required size

This patch fixes a bug that was found in the dmabuf flow.
Bug description as found on Gaudi2 device:
1. User allocates 4MB of device memory
    - Note that although the allocation size was 4MB the HMMU allocated
      a full page of 768MB to back the request.
    - The user gets a memory handle that points to a single page (768MB)
    - Mapping the handle, the user gets virtual address to the start of
      the page.
2. User exports the buffer
3. User registers the exported buffer in the importer. This flow has
   a callback to the exporter which in turn converts the phys_page_pack
   to an SG list for the importer. This SG list is of single entry of
   size 768MB. However, the size that was passed to the importer was
   only 4MB.

The solution for this is to make sure the importer gets exposure only
to the exported size.

This will be done by fixing the SG created by the exporter to be of
the total size of the actual exported memory requested by the user.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: modify export dmabuf API
Ohad Sharabi [Mon, 14 Nov 2022 10:16:37 +0000 (12:16 +0200)]
habanalabs: modify export dmabuf API

A previous commit deprecated the option to export from handle, leaving
the code with no support for devices with virtual memory.

This commit modifies the export API in a way that unifies the uAPI to
user address for both cases (i.e. with and without MMU support) and add
the actual support for devices with virtual memory.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: helper function to validate export params
Ohad Sharabi [Tue, 29 Nov 2022 07:13:34 +0000 (09:13 +0200)]
habanalabs: helper function to validate export params

Validate export parameters in a dedicated function instead of in the
main export flow.
This will be useful later when support to export dmabuf for devices
with virtual memory will be added.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: remove support to export dmabuf from handle
Ohad Sharabi [Tue, 29 Nov 2022 12:02:07 +0000 (14:02 +0200)]
habanalabs: remove support to export dmabuf from handle

The API to the user which allows exporting DMA buffer from handle is
deprecated here. It was never used as it is relevant only for Gaudi2,
and the user stack has yet to add support for dmabuf in Gaudi2.

Looking forward, a modified API to export DMA buffer for ASICs that
supports virtual memory will be added.

Until the new API will be ready- exporting DMA buffer will not be
supported for ASICs with virtual memory.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: set log level for descriptor validation to debug
farah kassabri [Tue, 29 Nov 2022 13:37:55 +0000 (15:37 +0200)]
habanalabs: set log level for descriptor validation to debug

This warning doesn't have real consequences, and therefore can be
printed in debug level.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: trace COMMS protocol
Ohad Sharabi [Wed, 30 Nov 2022 09:31:39 +0000 (11:31 +0200)]
habanalabs: trace COMMS protocol

Call COMMS tracepoints from within the dynamic CPU FW load.
This can help debug failures or delays in the dynamic FW load flow.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: define traces for COMMS protocol
Ohad Sharabi [Wed, 30 Nov 2022 09:16:51 +0000 (11:16 +0200)]
habanalabs: define traces for COMMS protocol

As the COMMS protocol is being used more widely in our driver,
an available debug tool for the handshake will be handy.

This commit defines tracepoints to various key points of the COMMS
protocol.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: support abrupt device reset event
Ofir Bitton [Wed, 30 Nov 2022 12:35:32 +0000 (14:35 +0200)]
habanalabs/gaudi2: support abrupt device reset event

In certain scenarios, firmware might encounter a fatal event for
which a device reset is required. Hence, a proper notification
is needed for driver to be aware and initiate a reset sequence.

In secured environments the reset will be performed by firmware
without an explicit request from the driver.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: skip device idle check in hpriv_release if in reset
Tomer Tayar [Wed, 30 Nov 2022 10:07:06 +0000 (12:07 +0200)]
habanalabs: skip device idle check in hpriv_release if in reset

When user context is released and hpriv_release() is called, there is a
device idle status check, to understand if user has left the device not
idle and then a reset is required.

However, if the user process is killed because of device hard reset,
the device at this point would always be not idle, because the device
engines were already forcefully halted.

Modify hpriv_release() to skip the idle check if reset is in progress.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: adjacent timestamps should be more accurate
Tamir Gilad-Raz [Sun, 6 Nov 2022 09:22:16 +0000 (11:22 +0200)]
habanalabs: adjacent timestamps should be more accurate

timestamp events that expire on the same interrupt will get the same
timestamp value

Signed-off-by: Tamir Gilad-Raz <tgiladraz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: remove duplicated event prints
Ofir Bitton [Wed, 16 Nov 2022 15:27:26 +0000 (17:27 +0200)]
habanalabs/gaudi2: remove duplicated event prints

In order to reduce error log, we try to minimize the dumped rows
while keeping all relevant error info. In addition we completely
remove clock throttling debug logs.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: count interrupt causes
Ofir Bitton [Wed, 23 Nov 2022 09:03:17 +0000 (11:03 +0200)]
habanalabs/gaudi2: count interrupt causes

During event handling we extract interrupt cause and count it.
In case we could not find any cause we should add proper error.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: update DRAM props according to preboot data
Ohad Sharabi [Sun, 27 Nov 2022 10:46:23 +0000 (12:46 +0200)]
habanalabs: update DRAM props according to preboot data

If the f/w reports the binning masks at the preboot stage, the driver
must align its DRAM properties according to the new information.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: fix double assignment in MMU V1
Marco Pagani [Tue, 29 Nov 2022 11:52:17 +0000 (12:52 +0100)]
habanalabs: fix double assignment in MMU V1

Removing double assignment of the hop2_pte_addr
variable in dram_default_mapping_fini().

Dead store reported by clang-analyzer.

Signed-off-by: Marco Pagani <marpagan@redhat.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: make set_dram_properties an ASIC function
Ohad Sharabi [Sun, 27 Nov 2022 10:38:49 +0000 (12:38 +0200)]
habanalabs: make set_dram_properties an ASIC function

As ASICs are evolving, we will need to update the DRAM properties at
various points because we may get different information from the f/w
at different points of the initialization.

This ASIC function is a foundation for this capability.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: use dev_dbg() when hl_mmap_mem_buf_get() fails
Tomer Tayar [Thu, 24 Nov 2022 09:12:38 +0000 (11:12 +0200)]
habanalabs: use dev_dbg() when hl_mmap_mem_buf_get() fails

As hl_mmap_mem_buf_get() is called also from IOCTLs which can have a
bad handle from user, modify the print for "no match to handle" to use
dev_dbg().
Calls to this function which are not dependent on user, already have an
error print when the function fails.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: don't allow user to destroy CB handle more than once
Tomer Tayar [Wed, 23 Nov 2022 13:09:43 +0000 (15:09 +0200)]
habanalabs: don't allow user to destroy CB handle more than once

The refcount of a CB buffer is initialized when user allocates a CB,
and is decreased when he destroys the CB handle.

If this refcount is increased also from kernel and user sends more than
one destroy requests for the handle, the buffer will be released/freed
and later be accessed when the refcount is put from kernel side.

To avoid it, prevent user from destroying the handle more than once.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: don't notify user about clk throttling due to power
Ofir Bitton [Thu, 24 Nov 2022 09:01:44 +0000 (11:01 +0200)]
habanalabs: don't notify user about clk throttling due to power

As clock throttling due to high power consumption can happen very
frequently and there is no real reason to notify the user about it,
we skip this notification in all asics.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: abort waiting user threads upon error
Tomer Tayar [Tue, 8 Nov 2022 12:34:43 +0000 (14:34 +0200)]
habanalabs: abort waiting user threads upon error

User should close the FD when being notified about an error, after
which a device reset takes place.

However, if the user has pending threads that wait for completions,
the device release won't be called and eventually the watchdog timeout
will expire, leading to hard reset and killing the user process.

To avoid it, abort such waiting threads right after the error
notification, and block following waiting operations.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: remove releasing of user threads from device release
Tomer Tayar [Sun, 6 Nov 2022 18:29:18 +0000 (20:29 +0200)]
habanalabs: remove releasing of user threads from device release

The device file is not in use when hl_device_release() is called,
and there aren't any user threads that use IOCTLs to wait for
interrupts. Therefore there is no need to release them at this point.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs: read binning info from preboot
farah kassabri [Sun, 13 Nov 2022 15:44:17 +0000 (17:44 +0200)]
habanalabs: read binning info from preboot

Sometimes we need the binning info at a very early state of the
driver initialization. Therefore, support was added in preboot to
provide the binning info as part of the f/w descriptor and the driver
can now use that.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agohabanalabs/gaudi2: fix BMON 3rd address range
tal albo [Wed, 16 Nov 2022 20:54:24 +0000 (22:54 +0200)]
habanalabs/gaudi2: fix BMON 3rd address range

Fix programming incorrect value of address range

Signed-off-by: tal albo <talbo@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
17 months agodrm/fbdev-generic: Rename struct fb_info 'fbi' to 'info'
Thomas Zimmermann [Wed, 25 Jan 2023 20:04:15 +0000 (21:04 +0100)]
drm/fbdev-generic: Rename struct fb_info 'fbi' to 'info'

The generic fbdev emulation names variables of type struct fb_info
both 'fbi' and 'info'. The latter seems to be more common in fbdev
code, so name fbi accordingly.

Also replace the duplicate variable in drm_fbdev_fb_destroy().

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125200415.14123-11-tzimmermann@suse.de
17 months agodrm/fbdev-generic: Inline clean-up helpers into drm_fbdev_fb_destroy()
Thomas Zimmermann [Wed, 25 Jan 2023 20:04:14 +0000 (21:04 +0100)]
drm/fbdev-generic: Inline clean-up helpers into drm_fbdev_fb_destroy()

The fbdev framebuffer cleanup in drm_fbdev_fb_destroy() calls
drm_fbdev_release() and drm_fbdev_cleanup(). Inline both into the
caller. No functional changes.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125200415.14123-10-tzimmermann@suse.de
17 months agodrm/fbdev-generic: Minimize client unregistering
Thomas Zimmermann [Wed, 25 Jan 2023 20:04:13 +0000 (21:04 +0100)]
drm/fbdev-generic: Minimize client unregistering

For uninitialized framebuffers, only release the DRM client and
free the fbdev memory. Do not attempt to clean up the framebuffer.

DRM fbdev clients have a two-step initialization: first create
the DRM client; then create the framebuffer device on the first
successful hotplug event. In cases where the client never creates
the framebuffer, only the client state needs to be released. We
can detect which case it is, full or client-only cleanup, by
looking at the presence of fb_helper's info field.

v3:
* fix typo in commit message (Javier)
* release client before unpreparing fbdev
v2:
* remove test for (fbi != NULL) in drm_fbdev_cleanup() (Sam)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125200415.14123-9-tzimmermann@suse.de
17 months agodrm/fbdev-generic: Minimize hotplug error handling
Thomas Zimmermann [Wed, 25 Jan 2023 20:04:12 +0000 (21:04 +0100)]
drm/fbdev-generic: Minimize hotplug error handling

Call drm_fb_helper_fini() in the generic-fbdev hotplug helper
to revert the effects of drm_fb_helper_init(). No full cleanup
is required.

v3:
* fix error in commit message (Javier)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125200415.14123-8-tzimmermann@suse.de
17 months agodrm/fb-helper: Initialize fb-helper's preferred BPP in prepare function
Thomas Zimmermann [Wed, 25 Jan 2023 20:04:11 +0000 (21:04 +0100)]
drm/fb-helper: Initialize fb-helper's preferred BPP in prepare function

Initialize the fb-helper's preferred_bpp field early from within
drm_fb_helper_prepare(); instead of the later client hot-plugging
callback. This simplifies the generic fbdev setup function.

No real changes, but all drivers' fbdev code has to be adapted.

v3:
* build with CONFIG_DRM_FBDEV_EMULATION unset (kernel test bot)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125200415.14123-7-tzimmermann@suse.de
17 months agodrm/fb-helper: Remove preferred_bpp parameter from fbdev internals
Thomas Zimmermann [Wed, 25 Jan 2023 20:04:10 +0000 (21:04 +0100)]
drm/fb-helper: Remove preferred_bpp parameter from fbdev internals

Store the console's preferred BPP value in struct drm_fb_helper
and remove the respective function parameters from the internal
fbdev code.

The BPP value is only required as a fallback and will now always
be available in the fb-helper instance.

No functional changes.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125200415.14123-6-tzimmermann@suse.de
17 months agodrm/fbdev-generic: Initialize fb-helper structure in generic setup
Thomas Zimmermann [Wed, 25 Jan 2023 20:04:09 +0000 (21:04 +0100)]
drm/fbdev-generic: Initialize fb-helper structure in generic setup

Initialize the fb-helper structure immediately after its allocation
in drm_fbdev_generic_setup(). That will make it easier to fill it with
driver-specific values, such as the preferred BPP.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125200415.14123-5-tzimmermann@suse.de
17 months agodrm/fb-helper: Introduce drm_fb_helper_unprepare()
Thomas Zimmermann [Wed, 25 Jan 2023 20:04:08 +0000 (21:04 +0100)]
drm/fb-helper: Introduce drm_fb_helper_unprepare()

Move the fb-helper clean-up code into drm_fb_helper_unprepare(). No
functional changes.

v2:
* declare as static inline (kernel test robot)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125200415.14123-4-tzimmermann@suse.de
17 months agodrm/client: Add hotplug_failed flag
Thomas Zimmermann [Wed, 25 Jan 2023 20:04:07 +0000 (21:04 +0100)]
drm/client: Add hotplug_failed flag

Signal failed hotplugging with a flag in struct drm_client_dev. If set,
the client helpers will not further try to set up the fbdev display.

This used to be signalled with a combination of cleared pointers in
struct drm_fb_helper, which prevents us from initializing these pointers
early after allocation.

The change also harmonizes behavior among DRM clients. Additional DRM
clients will now handle failed hotplugging like fbdev does.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125200415.14123-3-tzimmermann@suse.de
17 months agodrm/client: Test for connectors before sending hotplug event
Thomas Zimmermann [Wed, 25 Jan 2023 20:04:06 +0000 (21:04 +0100)]
drm/client: Test for connectors before sending hotplug event

Test for connectors in the client code and remove a similar test
from the generic fbdev emulation. Do nothing if the test fails.
Not having connectors indicates a driver bug.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125200415.14123-2-tzimmermann@suse.de
17 months agodrm/nouveau/devinit: Convert function disable() to be void
Deepak R Varma [Wed, 25 Jan 2023 15:07:14 +0000 (20:37 +0530)]
drm/nouveau/devinit: Convert function disable() to be void

The current design of callback function disable() of struct
nvkm_devinit_func is defined to return a u64 value. In its implementation
in the driver modules, the function always returns a fixed value 0. Hence
the design and implementation of this function should be enhanced to return
void instead of a fixed value. This change also eliminates untouched
return variables.

The change is identified using the returnvar.cocci Coccinelle semantic
patch script.

Signed-off-by: Deepak R Varma <drv@mailo.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Y9FFoooIXjlr+UP1@ubun2204.myguest.virtualbox.org
17 months agoMerge drm/drm-next into drm-misc-next
Thomas Zimmermann [Wed, 25 Jan 2023 20:12:51 +0000 (21:12 +0100)]
Merge drm/drm-next into drm-misc-next

Backmerging to sync with other DRM trees.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
17 months agodrm/ttm: revert "stop allocating dummy resources during BO creation"
Christian König [Wed, 25 Jan 2023 15:49:02 +0000 (16:49 +0100)]
drm/ttm: revert "stop allocating dummy resources during BO creation"

This reverts commit 00984ad39599bb2a1e6ec5d4e9c75a749f7f45c9.

It seems to still breka i915.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125155023.105584-1-christian.koenig@amd.com
17 months agodrm/ttm: revert "stop allocating a dummy resource for pipelined gutting"
Christian König [Wed, 25 Jan 2023 15:48:36 +0000 (16:48 +0100)]
drm/ttm: revert "stop allocating a dummy resource for pipelined gutting"

This reverts commit 4110872b8115aab2adb3a52149c144d8465440de.

This still seems to break i915.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125155023.105584-1-christian.koenig@amd.com
17 months agodrm/ttm: revert "prevent moving of pinned BOs"
Christian König [Wed, 25 Jan 2023 16:14:37 +0000 (17:14 +0100)]
drm/ttm: revert "prevent moving of pinned BOs"

This reverts commit b49323aa35d502b0d9a7950327f30a1a52eae534.

This still seems to break i915.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125155023.105584-1-christian.koenig@amd.com
17 months agodrm/i915/params: use generics for parameter debugfs file creation
Jani Nikula [Wed, 18 Jan 2023 15:18:00 +0000 (17:18 +0200)]
drm/i915/params: use generics for parameter debugfs file creation

Replace the __builtin_strcmp() if ladder with generics.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230118151800.3669913-4-jani.nikula@intel.com
17 months agodrm/i915/params: use generics for parameter free
Jani Nikula [Wed, 18 Jan 2023 15:17:59 +0000 (17:17 +0200)]
drm/i915/params: use generics for parameter free

Replace the __builtin_strcmp() if ladder with generics.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230118151800.3669913-3-jani.nikula@intel.com
17 months agodrm/i915/params: use generics for parameter dup
Jani Nikula [Wed, 18 Jan 2023 15:17:58 +0000 (17:17 +0200)]
drm/i915/params: use generics for parameter dup

Replace the __builtin_strcmp() if ladder with generics.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230118151800.3669913-2-jani.nikula@intel.com