platform/kernel/linux-starfive.git
19 months agodrm/i915/psr: clean up PSR debugfs sink status error handling
Jani Nikula [Fri, 17 Mar 2023 13:41:44 +0000 (15:41 +0200)]
drm/i915/psr: clean up PSR debugfs sink status error handling

Handle errors first and return early, and reduce indentation on the
happy day code path.

Cc: Jouni Högander <jouni.hogander@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230317134144.223936-3-jani.nikula@intel.com
19 months agodrm/i915/psr: switch PSR debugfs to struct intel_connector
Jani Nikula [Fri, 17 Mar 2023 13:41:43 +0000 (15:41 +0200)]
drm/i915/psr: switch PSR debugfs to struct intel_connector

Prefer struct intel_connector over struct drm_connector.

Cc: Jouni Högander <jouni.hogander@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230317134144.223936-2-jani.nikula@intel.com
19 months agodrm/i915/psr: move PSR debugfs to intel_psr.c
Jani Nikula [Fri, 17 Mar 2023 13:41:42 +0000 (15:41 +0200)]
drm/i915/psr: move PSR debugfs to intel_psr.c

Move the debugfs next to the implementation.

Cc: Jouni Högander <jouni.hogander@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230317134144.223936-1-jani.nikula@intel.com
19 months agogpu: drm: bridge: sii9234: remove unused bridge_to_sii9234 function
Tom Rix [Sat, 18 Mar 2023 00:23:21 +0000 (20:23 -0400)]
gpu: drm: bridge: sii9234: remove unused bridge_to_sii9234 function

clang with W=1 reports
drivers/gpu/drm/bridge/sii9234.c:870:31: error:
  unused function 'bridge_to_sii9234' [-Werror,-Wunused-function]
static inline struct sii9234 *bridge_to_sii9234(struct drm_bridge *bridge)
                              ^
This static function is not used, so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Robert Foss <rfoss@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230318002321.1675181-1-trix@redhat.com
19 months agoMerge tag 'amd-drm-next-6.4-2023-03-17' of https://gitlab.freedesktop.org/agd5f/linux...
Dave Airlie [Mon, 20 Mar 2023 06:44:36 +0000 (16:44 +1000)]
Merge tag 'amd-drm-next-6.4-2023-03-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.4-2023-03-17:

amdgpu:
- Misc code cleanups
- Documentation fixes
- Make kobj structures const
- Add thermal throttling adjustments for supported APUs
- UMC RAS fixes
- Display reset fixes
- DCN 3.2 fixes
- Freesync fixes
- DC code reorg
- Generalize dmabuf import to work with KFD
- DC DML fixes
- SRIOV fixes
- UVD code cleanups
- IH 4.4.2 updates
- HDP 4.4.2 updates
- SDMA 4.4.2 updates
- PSP 13.0.6 updates
- Add capped/uncapped workload handling for supported APUs
- DCN 3.1.4 updates
- Re-org DC Kconfig
- USB4 fixes
- Reorg DC plane and stream handling
- Register vga_switcheroo for apple-gmux
- SMU 13.0.6 updates
- Fix error checking in read_mm_registers functions for affected families
- VCN 4.0.4 fix
- Drop redundant pci_enable_pcie_error_reporting() call
- RDNA2 SMU OD suspend/resume fix
- Expose additional memory stats via fdinfo
- RAS fixes
- Misc display fixes
- DP MST fixes
- IOMMU regression fix for KFD

amdkfd:
- Make kobj structures const
- Support for exporting buffers via dmabuf
- Multi-VMA page migration fixes
- NBIO fixes
- Misc code cleanups
- Fix possible double free
- Fix possible UAF

radeon:
- iMac fix

UAPI:
- KFD dmabuf export support.  Required for importing KFD buffers into GEM contexts and for RDMA P2P support.
  Proposed user mode changes: https://github.com/fxkamd/ROCT-Thunk-Interface/commits/fxkamd/dmabuf

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230317164416.138340-1-alexander.deucher@amd.com
20 months agodrm: fix typo in margin connector properties docs
Simon Ser [Sun, 5 Mar 2023 10:35:10 +0000 (10:35 +0000)]
drm: fix typo in margin connector properties docs

This was pointed out by Ville and Pekka in their replies, but
forgot to apply the change properly before pushing. Sorry for
the noise!

Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: 409f07d353b3 ("drm: document connector margin properties")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Maxime Ripard <maxime@cerno.tech>
Cc: Dave Stevenson <dave.stevenson@raspberrypi.com>
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230305103503.42619-1-contact@emersion.fr
20 months agodrm/i915: Extract intel_crtc_scanline_offset()
Ville Syrjälä [Fri, 10 Mar 2023 23:58:28 +0000 (01:58 +0200)]
drm/i915: Extract intel_crtc_scanline_offset()

Pull the scanline_offset calculation into its own function. Might
have further use for this later with DSB scanline waits.

Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mitul Golani <mitulkumar.ajitkumar.golani@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230310235828.17439-4-ville.syrjala@linux.intel.com
20 months agodrm/i915: Relocate intel_crtc_update_active_timings()
Ville Syrjälä [Fri, 10 Mar 2023 23:58:27 +0000 (01:58 +0200)]
drm/i915: Relocate intel_crtc_update_active_timings()

Move intel_crtc_update_active_timings() into intel_vblank.c
where it more properly belongs.

Also do the s/dev_priv/i915/ modernization rename while at it.

Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mitul Golani <mitulkumar.ajitkumar.golani@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230310235828.17439-3-ville.syrjala@linux.intel.com
20 months agodrm/i915: Add belts and suspenders locking for seamless M/N changes
Ville Syrjälä [Fri, 10 Mar 2023 23:58:26 +0000 (01:58 +0200)]
drm/i915: Add belts and suspenders locking for seamless M/N changes

Add some (probably overkill) locking to protect the vblank
timestamping constants updates during seamless M/N fastsets.

As everything should be naturally aligned I think the individual
pieces should probably end up updating atomically enough. So this
is only really meant to guarantee everyone sees a consistent whole.

All the drm_vblank.c usage is covered by vblank_time_lock,
and uncore.lock will take care of __intel_get_crtc_scanline()
that can also be called from outside the core vblank functionality.

Currently only crtc_clock and framedur_ns can change, but in
the future might fastset also across eg. vtotal/vblank_end
changes, so let's just grab the locks across the whole thing.

Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230310235828.17439-2-ville.syrjala@linux.intel.com
20 months agodrm/i915: Update vblank timestamping stuff on seamless M/N change
Ville Syrjälä [Fri, 10 Mar 2023 23:58:25 +0000 (01:58 +0200)]
drm/i915: Update vblank timestamping stuff on seamless M/N change

When we change the M/N values seamlessly during a fastset we should
also update the vblank timestamping stuff to make sure the vblank
timestamp corrections/guesstimations come out exact.

Note that only crtc_clock and framedur_ns can actually end up
changing here during fastsets. Everything else we touch can
only change during full modesets.

Technically we should try to do this exactly at the start of
vblank, but that would require some kind of double buffering
scheme. Let's skip that for now and just update things right
after the commit has been submitted to the hardware. This
means the information will be properly up to date when the
vblank irq handler goes to work. Only if someone ends up
querying some vblanky stuff in between the commit and start
of vblank may we see a slight discrepancy.

Also this same problem really exists for the DRRS downclocking
stuff. But as that is supposed to be more or less transparent
to the user, and it only drops to low gear after a long delay
(1 sec currently) we probably don't have to worry about it.
Any time something is actively submitting updates DRRS will
remain in high gear and so the timestamping constants will
match the hardware state.

Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mitul Golani <mitulkumar.ajitkumar.golani@intel.com>
Fixes: e6f29923c048 ("drm/i915: Allow M/N change during fastset on bdw+")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230310235828.17439-1-ville.syrjala@linux.intel.com
20 months agodrm/format-helper: Use drm_format_info_min_pitch() in tests helper
Javier Martinez Canillas [Thu, 16 Mar 2023 22:34:04 +0000 (23:34 +0100)]
drm/format-helper: Use drm_format_info_min_pitch() in tests helper

There's a nice macro to calculate the destination pitch that already takes
into account sub-byte pixel formats. Use that instead of open coding it.

Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230316223404.102806-1-javierm@redhat.com
20 months agodrm/arm/hdlcd: Use devm_platform_ioremap_resource()
Yang Li [Tue, 14 Mar 2023 08:02:30 +0000 (16:02 +0800)]
drm/arm/hdlcd: Use devm_platform_ioremap_resource()

According to commit 7945f929f1a7 ("drivers: provide
devm_platform_ioremap_resource()"), convert platform_get_resource(),
devm_ioremap_resource() to a single call to Use
devm_platform_ioremap_resource(), as this is exactly what this function
does.

Since 'struct platform_device *pdev = to_platform_device(drm->dev)',
'drm->dev' is equivalent to 'pdev->deva'.

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230314080231.20212-1-yang.lee@linux.alibaba.com
20 months agodrm/arm/malidp: Use devm_platform_get_and_ioremap_resource()
Yang Li [Tue, 14 Mar 2023 08:02:31 +0000 (16:02 +0800)]
drm/arm/malidp: Use devm_platform_get_and_ioremap_resource()

According to commit 890cc39a8799 ("drivers: provide
devm_platform_get_and_ioremap_resource()"), convert
platform_get_resource(), devm_ioremap_resource() to a single
call to devm_platform_get_and_ioremap_resource(), as this is exactly
what this function does.

Since 'struct platform_device *pdev = to_platform_device(dev)',
'pdev->dev' is equivalent to 'dev'.

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230314080231.20212-2-yang.lee@linux.alibaba.com
20 months agodrm/i915: Extract intel_sprite_uapi.c
Ville Syrjälä [Tue, 14 Mar 2023 13:02:55 +0000 (15:02 +0200)]
drm/i915: Extract intel_sprite_uapi.c

Move the sprite colorkey ioctl handler to its own file
so that intel_sprite.c becomes all about the low level
details of pre-skl sprite planes.

And drop a bunch of unnecessary includes while at it.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230314130255.23273-10-ville.syrjala@linux.intel.com
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
20 months agodrm/i915: Relocate intel_plane_check_src_coordinates()
Ville Syrjälä [Tue, 14 Mar 2023 13:02:54 +0000 (15:02 +0200)]
drm/i915: Relocate intel_plane_check_src_coordinates()

Move intel_plane_check_src_coordinates() from the pre-skl sprite
plane specific code to a more suitable place for common plane code.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230314130255.23273-9-ville.syrjala@linux.intel.com
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
20 months agodrm/i915: Clean up skl+ plane alpha bits
Ville Syrjälä [Tue, 14 Mar 2023 13:02:53 +0000 (15:02 +0200)]
drm/i915: Clean up skl+ plane alpha bits

Convert a few more skl+ plane registers to REG_BIT() & co.
Somehow thse were missed during the earlier cleanup.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230314130255.23273-8-ville.syrjala@linux.intel.com
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
20 months agodrm/i915: Define vlv/chv sprite plane SURFLIVE registers
Ville Syrjälä [Tue, 14 Mar 2023 13:02:52 +0000 (15:02 +0200)]
drm/i915: Define vlv/chv sprite plane SURFLIVE registers

Might as well complete the SURFLIVE register definitions
for all platforms/plane types. We are only missing the
VLV/CHV sprite planes.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230314130255.23273-7-ville.syrjala@linux.intel.com
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
20 months agodrm/i915: Define skl+ universal plane SURFLIVE registers
Ville Syrjälä [Tue, 14 Mar 2023 13:02:51 +0000 (15:02 +0200)]
drm/i915: Define skl+ universal plane SURFLIVE registers

Add the definitions for the skl+ univerals plane SURFLIVE
registers. Despite not being used for anything real
these came in suprisingly handy during some DSB debugging
recently, so having the defines around can be useful.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230314130255.23273-6-ville.syrjala@linux.intel.com
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
20 months agodrm/i915: Program VLV/CHV PIPE_MSA_MISC register
Ville Syrjälä [Tue, 14 Mar 2023 13:02:50 +0000 (15:02 +0200)]
drm/i915: Program VLV/CHV PIPE_MSA_MISC register

VLV/CHV have an extra register to configure some stereo3d
signalling details via DP MSA. Make sure we reset that
register to zero (since we don't do any stereo3d stuff).

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230314130255.23273-5-ville.syrjala@linux.intel.com
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
20 months agodrm/i915: Define more pipe timestamp registers
Ville Syrjälä [Tue, 14 Mar 2023 13:02:49 +0000 (15:02 +0200)]
drm/i915: Define more pipe timestamp registers

Add definitions for various pipe timestamp registers:
- frame timestamp (last start of vblank) (g4x+), already had this defined
- flip timestamp (when SURF was last written) (g4x+)
- flipdone timestamp (when last flipdone was signalled) (tgl+)

Note that on pre-tgl the flip related timestamps are only updated
for primary plane flips, but on tgl+ we can select which plane
updates them (via PIPE_MISC2). Let's define those related bits
as well.

Curiously VLV/CHV do not have the frame/flip timestamp registers,
despite all the other related registers being inherited from g4x.
This means we can get rid of the pipe_offsets[] usage for these,
and thus the implicit dev_priv is gone as well.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230314130255.23273-4-ville.syrjala@linux.intel.com
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
20 months agodrm/i915: s/PIPEMISC/PIPE_MISC/
Ville Syrjälä [Tue, 14 Mar 2023 13:02:48 +0000 (15:02 +0200)]
drm/i915: s/PIPEMISC/PIPE_MISC/

This PIPEMISC vs. PIPE_MISC inconsitency is ugly. Unify
the naming (PIPE_MISC is also what bspec has always called it).

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230314130255.23273-3-ville.syrjala@linux.intel.com
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
20 months agodrm/i915: Stop using pipe_offsets[] for PIPE_MISC*
Ville Syrjälä [Tue, 14 Mar 2023 13:02:47 +0000 (15:02 +0200)]
drm/i915: Stop using pipe_offsets[] for PIPE_MISC*

The PIPE_MISC registers don't exist on pre-bdw hardware,
so there is no point in using pipe_offsets[] for them.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230314130255.23273-2-ville.syrjala@linux.intel.com
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
20 months agodrm/ttm/ttm_bo: Provide a missing 'bulk' description and correct misnaming of 'placement'
Lee Jones [Fri, 17 Mar 2023 08:16:46 +0000 (08:16 +0000)]
drm/ttm/ttm_bo: Provide a missing 'bulk' description and correct misnaming of 'placement'

'bulk' description taken from another in the same file.

Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/ttm/ttm_bo.c:98: warning: Function parameter or member 'bulk' not described in 'ttm_bo_set_bulk_move'
 drivers/gpu/drm/ttm/ttm_bo.c:768: warning: Function parameter or member 'placement' not described in 'ttm_bo_mem_space'
 drivers/gpu/drm/ttm/ttm_bo.c:768: warning: Excess function parameter 'proposed_placement' description in 'ttm_bo_mem_space'

Signed-off-by: Lee Jones <lee@kernel.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230317081718.2650744-6-lee@kernel.org
20 months agodrm/format-helper: Make "destination_pitch" test usable for mono
Arthur Grillo [Sat, 11 Mar 2023 12:51:41 +0000 (09:51 -0300)]
drm/format-helper: Make "destination_pitch" test usable for mono

This test case uses an arbitrary pitch size, different of the default
one, to test if the conversions methods obey.

Change the "destination_pitch" colors to change the monochrome expected
result from being just zeros, as this makes the arbitrary pitch use
unusable.

Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230311125141.564801-3-arthurgrillo@riseup.net
20 months agodrm/format-helper: Add Kunit tests for drm_fb_xrgb8888_to_mono()
Arthur Grillo [Sat, 11 Mar 2023 12:51:40 +0000 (09:51 -0300)]
drm/format-helper: Add Kunit tests for drm_fb_xrgb8888_to_mono()

Extend the existing test cases to test the conversion from XRGB8888 to
monochromatic.

Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230311125141.564801-2-arthurgrillo@riseup.net
20 months agodrm/i915/display: ignore link training failures in CI
Vinod Govindapillai [Wed, 15 Feb 2023 08:38:32 +0000 (10:38 +0200)]
drm/i915/display: ignore link training failures in CI

If the ignore long HPD flag is set, ignore the link training
failures as well. Because of spurious HPDs, some unexpected link
training failures are happening while executing IGT test cases.
Ignore the link training failures for the time being if the long
HPDs are also ignored in the environments like CI.

Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230215083832.287519-3-vinod.govindapillai@intel.com
20 months agodrm/i915/display: ignore long HPDs based on a flag
Vinod Govindapillai [Wed, 15 Feb 2023 08:38:31 +0000 (10:38 +0200)]
drm/i915/display: ignore long HPDs based on a flag

Some panels generate long HPD events even while connected to
the port. This cause some unexpected CI execution issues. A
new flag is added to track if such spurious long HPDs can be
ignored and are not processed further if the flag is set.
Debugfs entry is added to control the ignore long hpd flag.

v2: Address patch styling comments (Jani Nikula)

v3: Ignoring the HPD moved to hotplug handler and now applies
    to all types of outputs (Imre Deak)

v4: use debugfs_create_bool and squash patches (Jani Nikula)

Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230215083832.287519-2-vinod.govindapillai@intel.com
20 months agodrm/nouveau/nvfw/acr: set wpr_generic_header_dump storage-class-specifier to static
Tom Rix [Thu, 2 Mar 2023 12:48:19 +0000 (07:48 -0500)]
drm/nouveau/nvfw/acr: set wpr_generic_header_dump storage-class-specifier to static

gcc with W=1 reports
drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c:49:1: error: no previous
  prototype for ‘wpr_generic_header_dump’ [-Werror=missing-prototypes]
   49 | wpr_generic_header_dump(struct nvkm_subdev *subdev,
      | ^~~~~~~~~~~~~~~~~~~~~~~

wpr_generic_header_dump is only used in acr.c, so it should be static

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230302124819.686469-1-trix@redhat.com
20 months agodrm/nouveau/fifo: set nvkm_engn_cgrp_get storage-class-specifier to static
Tom Rix [Tue, 28 Feb 2023 22:15:33 +0000 (17:15 -0500)]
drm/nouveau/fifo: set nvkm_engn_cgrp_get storage-class-specifier to static

smatch reports
drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c:33:18:
  warning: symbol 'nvkm_engn_cgrp_get' was not declared. Should it be static?

nvkm_engn_cgrp_get is only used in runl.c, so it should be static

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230228221533.3240520-1-trix@redhat.com
20 months agodrm/nouveau/fifo: set gf100_fifo_nonstall_block_dump storage-class-specifier to static
Tom Rix [Fri, 3 Mar 2023 13:27:31 +0000 (08:27 -0500)]
drm/nouveau/fifo: set gf100_fifo_nonstall_block_dump storage-class-specifier to static

gcc with W=1 reports
drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c:451:1: error:
  no previous prototype for ‘gf100_fifo_nonstall_block’ [-Werror=missing-prototypes]
  451 | gf100_fifo_nonstall_block(struct nvkm_event *event, int type, int index)
      | ^~~~~~~~~~~~~~~~~~~~~~~~~

gf100_fifo_nonstall_block is only used in gf100.c, so it should be static

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230303132731.1919329-1-trix@redhat.com
20 months agodrm/i915/opregion: Fix CONFIG_ACPI=n builds adding missing intel_opregion_cleanup...
Imre Deak [Tue, 14 Mar 2023 09:27:28 +0000 (11:27 +0200)]
drm/i915/opregion: Fix CONFIG_ACPI=n builds adding missing intel_opregion_cleanup() prototype

Add the missing intel_opregion_cleanup() prototype fixing CONFIG_ACPI=n
builds.

Fixes: 3e226e4a2180 ("drm/i915/opregion: Cleanup opregion after errors during driver loading")
Cc: Jani Nikula <jani.nikula@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202303141610.6L1VO7Gw-lkp@intel.com/
Signed-off-by: Imre Deak <imre.deak@intel.com>
20 months agodrm/amdgpu: Don't resume IOMMU after incomplete init
Felix Kuehling [Tue, 14 Mar 2023 00:03:08 +0000 (20:03 -0400)]
drm/amdgpu: Don't resume IOMMU after incomplete init

Check kfd->init_complete in kgd2kfd_iommu_resume, consistent with other
kgd2kfd calls. This should fix IOMMU errors on resume from suspend when
KFD IOMMU initialization failed.

Reported-by: Matt Fagnani <matt.fagnani@bell.net>
Link: https://lore.kernel.org/r/4a3b225c-2ffd-e758-4de1-447375e34cad@bell.net/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217170
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2454
Cc: Vasant Hegde <vasant.hegde@amd.com>
Cc: Linux regression tracking (Thorsten Leemhuis) <regressions@leemhuis.info>
Cc: stable@vger.kernel.org
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Matt Fagnani <matt.fagnani@bell.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd: fix compilation issue with legacy gcc
bobzhou [Wed, 15 Mar 2023 07:23:48 +0000 (15:23 +0800)]
drm/amd: fix compilation issue with legacy gcc

This patch is used to fix following compilation issue with legacy gcc

error: ‘for’ loop initial declarations are only allowed in C99 mode

Signed-off-by: bobzhou <bob.zhou@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: Retire pcie_gen3_enable function
Hawking Zhang [Mon, 6 Mar 2023 11:34:34 +0000 (19:34 +0800)]
drm/amdgpu: Retire pcie_gen3_enable function

Not needed since from vi. drop the function so
we don't duplicate code when introduce new asics.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: Move to common helper to query soc rev_id
Hawking Zhang [Mon, 6 Mar 2023 07:59:27 +0000 (15:59 +0800)]
drm/amdgpu: Move to common helper to query soc rev_id

Replace soc15, nv, soc21 get_rev_id callback with common
helper so we don't need to duplicate code when introduce
new asics.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: Move to common indirect reg access helper
Hawking Zhang [Mon, 6 Mar 2023 07:48:48 +0000 (15:48 +0800)]
drm/amdgpu: Move to common indirect reg access helper

Replace soc15, nv, soc21 specific callbacks with common
one. so we don't need to duplicate code when introduce
new asics.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: drop ras check at asic level for new blocks
Hawking Zhang [Sat, 4 Mar 2023 12:22:23 +0000 (20:22 +0800)]
drm/amdgpu: drop ras check at asic level for new blocks

amdgpu_ras_register_ras_block should always be invoked
by ras_sw_init, where driver needs to check ras caps
at ip level, instead of asic level.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: Rework pcie_bif ras sw_init
Hawking Zhang [Mon, 13 Mar 2023 06:18:34 +0000 (14:18 +0800)]
drm/amdgpu: Rework pcie_bif ras sw_init

pcie_bif ras blocks needs to be initialized as early
as possible to handle fatal error detected in hw_init
phase. also align the pcie_bif ras sw_init with other
ras blocks

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: Rework xgmi_wafl_pcs ras sw_init
Hawking Zhang [Sat, 4 Mar 2023 11:54:14 +0000 (19:54 +0800)]
drm/amdgpu: Rework xgmi_wafl_pcs ras sw_init

To align with other IP blocks.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: Rework mca ras sw_init
Hawking Zhang [Wed, 15 Mar 2023 00:59:04 +0000 (08:59 +0800)]
drm/amdgpu: Rework mca ras sw_init

To align with other IP blocks

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdkfd: Fixed kfd_process cleanup on module exit.
David Belanger [Tue, 28 Feb 2023 19:11:24 +0000 (14:11 -0500)]
drm/amdkfd: Fixed kfd_process cleanup on module exit.

Handle case when module is unloaded (kfd_exit) before a process space
(mm_struct) is released.

v2: Fixed potential race conditions by removing all kfd_process from
the process table first, then working on releasing the resources.

v3: Fixed loop element access / synchronization.  Fixed extra empty lines.

Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd/display: 3.2.227
Aric Cyr [Mon, 6 Mar 2023 01:48:26 +0000 (20:48 -0500)]
drm/amd/display: 3.2.227

This version brings along the following:
- FW Release 0.0.158.0
- Fixes to HDCP, DP MST and more
- Improvements on USB4 links and more
- Code re-architecture on link.h

Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd/display: [FW Promotion] Release 0.0.158.0
Anthony Koo [Sun, 5 Mar 2023 06:35:03 +0000 (01:35 -0500)]
drm/amd/display: [FW Promotion] Release 0.0.158.0

[Why & How]
Add boot control bit to control dispclk and dppclk deep sleep

Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd/display: fix assert condition
Samson Tam [Fri, 3 Mar 2023 22:30:25 +0000 (17:30 -0500)]
drm/amd/display: fix assert condition

[Why & How]
Reversed assert condition when checking that phy_pix_clk[] is not 0

Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd/display: Clearly states if long or short HPD event in dmesg logs
Stylon Wang [Wed, 1 Mar 2023 15:56:51 +0000 (23:56 +0800)]
drm/amd/display: Clearly states if long or short HPD event in dmesg logs

[Why]
The log "DMUB HPD callback" is crucial to identify when DP tunneling
is been established and driver is notified of this event from DMUB.
Same log is shared for long and short hotplug event and we need to
check trailing DC debug log to distinguish between them two, making
debugging on DPIA related issues a bit more troublesome.

[How]
Clearly states in dmesg logs whether this is a long or short hotplug
event.

Reviewed-by: Hamza Mahfooz <Hamza.Mahfooz@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Stylon Wang <stylon.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd/display: Make DCN32 functions available to future DCNs
Wesley Chalmers [Mon, 27 Feb 2023 18:21:17 +0000 (13:21 -0500)]
drm/amd/display: Make DCN32 functions available to future DCNs

[Why & How]
Make DCN32 functions available for more DCNs.

Reviewed-by: Chris Park <Chris.Park@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amdgpu: Init MMVM_CONTEXTS_DISABLE in gmc11 golden setting under SRIOV
Yifan Zha [Mon, 6 Mar 2023 06:54:05 +0000 (14:54 +0800)]
drm/amdgpu: Init MMVM_CONTEXTS_DISABLE in gmc11 golden setting under SRIOV

[Why]
If disable the mmhub vm contexts(set MMVM_CONTEXTS_DISABLE to 0xffff),
driver loading failed on vf due to fence fallback timer expired on all rings.
FLR cannot reset MMVM_CONTEXTS_DISABLE.
So this vf can not be recovered anymore unless trigger a whole gpu reset.

[How]
Under SRIOV, init MMVM_CONTEXTS_DISABLE in gmc11 golden register setting.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Horace Chen <Horace.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd/display: reallocate DET for dual displays with high pixel rate ratio
Samson Tam [Tue, 28 Feb 2023 19:33:00 +0000 (14:33 -0500)]
drm/amd/display: reallocate DET for dual displays with high pixel rate ratio

[Why]
For dual displays where pixel rate is much higher on one display,
we may get underflow when DET is evenly allocated.

[How]
Allocate less DET segments for the lower pixel rate display and
more DET segments for the higher pixel rate display

Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd/display: disconnect MPCC only on OTG change
Ayush Gupta [Thu, 2 Mar 2023 14:58:05 +0000 (09:58 -0500)]
drm/amd/display: disconnect MPCC only on OTG change

[Why]
Framedrops are observed while playing Vp9 and Av1 10 bit
video on 8k resolution using VSR while playback controls
are disappeared/appeared

[How]
Now ODM 2 to 1 is disabled for 5k or greater resolutions on VSR.

Cc: stable@vger.kernel.org
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Ayush Gupta <ayugupta@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd/display: Fix DP MST sinks removal issue
Cruise Hung [Thu, 2 Mar 2023 02:33:51 +0000 (10:33 +0800)]
drm/amd/display: Fix DP MST sinks removal issue

[Why]
In USB4 DP tunneling, it's possible to have this scenario that
the path becomes unavailable and CM tears down the path a little bit late.
So, in this case, the HPD is high but fails to read any DPCD register.
That causes the link connection type to be set to sst.
And not all sinks are removed behind the MST branch.

[How]
Restore the link connection type if it fails to read DPCD register.

Cc: stable@vger.kernel.org
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Cruise Hung <Cruise.Hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/amd/display: hpd rx irq not working with eDP interface
Robin Chen [Fri, 17 Feb 2023 12:47:57 +0000 (20:47 +0800)]
drm/amd/display: hpd rx irq not working with eDP interface

[Why]
This is the fix for the defect of commit ab144f0b4ad6
("drm/amd/display: Allow individual control of eDP hotplug support").

[How]
To revise the default eDP hotplug setting and use the enum to git rid
of the magic number for different options.

Fixes: ab144f0b4ad6 ("drm/amd/display: Allow individual control of eDP hotplug support")
Cc: stable@vger.kernel.org
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Robin Chen <robin.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
20 months agodrm/i915: Don't send idle pattern after DP2.0 link training
Ville Syrjälä [Wed, 8 Mar 2023 21:26:27 +0000 (23:26 +0200)]
drm/i915: Don't send idle pattern after DP2.0 link training

Bspec calls us to select pattern 2 after link training for
DP 2.0. Let's do that... by doing nothing because we will
be transmitting pattern 2 at the end of the link training
already.

Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230308212627.7601-2-ville.syrjala@linux.intel.com
20 months agodrm/i915: Don't switch to TPS1 when disabling DP_TP_CTL
Ville Syrjälä [Wed, 8 Mar 2023 21:26:26 +0000 (23:26 +0200)]
drm/i915: Don't switch to TPS1 when disabling DP_TP_CTL

AFAICS Bspec has never asked us to switch to TPS1 when *disabling*
DP_TP_CTL. Let's stop doing that in case it confuses something.
We do have to switch before we *enable* DP_TP_CTL, but that
is already being handled correctly.

v2: Do the same for FDI
v3: Rebase

Reviewed-by: Imre Deak <imre.deak@intel.com> #v1
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230308212627.7601-1-ville.syrjala@linux.intel.com
20 months agodrm/vmwgfx: Fix src/dst_pitch confusion
Zack Rusin [Tue, 14 Mar 2023 21:14:45 +0000 (17:14 -0400)]
drm/vmwgfx: Fix src/dst_pitch confusion

The src/dst_pitch got mixed up during the rework of the function, make
sure the offset's refer to the correct one.

Spotted by clang:
Clang warns (or errors with CONFIG_WERROR):

  drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c:509:29: error: variable 'dst_pitch' is uninitialized when used here [-Werror,-Wuninitialized]
          src_offset = ddirty->top * dst_pitch + ddirty->left * stdu->cpp;
                                     ^~~~~~~~~
  drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c:492:26: note: initialize the variable 'dst_pitch' to silence this warning
          s32 src_pitch, dst_pitch;
                                  ^
                                   = 0
  1 error generated.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Dave Airlie <airlied@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1811
Fixes: 39985eea5a6d ("drm/vmwgfx: Abstract placement selection")
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230314211445.1363828-1-zack@kde.org
20 months agodrm: Track clients by tgid and not tid
Tvrtko Ursulin [Tue, 14 Mar 2023 14:18:55 +0000 (14:18 +0000)]
drm: Track clients by tgid and not tid

Thread group id (aka pid from userspace point of view) is a more
interesting thing to show as an owner of a DRM fd, so track and show that
instead of the thread id.

In the next patch we will make the owner updated post file descriptor
handover, which will also be tgid based to avoid ping-pong when multiple
threads access the fd.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230314141904.1210824-2-tvrtko.ursulin@linux.intel.com
20 months agoaccel/habanalabs: Drop redundant pci_enable_pcie_error_reporting()
Bjorn Helgaas [Tue, 7 Mar 2023 20:27:29 +0000 (14:27 -0600)]
accel/habanalabs: Drop redundant pci_enable_pcie_error_reporting()

pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
commit f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"),
the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.

Remove the redundant pci_enable_pcie_error_reporting() call from the
driver.  Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.

Note that this only controls ERR_* Messages from the device.  An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: postpone mem_mgr IDR destruction to hpriv_release()
Tomer Tayar [Wed, 1 Mar 2023 15:45:58 +0000 (17:45 +0200)]
accel/habanalabs: postpone mem_mgr IDR destruction to hpriv_release()

The memory manager IDR is currently destroyed when user releases the
file descriptor.
However, at this point the user context might be still held, and memory
buffers might be still in use.
Later on, calls to release those buffers will fail due to not finding
their handles in the IDR, leading to a memory leak.
To avoid this leak, split the IDR destruction from the memory manager
fini, and postpone it to hpriv_release() when there is no user context
and no buffers are used.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: move soft-reset wait to soft-reset execute
Dafna Hirschfeld [Mon, 20 Feb 2023 05:54:44 +0000 (07:54 +0200)]
accel/habanalabs: move soft-reset wait to soft-reset execute

We plan to do soft-reset either by mmio or by using cpucp packet
depending on the FW version. We don't want to check FW version in two
different places for that (execute soft-reset and wait to soft-reset)
so move the waiting to gaudi2_execute_soft_reset. This also makes sense
because the cpucp also does the waiting.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: add uapi to stall/resume engine
Koby Elbaz [Wed, 15 Feb 2023 15:51:14 +0000 (17:51 +0200)]
accel/habanalabs: add uapi to stall/resume engine

The user might want to stall/resume engines to perform power testing
for various scenarios. Because our current
HL_CS_FLAGS_ENGINE_CORE_COMMAND command only handles the engines' cores,
we need to add another opcode for handling entire engine and not just
its core.

The user supplies an array, where each entry holds the engine's ID and
the command to send to the engine. The size of the array is limited
by the number of engines in the ASIC (only Gaudi2 is currently
supported).

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: use scnprintf() in print_device_in_use_info()
Tomer Tayar [Fri, 17 Feb 2023 10:56:48 +0000 (12:56 +0200)]
accel/habanalabs: use scnprintf() in print_device_in_use_info()

compose_device_in_use_info() was added to handle the snprintf() return
value in a single place.
However, the buffer size in print_device_in_use_info() is set such that
it would be enough for the max possible print, so
compose_device_in_use_info() is not really needed.
Moreover, scnprintf() can be used instead of snprintf(), to save the
check if the return value larger than the given size.

Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: unify err log of hw-fini failure in dirty state
Dafna Hirschfeld [Wed, 1 Mar 2023 08:59:10 +0000 (10:59 +0200)]
accel/habanalabs: unify err log of hw-fini failure in dirty state

print more informative message when failing in dirty state

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: use a mutex rather than a spinlock
Koby Elbaz [Thu, 23 Feb 2023 16:17:02 +0000 (18:17 +0200)]
accel/habanalabs: use a mutex rather than a spinlock

There are two reasons why mutex is better here:
1. There's a critical section relatively long, where in
certain scenarios (e.g., multiple VM allocations) taking a spinlock
might cause noticeable performance degradation.
2. It will remove the incorrect usage of mutex under
spin_lock (where preemption is disabled).

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: allow getting HL_INFO_DRAM_USAGE during soft-reset
Dafna Hirschfeld [Mon, 27 Feb 2023 06:22:54 +0000 (08:22 +0200)]
accel/habanalabs: allow getting HL_INFO_DRAM_USAGE during soft-reset

We can allow userspace to query the dram usage during soft-reset.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: fix register address on PDMA/EDMA idle check
Koby Elbaz [Tue, 21 Feb 2023 12:21:39 +0000 (14:21 +0200)]
accel/habanalabs: fix register address on PDMA/EDMA idle check

The PDMA/EDMA is_idle routines didn't check the correct CORE register
in order to get the accurate idle state.
Moreover, it's better to make the is_idle routine more robust by adding
additional checks (IS_HALTED) before announcing that the core is idle.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: remove a useless is_idle TPC flag
Koby Elbaz [Sun, 26 Feb 2023 06:22:45 +0000 (08:22 +0200)]
accel/habanalabs: remove a useless is_idle TPC flag

Is appears that the flag -
DCORE0_TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK, has no actual use when
it comes to querying TPC idleness, since this flag's corresponding bit
turns-off after stalling the engine, and turns back on after resuming
it.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: fix few misspelled words in the code
farah kassabri [Thu, 23 Feb 2023 08:22:23 +0000 (10:22 +0200)]
accel/habanalabs: fix few misspelled words in the code

Run spell checker on the code and fix accordingly.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: verify return code after scrubbing ARCs DCCMs
Koby Elbaz [Thu, 23 Feb 2023 08:43:14 +0000 (10:43 +0200)]
accel/habanalabs: verify return code after scrubbing ARCs DCCMs

In case the KDMA fails scrubbing the DCCMs (following a soft-reset
upon device release), the driver will only print failure until reset
flow ends, rather than escalating it into a hard-reset.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: use notifications and graceful reset for decoder
Tomer Tayar [Mon, 6 Feb 2023 12:33:00 +0000 (14:33 +0200)]
accel/habanalabs: use notifications and graceful reset for decoder

Add notifications to user in case of decoder abnormal interrupts, and
use the graceful reset mechanism if reset is required.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: assert return value of hw_fini
Dafna Hirschfeld [Wed, 15 Feb 2023 10:15:57 +0000 (12:15 +0200)]
accel/habanalabs: assert return value of hw_fini

Since hw_fini return error code for failure indication, we should
check its return value. Currently it might only fail upon soft-reset
from hl_device_reset. Later patch will add hw_fini failure in case of
polling timeout in hard-reset.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: break is_idle function into per-engine sub-routines
Koby Elbaz [Sun, 19 Feb 2023 11:30:49 +0000 (13:30 +0200)]
accel/habanalabs: break is_idle function into per-engine sub-routines

is_idle() was too long, so break it up for readability.
In addition, we can now use the new sub-routines from other places.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: add device id to all threads names
Sagiv Ozeri [Sun, 19 Feb 2023 22:34:41 +0000 (00:34 +0200)]
accel/habanalabs: add device id to all threads names

Compute driver threads names will start with hlX-*, when X is the
device id.
This will help distinguish them from the NIC thread names.

Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: add helper function to get vm hash node
Tomer Tayar [Thu, 16 Feb 2023 17:54:32 +0000 (19:54 +0200)]
accel/habanalabs: add helper function to get vm hash node

Add a helper function to search the vm hash for a node with a given
virtual address.
As opposed to the current code, this function explicitly returns NULL
when no node is found, instead of basing on the loop cursor object's
value.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: remove unneeded irq_handler variable
Tomer Tayar [Thu, 16 Feb 2023 16:16:56 +0000 (18:16 +0200)]
accel/habanalabs: remove unneeded irq_handler variable

'irq_handler' in gaudi2_enable_msix(), is just assigned with a function
name and then used when calling request_threaded_irq().
Remove the variable and use the function name directly as an argument.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: change hw_fini to return int to indicate error
Dafna Hirschfeld [Wed, 8 Feb 2023 14:14:48 +0000 (16:14 +0200)]
accel/habanalabs: change hw_fini to return int to indicate error

We later use cpucp packet for soft reset which might fail
so we should be able propagate the failure case.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: improve readability of engines idle mask print
Tomer Tayar [Sun, 12 Feb 2023 13:27:37 +0000 (15:27 +0200)]
accel/habanalabs: improve readability of engines idle mask print

Remove leading zeroes when printing the idle mask to make it clearer.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: organize hl_device structure comment
Sagiv Ozeri [Wed, 1 Feb 2023 17:30:33 +0000 (19:30 +0200)]
accel/habanalabs: organize hl_device structure comment

Make the comments align with the order of the fields in the structure

Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: set hl_capture_*_err storage-class-specifier to static
Tom Rix [Mon, 13 Feb 2023 14:48:14 +0000 (06:48 -0800)]
accel/habanalabs: set hl_capture_*_err storage-class-specifier to static

smatch reports
drivers/accel/habanalabs/common/device.c:2619:6: warning:
  symbol 'hl_capture_hw_err' was not declared. Should it be static?
drivers/accel/habanalabs/common/device.c:2641:6: warning:
  symbol 'hl_capture_fw_err' was not declared. Should it be static?

both are only used in device.c, so they should be static

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: change unused extern decl of hdev to forward decl of hl_device
Tom Rix [Wed, 8 Feb 2023 15:54:50 +0000 (07:54 -0800)]
accel/habanalabs: change unused extern decl of hdev to forward decl of hl_device

Building with clang W=2 has several similar warnings
drivers/accel/habanalabs/common/decoder.c:46:51: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
static void
dec_error_intr_work(struct hl_device *hdev, u32 base_addr, u32 core_id)
                                                  ^
drivers/accel/habanalabs/common/security.h:13:26: note: previous declaration is here
extern struct hl_device *hdev;
                         ^

There is no global definition of hdev, so the extern is not needed.
Searched with
grep -r '^struct' . | grep hl_dev

Change to an forward decl to resolve these issues
drivers/accel/habanalabs/common/mmu/../security.h:133:40: error: ‘struct hl_device’ declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
  133 |         bool (*skip_block_hook)(struct hl_device *hdev,
      |                                        ^~~~~~~~~

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
20 months agoaccel/habanalabs: don't trace cpu accessible dma alloc/free
Dafna Hirschfeld [Wed, 1 Feb 2023 16:46:10 +0000 (18:46 +0200)]
accel/habanalabs: don't trace cpu accessible dma alloc/free

The cpu accessible dma allocations use the gen_pool api which actually
does not allocate new memory from the system but manages memory already
allocated before. When tracing this together with real dma
allocation/free it cause confusing logs like a '0' dma address and
a cpu address appearing twice etc.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: in hl_device_reset small refactor for readabilty
Dafna Hirschfeld [Wed, 8 Feb 2023 13:46:00 +0000 (15:46 +0200)]
accel/habanalabs: in hl_device_reset small refactor for readabilty

in the out_err flow, combine the two cases of soft-reset since
they have mostly common code. In addition unlock reset_info.lock
after touching reset count.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: in hl_device_reset remove 'hard_instead_of_soft'
Dafna Hirschfeld [Wed, 8 Feb 2023 13:17:43 +0000 (15:17 +0200)]
accel/habanalabs: in hl_device_reset remove 'hard_instead_of_soft'

Because this field is only used for debug print,
we can do more precise debug directly instead.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: rename security function parameters
Koby Elbaz [Thu, 9 Feb 2023 12:45:54 +0000 (14:45 +0200)]
accel/habanalabs: rename security function parameters

To match their description above the function

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: tiny refactor of hl_device_reset for readability
Dafna Hirschfeld [Wed, 8 Feb 2023 13:11:58 +0000 (15:11 +0200)]
accel/habanalabs: tiny refactor of hl_device_reset for readability

Align assignment of reset_upon_device_release to the convention used
in this function.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: remove hl_irq_handler_default()
Tomer Tayar [Tue, 7 Feb 2023 18:24:11 +0000 (20:24 +0200)]
accel/habanalabs: remove hl_irq_handler_default()

hl_irq_handler_default() is not used and can be removed.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: fix print in hl_irq_handler_eq()
Tomer Tayar [Mon, 6 Feb 2023 07:09:14 +0000 (09:09 +0200)]
accel/habanalabs: fix print in hl_irq_handler_eq()

"eq_base[eq->ci].hdr.ctl" is used directly in a print without a
le32_to_cpu() conversion.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: add support for TPC assert
Ofir Bitton [Mon, 16 Jan 2023 17:56:23 +0000 (19:56 +0200)]
accel/habanalabs: add support for TPC assert

In order to allow TPC engines to raise an assert, we must expose
the relevant MSIX interrupt to the user so he will configure the engine
correctly. In addition, we implement the corresponding interrupt
handler that will notify the user upon such an event.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: capture interrupt timestamp in handler
Ofir Bitton [Wed, 1 Feb 2023 17:35:54 +0000 (19:35 +0200)]
accel/habanalabs: capture interrupt timestamp in handler

In order for interrupt timestamp to be more accurate we should
capture it during the interrupt handling rather than in threaded
irq context.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: change user interrupt to threaded IRQ
Tal Cohen [Wed, 25 Jan 2023 18:29:15 +0000 (20:29 +0200)]
accel/habanalabs: change user interrupt to threaded IRQ

We prefer not to handle the user interrupt job inside the interrupt
context. Instead, use threaded IRQ to handle the user interrupts.
This will allow to avoid disabling interrupts when the user process
registers for a new event and to avoid long handling inside an
interrupt.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: modify events reset policy
Ohad Sharabi [Mon, 23 Jan 2023 08:08:34 +0000 (10:08 +0200)]
accel/habanalabs: modify events reset policy

The policy file of the events reset has been modified.
This change is reflected in the autogenerated file.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: get reset type indication from irq_map
Ohad Sharabi [Sun, 13 Nov 2022 08:49:05 +0000 (10:49 +0200)]
accel/habanalabs: get reset type indication from irq_map

When getting an event, add the ability to deduce the reset type from
the IRQ map table instead of using hard reset regardless.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: enable graceful reset mechanism for compute-reset
Tomer Tayar [Wed, 25 Jan 2023 09:38:56 +0000 (11:38 +0200)]
accel/habanalabs: enable graceful reset mechanism for compute-reset

The graceful reset mechanism is currently enabled only for reset
requests that will end up with hard-reset.
In future, reset requests due to errors in some device engines, are
going to be modified to request compute-reset, as the much longer
hard-reset is not really needed there.
To allow it, enable graceful reset also for compute-reset, and reset
after user releases the device won't be escalated to hard-reset in those
cases.
If watchdog expires and user didn't release the device, hard-reset will
be initiated in any case.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: disable PCI when escalating compute to hard-reset
Koby Elbaz [Sun, 29 Jan 2023 11:35:58 +0000 (13:35 +0200)]
accel/habanalabs: disable PCI when escalating compute to hard-reset

In case a compute reset has failed or a request for a hard reset has
just arrived, then we escalate current reset procedure from compute
to hard-reset.
In such a case, the FW should be aware of the updated error cause,
and if LKD is the one who performs the reset (rather than the FW),
then we ask the FW to disable PCI access.

We would also like to have relevant debug info and therefore
we print the currently escalating reset type.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: minimize error prints when mem map fails
Moti Haimovski [Wed, 25 Jan 2023 18:16:19 +0000 (20:16 +0200)]
accel/habanalabs: minimize error prints when mem map fails

This commit minimizes the "chain of errors" displayed when memory
mapping fails.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: unsecure CFG_TPC_ID register
Koby Elbaz [Tue, 24 Jan 2023 12:13:20 +0000 (14:13 +0200)]
accel/habanalabs: unsecure CFG_TPC_ID register

Required to allow the TPC compiler to know on which offset of the index
space it works on.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: expose engine core int reg address
Ofir Bitton [Sun, 22 Jan 2023 12:06:15 +0000 (14:06 +0200)]
accel/habanalabs: expose engine core int reg address

In order for engine cores to raise interrupts towards FW, They need
to know which register the event data should be written to.
Hence, we forward the relevant scratchpad register received during
dynamic regs handshake with FW.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: add critical-event bit in notifier
Moti Haimovski [Tue, 10 Jan 2023 15:35:31 +0000 (17:35 +0200)]
accel/habanalabs: add critical-event bit in notifier

Enhance the existing user notifications by adding a HW and FW critical
event bits to be used when a HW or FW event occur that requires
both SW abort and hard-resetting the chip.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: enforce release order of compute device and dma-buf
Tomer Tayar [Sun, 22 Jan 2023 10:17:02 +0000 (12:17 +0200)]
accel/habanalabs: enforce release order of compute device and dma-buf

When user closes the compute device file descriptor without closing a
dma-buf file descriptor, the device will be considered as in use,
leading to hard reset and killing the user process, to ensure the
release of the dma-buf.
Same thing will happen if user first releases the compute device file
and only then the dma-buf.

The implication of this is the duration of hard reset, during which the
device cannot be reacquired.
Moreover, this behavior adds a constraint on a user process to follow
this order of release operations.

To avoid killing the user process and to remove this constraint, enforce
the correct order of release operations inside the driver, by
incrementing the device file refcount for any dma-buf until it is
released.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: add info when FD released while device still in use
Tomer Tayar [Wed, 18 Jan 2023 15:35:17 +0000 (17:35 +0200)]
accel/habanalabs: add info when FD released while device still in use

When user closes the device file descriptor, it is checked whether the
device is still in use, and a message is printed if it is.
To make this message more informative, add to this print also the reason
due to which the device is considered as in use.
The possible reasons which are checked for now are active CS and
exported dma-buf.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: fix address decode RAZWI handling
Dani Liberman [Tue, 17 Jan 2023 11:17:15 +0000 (13:17 +0200)]
accel/habanalabs: fix address decode RAZWI handling

PSOC RAZWI handling code did not took into account single router that
supports several initiators with different XY coordinates. Also, it
ignored XY_HI coordinate. This caused 2 problems:
1. RAZWI handle ignored some initiators.
2. When getting PSOC RAZWI from some routers, there was a lot of
   possible engines which could have caused the RAZWI.

Fixed the above issue by handling PSOC RAZWI with both low and high
XY coordinates. This way driver supports all initiators and in
the worst case there are not more than 2 possible engines for RAZWI.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
20 months agoaccel/habanalabs: use memhash_node_export_put() in hl_release_dmabuf()
Tomer Tayar [Wed, 18 Jan 2023 14:00:55 +0000 (16:00 +0200)]
accel/habanalabs: use memhash_node_export_put() in hl_release_dmabuf()

The same mutex lock/unlock and counter decrementing in
hl_release_dmabuf() is already done in the memhash_node_export_put()
helper function.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>