Evan Quan [Mon, 13 Dec 2021 03:37:56 +0000 (11:37 +0800)]
drm/amdgpu: move smu_debug_mask to a more proper place
As the smu_context will be invisible from outside(of power). Also,
the smu_debug_mask can be shared around all power code instead of
some specific framework(swSMU) only.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Victor Skvortsov [Mon, 13 Dec 2021 21:38:20 +0000 (21:38 +0000)]
drm/amdgpu: SRIOV flr_work should use down_write
Host initiated VF FLR may fail if someone else is
already holding a read_lock. Change from down_write_trylock
to down_write to guarantee the reset goes through.
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aric Cyr [Fri, 10 Dec 2021 23:04:08 +0000 (15:04 -0800)]
drm/amd/display: 3.2.166
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Martin Leung [Fri, 10 Dec 2021 23:04:07 +0000 (15:04 -0800)]
drm/amd/display: implement dc_mode_memclk
why:
Need interface to lower clocks when in dc (power save)
mode. Must be able to work with p_state unsupported cases
Can cause flicker when OS notifies us of dc state change
how:
added dal3 interface for KMD
added pathway to query smu for this softmax
added blank before clock change to override underflow
added logic to change clk based on pstatesupport and softmax
added logic in prepare/optimize_bw to conform while changing
clocks
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Martin Leung <Martin.Leung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Eric Bernstein [Fri, 10 Dec 2021 23:04:06 +0000 (15:04 -0800)]
drm/amd/display: ODM + MPO window on only one half of ODM
[Why]
For ODM + MPO window on one half of ODM, only 3 pipes should
be allocated and scaling parameters adjusted to handle this case
[How]
Fix pipe allocation when MPO viewport is only on one side of ODM
split, and modify scaling paramters.
Added diags test cases for ODM + windows MPO, where MPO window is
on right half, left half, and both halves or ODM.
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Eric Bernstein <eric.bernstein@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nicholas Kazlauskas [Fri, 10 Dec 2021 23:04:05 +0000 (15:04 -0800)]
drm/amd/display: Reset DMCUB before HW init
[Why]
If the firmware wasn't reset by PSP or HW and is currently running
then the firmware will hang or perform underfined behavior when we
modify its firmware state underneath it.
[How]
Reset DMCUB before setting up cache windows and performing HW init.
Reviewed-by: Aurabindo Jayamohanan Pillai <Aurabindo.Pillai@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Anthony Koo [Fri, 10 Dec 2021 23:04:04 +0000 (15:04 -0800)]
drm/amd/display: [FW Promotion] Release 0.0.97
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michael Strauss [Fri, 10 Dec 2021 23:04:03 +0000 (15:04 -0800)]
drm/amd/display: Force det buf size to 192KB with 3+ streams and upscaling
[WHY]
This workaround resolves underflow caused by incorrect DST_Y_PREFETCH.
Overriding to 192KB DET buf size until the DST_Y_PREFETCH calc is fixed.
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mikita Lipski [Fri, 10 Dec 2021 23:04:02 +0000 (15:04 -0800)]
drm/amd/display: parse and check PSR SU caps
[why]
Adding a function to read PSR capabilities
and ALPM capabilities.
Also adding a helper function to validate if
the sink and the driver support PSR SU.
[how]
- isolated all PSR and ALPM reading calls to a separate funciton
- set all required PSR caps
- added a helper function to check if PSR SU is supported by sink
and the driver
Reviewed-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Solomon Chiu [Fri, 10 Dec 2021 23:04:01 +0000 (15:04 -0800)]
drm/amd/display: Add src/ext ID info for dummy service
[Why]
Current error log of dummy irq service doesn't have
src/ext ID info in the log.
[How]
Add src/ext ID in ack/set of dummy irq service.
Reviewed-by: Wayne Lin <Wayne.Lin@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wayne Lin [Fri, 10 Dec 2021 23:04:00 +0000 (15:04 -0800)]
drm/amd/display: Add debugfs entry for ILR
[Why & How]
In order to know the intermediate link rates supported by the eDP
panel and test to select the optimized link rate to save power,
create a new debugfs entry "ilr_setting" for
setting ILR.
Reviewed-by: Aurabindo Jayamohanan Pillai <Aurabindo.Pillai@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nicholas Kazlauskas [Fri, 10 Dec 2021 23:03:59 +0000 (15:03 -0800)]
drm/amd/display: Set exit_optimized_pwr_state for DCN31
[Why]
SMU now respects the PHY refclk disable request from driver.
This causes a hang during hotplug when PHY refclk was disabled
because it's not being re-enabled and the transmitter control
starts on dc_link_detect.
[How]
We normally would re-enable the clk with exit_optimized_pwr_state
but this is only set on DCN21 and DCN301. Set it for dcn31 as well.
This fixes DMCUB timeouts in the PHY.
Fixes: 64b1d0e8d500 ("drm/amd/display: Add DCN3.1 HWSEQ")
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
chiminghao [Thu, 9 Dec 2021 01:58:23 +0000 (01:58 +0000)]
drm:amdgpu:remove unneeded variable
return value form directly instead of
taking this in another redundant variable.
Reported-by: Zeal Robot <zealci@zte.com.cm>
Signed-off-by: chiminghao <chi.minghao@zte.com.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yann Dirson [Fri, 10 Dec 2021 18:20:28 +0000 (19:20 +0100)]
Documentation/gpu: split amdgpu/index for readability
This starts to make the formated index much more manageable to the reader.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Yann Dirson <ydirson@free.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Mon, 13 Dec 2021 15:05:09 +0000 (09:05 -0600)]
drivers/amd/pm: drop statement to print FW version for smu_v13
Update smu_v13 to match smu_v12 and smu_v11 behavior where this is
fetched from debugfs rather than in kernel logs on every boot.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Thu, 9 Dec 2021 18:13:53 +0000 (12:13 -0600)]
drm/amd/pm: fix reading SMU FW version from amdgpu_firmware_info on YC
This value does not get cached into adev->pm.fw_version during
startup for smu13 like it does for other SMU like smu12.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Wed, 8 Dec 2021 17:46:51 +0000 (14:46 -0300)]
drm/amd/display: fix function scopes
This turns previously global functions into static, thus removing
compile-time warnings such as:
warning: no previous prototype for 'get_highest_allowed_voltage_level'
[-Wmissing-prototypes]
742 | unsigned int get_highest_allowed_voltage_level(uint32_t chip_family, uint32_t hw_internal_rev, uint32_t pci_revision_id)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warning: no previous prototype for 'rv1_vbios_smu_send_msg_with_param'
[-Wmissing-prototypes]
102 | int rv1_vbios_smu_send_msg_with_param(struct clk_mgr_internal *clk_mgr, unsigned int msg_id, unsigned int param)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Changes since v1:
- As suggested by Rodrigo Siqueira:
1. Rewrite function signatures to make them more readable.
2. Get rid of unused functions in order to remove 'defined but not
used' warnings.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michel Dänzer [Thu, 9 Dec 2021 16:46:44 +0000 (17:46 +0100)]
drm/amd/display: Reduce stack size for dml31 UseMinimumDCFCLK
Use the struct display_mode_lib pointer instead of passing lots of large
arrays as parameters by value.
Addresses this warning (resulting in failure to build a RHEL debug kernel
with Werror enabled):
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c: In function ‘UseMinimumDCFCLK’:
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:7478:1: warning: the frame size of 2128 bytes is larger than 2048 bytes [-Wframe-larger-than=]
NOTE: AFAICT this function previously had no observable effect, since it
only modified parameters passed by value and doesn't return anything.
Now it may modify some values in struct display_mode_lib passed in by
reference.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michel Dänzer [Thu, 9 Dec 2021 16:46:43 +0000 (17:46 +0100)]
drm/amd/display: Reduce stack size for dml31_ModeSupportAndSystemConfigurationFull
Move code using the Pipe struct to a new helper function.
Works around[0] this warning (resulting in failure to build a RHEL debug
kernel with Werror enabled):
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c: In function ‘dml31_ModeSupportAndSystemConfigurationFull’:
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:5740:1: warning: the frame size of 2144 bytes is larger than 2048 bytes [-Wframe-larger-than=]
The culprit seems to be the Pipe struct, so pull the relevant block out
into its own sub-function. (This is porting
commit
a62427ef9b55 ("drm/amd/display: Reduce stack size for dml21_ModeSupportAndSystemConfigurationFull")
from dml31 to dml21)
[0] AFAICT this doesn't actually reduce the total amount of stack which
can be used, just moves some of it from
dml31_ModeSupportAndSystemConfigurationFull to the new helper function,
so the former happens to no longer exceed the limit for a single
function.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Thu, 9 Dec 2021 15:47:22 +0000 (12:47 -0300)]
drm/amdgpu: re-format file header comments
Fix the warning below:
warning: Cannot understand * \file amdgpu_ioc32.c
on line 2 - I thought it was a doc line
Changes since v1:
- As suggested by Alexander Deucher:
1. Reduce diff to minimum as this DOC section doesn't provide much
value.
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Thu, 9 Dec 2021 15:47:21 +0000 (12:47 -0300)]
drm/amdgpu: remove unnecessary variables
This fixes the warnings below, and also drops the display_count
variable, as it's unused.
In function 'svm_range_map_to_gpu':
warning: variable 'bo_va' set but not used [-Wunused-but-set-variable]
1172 | struct amdgpu_bo_va bo_va;
| ^~~~~
...
In function 'dcn201_update_clocks':
warning: variable 'enter_display_off' set but not used [-Wunused-but-set-variable]
132 | bool enter_display_off = false;
| ^~~~~~~~~~~~~~~~~
Changes since v1:
- As suggested by Rodrigo Siqueira:
1. Drop display_count variable.
- As suggested by Felix Kuehling:
1. Remove block surrounding amdgpu_xgmi_same_hive.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Thu, 9 Dec 2021 15:47:19 +0000 (12:47 -0300)]
drm/amdgpu: fix amdgpu_ras_mca_query_error_status scope
This commit fixes the compile-time warning below:
warning: no previous prototype for ‘amdgpu_ras_mca_query_error_status’
[-Wmissing-prototypes]
Changes since v1:
- As suggested by Alexander Deucher:
1. Make function static instead of adding prototype.
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Fri, 10 Dec 2021 15:49:52 +0000 (09:49 -0600)]
drm/amd: move variable to local scope
`edp_stream` is only used when backend is enabled on eDP, don't
declare the variable outside that scope.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Fri, 10 Dec 2021 15:45:23 +0000 (09:45 -0600)]
drm/amd: add some extra checks that is_dig_enabled is defined
There are a few places that this isn't checked that could potentially
be a NULL pointer access.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Tue, 7 Dec 2021 02:17:14 +0000 (21:17 -0500)]
drm/amdgpu: Reduce SG bo memory usage for mGPUs
For userptr bo, if adev is not in IOMMU isolation mode, RAM direct map
to GPU, multiple GPUs use same system memory dma mapping address, they
can share the original mem->bo in attachment to reduce dma address array
memory usage.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Tue, 7 Dec 2021 02:10:52 +0000 (21:10 -0500)]
drm/amdgpu: Detect if amdgpu in IOMMU direct map mode
If host and amdgpu IOMMU is not enabled or IOMMU is pass through mode,
set adev->ram_is_direct_mapped flag which will be used to optimize
memory usage for multi GPU mappings.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rodrigo Siqueira [Thu, 25 Nov 2021 15:38:30 +0000 (10:38 -0500)]
Documentation/gpu: Add amdgpu and dc glossary
In the DC driver, we have multiple acronyms that are not obvious most of
the time; the same idea is valid for amdgpu. This commit introduces a DC
and amdgpu glossary in order to make it easier to navigate through our
driver.
Changes since V3:
- Yann: Add new acronyms to amdgpu glossary
- Daniel: Add link between dc and amdgpu glossary
Changes since V2:
- Add MMHUB
Changes since V1:
- Yann: Divide glossary based on driver context.
- Alex: Make terms more consistent and update CPLIB
- Add new acronyms to the glossary
Reviewed-by: Yann Dirson <ydirson@free.fr>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rodrigo Siqueira [Thu, 25 Nov 2021 15:38:29 +0000 (10:38 -0500)]
Documentation/gpu: Add basic overview of DC pipeline
This commit describes how DCN works by providing high-level diagrams
with an explanation of each component. In particular, it details the
Global Sync signals.
Change since V2:
- Add a comment about MMHUBBUB.
Reviewed-by: Yann Dirson <ydirson@free.fr>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rodrigo Siqueira [Thu, 25 Nov 2021 15:38:28 +0000 (10:38 -0500)]
Documentation/gpu: How to collect DTN log
Introduce how to collect DTN log from debugfs.
Reviewed-by: Yann Dirson <ydirson@free.fr>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rodrigo Siqueira [Thu, 25 Nov 2021 15:38:27 +0000 (10:38 -0500)]
Documentation/gpu: Document pipe split visual confirmation
Display core provides a feature that makes it easy for users to debug
Pipe Split. This commit introduces how to use such a debug option.
Reviewed-by: Yann Dirson <ydirson@free.fr>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rodrigo Siqueira [Thu, 25 Nov 2021 15:38:26 +0000 (10:38 -0500)]
Documentation/gpu: Document amdgpu_dm_visual_confirm debugfs entry
Display core provides a feature that makes it easy for users to debug
Multiple planes by enabling a visual notification at the bottom of each
plane. This commit introduces how to use such a feature.
Reviewed-by: Yann Dirson <ydirson@free.fr>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rodrigo Siqueira [Thu, 25 Nov 2021 15:38:25 +0000 (10:38 -0500)]
Documentation/gpu: Reorganize DC documentation
Display core documentation is not well organized, and it is hard to find
information due to the lack of sections. This commit reorganizes the
documentation layout, and it is preparation work for future changes.
Changes since V1:
- Christian: Group amdgpu documentation together.
- Daniel: Drop redundant amdgpu prefix.
- Jani: Create index pages.
- Yann: Mirror display folder in the documentation.
Reviewed-by: Yann Dirson <ydirson@free.fr>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lang Yu [Tue, 30 Nov 2021 03:06:40 +0000 (11:06 +0800)]
drm/amdgpu: add support for SMU debug option
SMU firmware expects the driver maintains error context
and doesn't interact with SMU any more when SMU errors
occurred. That will aid in debugging SMU firmware issues.
Add SMU debug option support for this request, it can be
enabled or disabled via amdgpu_smu_debug debugfs file.
Use a 32-bit mask to indicate corresponding debug modes.
Currently, only one mode(HALT_ON_ERROR) is supported.
When enabled, it brings hardware to a kind of halt state
so that no one can touch it any more in the envent of SMU
errors.
The dirver interacts with SMU via sending messages. And
threre are three ways to sending messages to SMU in current
implementation. Handle them respectively as following:
1, smu_cmn_send_smc_msg_with_param() for normal timeout cases
Halt on any error.
2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response()
for longer timeout cases
Halt on errors apart from ETIME. Otherwise this way won't work.
Let the user handle ETIME error in such a case.
3, smu_cmn_send_msg_without_waiting() for no waiting cases
Halt on errors apart from ETIME. Otherwise second way won't work.
== Command Guide ==
1, enable HALT_ON_ERROR mode
# echo 0x1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
2, disable HALT_ON_ERROR mode
# echo 0x0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
v5:
- Use bit mask to allow more debug features.(Evan)
- Use WRAN() instead of BUG().(Evan)
v4:
- Set to halt state instead of a simple hang.(Christian)
v3:
- Use debugfs_create_bool().(Christian)
- Put variable into smu_context struct.
- Don't resend command when timeout.
v2:
- Resend command when timeout.(Lijo)
- Use debugfs file instead of module parameter.
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lang Yu [Thu, 9 Dec 2021 07:10:32 +0000 (15:10 +0800)]
drm/amdgpu: introduce a kind of halt state for amdgpu device
It is useful to maintain error context when debugging
SW/FW issues. Introduce amdgpu_device_halt() for this
purpose. It will bring hardware to a kind of halt state,
so that no one can touch it any more.
Compare to a simple hang, the system will keep stable
at least for SSH access. Then it should be trivial to
inspect the hardware state and see what's going on.
v2:
- Set adev->no_hw_access earlier to avoid potential crashes.(Christian)
Suggested-by: Christian Koenig <christian.koenig@amd.com>
Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.co>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Thu, 25 Nov 2021 07:41:37 +0000 (15:41 +0800)]
drm/amdgpu: check df_funcs and its callback pointers
in case they are not avaiable in early phase
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Sat, 4 Dec 2021 11:22:12 +0000 (19:22 +0800)]
drm/amdgpu: don't override default ECO_BITs setting
Leave this bit as hardware default setting
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Le Ma [Sat, 4 Dec 2021 10:59:08 +0000 (18:59 +0800)]
drm/amdgpu: correct register access for RLC_JUMP_TABLE_RESTORE
should count on GC IP base address
Signed-off-by: Le Ma <le.ma@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Mon, 22 Nov 2021 13:32:48 +0000 (21:32 +0800)]
drm/amdgpu: read and authenticate ip discovery binary
read and authenticate ip discovery binary getting from
vram first, if it is not valid, read and authenticate
the one getting from file
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Mon, 22 Nov 2021 13:19:06 +0000 (21:19 +0800)]
drm/amdgpu: add helper to verify ip discovery binary signature
To be used to check ip discovery binary signature
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Mon, 22 Nov 2021 12:12:59 +0000 (20:12 +0800)]
drm/amdgpu: rename discovery_read_binary helper
add _from_vram in the funciton name to diffrentiate
the one used to read from file
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Tue, 23 Nov 2021 07:25:03 +0000 (15:25 +0800)]
drm/amdgpu: add helper to load ip_discovery binary from file
To be used when ip_discovery binary is not carried by vbios
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leslie Shi [Wed, 8 Dec 2021 07:47:41 +0000 (15:47 +0800)]
drm/amdgpu: fix incorrect VCN revision in SRIOV
Guest OS will setup VCN instance 1 which is disabled as an enabled instance and
execute initialization work on it, but this causes VCN ib ring test failure
on the disabled VCN instance during modprobe:
amdgpu 0000:00:08.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 5 on hub 1
amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_dec_0 (-110).
amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_enc_0.0 (-110).
[drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110).
v2: drop amdgpu_discovery_get_vcn_version and rename sriov_config to
vcn_config
v3: modify VCN's revision in SR-IOV and bare-metal
Fixes: baf3f8f3740625 ("drm/amdgpu: handle SRIOV VCN revision parsing")
Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leslie Shi [Mon, 6 Dec 2021 08:44:54 +0000 (16:44 +0800)]
drm/amdgpu: add modifiers in amdgpu_vkms_plane_init()
Fix following warning in SRIOV during modprobe:
amdgpu 0000:00:08.0: GFX9+ requires FB check based on format modifier
WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu]
Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Fri, 10 Dec 2021 03:01:22 +0000 (22:01 -0500)]
drm/amdgpu: disable default navi2x co-op kernel support
This patch reverts the following:
commit
48733b224fa7ba ("drm/amdkfd: add Navi2x to GWS init conditions")
Disable GWS usage in default settings for now due to FW bugs.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Graham Sider [Thu, 9 Dec 2021 18:26:05 +0000 (13:26 -0500)]
drm/amdkfd: add Navi2x to GWS init conditions
Initalize GWS on Navi2x with mec2_fw_version >= 0x42.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-and-tested-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lang Yu [Fri, 3 Dec 2021 06:30:16 +0000 (14:30 +0800)]
drm/amdgpu: only hw fini SMU fisrt for ASICs need that
We found some headaches on ASICs don't need that,
so remove that for them.
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Kevin Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lang Yu [Fri, 3 Dec 2021 06:20:48 +0000 (14:20 +0800)]
drm/amdgpu: remove power on/off SDMA in SMU hw_init/fini()
Currently, we don't find some neccesities to power on/off
SDMA in SMU hw_init/fini(). It makes more sense in SDMA
hw_init/fini().
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Kevin Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Wed, 8 Dec 2021 03:20:35 +0000 (22:20 -0500)]
drm/amdkfd: Make KFD support on Hawaii experimental
Hawaii support is mostly untested these days. ROCm user mode also
depends on custom firmware for AQL packet processing, that was never
pushed upstream due to quality regressions in graphics driver testing.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Wed, 8 Dec 2021 02:04:06 +0000 (21:04 -0500)]
drm/amdkfd: Don't split unchanged SVM ranges
If an existing SVM range overlaps an svm_range_set_attr call, we would
normally split it in order to update only the overlapping part.
However, if the attributes of the existing range would not be changed
splitting it is unnecessary.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Wed, 8 Dec 2021 02:02:42 +0000 (21:02 -0500)]
drm/amdkfd: Fix svm_range_is_same_attrs
The existing function doesn't compare the access bitmaps and flags.
This can result in failure to update those attributes in existing
ranges when all other attributes remained unchanged.
Because the access and flags attributes modify only some bits in the
respective bitmaps, we cannot compare them directly. Instead we need to
check whether applying the attributes to a particular range would
change the bitmaps.
A PREFETCH_LOC attribute must always trigger a migration, even if the
attribute value remains unchanged. E.g. if some pages were migrated due
to a CPU page fault, a prefetch must still be executed to migrate pages
back to VRAM.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Wed, 8 Dec 2021 01:23:14 +0000 (20:23 -0500)]
drm/amdkfd: Fix error handling in svm_range_add
Add null-pointer check after the last svm_range_new call. This was
originally reported by Zhou Qingyang <zhou1615@umn.edu> based on a
static analyzer.
To avoid duplicating the unwinding code from svm_range_handle_overlap,
I merged the two functions into one.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Zhou Qingyang <zhou1615@umn.edu>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Wed, 8 Dec 2021 19:55:15 +0000 (14:55 -0500)]
drm/amdgpu: Handle fault with same timestamp
Remove not unique timestamp WARNING as same timestamp interrupt happens
on some chips,
Drain fault need to wait for the processed_timestamp to be truly greater
than the checkpoint or the ring to be empty to be sure no stale faults
are handled.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1818
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Wed, 8 Dec 2021 01:25:27 +0000 (22:25 -0300)]
drm/amdgpu: fix location of prototype for amdgpu_kms_compat_ioctl
This fixes the warning below by changing the prototype to a location
that's actually included by the .c files that call
amdgpu_kms_compat_ioctl:
warning: no previous prototype for ‘amdgpu_kms_compat_ioctl’
[-Wmissing-prototypes]
37 | long amdgpu_kms_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
| ^~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Wed, 8 Dec 2021 01:25:26 +0000 (22:25 -0300)]
drm/amd: append missing includes
This fixes warnings caused by global functions lacking prototypes:, such as:
warning: no previous prototype for 'dcn303_hw_sequencer_construct'
[-Wmissing-prototypes]
12 | void dcn303_hw_sequencer_construct(struct dc *dc)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
warning: no previous prototype for ‘amdgpu_has_atpx’
[-Wmissing-prototypes]
76 | bool amdgpu_has_atpx(void) {
| ^~~~~~~~~~~~~~~
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Wed, 8 Dec 2021 01:25:24 +0000 (22:25 -0300)]
drm/amdkfd: fix function scopes
This turns previously global functions into static, thus removing
compile-time warnings such as:
warning: no previous prototype for 'pm_set_resources_vi' [-Wmissing-prototypes]
113 | int pm_set_resources_vi(struct packet_manager *pm, uint32_t *buffer,
| ^~~~~~~~~~~~~~~~~~~
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Wed, 8 Dec 2021 01:25:23 +0000 (22:25 -0300)]
drm/amdgpu: fix function scopes
This turns previously global functions into static, thus removing
compile-time warnings such as:
warning: no previous prototype for 'amdgpu_vkms_output_init' [-Wmissing-prototypes]
399 | int amdgpu_vkms_output_init(struct drm_device *dev,
| ^~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Wed, 8 Dec 2021 01:25:21 +0000 (22:25 -0300)]
drm/amd: fix improper docstring syntax
This fixes various warnings relating to erroneous docstring syntax, of
which some are listed below:
warning: Function parameter or member 'adev' not described in
'amdgpu_atomfirmware_ras_rom_addr'
...
warning: expecting prototype for amdgpu_atpx_validate_functions().
Prototype was for amdgpu_atpx_validate() instead
...
warning: Excess function parameter 'mem' description in 'amdgpu_preempt_mgr_new'
...
warning: Cannot understand * @kfd_get_cu_occupancy - Collect number of
waves in-flight on this device
...
warning: This comment starts with '/**', but isn't a kernel-doc
comment. Refer Documentation/doc-guide/kernel-doc.rst
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Isabella Basso [Wed, 8 Dec 2021 01:25:20 +0000 (22:25 -0300)]
drm/amd: Mark IP_BASE definition as __maybe_unused
Silences 166 compile-time warnings like:
warning: 'UVD0_BASE' defined but not used [-Wunused-const-variable=]
129 | static const struct IP_BASE UVD0_BASE ={ { { { 0x00007800, 0x00007E00, 0, 0, 0 } },
| ^~~~~~~~~
warning: 'UMC0_BASE' defined but not used [-Wunused-const-variable=]
123 | static const struct IP_BASE UMC0_BASE ={ { { { 0x00014000, 0, 0, 0, 0 } },
| ^~~~~~~~~
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhigang Luo [Mon, 6 Dec 2021 21:40:24 +0000 (16:40 -0500)]
drm/amdgpu: extended waiting SRIOV VF reset completion timeout to 10s
For the ASIC has big FB, it need more time to clear FB during reset.
This change extended SRIOV VF waiting reset completion timeout from 5s
to 10s.
Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Acked-by: Shaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhigang Luo [Mon, 6 Dec 2021 21:21:03 +0000 (16:21 -0500)]
drm/amdgpu: recover XGMI topology for SRIOV VF after reset
For SRIOV VF, the XGMI topology was not recovered after reset. This
change added code to SRIOV VF reset function to update XGMI topology
for SRIOV VF after reset.
Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhigang Luo [Mon, 6 Dec 2021 19:52:00 +0000 (14:52 -0500)]
drm/amdgpu: added PSP XGMI initialization for SRIOV VF during recover
For SRIOV VF, XGMI was not initialized in PSP during recover. This change
added PSP XGMI initialization for SRIOV VF during recover.
Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhigang Luo [Fri, 26 Nov 2021 17:16:45 +0000 (12:16 -0500)]
drm/amdgpu: skip reset other device in the same hive if it's SRIOV VF
On SRIOV, host driver can support FLR(function level reset) on individual VF
within the hive which might bring the individual device back to normal without
the necessary to execute the hive reset. If the FLR failed , host driver will
trigger the hive reset, each guest VF will get reset notification before the
real hive reset been executed. The VF device can handle the reset request
individually in it's reset work handler.
This change updated gpu recover sequence to skip reset other device in
the same hive for SRIOV VF.
Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aurabindo Pillai [Tue, 7 Dec 2021 17:14:40 +0000 (12:14 -0500)]
drm/amd/display: Add feature flags to disable LTTPR
[Why]
Allow for disabling non transparent mode of LTTPR for running tests.
[How]
Add a feature flag and set them during init sequence. The flags are
already being used in DC.
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Wed, 8 Dec 2021 06:01:09 +0000 (14:01 +0800)]
drm/amdgpu: enable RAS poison flag when GPU is connected to CPU
The RAS poison mode is enabled by default on the platform.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fangzhi Zuo [Tue, 7 Dec 2021 18:47:51 +0000 (13:47 -0500)]
drm/amd/display: Add Debugfs Entry to Force in SST Sequence
It is to force SST sequence on MST capable receivers.
v2: squash in compilation fix when CONFIG_DRM_AMD_DC_DCN is not set
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Claudio Suarez [Sun, 17 Oct 2021 11:35:00 +0000 (13:35 +0200)]
drm/amdgpu: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi
Once EDID is parsed, the monitor HDMI support information is available
through drm_display_info.is_hdmi. The amdgpu driver still calls
drm_detect_hdmi_monitor() to retrieve the same information, which
is less efficient. Change to drm_display_info.is_hdmi
This is a TODO task in Documentation/gpu/todo.rst
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Claudio Suarez <cssk@net-c.es>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Claudio Suarez [Sun, 17 Oct 2021 11:34:59 +0000 (13:34 +0200)]
drm/amdgpu: use drm_edid_get_monitor_name() instead of duplicating the code
Use drm_edid_get_monitor_name() instead of duplicating the code that
parses the EDID in dm_helpers_parse_edid_caps()
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Claudio Suarez <cssk@net-c.es>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Claudio Suarez [Sun, 17 Oct 2021 11:34:58 +0000 (13:34 +0200)]
drm/amdgpu: update drm_display_info correctly when the edid is read
drm_display_info is updated by drm_get_edid() or
drm_connector_update_edid_property(). In the amdgpu driver it is almost
always updated when the edid is read in amdgpu_connector_get_edid(),
but not always. Change amdgpu_connector_get_edid() and
amdgpu_connector_free_edid() to keep drm_display_info updated.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Claudio Suarez <cssk@net-c.es>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nicholas Kazlauskas [Tue, 7 Dec 2021 14:46:39 +0000 (09:46 -0500)]
drm/amd/display: Fix out of bounds access on DNC31 stream encoder regs
[Why]
During dcn31_stream_encoder_create, if PHYC/D get remapped to F/G on B0
then we'll index 5 or 6 into a array of length 5 - leading to an
access violation on some configs during device creation.
[How]
Software won't be touching PHYF/PHYG directly, so just extend the
array to cover all possible engine IDs.
Even if it does by try to access one of these registers by accident
the offset will be 0 and we'll get a warning during the access.
Fixes: 2fe9a0e1173f ("drm/amd/display: Fix DCN3 B0 DP Alt Mapping")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stanley.Yang [Tue, 7 Dec 2021 06:28:58 +0000 (14:28 +0800)]
drm/amdgpu: skip umc ras error count harvest
remove in recovery stat check, skip umc ras err cnt
harvest in amdgpu_ras_log_on_err_counter
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Flora Cui [Thu, 2 Dec 2021 07:47:56 +0000 (15:47 +0800)]
drm/amdgpu: free vkms_output after use
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Leslie Shi <Yuliang.Shi@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Flora Cui [Wed, 24 Nov 2021 07:53:17 +0000 (15:53 +0800)]
drm/amdgpu: drop the critial WARN_ON in amdgpu_vkms
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Leslie Shi <Yuliang.Shi@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aric Cyr [Sat, 27 Nov 2021 02:20:09 +0000 (21:20 -0500)]
drm/amd/display: Reduce stack usage
Reduce stack usage by moving an unnecessary structure copy to a pointer.
Reviewed-by: Joshua Aberback <Joshua.Aberback@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nicholas Kazlauskas [Sun, 28 Nov 2021 17:10:14 +0000 (12:10 -0500)]
drm/amd/display: Query DMCUB for dp alt status
[Why]
To avoid hanging RDPCSPIPE when INTERCEPTB isn't set.
DMCUB owns control of that bit so DMCUB should manage returning the
information driver needs for link encoder control.
[How]
Add a new DMCUB command to return dp alt disable and dp4 information.
Reviewed-by: Hansen Dsouza <hansen.dsouza@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Anthony Koo [Sun, 28 Nov 2021 02:12:20 +0000 (21:12 -0500)]
drm/amd/display: [FW Promotion] Release 0.0.96
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wenjing Liu [Fri, 26 Nov 2021 19:18:59 +0000 (14:18 -0500)]
drm/amd/display: add a debug option to force dp2 lt fallback method
[why]
A debug option is needed to temporarily force dp2 new link training
fallback method for debugging purpose.
Reviewed-by: George Shen <George.Shen@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Oliver Logush [Wed, 24 Nov 2021 22:24:05 +0000 (17:24 -0500)]
drm/amd/display: Rename a struct field to describe a cea component better
[why]
Need to fix the code so it does not use reserved keywords
[how]
Change the total_length member of the cea struct
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: Oliver Logush <oliver.logush@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Meenakshikumar Somasundaram [Thu, 25 Nov 2021 23:31:26 +0000 (18:31 -0500)]
drm/amd/display: Adding dpia debug bits for hpd delay
[Why]
Need to have dpia debug bits for configuring hpd delay.
[How]
Added hpd_delay_in_ms variable in dpia_debug_options.
Reviewed-by: Jimmy Kizito <Jimmy.Kizito@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: meenakshikumar somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jude Shih [Tue, 23 Nov 2021 05:53:00 +0000 (13:53 +0800)]
drm/amd/display: Move link_enc init logic to DC
[Why]
We shouldn't be accessing res_pool funcs from DM level,
therefore, we should create API and let the flow
be done in DC level.
[How]
We create new interface dp_get_link_enc to access and get the correct link_enc
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: Jude Shih <shenshih@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wayne Lin [Fri, 26 Nov 2021 07:19:57 +0000 (15:19 +0800)]
drm/amd/display: Fix bug in debugfs crc_win_update entry
[Why]
crc_rd_wrk shouldn't be null in crc_win_update_set(). Current programming
logic is inconsistent in crc_win_update_set().
[How]
Initially, return if crc_rd_wrk is NULL. Later on, we can use member of
crc_rd_wrk safely.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 9a65df193108 ("drm/amd/display: Use PSP TA to read out crc")
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mikita Lipski [Mon, 15 Nov 2021 21:07:38 +0000 (16:07 -0500)]
drm/amd/display: prevent reading unitialized links
[why/how]
The function can be called on boot or after suspend when
links are not initialized, to prevent it guard it with
NULL pointer check
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jarif Aftab [Wed, 24 Nov 2021 18:44:52 +0000 (13:44 -0500)]
drm/amd/display: Added Check For dc->res_pool
[WHY]
-To ensure dc->res_pool has been initialized
[HOW]
-Check if dc->res_pool is true in
the if statement
Reviewed-by: Martin Leung <Martin.Leung@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: Jarif Aftab <jaraftab@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wyatt Wood [Wed, 24 Nov 2021 17:50:20 +0000 (12:50 -0500)]
drm/amd/display: Prevent PSR disable/reenable in HPD IRQ
[Why]
When HPD IRQ occurs, it triggers a PSR disable and reenable
directly through dc layer.
Since it does not pass through the power layer, the layer
that tracks whether PSR is enabled or disabled and which
masks are set, this layer is now out of sync with the real
PSR state in FW.
Theoretically PSR can be enabled during hw programming
sequences or any other situation where we must disable PSR.
[How]
Check if PSR is enabled before doing PSR disable/reenable.
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: Wyatt Wood <wyatt.wood@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nicholas Kazlauskas [Tue, 23 Nov 2021 16:56:38 +0000 (11:56 -0500)]
drm/amd/display: Fix DPIA outbox timeout after S3/S4/reset
[Why]
The HW interrupt gets disabled after S3/S4/reset so we don't receive
notifications for HPD or AUX from DMUB - leading to timeout and
black screen with (or without) DPIA links connected.
[How]
Re-enable the interrupt after S3/S4/reset like we do for the other
DC interrupts.
Guard both instances of the outbox interrupt enable or we'll hang
during restore on ASIC that don't support it.
Fixes: 524a0ba6fab955 ("drm/amd/display: Fix DPIA outbox timeout after GPU reset")
Reviewed-by: Jude Shih <Jude.Shih@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
George Shen [Sat, 20 Nov 2021 23:35:15 +0000 (18:35 -0500)]
drm/amd/display: Add W/A for PHY tests with certain LTTPR
[Why]
Certain LTTPR require output VS/PE to be explicitly
set during PHY test automation.
[How]
Add vendor-specific sequence to set LTTPR
output VS/PE.
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: George Shen <George.Shen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
George Shen [Sat, 20 Nov 2021 01:03:20 +0000 (20:03 -0500)]
drm/amd/display: Apply LTTPR workarounds to non-transparent mode
[Why]
Some of the vendor-specific workarounds added for transparent mode
also need to be applied to non-transparent mode in order to succeed
link training consistently.
[How]
Remove transparent mode check for the required workarounds.
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: George Shen <George.Shen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stanley.Yang [Fri, 3 Dec 2021 05:08:41 +0000 (13:08 +0800)]
drm/amdgpu: only skip get ecc info for aldebaran
skip get ecc info for aldebarn through check ip version
do not affect other asic type
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
chen gong [Thu, 2 Dec 2021 08:47:54 +0000 (16:47 +0800)]
drm/amdkfd: Correct the value of the no_atomic_fw_version variable
145:
navi10 IP_VERSION(10, 1, 10)
navi12 IP_VERSION(10, 1, 2)
navi14 IP_VERSION(10, 1, 1)
92:
sienna_cichlid IP_VERSION(10, 3, 0)
navy_flounder IP_VERSION(10, 3, 2)
vangogh IP_VERSION(10, 3, 1)
dimgrey_cavefish IP_VERSION(10, 3, 4)
beige_goby IP_VERSION(10, 3, 5)
yellow_carp IP_VERSION(10, 3, 3)
Signed-off-by: chen gong <curry.gong@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Graham Sider <Graham.Sider@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhou Qingyang [Wed, 1 Dec 2021 15:13:10 +0000 (23:13 +0800)]
drm/radeon/radeon_kms: Fix a NULL pointer dereference in radeon_driver_open_kms()
In radeon_driver_open_kms(), radeon_vm_bo_add() is assigned to
vm->ib_bo_va and passes and used in radeon_vm_bo_set_addr(). In
radeon_vm_bo_set_addr(), there is a dereference of vm->ib_bo_va,
which could lead to a NULL pointer dereference on failure of
radeon_vm_bo_add().
Fix this bug by adding a check of vm->ib_bo_va.
This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.
Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.
Builds with CONFIG_DRM_RADEON=m show no new warnings,
and our static analyzer no longer warns about this code.
Fixes: cc9e67e3d700 ("drm/radeon: fix VM IB handling")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Zhou Qingyang <zhou1615@umn.edu>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Vlad Zahorodnii [Thu, 2 Dec 2021 12:52:15 +0000 (14:52 +0200)]
drm/amd/display: Use oriented source size when checking cursor scaling
dm_check_crtc_cursor() doesn't take into account plane transforms when
calculating plane scaling, this can result in false positives.
For example, if there's an output with resolution 3840x2160 and the
output is rotated 90 degrees, CRTC_W and CRTC_H will be 3840 and 2160,
respectively, but SRC_W and SRC_H will be 2160 and 3840, respectively.
Since the cursor plane usually has a square buffer attached to it, the
dm_check_crtc_cursor() will think that there's a scale factor mismatch
even though there isn't really.
This fixes an issue where kwin fails to use hardware plane transforms.
Changes since version 1:
- s/orientated/oriented/g
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Vlad Zahorodnii <vlad.zahorodnii@kde.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhou Qingyang [Thu, 2 Dec 2021 16:17:36 +0000 (00:17 +0800)]
drm/amdgpu: Fix a NULL pointer dereference in amdgpu_connector_lcd_native_mode()
In amdgpu_connector_lcd_native_mode(), the return value of
drm_mode_duplicate() is assigned to mode, and there is a dereference
of it in amdgpu_connector_lcd_native_mode(), which will lead to a NULL
pointer dereference on failure of drm_mode_duplicate().
Fix this bug add a check of mode.
This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.
Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.
Builds with CONFIG_DRM_AMDGPU=m show no new warnings, and
our static analyzer no longer warns about this code.
Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
Signed-off-by: Zhou Qingyang <zhou1615@umn.edu>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 30 Nov 2021 22:04:57 +0000 (17:04 -0500)]
drm/amdgpu: handle SRIOV VCN revision parsing
For SR-IOV, the IP discovery revision number encodes
additional information. Handle that case here.
v2: drop additional IP versions
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stanley.Yang [Thu, 2 Dec 2021 05:06:05 +0000 (13:06 +0800)]
drm/amdgpu: skip query ecc info in gpu recovery
this is a workaround due to get ecc info failed during gpu recovery
[ 700.236122] amdgpu 0000:09:00.0: amdgpu: Failed to export SMU ecc table!
[ 700.236128] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
[ 704.331171] amdgpu: qcm fence wait loop timeout expired
[ 704.331194] amdgpu: The cp might be in an unrecoverable state due to an unsuccessful queues preemption
[ 704.332445] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
[ 704.332448] amdgpu 0000:09:00.0: amdgpu: Bailing on TDR for s_job:
ffffffffffffffff, as another already in progress
[ 704.332456] amdgpu: Pasid 0x8000 destroy queue 0 failed, ret -62
[ 710.360924] amdgpu 0000:09:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000013 SMN_C2PMSG_82:0x00000007
[ 710.360964] amdgpu 0000:09:00.0: amdgpu: Failed to disable smu features.
[ 710.361002] amdgpu 0000:09:00.0: amdgpu: Fail to disable dpm features!
[ 710.361014] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yann Dirson [Mon, 29 Nov 2021 20:08:57 +0000 (21:08 +0100)]
drm/amdgpu: update fw_load_type module parameter doc to match code
amdgpu_ucode_get_load_type() does not interpret this parameter as
documented. It is ignored for many ASIC types (which presumably
only support one load_type), and when not ignored it is only used
to force direct loading instead of PSP loading. SMU loading is
only available for ASICs for which the parameter is ignored.
Signed-off-by: Yann Dirson <ydirson@free.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Mon, 29 Nov 2021 20:21:50 +0000 (15:21 -0500)]
drm/amdkfd: err_pin_bo path leaks kfd_bo_list
Refactor userptr and pin_bo path to make it less confusing, move
err_pin_bo label up to remove mem from process_info kfd_bo_list.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Mon, 29 Nov 2021 17:33:05 +0000 (12:33 -0500)]
drm/amdkfd: process_info lock not needed for svm
process_info->lock is used to protect kfd_bo_list, vm_list_head, n_vms
and userptr valid/inval list, svm_range_restore_work and
svm_range_set_attr don't access those, so do not need to take
process_info lock. This will avoid potential circular locking issue.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Graham Sider [Thu, 18 Nov 2021 21:09:58 +0000 (16:09 -0500)]
drm/amdkfd: remove hardcoded device_info structs
With device_info initialization being handled in kfd_device_info_init,
these structs may be removed. Also add comments to help matching IP
versions to asic names.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Graham Sider [Wed, 17 Nov 2021 22:32:37 +0000 (17:32 -0500)]
drm/amdkfd: add kfd_device_info_init function
Initializes kfd->device_info given either asic_type (enum) if GFX
version is less than GFX9, or GC IP version if greater. Also takes in vf
and the target compiler gfx version. Uses SDMA version to determine
num_sdma_queues_per_engine.
Convert device_info to a non-pointer member of kfd, change references
accordingly.
Change unsupported asic condition to only probe f2g, move device_info
initialization post-switch.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Graham Sider [Thu, 11 Nov 2021 16:28:45 +0000 (11:28 -0500)]
drm/amdkfd: replace asic_name with amdgpu_asic_name
device_info->asic_name and amdgpu_asic_name[adev->asic_type] both
provide asic name strings, with the only difference being casing.
Remove asic_name from device_info and replace sysfs entry with lowercase
amdgpu_asic_name[]. Ensures string is null-terminated so that this
doesn't break if dev->node_props.name ever gets set anywhere else.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
shaoyunl [Tue, 30 Nov 2021 02:29:05 +0000 (21:29 -0500)]
drm/amdgpu: adjust the kfd reset sequence in reset sriov function
This change revert previous commits:
9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov")
271fd38ce56d ("drm/amdgpu: move kfd post_reset out of reset_sriov function")
This change moves the amdgpu_amdkfd_pre_reset to an earlier place
in amdgpu_device_reset_sriov, presumably to address the sequence issue
that the first patch was originally meant to fix.
Some register access(GRBM_GFX_CNTL) only be allowed on full access
mode. Move kfd_pre_reset and kfd_post_reset back inside reset_sriov
function.
Fixes: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov")
Fixes: 271fd38ce56d ("drm/amdgpu: move kfd post_reset out of reset_sriov function")
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>