platform/kernel/linux-rpi.git
5 years agodrm/amdgpu: fix null pointer deref in firmware header printing
Xiaojie Yuan [Thu, 5 Sep 2019 08:50:22 +0000 (16:50 +0800)]
drm/amdgpu: fix null pointer deref in firmware header printing

v2: declare as (struct common_firmware_header *) type because
    struct xxx_firmware_header inherits from it

When CE's ucode_id(8) is used to get sdma_hdr, we will be accessing an
unallocated amdgpu_firmware_info instance.

This issue appears on rhel7.7 with gcc 4.8.5. Newer compilers might have
optimized out such 'defined but not referenced' variable.

[ 1120.798564] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a
[ 1120.806703] IP: [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu]
[ 1120.813693] PGD 80000002603ff067 PUD 271b8d067 PMD 0
[ 1120.818931] Oops: 0000 [#1] SMP
[ 1120.822245] Modules linked in: amdgpu(OE+) amdkcl(OE) amd_iommu_v2 amdttm(OE) amd_sched(OE) xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun bridge stp llc devlink ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dm_mirror dm_region_hash dm_log dm_mod intel_pmc_core intel_powerclamp coretemp intel_rapl joydev kvm_intel eeepc_wmi asus_wmi kvm sparse_keymap iTCO_wdt irqbypass rfkill crc32_pclmul snd_hda_codec_realtek mxm_wmi ghash_clmulni_intel intel_wmi_thunderbolt iTCO_vendor_support snd_hda_codec_generic snd_hda_codec_hdmi aesni_intel lrw gf128mul glue_helper ablk_helper sg cryptd pcspkr snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd pinctrl_sunrisepoint pinctrl_intel soundcore acpi_pad mei_me wmi mei i2c_i801 pcc_cpufreq ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic i915 i2c_algo_bit iosf_mbi drm_kms_helper e1000e syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm ptp libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw pps_core drm_panel_orientation_quirks video i2c_hid
[ 1120.954136] CPU: 4 PID: 2426 Comm: modprobe Tainted: G           OE  ------------   3.10.0-1062.el7.x86_64 #1
[ 1120.964390] Hardware name: System manufacturer System Product Name/Z170-A, BIOS 1302 11/09/2015
[ 1120.973321] task: ffff991ef1e3c1c0 ti: ffff991ee625c000 task.ti: ffff991ee625c000
[ 1120.981020] RIP: 0010:[<ffffffffc0e3c9b3>]  [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu]
[ 1120.990483] RSP: 0018:ffff991ee625f950  EFLAGS: 00010202
[ 1120.995935] RAX: 0000000000000002 RBX: ffff991edf6b2d38 RCX: ffff991edf6a0000
[ 1121.003391] RDX: 0000000000000000 RSI: ffff991f01d13898 RDI: ffffffffc110afb3
[ 1121.010706] RBP: ffff991ee625f9b0 R08: 0000000000000000 R09: 0000000000000000
[ 1121.018029] R10: 00000000000004c4 R11: ffff991ee625f64e R12: ffff991edf6b3220
[ 1121.025353] R13: ffff991edf6a0000 R14: 0000000000000008 R15: ffff991edf6b2d30
[ 1121.032666] FS:  00007f97b0c0b740(0000) GS:ffff991f01d00000(0000) knlGS:0000000000000000
[ 1121.041000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1121.046880] CR2: 000000000000000a CR3: 000000025e604000 CR4: 00000000003607e0
[ 1121.054239] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1121.061631] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1121.068938] Call Trace:
[ 1121.071494]  [<ffffffffc0e3dba8>] psp_hw_init+0x218/0x270 [amdgpu]
[ 1121.077886]  [<ffffffffc0da3188>] amdgpu_device_fw_loading+0xe8/0x160 [amdgpu]
[ 1121.085296]  [<ffffffffc0e3b34c>] ? vega10_ih_irq_init+0x4bc/0x730 [amdgpu]
[ 1121.092534]  [<ffffffffc0da5c75>] amdgpu_device_init+0x1495/0x1c90 [amdgpu]
[ 1121.099675]  [<ffffffffc0da9cab>] amdgpu_driver_load_kms+0x8b/0x2f0 [amdgpu]
[ 1121.106888]  [<ffffffffc01b25cf>] drm_dev_register+0x12f/0x1d0 [drm]
[ 1121.113419]  [<ffffffffa4dcdfd8>] ? pci_enable_device_flags+0xe8/0x140
[ 1121.120183]  [<ffffffffc0da260a>] amdgpu_pci_probe+0xca/0x170 [amdgpu]
[ 1121.126919]  [<ffffffffa4dcf97a>] local_pci_probe+0x4a/0xb0
[ 1121.132622]  [<ffffffffa4dd10c9>] pci_device_probe+0x109/0x160
[ 1121.138607]  [<ffffffffa4eb4205>] driver_probe_device+0xc5/0x3e0
[ 1121.144766]  [<ffffffffa4eb4603>] __driver_attach+0x93/0xa0
[ 1121.150507]  [<ffffffffa4eb4570>] ? __device_attach+0x50/0x50
[ 1121.156422]  [<ffffffffa4eb1da5>] bus_for_each_dev+0x75/0xc0
[ 1121.162213]  [<ffffffffa4eb3b7e>] driver_attach+0x1e/0x20
[ 1121.167771]  [<ffffffffa4eb3620>] bus_add_driver+0x200/0x2d0
[ 1121.173590]  [<ffffffffa4eb4c94>] driver_register+0x64/0xf0
[ 1121.179345]  [<ffffffffa4dd0905>] __pci_register_driver+0xa5/0xc0
[ 1121.185593]  [<ffffffffc099f000>] ? 0xffffffffc099efff
[ 1121.190914]  [<ffffffffc099f0a4>] amdgpu_init+0xa4/0xb0 [amdgpu]
[ 1121.197101]  [<ffffffffa4a0210a>] do_one_initcall+0xba/0x240
[ 1121.202901]  [<ffffffffa4b1c90a>] load_module+0x271a/0x2bb0
[ 1121.208598]  [<ffffffffa4dad740>] ? ddebug_proc_write+0x100/0x100
[ 1121.214894]  [<ffffffffa4b1ce8f>] SyS_init_module+0xef/0x140
[ 1121.220698]  [<ffffffffa518bede>] system_call_fastpath+0x25/0x2a
[ 1121.226870] Code: b4 01 60 a2 00 00 31 c0 e8 83 60 33 e4 41 8b 47 08 48 8b 4d d0 48 c7 c7 b3 af 10 c1 48 69 c0 68 07 00 00 48 8b 84 01 60 a2 00 00 <48> 8b 70 08 31 c0 48 89 75 c8 e8 56 60 33 e4 48 8b 4d d0 48 c7
[ 1121.247422] RIP  [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu]
[ 1121.254432]  RSP <ffff991ee625f950>
[ 1121.258017] CR2: 000000000000000a
[ 1121.261427] ---[ end trace e98b35387ede75bd ]---

Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Fixes: c5fb912653dae3f878 ("drm/amdgpu: add firmware header printing for psp fw loading (v2)")
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: enable renoir while device probes
Huang Rui [Mon, 2 Sep 2019 14:35:37 +0000 (22:35 +0800)]
drm/amdkfd: enable renoir while device probes

This patch is to add asic flag to enable device probe during kfd init.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: disable gfxoff while use no H/W scheduling policy
Huang Rui [Mon, 2 Sep 2019 15:34:32 +0000 (23:34 +0800)]
drm/amdgpu: disable gfxoff while use no H/W scheduling policy

While gfxoff is enabled, the mmVM_XXX registers will be 0xfffffff while the GFX
is in "off" state. KFD queue creattion doesn't use ring based method, so it will
trigger a VM fault.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: add renoir kfd topology
Huang Rui [Mon, 2 Sep 2019 15:25:29 +0000 (23:25 +0800)]
drm/amdkfd: add renoir kfd topology

This patch adds renoir kfd topology which is the same with Raven.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: add package manager for renoir
Huang Rui [Mon, 2 Sep 2019 15:24:29 +0000 (23:24 +0800)]
drm/amdkfd: add package manager for renoir

Renoir use GFX v9, so adds v9 package manager.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: init kernel queue for renoir
Huang Rui [Mon, 2 Sep 2019 15:19:38 +0000 (23:19 +0800)]
drm/amdkfd: init kernel queue for renoir

Renoir is GFX v9, so init v9 kernel queue.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: init kfd apertures v9 for renoir
Huang Rui [Mon, 2 Sep 2019 15:15:46 +0000 (23:15 +0800)]
drm/amdkfd: init kfd apertures v9 for renoir

Renoir is GMC v9, so init v9 kfd apertures.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: add renoir type for the workaround of iommu v2 (v2)
Huang Rui [Mon, 2 Sep 2019 15:13:26 +0000 (23:13 +0800)]
drm/amdkfd: add renoir type for the workaround of iommu v2 (v2)

Renoir is the same with Raven, will enable iommu event in future.

v2: fix the checking (Thong)

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: enable kfd device queue manager v9 for renoir
Huang Rui [Mon, 2 Sep 2019 15:10:52 +0000 (23:10 +0800)]
drm/amdkfd: enable kfd device queue manager v9 for renoir

Renoir is GFX9, so enable v9 devcie queue manager.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: add renoir kfd device info (v2)
Huang Rui [Mon, 2 Sep 2019 15:06:58 +0000 (23:06 +0800)]
drm/amdkfd: add renoir kfd device info (v2)

This patch inits renoir kfd device info, so we treat renoir as "dgpu"
(bypass iommu v2). Will enable needs_iommu_device till renoir iommu is ready.

v2: rebase and align the drm-next

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: add renoir cache info for CRAT (v2)
Huang Rui [Mon, 2 Sep 2019 14:59:01 +0000 (22:59 +0800)]
drm/amdkfd: add renoir cache info for CRAT (v2)

Renoir's cache info should be the same with raven and carrizo's.

v2: fix missed "break"

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: Support Navi14 in KFD
Yong Zhao [Tue, 13 Aug 2019 21:13:27 +0000 (17:13 -0400)]
drm/amdkfd: Support Navi14 in KFD

Initial support of Navi14 in KFD. The device IDs will be added later.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Disable retry faults in VMID0
Felix Kuehling [Wed, 4 Sep 2019 23:26:16 +0000 (19:26 -0400)]
drm/amdgpu: Disable retry faults in VMID0

There is no point retrying page faults in VMID0. Those faults are
always fatal.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-and-Tested-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Add a kernel parameter for specifying the asic type
Yong Zhao [Fri, 30 Aug 2019 22:09:10 +0000 (18:09 -0400)]
drm/amdgpu: Add a kernel parameter for specifying the asic type

As more and more new asics start to reuse the old device IDs before
launch, there is a need to quickly override the existing asic type
corresponding to the reused device ID through a kernel parameter. With
this, engineers no longer need to rely on local hack patches,
facilitating cooperation across teams.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/irq: check if nbio funcs exist
Alex Deucher [Sun, 1 Sep 2019 17:31:42 +0000 (12:31 -0500)]
drm/amdgpu/irq: check if nbio funcs exist

We need to check if the nbios funcs exist before
checking the individual pointers.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
5 years agodrm/amd/display: replace FIXME with TODO
Qingqing Zhuo [Thu, 22 Aug 2019 19:28:26 +0000 (15:28 -0400)]
drm/amd/display: replace FIXME with TODO

Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: verify stream link before link test
Jing Zhou [Thu, 22 Aug 2019 06:26:33 +0000 (14:26 +0800)]
drm/amd/display: verify stream link before link test

[Why]
DP1.2 LL CTS test failure.

[How]
The failure is caused by not verify stream link is equal
to link, only check stream and link is not null.

Signed-off-by: Jing Zhou <Jing.Zhou@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: dce11.x /dce12 update formula input
Charlene Liu [Wed, 21 Aug 2019 00:33:46 +0000 (20:33 -0400)]
drm/amd/display: dce11.x /dce12 update formula input

[Description]
1. OUTSTANDING_REQUEST_LIMIT update from 0xFF to 0x1F (HW doc update)
2. using memory type to convert UMC's MCLK to Yclk.

Signed-off-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Isolate DSC module from driver dependencies
Bayan Zabihiyan [Mon, 19 Aug 2019 19:18:43 +0000 (15:18 -0400)]
drm/amd/display: Isolate DSC module from driver dependencies

[Why]
Edid Utility wishes to include DSC module from driver instead
of doing it's own logic which will need to be updated every time
someone modifies the driver logic.

[How]
Modify some functions such that we dont need to pass the entire
DC structure as parameter.
-Remove DC inclusion from module.
-Filter out problematic types and inclusions

Signed-off-by: Bayan Zabihiyan <bayan.zabihiyan@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: OTC underflow fix
Jaehyun Chung [Mon, 19 Aug 2019 20:45:05 +0000 (16:45 -0400)]
drm/amd/display: OTC underflow fix

[Why] Underflow occurs on some display setups(repro'd on 3x4K HDR) on boot,
mode set, and hot-plugs with. Underflow occurs because mem clk
is not set high after disabling pstate switching. This behaviour occurs
because some calculations assumed displays were synchronized.

[How] Add a condition to check if timing sync is disabled so that
synchronized vblank can be set to false.

Signed-off-by: Jaehyun Chung <jaehyun.chung@amd.com>
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: remove hw access from dc_destroy
Jun Lei [Thu, 15 Aug 2019 19:22:34 +0000 (15:22 -0400)]
drm/amd/display: remove hw access from dc_destroy

[why]
dc_destroy should only clean up SW, this is because GPUs may be
removed before driver unload, leading to HW to be unavailable.

[how]
remove GPIO close as part of GPIO destroy, this is unnecessary because
GPIO is not shared, and GPIOs are generally closed after being opened

Add tracking to HW access during destructor to make future issues
easier to pinpoint, and block access to prevent hangs.

Signed-off-by: Jun Lei <Jun.Lei@amd.com>
Reviewed-by: Yongqiang Sun <yongqiang.sun@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Reuse dcn2 registers
Vitaly Prosyak [Fri, 9 Aug 2019 17:36:08 +0000 (12:36 -0500)]
drm/amd/display: Reuse dcn2 registers

[Why & How]
Use dcn2 blender, shaper, 3dlut registers

Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Acked-by: Vitaly Prosyak <Vitaly.Prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: remove temporary transition code
Dmytro Laktyushkin [Wed, 3 Jul 2019 20:37:54 +0000 (16:37 -0400)]
drm/amd/display: remove temporary transition code

Remove code used to allow compilation error free
interface change.

Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: update navi to use new surface programming behaviour
Dmytro Laktyushkin [Fri, 13 Sep 2019 23:00:28 +0000 (18:00 -0500)]
drm/amd/display: update navi to use new surface programming behaviour

New behaviour will track global updates and update any hw that isn't
related to current stream being updated.

This should fix any issues caused by pipe split pipes being taken
by other streams.

Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Add missing surface address registers
Ilya Bakoulin [Fri, 16 Aug 2019 20:33:28 +0000 (16:33 -0400)]
drm/amd/display: Add missing surface address registers

[Why]
- Need to add missing surface address register definitions.
- RGBE+A does not work in a stereo configuration because
  surface addresses are no programmed correctly.

[How]
Added the register definitions and surface address programming.

Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: 3.2.49
Anthony Koo [Mon, 19 Aug 2019 13:22:24 +0000 (09:22 -0400)]
drm/amd/display: 3.2.49

Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: config to override DSC start slice height
Nikola Cornij [Fri, 16 Aug 2019 18:26:56 +0000 (14:26 -0400)]
drm/amd/display: config to override DSC start slice height

[why]
It's sometimes useful to have this option when debugging

[how]
Add a config flag. If the flag is not set, use driver default policy.
If the flag is set, use the value from the flag as the starting DSC slice
height.

Signed-off-by: Nikola Cornij <nikola.cornij@amd.com>
Reviewed-by: Martin Leung <Martin.Leung@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Add back support for DSC 4:2:2 Simple
Nikola Cornij [Thu, 15 Aug 2019 15:13:54 +0000 (11:13 -0400)]
drm/amd/display: Add back support for DSC 4:2:2 Simple

[why]
The requirement has been clarified and only DSC 4:2:2 Native has to
be disabled.

Signed-off-by: Nikola Cornij <nikola.cornij@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Don't allocate payloads if link lost
Alvin Lee [Fri, 2 Aug 2019 17:42:49 +0000 (13:42 -0400)]
drm/amd/display: Don't allocate payloads if link lost

We should not allocate payloads if the link is lost until the link is retrained.
Some displays require this.

Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Subsample mode suboptimal for YCbCr4:2:2
Krunoslav Kovac [Fri, 9 Aug 2019 21:16:18 +0000 (17:16 -0400)]
drm/amd/display: Subsample mode suboptimal for YCbCr4:2:2

[Why&How]
Driver defaults to 1-tap subsample mode for 4:2:2.
DCE11.2 added 3-tap. The policy is:
DCE8-DCE11 - change to 2-tap, it's better than 1-tap.
DCE11.2+ - use 3-tap

Note that 4:2:0 was added in DCE11.2 and already uses 3-tap always.
Note 2 is that DCE not covered on Linux, only DCN+

Signed-off-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: refine i2c over aux
Lewis Huang [Wed, 7 Aug 2019 10:05:49 +0000 (18:05 +0800)]
drm/amd/display: refine i2c over aux

[Why]
When user mode use i2c over aux through ADL or DDI, the function
dal_ddc_service_query_ddc_data will be called. There are two issues.

1. When read/write length > 16byte, current always return false because
the DEFAULT_AUX_MAX_DATA_SIZE != length.
2. When usermode only need to read data through i2c, driver will write
mot = true at the same address and cause i2c sink confused. Therefore
the following read command will get garbage.

[How]
1. Add function dal_dcc_submit_aux_command to handle length > 16 byte.
2. Check read size and write size when query ddc data.

Signed-off-by: Lewis Huang <Lewis.Huang@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Fix DML tests
Ilya Bakoulin [Wed, 7 Aug 2019 17:02:44 +0000 (13:02 -0400)]
drm/amd/display: Fix DML tests

[Why]
DML diags tests are failing because the struct contents get
clobbered by a memcpy.

[How]
Remove the memcpy call.

Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: Fix a building error when KFD_SUPPORT_IOMMU_V2 is turned off
Yong Zhao [Thu, 5 Sep 2019 14:49:56 +0000 (10:49 -0400)]
drm/amdkfd: Fix a building error when KFD_SUPPORT_IOMMU_V2 is turned off

The issue was accidentally introduced recently.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: move the call of ras recovery_init and bad page reserve to proper place
Tao Zhou [Fri, 30 Aug 2019 11:50:39 +0000 (19:50 +0800)]
drm/amdgpu: move the call of ras recovery_init and bad page reserve to proper place

ras recovery_init should be called after ttm init,
bad page reserve should be put in front of gpu reset since i2c
may be unstable during gpu reset.
add cleanup for recovery_init and recovery_fini

v2: add more comment and print.
    remove cancel_work_sync in recovery_init.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: save umc error records
Tao Zhou [Thu, 15 Aug 2019 08:15:08 +0000 (16:15 +0800)]
drm/amdgpu: save umc error records

save umc error records to ras bad page array

v2: add bad pages before gpu reset
v3: add NULL check for adev->umc.funcs

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Hook EEPROM table to RAS
Tao Zhou [Thu, 15 Aug 2019 06:55:55 +0000 (14:55 +0800)]
drm/amdgpu: Hook EEPROM table to RAS

support eeprom records load and save for ras,
move EEPROM records storing to bad page reserving

v2: remove redundant check for con->eh_data

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: change ras bps type to eeprom table record structure
Tao Zhou [Tue, 13 Aug 2019 02:39:05 +0000 (10:39 +0800)]
drm/amdgpu: change ras bps type to eeprom table record structure

change bps type from retired page to eeprom table record, prepare for
saving umc error records to eeprom

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/madgpu: Fix EEPROM Checksum calculation.
Andrey Grodzovsky [Thu, 5 Sep 2019 02:45:20 +0000 (22:45 -0400)]
drm/madgpu: Fix EEPROM Checksum calculation.

Fix typo which messed up the calculation.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-and-tested-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Remove clock gating restore.
Andrey Grodzovsky [Wed, 4 Sep 2019 15:16:24 +0000 (11:16 -0400)]
drm/amdgpu: Remove clock gating restore.

Restoring clock gating break SMU opeartion afterwards, avoid
this until this further invistigated with SMU.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-and-tested-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Add smu lock around in pp_smu_i2c_bus_access
Andrey Grodzovsky [Tue, 3 Sep 2019 20:47:40 +0000 (16:47 -0400)]
drm/amdgpu: Add smu lock around in pp_smu_i2c_bus_access

Protect from concurrent SMU accesses.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-and-tested-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: Add stereo mux and dig programming calls for dcn21
Roman Li [Wed, 4 Sep 2019 21:23:11 +0000 (17:23 -0400)]
drm/amd/display: Add stereo mux and dig programming calls for dcn21

[Why]
The earlier patch "Hook up calls to do stereo mux and dig programming..."
doesn't include update for dcn21.

[How]
Align dcn21 gpio settings with updated stereo control interface.

Signed-off-by: Roman Li <Roman.Li@amd.com>
Acked-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdkfd: Query kfd device info by CHIP id instead of pci device id
Yong Zhao [Tue, 3 Sep 2019 21:55:30 +0000 (17:55 -0400)]
drm/amdkfd: Query kfd device info by CHIP id instead of pci device id

This optimizes out the pci device id usage in KFD and makes the code
more maintainable.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Disable page faults while reading user wptrs
Felix Kuehling [Fri, 30 Aug 2019 04:10:34 +0000 (00:10 -0400)]
drm/amdgpu: Disable page faults while reading user wptrs

These wptrs must be pinned and GPU accessible when this is called
from hqd_load functions. So they should never fault. This resolves
a circular lock dependency issue involving four locks including the
DQM lock and mmap_sem.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: disable stutter mode for renoir
Aaron Liu [Wed, 4 Sep 2019 03:47:48 +0000 (11:47 +0800)]
drm/amdgpu: disable stutter mode for renoir

With stutter mode enabled, NMI prints frequently.
Disable stutter for the moment because NMI warning storm, and will
enable it back till the issue is addressed

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/display: update renoir_ip_offset.h
Aaron Liu [Wed, 4 Sep 2019 05:21:42 +0000 (13:21 +0800)]
drm/amd/display: update renoir_ip_offset.h

This patch updates MP1_BASE in renoir_ip_offset.h

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: implement sysfs for getting dpm clock
Prike Liang [Wed, 4 Sep 2019 08:34:39 +0000 (16:34 +0800)]
drm/amd/powerplay: implement sysfs for getting dpm clock

With the common interface print_clk_levels can get the following dpm clock:

-pp_dpm_dcefclk
-pp_dpm_fclk
-pp_dpm_mclk
-pp_dpm_sclk
-pp_dpm_socclk

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: clean up load TMR sequence
John Clements [Wed, 4 Sep 2019 08:23:27 +0000 (16:23 +0800)]
drm/amdgpu: clean up load TMR sequence

Removed redundant goto statement

Signed-off-by: John Clements <john.clements@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: enable TA load support in Arcturus
John Clements [Wed, 4 Sep 2019 08:23:08 +0000 (16:23 +0800)]
drm/amdgpu: enable TA load support in Arcturus

Add support for loading XGMI/RAS FW

Signed-off-by: John Clements <john.clements@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: change r type to int in gmc_v9_0_late_init
Tao Zhou [Mon, 2 Sep 2019 11:27:23 +0000 (19:27 +0800)]
drm/amdgpu: change r type to int in gmc_v9_0_late_init

change r type from bool to int, suitable for both bool and int return
value

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: replace smu->table_count with SMU_TABLE_COUNT in smu (v2)
Kevin Wang [Tue, 3 Sep 2019 08:02:33 +0000 (16:02 +0800)]
drm/amd/powerplay: replace smu->table_count with SMU_TABLE_COUNT in smu (v2)

fix bellow patch issue:
drm/amd/powerplay: introduce smu table id type to handle the smu table
for each asic
----
"This patch introduces new smu table type, it's to handle the
 different smu table
 defines for each asic with the same smu ip."

before:
use smu->table_count to represent the actual table count in smc firmware
use actual table count to check smu function parameter about smu table
after:
use logic table count "SMU_TABLE_COUNT" to check function parameter
because table id already mapped in smu driver,
and smu function will use logic table id not actual table id to check func parameter.

v2: squash in warning fix

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/amdgpu: add sw_fini interface for df_funcs
Jack Zhang [Tue, 3 Sep 2019 02:15:23 +0000 (10:15 +0800)]
drm/amd/amdgpu: add sw_fini interface for df_funcs

add sw_fini interface of df_funcs.
This interface will remove sysfs file of df_cntr_avail
function.

The old behavior only create sysfs of df_cntr_avail
in sw_init, but never remove it for lack of sw_fini
interface. With this,driver will report create
sysfs fail when it's loaded for the second time.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: init UMC & RSMU register base address
Hawking Zhang [Mon, 2 Sep 2019 19:19:31 +0000 (03:19 +0800)]
drm/amdgpu: init UMC & RSMU register base address

UMC RAS feature requires access to UMC & RSMU registers

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/nbio: switch to amdgpu_nbio_ras_late_init helper function
Hawking Zhang [Mon, 2 Sep 2019 22:48:00 +0000 (06:48 +0800)]
drm/amdgpu/nbio: switch to amdgpu_nbio_ras_late_init helper function

amdgpu_nbio_ras_late_init is used to init nbio specfic
ras debugfs/sysfs node and nbio specific interrupt handler.
It can be shared among nbio generations

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/mmhub: switch to amdgpu_mmhub_ras_late_init helper function
Hawking Zhang [Mon, 2 Sep 2019 22:23:12 +0000 (06:23 +0800)]
drm/amdgpu/mmhub: switch to amdgpu_mmhub_ras_late_init helper function

amdgpu_mmhub_ras_late_init is used to init mmhub specfic
ras debugfs/sysfs node and mmhub specific interrupt handler.
It can be shared among mmhub generations

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/sdma: switch to amdgpu_sdma_ras_late_init helper function
Hawking Zhang [Mon, 2 Sep 2019 22:02:07 +0000 (06:02 +0800)]
drm/amdgpu/sdma: switch to amdgpu_sdma_ras_late_init helper function

amdgpu_sdma_ras_late_init is used to init sdma specfic
ras debugfs/sysfs node and sdma specific interrupt handler.
It can be shared among sdma generations

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/gfx: switch to amdgpu_gfx_ras_late_init helper function
Hawking Zhang [Mon, 2 Sep 2019 22:06:08 +0000 (06:06 +0800)]
drm/amdgpu/gfx: switch to amdgpu_gfx_ras_late_init helper function

amdgpu_gfx_ras_late_init is used to init gfx specfic
ras debugfs/sysfs node and gfx specific interrupt handler.
It can be shared among gfx generations

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/gmc: switch to amdgpu_gmc_ras_late_init helper function
Hawking Zhang [Mon, 2 Sep 2019 21:24:35 +0000 (05:24 +0800)]
drm/amdgpu/gmc: switch to amdgpu_gmc_ras_late_init helper function

amdgpu_gmc_ras_late_init is used to init gmc specfic
ras debugfs/sysfs node and gmc specific interrupt handler.
It can be shared among gmc generations.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: set ip specific ras interface pointer to NULL after free it
Hawking Zhang [Mon, 2 Sep 2019 19:16:47 +0000 (03:16 +0800)]
drm/amdgpu: set ip specific ras interface pointer to NULL after free it

to prevent access to dangling pointers

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodmr/amdgpu: Add system auto reboot to RAS.
Andrey Grodzovsky [Thu, 22 Aug 2019 19:01:37 +0000 (15:01 -0400)]
dmr/amdgpu: Add system auto reboot to RAS.

In case of RAS error allow user configure auto system
reboot through ras_ctrl.
This is also part of the temproray work around for the RAS
hang problem.

v4: Use latest kernel API for disk sync.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Avoid HW GPU reset for RAS.
Andrey Grodzovsky [Fri, 13 Sep 2019 22:40:32 +0000 (17:40 -0500)]
drm/amdgpu: Avoid HW GPU reset for RAS.

Problem:
Under certain conditions, when some IP bocks take a RAS error,
we can get into a situation where a GPU reset is not possible
due to issues in RAS in SMU/PSP.

Temporary fix until proper solution in PSP/SMU is ready:
When uncorrectable error happens the DF will unconditionally
broadcast error event packets to all its clients/slave upon
receiving fatal error event and freeze all its outbound queues,
err_event_athub interrupt  will be triggered.
In such case and we use this interrupt
to issue GPU reset. THe GPU reset code is modified for such case to avoid HW
reset, only stops schedulers, deatches all in progress and not yet scheduled
job's fences, set error code on them and signals.
Also reject any new incoming job submissions from user space.
All this is done to notify the applications of the problem.

v2:
Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev
Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c
Remove print param from amdgpu_ras_query_error_count

v3:
Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset
for other XGMI hive memebers.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Fix bugs in amdgpu_device_gpu_recover in XGMI case.
Andrey Grodzovsky [Fri, 30 Aug 2019 14:31:18 +0000 (10:31 -0400)]
drm/amdgpu: Fix bugs in amdgpu_device_gpu_recover in XGMI case.

Issue 1:
In  XGMI case amdgpu_device_lock_adev for other devices in hive
was called to late, after access to their repsective schedulers.
So relocate the lock to the begining of accessing the other devs.

Issue 2:
Using amdgpu_device_ip_need_full_reset to switch the device list from
all devices in hive to the single 'master' device who owns this reset
call is wrong because when stopping schedulers we iterate all the devices
in hive but when restarting we will only reactivate the 'master' device.
Also, in case amdgpu_device_pre_asic_reset conlcudes that full reset IS
needed we then have to stop schedulers for all devices in hive and not
only the 'master' but with amdgpu_device_ip_need_full_reset  we
already missed the opprotunity do to so. So just remove this logic and
always stop and start all schedulers for all devices in hive.

Also minor cleanup and print fix.

v4: Minor coding style fix.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: remove amdgpu_cs_try_evict
Christian König [Fri, 30 Aug 2019 12:42:10 +0000 (14:42 +0200)]
drm/amdgpu: remove amdgpu_cs_try_evict

Trying to evict things from the current working set doesn't work that
well anymore because of per VM BOs.

Rely on reserving VRAM for page tables to avoid contention.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: reserve at least 4MB of VRAM for page tables v2
Christian König [Fri, 30 Aug 2019 12:38:37 +0000 (14:38 +0200)]
drm/amdgpu: reserve at least 4MB of VRAM for page tables v2

This hopefully helps reduce the contention for page tables.

v2: adjust maximum reported VRAM size as well

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: use moving fence instead of exclusive for VM updates
Christian König [Fri, 13 Sep 2019 22:38:07 +0000 (17:38 -0500)]
drm/amdgpu: use moving fence instead of exclusive for VM updates

Make VM updates depend on the moving fence instead of the exclusive one.

Makes it less likely to actually have a dependency.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: do proper cleanups on hw_fini
Evan Quan [Mon, 2 Sep 2019 04:37:23 +0000 (12:37 +0800)]
drm/amd/powerplay: do proper cleanups on hw_fini

These are needed for smu_reset support.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: update cached feature enablement status V3
Evan Quan [Wed, 21 Aug 2019 09:19:52 +0000 (17:19 +0800)]
drm/amd/powerplay: update cached feature enablement status V3

Need to update in cache feature enablement status after pp_feature
settings. Another fix for the commit below:
drm/amd/powerplay: implment sysfs feature status function in smu

V2: update smu_feature_update_enable_state() and relates
V3: use bitmap_or and bitmap_andnot

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: guard manual mode prerequisite for clock level force
Evan Quan [Fri, 30 Aug 2019 09:30:46 +0000 (17:30 +0800)]
drm/amd/powerplay: guard manual mode prerequisite for clock level force

Force clock level is for dpm manual mode only.

Reported-by: Candice Li <candice.li@amd.com>
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: only apply gds clearing workaround when ras is supported
Hawking Zhang [Sat, 31 Aug 2019 06:27:13 +0000 (14:27 +0800)]
drm/amdgpu: only apply gds clearing workaround when ras is supported

gds clearing workaround should only be applied on asics that support gfx ras

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: fix memory leak when ras is not supported on specific ip block
Hawking Zhang [Sat, 31 Aug 2019 06:20:38 +0000 (14:20 +0800)]
drm/amdgpu: fix memory leak when ras is not supported on specific ip block

free ras_if if ras is not supported

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: check mmhub_funcs pointer before refering to it
Hawking Zhang [Sat, 31 Aug 2019 05:18:40 +0000 (13:18 +0800)]
drm/amdgpu: check mmhub_funcs pointer before refering to it

mmhub callback functions are not initialized for all the ASICs

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Remove unnecessary TLB workaround (v2)
Felix Kuehling [Fri, 30 Aug 2019 01:18:43 +0000 (21:18 -0400)]
drm/amdgpu: Remove unnecessary TLB workaround (v2)

This workaround is better handled in user mode in a way that doesn't
require allocating extra memory and breaking userptr BOs.

The TLB bug is a performance bug, not a functional or security bug.
Hence it is safe to remove this kernel part of the workaround to
allow a better workaround using only virtual address alignments in
user mode.

v2: Removed VI_BO_SIZE_ALIGN definition

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Use optimal mtypes and PTE bits for Arcturus
Felix Kuehling [Mon, 26 Aug 2019 22:46:28 +0000 (18:46 -0400)]
drm/amdgpu: Use optimal mtypes and PTE bits for Arcturus

For compute VRAM allocations on Arturus use the new RW mtype
for non-coherent local memory, CC mtype for coherent local
memory and PTE_SNOOPED bit for invalidating non-dirty cache
lines on remote XGMI mappings.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Determing PTE flags separately for each mapping (v3)
Felix Kuehling [Mon, 26 Aug 2019 22:39:11 +0000 (18:39 -0400)]
drm/amdgpu: Determing PTE flags separately for each mapping (v3)

The same BO can be mapped with different PTE flags by different GPUs.
Therefore determine the PTE flags separately for each mapping instead
of storing them in the KFD buffer object.

Add a helper function to determine the PTE flags to be extended with
ASIC and memory-type-specific logic in subsequent commits.

v2: Split Arcturus-specific MTYPE changes into separate commit
v3: Fix return type of get_pte_flags to uint64_t

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Support new arcturus mtype
Oak Zeng [Fri, 26 Jul 2019 21:03:11 +0000 (16:03 -0500)]
drm/amdgpu: Support new arcturus mtype

Arcturus repurposed mtype WC to RW. Modify gmc functions
to support the new mtype

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Extends amdgpu vm definitions (v2)
Oak Zeng [Fri, 26 Jul 2019 20:57:50 +0000 (15:57 -0500)]
drm/amdgpu: Extends amdgpu vm definitions (v2)

Add RW mtype introduced for arcturus.

v2:
* Don't add probe-invalidation bit from UAPI
* Don't add unused AMDGPU_MTYPE_ definitions

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: switch to amdgpu_ras_late_init for nbio v7_4 (v2)
Hawking Zhang [Thu, 29 Aug 2019 11:56:44 +0000 (19:56 +0800)]
drm/amdgpu: switch to amdgpu_ras_late_init for nbio v7_4 (v2)

call helper function in late init phase to handle ras init
for nbio ip block

v2: init local var r to 0 in case the function return failure
on asics that don't have ras_late_init implementation

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: add ras_late_init callback function for nbio v7_4 (v3)
Hawking Zhang [Thu, 29 Aug 2019 12:57:32 +0000 (20:57 +0800)]
drm/amdgpu: add ras_late_init callback function for nbio v7_4 (v3)

ras_late_init callback function will be used to do common ras
init in late init phase.

v2: call ras_late_fini to do cleanup when fails to enable interrupt

v3: rename sysfs/debugfs node name to pcie_bif_xxx

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: add mmhub ras_late_init callback function (v2)
Hawking Zhang [Fri, 30 Aug 2019 05:34:38 +0000 (13:34 +0800)]
drm/amdgpu: add mmhub ras_late_init callback function (v2)

The function will be called in late init phase to do mmhub
ras init

v2: check ras_late_init function pointer before invoking the
function

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: switch to amdgpu_ras_late_init for gmc v9 block (v2)
Hawking Zhang [Thu, 29 Aug 2019 11:35:50 +0000 (19:35 +0800)]
drm/amdgpu: switch to amdgpu_ras_late_init for gmc v9 block (v2)

call helper function in late init phase to handle ras init
for gmc ip block

v2: call ras_late_fini to do clean up when fail to enable interrupt

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: switch to amdgpu_ras_late_init for sdma v4 block (v2)
Hawking Zhang [Thu, 29 Aug 2019 11:30:02 +0000 (19:30 +0800)]
drm/amdgpu: switch to amdgpu_ras_late_init for sdma v4 block (v2)

call helper function in late init phase to handle ras init
for sdma ip block

v2: call ras_late_fini to do clean up when fail to enable interrupt

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: switch to amdgpu_ras_late_init for gfx v9 block (v2)
Hawking Zhang [Thu, 29 Aug 2019 11:15:16 +0000 (19:15 +0800)]
drm/amdgpu: switch to amdgpu_ras_late_init for gfx v9 block (v2)

call helper function in late init phase to handle ras init
for gfx ip block

v2: call ras_late_fini to do clean up when fail to enable interrupt

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: add helper function to do common ras_late_init/fini (v3)
Hawking Zhang [Fri, 30 Aug 2019 05:29:18 +0000 (13:29 +0800)]
drm/amdgpu: add helper function to do common ras_late_init/fini (v3)

In late_init for ras, the helper function will be used to
1). disable ras feature if the IP block is masked as disabled
2). send enable feature command if the ip block was masked as enabled
3). create debugfs/sysfs node per IP block
4). register interrupt handler

v2: check ih_info.cb to decide add interrupt handler or not

v3: add ras_late_fini for cleanup all the ras fs node and remove
interrupt handler

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: poll ras_controller_irq and err_event_athub_irq status
Hawking Zhang [Wed, 5 Jun 2019 06:40:57 +0000 (14:40 +0800)]
drm/amdgpu: poll ras_controller_irq and err_event_athub_irq status

For the hardware that can not enable BIF ring for IH cookies for both
ras_controller_irq and err_event_athub_irq, the driver has to poll the
status register in irq handling and ack the hardware properly when there
is interrupt triggered

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: add ras_controller and err_event_athub interrupt support
Hawking Zhang [Wed, 5 Jun 2019 06:57:00 +0000 (14:57 +0800)]
drm/amdgpu: add ras_controller and err_event_athub interrupt support

Ras controller interrupt and Ras err event athub interrupt are two dedicated
interrupts for RAS support.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: update nbio v7_4 ip header files
Hawking Zhang [Wed, 5 Jun 2019 06:20:38 +0000 (14:20 +0800)]
drm/amdgpu: update nbio v7_4 ip header files

Add mmBIF_INTR_CNTL and its shift mask.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: add nbif v7_4 irq source header for vega20
Hawking Zhang [Wed, 29 May 2019 06:00:19 +0000 (14:00 +0800)]
drm/amdgpu: add nbif v7_4 irq source header for vega20

nbif v7_4 interrupt source definition

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu/nbio: add functions to query ras specific interrupt status
Hawking Zhang [Thu, 30 May 2019 03:57:20 +0000 (11:57 +0800)]
drm/amdgpu/nbio: add functions to query ras specific interrupt status

ras_controller_interrupt and err_event_interrupt are ras specific interrupts.
add functions to check their status and ack them if they are generated. both
funcitons should only be invoked in ISR when BIF ring is disabled or even not
initialized.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: switch to new amdgpu_nbio structure
Hawking Zhang [Fri, 23 Aug 2019 11:39:18 +0000 (19:39 +0800)]
drm/amdgpu: switch to new amdgpu_nbio structure

no functional change, just switch to new structures

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: add new amdgpu nbio header file
Hawking Zhang [Fri, 23 Aug 2019 11:02:14 +0000 (19:02 +0800)]
drm/amdgpu: add new amdgpu nbio header file

More nbio funcitonalities will be added and nbio could
be treated as an ip block like gfx/sdma.etc

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agoMerge branch 'etnaviv/next' of https://git.pengutronix.de/git/lst/linux into drm...
Dave Airlie [Fri, 6 Sep 2019 06:57:34 +0000 (16:57 +1000)]
Merge branch 'etnaviv/next' of https://git.pengutronix.de/git/lst/linux into drm-next

single etnaviv fix for an error path.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas Stach <l.stach@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/4ae00cfb47c8e6fffca5dbb45ae9370cd4e5eaf4.camel@pengutronix.de
5 years agoMerge tag 'drm-next-5.4-2019-08-30' of git://people.freedesktop.org/~agd5f/linux...
Dave Airlie [Fri, 6 Sep 2019 06:40:28 +0000 (16:40 +1000)]
Merge tag 'drm-next-5.4-2019-08-30' of git://people.freedesktop.org/~agd5f/linux into drm-next

drm-next-5.4-2019-08-30:

amdgpu:
- Add DC support for Renoir
- Add some GPUVM hw bug workarounds
- add support for the smu11 i2c controller
- GPU reset vram lost bug fixes
- Navi1x powergating fixes
- Navi12 power fixes
- Renoir power fixes
- Misc bug fixes and cleanups

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190830212650.5055-1-alexander.deucher@amd.com
5 years agoMerge tag 'exynos-drm-next-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel...
Dave Airlie [Tue, 3 Sep 2019 06:05:45 +0000 (16:05 +1000)]
Merge tag 'exynos-drm-next-for-v5.4' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-next

- JUst one cleanup which drops the use of drmP.h header file.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Inki Dae <daeinki@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190901120619.3992-1-daeinki@gmail.com
5 years agodrm/etnaviv: fix missing unlock on error in etnaviv_iommuv1_context_alloc()
Wei Yongjun [Mon, 19 Aug 2019 06:17:33 +0000 (06:17 +0000)]
drm/etnaviv: fix missing unlock on error in etnaviv_iommuv1_context_alloc()

Add the missing unlock before return from function etnaviv_iommuv1_context_alloc()
in the error handling case.

Fixes: 27b67278e007 ("drm/etnaviv: rework MMU handling")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
5 years agodrm/exynos: drop use of drmP.h
Sam Ravnborg [Wed, 21 Aug 2019 11:28:43 +0000 (20:28 +0900)]
drm/exynos: drop use of drmP.h

There was a few uses of drmP that was missed in the last
patch removing this header from exynos.

Remove the final uses of this header.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Jingoo Han <jingoohan1@gmail.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
5 years agodrm/amdgpu: Move null pointer dereference check
Austin Kim [Fri, 30 Aug 2019 08:07:04 +0000 (17:07 +0900)]
drm/amdgpu: Move null pointer dereference check

Null pointer dereference check should have been checked,
ahead of below routine.
struct amdgpu_device *adev = hwmgr->adev;

With this commit, it could avoid potential NULL dereference.

Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Fix undefined dm_ip_block for navi12
Petr Cvek [Fri, 30 Aug 2019 14:31:58 +0000 (16:31 +0200)]
drm/amdgpu: Fix undefined dm_ip_block for navi12

There is missing "if defined" CONFIG_DRM_AMD_DC block for non DC
configurations. This will cause link error. The patch is fixing that.

Bug: https://bugs.freedesktop.org/show_bug.cgi?id=110979
Signed-off-by: Petr Cvek <petrcvekcz@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: fix no interrupt issue for renoir emu (v2)
Aaron Liu [Fri, 14 Dec 2018 03:21:41 +0000 (11:21 +0800)]
drm/amdgpu: fix no interrupt issue for renoir emu (v2)

In renoir's vega10_ih model, there's a security change in mmIH_CHICKEN
register, that limits IH to use physical address (FBPA, GPA) directly.
Those chicken bits need to be programmed first.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: update IH_CHICKEN in oss 4.0 IP header for VG/RV series
Aaron Liu [Fri, 14 Dec 2018 03:16:36 +0000 (11:16 +0800)]
drm/amdgpu: update IH_CHICKEN in oss 4.0 IP header for VG/RV series

In Renoir's emulator, those chicken bits need to be programmed.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amd/powerplay: SMU_MSG_OverridePcieParameters is unsupport for APU
Aaron Liu [Fri, 30 Aug 2019 01:53:27 +0000 (09:53 +0800)]
drm/amd/powerplay: SMU_MSG_OverridePcieParameters is unsupport for APU

For apu, SMU_MSG_OverridePcieParameters is unsupport.
So return directly in smu_override_pcie_parameters function.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: Handle job is NULL use case in amdgpu_device_gpu_recover
Andrey Grodzovsky [Tue, 27 Aug 2019 16:14:47 +0000 (12:14 -0400)]
drm/amdgpu: Handle job is NULL use case in amdgpu_device_gpu_recover

This should be checked at all places job is accessed.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>