Moti Haimovski [Mon, 27 Jun 2022 14:21:39 +0000 (17:21 +0300)]
habanalabs: add gaudi2 MMU support
Gaudi2 has new MMU units. A PMMU for device->host accesses, and HMMU
for HBM accesses.
The page tables of both MMUs are located in the host's memory (referred
to in the code as host-resident pgt).
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Mon, 27 Jun 2022 12:05:28 +0000 (15:05 +0300)]
habanalabs: add gaudi2 wait-for-CS support
In Gaudi2 we moved to a different wait for command submission
completion model. Instead of receiving interrupt only on external
queues, we use the device's sync manager to notify us when the
entire command submission finishes.
This enables us to remove the categorization of queues to external
and internal, and treat each queue equally, without the need to parse
and patch any command buffer.
This change also requires refactoring to the IRQ handling of
CS completions.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Benjamin Dotan [Sun, 26 Jun 2022 18:35:07 +0000 (21:35 +0300)]
habanalabs/gaudi2: add gaudi2 profiler module
Add the Gaudi2 code to initialize the ASIC's profiler. The profile
receives its initialization values from the user, same as in Gaudi2,
but the code to initialize is in the driver because the configuration
space of the device is not directly exposed to the user.
Signed-off-by: Benjamin Dotan <bdotan@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Ofir Bitton [Sun, 26 Jun 2022 18:30:25 +0000 (21:30 +0300)]
habanalabs/gaudi2: add gaudi2 security module
Use the generic security module to block all registers in the ASIC and
then open only those that are needed to be accessed by the user.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Ofir Bitton [Sun, 26 Jun 2022 18:24:50 +0000 (21:24 +0300)]
habanalabs: add generic security module
As the ASICs become more complex and have many more registers, we need
a better way to configure the security properties.
As a reminder, we have two dedicated mechanisms for security:
Range Registers and Protection bits. Those mechanisms protect sensitive
memory and configuration areas inside the device.
The generic module handles the low-level part of the configuration,
because the configuration mechanism is identical in all ASICs. The
difference is the address ranges and register names.
Any ASIC that use this block should first block all the register
blocks in the ASIC. Then, it should open only the registers that
need to be accessed by the user (This is opposed to Goya and Gaudi,
where we blocked only what should not be accesses by the user).
The module contains several functions, to unblock single register,
multiple registers, entire blocks, ranges, ranges with mask.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Tue, 28 Jun 2022 07:53:17 +0000 (10:53 +0300)]
habanalabs: remove obsolete device variables used for testing
There are a couple of device variables that are used for testing
purposes and they are set to fixed values.
Remove the variables that are not relevant anymore and document the
remaining variables.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Fri, 24 Jun 2022 13:58:23 +0000 (16:58 +0300)]
habanalabs: initialize new asic properties
New asic properties were added for Gaudi2. We want to initialize
and use them, when relevant, also for Goya and Gaudi.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Fri, 24 Jun 2022 13:47:13 +0000 (16:47 +0300)]
habanalabs: add unsupported functions
There are a number of new ASIC-specific functions that were added
for Gaudi2. To make the common code work, we need to define empty
implementations of those functions for Goya and Gaudi.
Some functions will return error if called with Goya/Gaudi.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Sun, 26 Jun 2022 15:20:03 +0000 (18:20 +0300)]
habanalabs: add gaudi2 asic-specific code
Add the ASIC-specific code for Gaudi2. Supply (almost) all of the
function callbacks that the driver's common code need to initialize,
finalize and submit workloads to the Gaudi2 ASIC.
It also contains the code to initialize the F/W of the Gaudi2 ASIC
and to receive events from the F/W.
It contains new debugfs entry to dump razwi events. razwi is a case
where the device's engines create a transaction that reaches an
invalid destination.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Fri, 24 Jun 2022 10:38:57 +0000 (13:38 +0300)]
uapi: habanalabs: add gaudi2 defines
Add the new defines for GAUDI2 uapi interface.
It includes the following:
1. Enums of engines and PLLs.
2. New information in the info IOCTL that is retrieved by the driver.
3. Update comments regarding the CB/CS/wait for CS ioctls.
4. New fields in the debug IOCTL for configuring the profiler for
Gaudi2.
There is no new IOCTL.
Some of the changes are also relevant for Greco (which will be
upstreamed later this year). When ever it says "Greco and onwards",
it means it is also for Gaudi2.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Fri, 24 Jun 2022 15:56:42 +0000 (18:56 +0300)]
habanalabs/gaudi2: add asic registers header files
Add the relevant GAUDI2 ASIC registers header files. These files are
generated automatically from a tool maintained by the VLSI engineers.
There are more files which are not upstreamed because only very few
defines from those files are used in the driver. For those files, I
copied the relevant defines into gaudi2_regs.h and gaudi2_masks.h, to
reduce the size of this patch.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Ofir Bitton [Mon, 27 Jun 2022 13:59:02 +0000 (16:59 +0300)]
habanalabs: remove redundant argument in access_dev_mem APIs
Region structure is derived from region type, hence no need to pass
it as an argument.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Mon, 27 Jun 2022 08:30:43 +0000 (11:30 +0300)]
habanalabs: use %pa to print pci bar size
PCI bar size is resource_size_t so we should use %pa to make it work
correctly on all architectures.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Dafna Hirschfeld [Sun, 26 Jun 2022 13:18:40 +0000 (16:18 +0300)]
habanalabs/gaudi: replace hl_poll_timeout with while loop
in gaudi_scrub_device_mem, replace call to hl_poll_timeout
with a while loop to avoid using dummy variables.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Ohad Sharabi [Sat, 25 Jun 2022 20:36:13 +0000 (23:36 +0300)]
habanalabs: communicate supported page sizes to user
Because in future ASICs the driver will allow the user to set the
page size we need to make sure this data is propagated in all APIs.
In addition, since this is already an ASIC property we no longer need
ASIC function for it.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Tomer Tayar [Fri, 24 Jun 2022 10:05:23 +0000 (13:05 +0300)]
habanalabs: remove dead code from free_device_memory()
free_device_memory() ends with if and else, each has a return statement,
followed by another return statement that can never be reached.
Restructure the function and remove this dead code.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Fri, 24 Jun 2022 16:11:38 +0000 (19:11 +0300)]
habanalabs/gaudi: enable error interrupt on ARB WDT
We want to receive an error interrupt in case the watchdog timer
expires on arbitration event in the queues.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Ohad Sharabi [Wed, 22 Jun 2022 12:38:56 +0000 (15:38 +0300)]
habanalabs: page size can only be a power of 2
We dropped support for page sizes that are not power of 2.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Ohad Sharabi [Sun, 12 Jun 2022 12:00:29 +0000 (15:00 +0300)]
habanalabs: refactor dma asic-specific functions
This is a pre-requisite patch for adding tracepoints to the DMA memory
operations (allocation/free) in the driver.
The main purpose is to be able to cross data with the map operations and
determine whether memory violation occurred, for example free DMA
allocation before unmapping it from device memory.
To achieve this the DMA alloc/free code flows were refactored so that a
single DMA tracepoint will catch many flows.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Fri, 24 Jun 2022 13:49:26 +0000 (16:49 +0300)]
habanalabs/gaudi: remove unused enum
Also beautify code by preferring single line wherever possible.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Fri, 24 Jun 2022 13:45:02 +0000 (16:45 +0300)]
habanalabs/gaudi: mask constant value before cast
This fixes a sparse warning of
"cast truncates bits from constant value"
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Fri, 24 Jun 2022 13:05:59 +0000 (16:05 +0300)]
habanalabs/gaudi: use correct type in assignment
packets are defined as LE so we need to convert before assigning
values to them.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Fri, 24 Jun 2022 13:04:30 +0000 (16:04 +0300)]
habanalabs/gaudi: fix function name in comment
function name in comment didn't match actual function name.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Fri, 24 Jun 2022 10:36:10 +0000 (13:36 +0300)]
habanalabs/goya: move dma direction enum to uapi file
The values in this enum are not used by h/w but are a contract
between userspace and the kernel driver so they must be defined
in the uapi file.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Dafna Hirschfeld [Thu, 12 May 2022 13:16:25 +0000 (16:16 +0300)]
habanalabs: set default value for memory_scrub
Set a default value for memory scrubbing
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Dafna Hirschfeld [Thu, 12 May 2022 12:20:55 +0000 (15:20 +0300)]
habanalabs: move call to scrub_device_mem after ctx_fini
In future ASICs, it would be possible to have a non-idle
device when context is released. We thus need to postpone the
scrubbing. Postpone it to hpriv release if reset is not executed
or to device late init if reset is executed.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Dafna Hirschfeld [Thu, 23 Jun 2022 04:47:48 +0000 (07:47 +0300)]
habanalabs/gaudi: use memory_scrub_val from debugfs
In the callback scrub_device_mem, use 'memory_scrub_val'
from debugfs for the scrubbing value.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Dafna Hirschfeld [Thu, 23 Jun 2022 04:28:17 +0000 (07:28 +0300)]
habanalabs: don't send addr and size to scrub_device_mem cb
We use scrub_device_mem only to scrub the entire SRAM and entire
DRAM. Therefore there is no need to send addr and size
args to the callback.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Dafna Hirschfeld [Tue, 10 May 2022 13:36:02 +0000 (16:36 +0300)]
habanalabs: don't do memory scrubbing when unmapping
There is no need to do memory scrub when unmapping anymore as it is
an overhead as long as we have a single user at any given time.
Remove that code and change return value of free_phys_pg_pack to void
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Ofir Bitton [Wed, 22 Jun 2022 12:10:08 +0000 (15:10 +0300)]
habanalabs: print if firmware is secured during load
For easier debug, it is desirable to have a simple way
to know whether the device is secured or not, hence we dump this
indication during boot.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Yuri Nudelman [Wed, 22 Jun 2022 09:52:34 +0000 (12:52 +0300)]
habanalabs/gaudi: fix a race condition causing DMAR error
There is a rare race condition in CB completion mechanism, that can
occur under a very high pressure of command submissions.
The preconditions for this to happen are:
1. There should be enough command submissions for the pre-allocated
patched CB pool to run out of commands. At this stage we start
allocating new patched CBs as they arrive.
2. CB size has to be exactly (128*n + 104)B for some n, i.e. 24B below
a cache line end.
The flow:
1. Two command buffers being completed on different streams, at the
same time. Denote those CB1 and CB2.
2. Each command buffer is injected with two messages, 16B each - one
for a HBW update of the completion queue, another to raise
interrupt.
3. Assume CB1 updated the completion queue and raise the interrupt.
4. Assume CB2 updated the completion queue but did not raise the
interrupt yet.
5. The host receives the interrupt. It goes over the completion queue
and sees two completions - CB1 and CB2. Release them both.
6. CB2 performs the last command. The problem is that the last command
is split between 2 cache lines. So to read the last 8B of the last
command, it has to access the host again. Problem is - CB2 is
already released. This causes a DMAR error.
The solution to this problem is simply to make sure the last two
commands in the CB are always in the same cache line, using NOP padding.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Koby Elbaz [Wed, 22 Jun 2022 07:42:42 +0000 (10:42 +0300)]
habanalabs/gaudi: fix warning: var might be used uninitialized
kernel test robot:
"warning: variable 'index' is used uninitialized whenever 'if' condition
is false"
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Dafna Hirschfeld [Thu, 2 Jun 2022 08:55:00 +0000 (11:55 +0300)]
habanalabs: move memory_scrub_val to hdev struct
move the field memory_scrub_val from struct hl_dbg_device_entry
to struct hl_device. This is because we want to use
this field also if debugfs is off.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Sun, 19 Jun 2022 09:41:19 +0000 (12:41 +0300)]
habanalabs: fix comment style
function name should not be preceded with @
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Sun, 19 Jun 2022 09:40:19 +0000 (12:40 +0300)]
habanalabs: use kvcalloc when possible
kvcalloc is same as kvmalloc_array with GFP_ZERO.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Sun, 19 Jun 2022 09:37:01 +0000 (12:37 +0300)]
habanalabs: print pointer with correct modifier
Use %p instead of %llx for printing pointers.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Sun, 19 Jun 2022 09:35:06 +0000 (12:35 +0300)]
habanalabs: check fence pointer before use
fence pointer can be NULL in this path, as shown by an earlier check.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
ran shalit [Wed, 15 Jun 2022 18:24:38 +0000 (21:24 +0300)]
habanalabs: add critical indication in sram ecc
Multiple SRAM SERR events are treated as critical events,
and host should be notified about it. Thus, adding is_critical
indication as part of SRAM ECC failure packet.
Signed-off-by: ran shalit <rshalit@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Tal Cohen [Thu, 9 Jun 2022 15:08:31 +0000 (18:08 +0300)]
habanalabs/gaudi: notify user process on device unavailable
When a device error occurs, user process would like to get some
indication on the error by reading some device HW info. If the
device is unavailable, user process can't perform any HW device
reading.
Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Sat, 18 Jun 2022 18:27:07 +0000 (21:27 +0300)]
habanalabs: remove unused get_dma_desc_list_size
This asic callback function is not called anymore from the common code.
The asic-specific function itself is called but from within the
asic-specific code.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Yuri Nudelman [Tue, 14 Jun 2022 12:14:20 +0000 (15:14 +0300)]
habanalabs: fix NULL dereference on cs timeout
Device descriptor is accessed before an assignment
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Ofir Bitton [Wed, 15 Jun 2022 13:11:31 +0000 (16:11 +0300)]
habanalabs/gaudi: fix shift out of bounds
When validating NIC queues, queue offset calculation must be
performed only for NIC queues.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
farah kassabri [Mon, 13 Jun 2022 13:22:20 +0000 (16:22 +0300)]
habanalabs: add validity check for cq counter offset
Driver performs no validity check for the user cq counter offset
used in both wait_for_interrupt and register_for_timestamp APIs.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Koby Elbaz [Sun, 12 Jun 2022 19:16:25 +0000 (22:16 +0300)]
habanalabs/gaudi: fix incorrect MME offset calculation
Once FW raised an event following a MME2 QMAN error, the driver should
have gone to the corresponding status registers, trying to gather more
info on the error, yet it was accidentally accessing MME1 QMAN address
space.
Generally, we have x4 MMEs, while 0 & 2 are marked MASTER, and
1 & 3 are marked SLAVE. The former can be addressed, yet addressing
the latter is considered an access violation, and will result in a
hung system, which is what unintentionally happened above.
Note that this cannot happen in a secured system, since these registers
are protected with range registers.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Dani Liberman [Thu, 2 Jun 2022 13:15:03 +0000 (16:15 +0300)]
habanalabs: avoid unnecessary error print
When sending a packet to FW right after it made reset, we will get
packet timeout. Since it is expected behavior, we don't need to
print an error in such case.
Hence, when driver is in hard reset it will avoid from printing error
messages about packet timeout.
Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Tal Cohen [Thu, 19 May 2022 15:00:55 +0000 (18:00 +0300)]
habanalabs: send an event notification when CS timeout occurs
The Driver needs to inform the User process whenever one of its
CS is timed out. The Driver shall recognize the CS timeout and shall
send an eventfd notification, towards user space, whenever a timeout
is expired on a CS.
Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Tal Cohen [Wed, 8 Jun 2022 14:34:54 +0000 (17:34 +0300)]
habanalabs/gaudi: send device reset notification
Device reset event, indicates that the device shall be reset -
after a short delay. In such case, the driver sends a notification
towards the User process. This allows the User process
to be able to take several debug actions for system
diagnostic purposes.
Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Tal Cohen [Wed, 8 Jun 2022 13:02:09 +0000 (16:02 +0300)]
habanalabs/gaudi: invoke device reset from one code block
In order to prepare the driver code for device reset event
notification, change the event handler function flow to call
device reset from one code block.
In addition, the commit fixes an issue that reset was performed
w/o checking the 'hard_reset_on_fw_event' state and w/o setting
the HL_DRV_RESET_DELAY flag.
Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Tal Cohen [Thu, 12 May 2022 08:59:41 +0000 (11:59 +0300)]
habanalabs: expose undefined opcode status via info ioctl
The info ioctl retrieves information on the last undefined opcode
occurred.
Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Tal Cohen [Wed, 11 May 2022 15:02:39 +0000 (18:02 +0300)]
habanalabs/gaudi: collect undefined opcode error info
when an undefined opcode error occurres, the driver collects
the relevant information from the Qman and stores it inside
the hdev data structure. An event fd indication is sent towards the
user space.
Note: another commit shall be followed which will add support to
read the error info by an ioctl.
Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Tomer Tayar [Sun, 22 May 2022 06:43:54 +0000 (09:43 +0300)]
habanalabs: fix race between hl_get_compute_ctx() and hl_ctx_put()
hl_get_compute_ctx() is used to get the pointer to the compute context
from the hpriv object.
The function is called in code paths that are not necessarily initiated
by user, so it is possible that a context release process will happen in
parallel.
This can lead to a race condition in which hl_get_compute_ctx()
retrieves the context pointer, and just before it increments the context
refcount, the context object is released and a freed memory is accessed.
To avoid this race, add a mutex to protect the context pointer in hpriv.
With this lock, hl_get_compute_ctx() will be able to detect if the
context has been released or is about to be released.
struct hl_ctx_mgr has a mutex for contexts IDR with a similar "ctx_lock"
name, so rename it to just "lock" to avoid a confusion with the new
lock.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Yuri Nudelman [Tue, 24 May 2022 13:29:03 +0000 (16:29 +0300)]
habanalabs: keep a record of completed CS outcomes
Often, the user is not interested in the completion timestamp of all
command submissions.
A common situation is, for example, when the user submits a burst of,
possibly, several thousands of commands, then request the completion
timestamp of only couple of specific key commands from all the burst.
The problem is that currently, the outcome of the early commands may be
lost, due to a large amount of later commands, that the user does not
really care about.
This patch creates a separate store with the outcomes of commands the
user has mark explicitly as interested in. This store does not mix the
marked commands with the unmarked ones, hence the data there will
survive for much longer.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Oded Gabbay [Sun, 5 Jun 2022 09:56:36 +0000 (12:56 +0300)]
habanalabs/gaudi: fix comment to reflect current code
Due to code changes in the past few years, the original comment of
how parser->user_cb_size is checked was not correct anymore.
Fix it to reflect current code and add more explanation as the code
is more complex now.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Tal Cohen [Wed, 1 Jun 2022 08:32:35 +0000 (11:32 +0300)]
habanalabs: change the write flag name of error info structs
positive flags naming will make more clear code while adding
more 'error info' structures
Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Tal Cohen [Wed, 1 Jun 2022 20:14:16 +0000 (23:14 +0300)]
habanalabs/gaudi: move tpc assert raise into internal func
raising the tpc assert event in an internal function will make
the code cleaner as we are going to be adding more events
Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Dan Rapaport [Mon, 30 May 2022 11:11:45 +0000 (14:11 +0300)]
habanalabs: align ioctl uapi structures to 64-bit
The compiler is padding the members of the struct to be aligned to
64-bit. The content of the padded bytes is and not zeroed explicitly,
hence might copy undefined data. We add a padding member to the struct
to get a zeroed 64-bit align struct.
Signed-off-by: Dan Rapaport <drapaport@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Dafna Hirschfeld [Thu, 19 May 2022 07:54:30 +0000 (10:54 +0300)]
habanalabs: add terminating NULL to attrs arrays
Arrays of struct attribute are expected to be NULL terminated.
This is required by API methods such as device_add_groups.
This fixes a crash when loading the driver for Goya device.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Jiapeng Chong [Thu, 2 Jun 2022 03:58:27 +0000 (11:58 +0800)]
habanalabs: Fix kernel-doc
Fix the following W=1 kernel warnings:
drivers/misc/habanalabs/common/mmu/mmu_v1.c:425: warning: expecting
prototype for hl_mmu_fini(). Prototype was for hl_mmu_v1_fini() instead.
drivers/misc/habanalabs/common/mmu/mmu_v1.c:449: warning: expecting
prototype for hl_mmu_ctx_init(). Prototype was for hl_mmu_v1_ctx_init()
instead.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Jiapeng Chong [Thu, 2 Jun 2022 03:58:26 +0000 (11:58 +0800)]
habanalabs: Fix kernel-doc
Fix the following W=1 kernel warnings:
drivers/misc/habanalabs/common/pci/pci.c:454: warning: expecting
prototype for hl_fw_fini(). Prototype was for hl_pci_fini() instead.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Dan Carpenter [Wed, 25 May 2022 12:25:06 +0000 (15:25 +0300)]
habanalabs: fix double unlock on error in map_device_va()
If hl_mmu_prefetch_cache_range() fails then this code calls
mutex_unlock(&ctx->mmu_lock) when it's no longer holding the mutex.
Fixes:
9e495e24003e ("habanalabs: do MMU prefetch as deferred work")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Greg Kroah-Hartman [Mon, 11 Jul 2022 18:54:28 +0000 (20:54 +0200)]
Merge tag 'coresight-next-v5.20' of git://git./linux/kernel/git/coresight/linux into char-misc-next
Suzuki writes:
CoreSight self-hosted tracing changes for v5.20.
- Fixes LOCKDEP warnings on module unload with configfs
- Conversion of DT bindings to DT schema
- Branch broadcast support for perf cs_etm
- Etm4x driver fixes for build failures with Clang and unrolled loops
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
* tag 'coresight-next-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux:
coresight: etm4x: avoid build failure with unrolled loops
Documentation: coresight: Expand branch broadcast documentation
Documentation: coresight: Link config options to existing documentation
Documentation: coresight: Turn numbered subsections into real subsections
coresight: Add config flag to enable branch broadcast
Documentation: coresight: Escape coresight bindings file wildcard
dt-bindings: arm: Convert CoreSight CPU debug to DT schema
dt-bindings: arm: Convert CoreSight bindings to DT schema
dt-bindings: arm: Rename Coresight filenames to match compatible
coresight: syscfg: Update load and unload operations
coresight: configfs: Fix unload of configurations on module exit
coresight: Clear the connection field properly
Greg Kroah-Hartman [Mon, 11 Jul 2022 10:30:24 +0000 (12:30 +0200)]
Merge tag 'fpga-late-for-5.20-rc1' of ssh://gitolite./linux/kernel/git/fpga/linux-fpga into char-misc-next
Xu writes:
Here is the second set of FPGA changes for 5.20-rc1
FPGA Manager core:
- Ivan's change to support image offset and data size setting for
reprograming. A parse_header() callback is introduced for drivers to
specify these info.
- Colin's immediate spelling fix for Ivan's patch.
Microchip:
- Ivan's change to add Microchip MPF FPGA manager driver. And MAINTAINERS
entry added for the driver.
All patches have been reviewed on the mailing list, and have been in the
last linux-next releases (as part of our for-next branch).
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
* tag 'fpga-late-for-5.20-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/fpga/linux-fpga:
fpga: fpga-mgr: Fix spelling mistake "bitsream" -> "bitstream"
MAINTAINERS: add Microchip PolarFire FPGA drivers entry
dt-bindings: fpga: add binding doc for microchip-spi fpga mgr
fpga: microchip-spi: add Microchip MPF FPGA manager
docs: fpga: mgr: document parse_header() callback
fpga: fpga-mgr: support bitstream offset in image buffer
Greg Kroah-Hartman [Mon, 11 Jul 2022 10:28:46 +0000 (12:28 +0200)]
Merge tag 'mhi-for-v5.20' of git://git./linux/kernel/git/mani/mhi into char-misc-next
Manivannan writes:
MHI Host
--------
Support for new modems:
- Quectel EM120 FCCL based on SDX24. This product MHI configuration is same
as EM120R-GL modem.
- Foxconn Cinterion MV31-W. This product is same as the existing MV31-W
modem but sold as a separate product as it uses a different firmware
baseline.
- Foxconn T99W175 based on SDX55.
Core changes:
- Moved the IRQ allocation to MHI controller registration phase. Since the
MHI endpoint may be powered up/down several times during runtime, it
makes sense to move the IRQ allocation to registration phase and just
enable/disable IRQs during endpoint power up/down.
MHI endpoint
------------
Core changes:
- Added error check for dev_set_name()
* tag 'mhi-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi:
bus: mhi: ep: Check dev_set_name() return value
bus: mhi: host: pci_generic: Add another Foxconn T99W175
bus: mhi: host: Move IRQ allocation to controller registration phase
bus: mhi: host: pci_generic: Add Cinterion MV31-W with new baseline
bus: mhi: host: pci_generic: Add support for Quectel EM120 FCCL modem
Nick Desaulniers [Fri, 8 Jul 2022 23:15:20 +0000 (16:15 -0700)]
coresight: etm4x: avoid build failure with unrolled loops
When the following configs are enabled:
* CORESIGHT
* CORESIGHT_SOURCE_ETM4X
* UBSAN
* UBSAN_TRAP
Clang fails assemble the kernel with the error:
<instantiation>:1:7: error: expected constant expression in '.inst' directive
.inst (0xd5200000|((((2) << 19) | ((1) << 16) | (((((((((((0x160 + (i * 4))))) >> 2))) >> 7) & 0x7)) << 12) | ((((((((((0x160 + (i * 4))))) >> 2))) & 0xf)) << 8) | (((((((((((0x160 + (i * 4))))) >> 2))) >> 4) & 0x7)) << 5)))|(.L__reg_num_x8))
^
drivers/hwtracing/coresight/coresight-etm4x-core.c:702:4: note: while in
macro instantiation
etm4x_relaxed_read32(csa, TRCCNTVRn(i));
^
drivers/hwtracing/coresight/coresight-etm4x.h:403:4: note: expanded from
macro 'etm4x_relaxed_read32'
read_etm4x_sysreg_offset((offset), false)))
^
drivers/hwtracing/coresight/coresight-etm4x.h:383:12: note: expanded
from macro 'read_etm4x_sysreg_offset'
__val = read_etm4x_sysreg_const_offset((offset)); \
^
drivers/hwtracing/coresight/coresight-etm4x.h:149:2: note: expanded from
macro 'read_etm4x_sysreg_const_offset'
READ_ETM4x_REG(ETM4x_OFFSET_TO_REG(offset))
^
drivers/hwtracing/coresight/coresight-etm4x.h:144:2: note: expanded from
macro 'READ_ETM4x_REG'
read_sysreg_s(ETM4x_REG_NUM_TO_SYSREG((reg)))
^
arch/arm64/include/asm/sysreg.h:1108:15: note: expanded from macro
'read_sysreg_s'
asm volatile(__mrs_s("%0", r) : "=r" (__val)); \
^
arch/arm64/include/asm/sysreg.h:1074:2: note: expanded from macro '__mrs_s'
" mrs_s " v ", " __stringify(r) "\n" \
^
Consider the definitions of TRCSSCSRn and TRCCNTVRn:
drivers/hwtracing/coresight/coresight-etm4x.h:56
#define TRCCNTVRn(n) (0x160 + (n * 4))
drivers/hwtracing/coresight/coresight-etm4x.h:81
#define TRCSSCSRn(n) (0x2A0 + (n * 4))
Where the macro parameter is expanded to i; a loop induction variable
from etm4_disable_hw.
When any compiler can determine that loops may be unrolled, then the
__builtin_constant_p check in read_etm4x_sysreg_offset() defined in
drivers/hwtracing/coresight/coresight-etm4x.h may evaluate to true. This
can lead to the expression `(0x160 + (i * 4))` being passed to
read_etm4x_sysreg_const_offset. Via the trace above, this is passed
through READ_ETM4x_REG, read_sysreg_s, and finally to __mrs_s where it
is string-ified and used directly in inline asm.
Regardless of which compiler or compiler options determine whether a
loop can or can't be unrolled, which determines whether
__builtin_constant_p evaluates to true when passed an expression using a
loop induction variable, it is NEVER safe to allow the preprocessor to
construct inline asm like:
asm volatile (".inst (0x160 + (i * 4))" : "=r"(__val));
^ expected constant expression
Instead of read_etm4x_sysreg_offset() using __builtin_constant_p(), use
__is_constexpr from include/linux/const.h instead to ensure only
expressions that are valid integer constant expressions get passed
through to read_sysreg_s().
This is not a bug in clang; it's a potentially unsafe use of the macro
arguments in read_etm4x_sysreg_offset dependent on __builtin_constant_p.
Link: https://github.com/ClangBuiltLinux/linux/issues/1310
Reported-by: Arnd Bergmann <arnd@kernel.org>
Reported-by: Tao Zhang <quic_taozha@quicinc.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220708231520.3958391-1-ndesaulniers@google.com
Greg Kroah-Hartman [Mon, 11 Jul 2022 06:32:58 +0000 (08:32 +0200)]
Merge 5.19-rc6 into char-misc-next
We need the misc driver fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Sun, 10 Jul 2022 21:40:51 +0000 (14:40 -0700)]
Linux 5.19-rc6
Linus Torvalds [Sun, 10 Jul 2022 21:26:49 +0000 (14:26 -0700)]
Merge branch 'hot-fixes' (fixes for rc6)
This is a collection of three fixes for small annoyances.
Two of these are already pending in other trees, but I really don't want
to release another -rc with these issues pending, so I picked up the
patches for these things directly. We'll end up with duplicate commits
eventually, I prefer that over having these issues pending.
The third one is just me getting rid of another BUG_ON() just because it
was reported and I dislike those things so much.
* merge 'hot-fixes' branch:
ida: don't use BUG_ON() for debugging
drm/aperture: Run fbdev removal before internal helpers
ptrace: fix clearing of JOBCTL_TRACED in ptrace_unfreeze_traced()
Linus Torvalds [Sun, 10 Jul 2022 20:55:49 +0000 (13:55 -0700)]
ida: don't use BUG_ON() for debugging
This is another old BUG_ON() that just shouldn't exist (see also commit
a382f8fee42c: "signal handling: don't use BUG_ON() for debugging").
In fact, as Matthew Wilcox points out, this condition shouldn't really
even result in a warning, since a negative id allocation result is just
a normal allocation failure:
"I wonder if we should even warn here -- sure, the caller is trying to
free something that wasn't allocated, but we don't warn for
kfree(NULL)"
and goes on to point out how that current error check is only causing
people to unnecessarily do their own index range checking before freeing
it.
This was noted by Itay Iellin, because the bluetooth HCI socket cookie
code does *not* do that range checking, and ends up just freeing the
error case too, triggering the BUG_ON().
The HCI code requires CAP_NET_RAW, and seems to just result in an ugly
splat, but there really is no reason to BUG_ON() here, and we have
generally striven for allocation models where it's always ok to just do
free(alloc());
even if the allocation were to fail for some random reason (usually
obviously that "random" reason being some resource limit).
Fixes:
88eca0207cf1 ("ida: simplified functions for id allocation")
Reported-by: Itay Iellin <ieitayie@gmail.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 10 Jul 2022 18:23:01 +0000 (11:23 -0700)]
Merge tag 'dmaengine-fix-5.19' of git://git./linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
"One core fix for DMA_INTERRUPT and rest driver fixes.
Core:
- Revert verification of DMA_INTERRUPT capability as that was
incorrect
Bunch of driver fixes for:
- ti: refcount and put_device leak
- qcom_bam: runtime pm overflow
- idxd: force wq context cleanup and call idxd_enable_system_pasid()
on success
- dw-axi-dmac: RMW on channel suspend register
- imx-sdma: restart cyclic channel when enabled
- at_xdma: error handling for at_xdmac_alloc_desc
- pl330: lockdep warning
- lgm: error handling path in probe
- allwinner: Fix min/max typo in binding"
* tag 'dmaengine-fix-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dt-bindings: dma: allwinner,sun50i-a64-dma: Fix min/max typo
dmaengine: lgm: Fix an error handling path in intel_ldma_probe()
dmaengine: pl330: Fix lockdep warning about non-static key
dmaengine: idxd: Only call idxd_enable_system_pasid() if succeeded in enabling SVA feature
dmaengine: at_xdma: handle errors of at_xdmac_alloc_desc() correctly
dmaengine: imx-sdma: only restart cyclic channel when enabled
dmaengine: dw-axi-dmac: Fix RMW on channel suspend register
dmaengine: idxd: force wq context cleanup on device disable path
dmaengine: qcom: bam_dma: fix runtime PM underflow
dmaengine: imx-sdma: Allow imx8m for imx7 FW revs
dmaengine: Revert "dmaengine: add verification of DMA_INTERRUPT capability for dmatest"
dmaengine: ti: Add missing put_device in ti_dra7_xbar_route_allocate
dmaengine: ti: Fix refcount leak in ti_dra7_xbar_route_allocate
Linus Torvalds [Sun, 10 Jul 2022 16:51:56 +0000 (09:51 -0700)]
Merge tag 'staging-5.19-rc6' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fix from Greg KH:
"Here is a single staging driver fix for a reported problem that showed
up in 5.19-rc1 in the wlan-ng driver. It has been in linux-next for a
week with no reported problems"
* tag 'staging-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging/wlan-ng: get the correct struct hfa384x in work callback
Linus Torvalds [Sun, 10 Jul 2022 16:45:29 +0000 (09:45 -0700)]
Merge tag 'char-misc-5.19-rc6' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are four small char/misc driver fixes for 5.19-rc6 to resolve
some reported issues. They only affect two drivers:
- rtsx_usb: fix for of-reported DMA warning error, the driver was
handling memory buffers in odd ways, it has now been fixed up to be
much simpler and correct by Shuah.
- at25 eeprom driver bugfix for reported problem
All of these have been in linux-next for a week with no reported
problems"
* tag 'char-misc-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
misc: rtsx_usb: set return value in rsp_buf alloc err path
misc: rtsx_usb: use separate command and response buffers
misc: rtsx_usb: fix use of dma mapped buffer for usb bulk transfer
eeprom: at25: Rework buggy read splitting
Linus Torvalds [Sun, 10 Jul 2022 16:14:54 +0000 (09:14 -0700)]
Merge tag 'io_uring-5.19-2022-07-09' of git://git.kernel.dk/linux-block
Pull io_uring fix from Jens Axboe:
"A single fix for an issue that came up yesterday that we should plug
for -rc6.
This is a regression introduced in this cycle"
* tag 'io_uring-5.19-2022-07-09' of git://git.kernel.dk/linux-block:
io_uring: check that we have a file table when allocating update slots
Linus Torvalds [Sun, 10 Jul 2022 15:59:02 +0000 (08:59 -0700)]
Merge tag 'kbuild-fixes-v5.19-3' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Adjust gen_compile_commands.py to the format change of *.mod files
- Remove unused macro in scripts/Makefile.modinst
* tag 'kbuild-fixes-v5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: remove unused cmd_none in scripts/Makefile.modinst
gen_compile_commands: handle multiple lines per .mod file
Linus Torvalds [Sun, 10 Jul 2022 15:52:12 +0000 (08:52 -0700)]
Merge tag 'irq_urgent_for_v5.19_rc6' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Gracefully handle failure to request MMIO resources in the GICv3
driver
- Make a static key static in the Apple AIC driver
- Fix the Xilinx intc driver dependency on OF_ADDRESS
* tag 'irq_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/apple-aic: Make symbol 'use_fast_ipi' static
irqchip/xilinx: Add explicit dependency on OF_ADDRESS
irqchip/gicv3: Handle resource request failure consistently
Linus Torvalds [Sun, 10 Jul 2022 15:43:52 +0000 (08:43 -0700)]
Merge tag 'x86_urgent_for_v5.19_rc6' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Prepare for and clear .brk early in order to address XenPV guests
failures where the hypervisor verifies page tables and uninitialized
data in that range leads to bogus failures in those checks
- Add any potential setup_data entries supplied at boot to the identity
pagetable mappings to prevent kexec kernel boot failures. Usually,
this is not a problem for the normal kernel as those mappings are
part of the initially mapped 2M pages but if kexec gets to allocate
the second kernel somewhere else, those setup_data entries need to be
mapped there too.
- Fix objtool not to discard text references from the __tracepoints
section so that ENDBR validation still works
- Correct the setup_data types limit as it is user-visible, before 5.19
releases
* tag 'x86_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Fix the setup data types max limit
x86/ibt, objtool: Don't discard text references from tracepoint section
x86/compressed/64: Add identity mappings for setup_data entries
x86: Fix .brk attribute in linker script
x86: Clear .brk area at early boot
x86/xen: Use clear_bss() for Xen PV guests
Masahiro Yamada [Thu, 30 Jun 2022 08:09:35 +0000 (17:09 +0900)]
kbuild: remove unused cmd_none in scripts/Makefile.modinst
Commit
65ce9c38326e ("kbuild: move module strip/compression code into
scripts/Makefile.modinst") added this unused code.
Perhaps, I thought cmd_none was useful for CONFIG_MODULE_COMPRESS_NONE,
but I did not use it after all.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Borislav Petkov [Sun, 10 Jul 2022 09:15:47 +0000 (11:15 +0200)]
x86/boot: Fix the setup data types max limit
Commit in Fixes forgot to change the SETUP_TYPE_MAX definition which
contains the highest valid setup data type.
Correct that.
Fixes:
5ea98e01ab52 ("x86/boot: Add Confidential Computing type to setup_data")
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/ddba81dd-cc92-699c-5274-785396a17fb5@zytor.com
Linus Torvalds [Sat, 9 Jul 2022 18:20:15 +0000 (11:20 -0700)]
Merge tag 'i2c-for-5.19-rc6' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two I2C driver bugfixes preventing resource leaks"
* tag 'i2c-for-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: cadence: Unregister the clk notifier in error path
i2c: piix4: Fix a memory leak in the EFCH MMIO support
Thomas Zimmermann [Fri, 17 Jun 2022 12:10:27 +0000 (14:10 +0200)]
drm/aperture: Run fbdev removal before internal helpers
Always run fbdev removal first to remove simpledrm via sysfb_disable().
This clears the internal state.
The later call to drm_aperture_detach_drivers() then does nothing.
Otherwise, with drm_aperture_detach_drivers() running first, the call to
sysfb_disable() uses inconsistent state.
Example backtrace show below:
BUG: KASAN: use-after-free in device_del+0x79/0x5f0
Read of size 8 at addr
ffff888108185050 by task systemd-udevd/311
CPU: 0 PID: 311 Comm: systemd-udevd Tainted: G E 5.19.0-rc2-1-default+ #1689
Hardware name: HP ProLiant DL120 G7, BIOS J01 04/21/2011
Call Trace:
device_del+0x79/0x5f0
platform_device_del.part.0+0x19/0xe0
platform_device_unregister+0x1c/0x30
sysfb_disable+0x2d/0x70
remove_conflicting_framebuffers+0x1c/0xf0
remove_conflicting_pci_framebuffers+0x130/0x1a0
drm_aperture_remove_conflicting_pci_framebuffers+0x86/0xb0
mgag200_pci_probe+0x2d/0x140 [mgag200]
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes:
873eb3b11860 ("fbdev: Disable sysfb device registration when removing conflicting FBs")
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Changcheng Deng <deng.changcheng@zte.com.cn>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sven Schnelle [Wed, 6 Jul 2022 10:16:25 +0000 (12:16 +0200)]
ptrace: fix clearing of JOBCTL_TRACED in ptrace_unfreeze_traced()
CI reported the following splat while running the strace testsuite:
WARNING: CPU: 1 PID: 3570031 at kernel/ptrace.c:272 ptrace_check_attach+0x12e/0x178
CPU: 1 PID: 3570031 Comm: strace Tainted: G OE 5.19.0-
20220624.rc3.git0.
ee819a77d4e7.300.fc36.s390x #1
Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
Call Trace:
[<
00000000ab4b645a>] ptrace_check_attach+0x132/0x178
([<
00000000ab4b6450>] ptrace_check_attach+0x128/0x178)
[<
00000000ab4b6cde>] __s390x_sys_ptrace+0x86/0x160
[<
00000000ac03fcec>] __do_syscall+0x1d4/0x200
[<
00000000ac04e312>] system_call+0x82/0xb0
Last Breaking-Event-Address:
[<
00000000ab4ea3c8>] wait_task_inactive+0x98/0x190
This is because JOBCTL_TRACED is set, but the task is not in TASK_TRACED
state. Caused by ptrace_unfreeze_traced() which does:
task->jobctl &= ~TASK_TRACED
but it should be:
task->jobctl &= ~JOBCTL_TRACED
Fixes:
31cae1eaae4f ("sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 9 Jul 2022 17:34:08 +0000 (10:34 -0700)]
Merge tag 'powerpc-5.19-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
- On Power8 bare metal, fix creation of RNG platform devices, which are
needed for the /dev/hwrng driver to probe correctly.
Thanks to Jason A. Donenfeld, and Sachin Sant.
* tag 'powerpc-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv: delay rng platform device creation until later in boot
Jens Axboe [Sat, 9 Jul 2022 13:02:10 +0000 (07:02 -0600)]
io_uring: check that we have a file table when allocating update slots
If IORING_FILE_INDEX_ALLOC is set asking for an allocated slot, the
helper doesn't check if we actually have a file table or not. The non
alloc path does do that correctly, and returns -ENXIO if we haven't set
one up.
Do the same for the allocated path, avoiding a NULL pointer dereference
when trying to find a free bit.
Fixes:
a7c41b4687f5 ("io_uring: let IORING_OP_FILES_UPDATE support choosing fixed file slots")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bo Liu [Fri, 8 Jul 2022 01:59:48 +0000 (21:59 -0400)]
bus: mhi: ep: Check dev_set_name() return value
It's possible that dev_set_name() returns -ENOMEM, catch and handle this.
Signed-off-by: Bo Liu <liubo03@inspur.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/20220708015948.4091-1-liubo03@inspur.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Linus Torvalds [Fri, 8 Jul 2022 23:08:48 +0000 (16:08 -0700)]
Merge tag 'fscache-fixes-
20220708' of git://git./linux/kernel/git/dhowells/linux-fs
Pull fscache fixes from David Howells:
- Fix a check in fscache_wait_on_volume_collision() in which the
polarity is reversed. It should complain if a volume is still marked
acquisition-pending after 20s, but instead complains if the mark has
been cleared (ie. the condition has cleared).
Also switch an open-coded test of the ACQUIRE_PENDING volume flag to
use the helper function for consistency.
- Not a fix per se, but neaten the code by using a helper to check for
the DROPPED state.
- Fix cachefiles's support for erofs to only flush requests associated
with a released control file, not all requests.
- Fix a race between one process invalidating an object in the cache
and another process trying to look it up.
* tag 'fscache-fixes-
20220708' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
fscache: Fix invalidation/lookup race
cachefiles: narrow the scope of flushed requests when releasing fd
fscache: Introduce fscache_cookie_is_dropped()
fscache: Fix if condition in fscache_wait_on_volume_collision()
Linus Torvalds [Fri, 8 Jul 2022 20:05:56 +0000 (13:05 -0700)]
Merge tag 'acpi-5.19-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix two recent regressions related to CPPC support.
Specifics:
- Prevent _CPC from being used if the platform firmware does not
confirm CPPC v2 support via _OSC (Mario Limonciello)
- Allow systems with X86_FEATURE_CPPC set to use _CPC even if CPPC
support cannot be agreed on via _OSC (Mario Limonciello)"
* tag 'acpi-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported
ACPI: CPPC: Only probe for _CPC if CPPC v2 is acked
Linus Torvalds [Fri, 8 Jul 2022 20:01:04 +0000 (13:01 -0700)]
Merge tag 'pm-5.19-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a NULL pointer dereference in a devfreq driver and a runtime
PM framework issue that may cause a supplier device to be suspended
before its consumer.
Specifics:
- Fix NULL pointer dereference related to printing a diagnostic
message in the exynos-bus devfreq driver (Christian Marangi)
- Fix race condition in the runtime PM framework which in some cases
may cause a supplier device to be suspended when its consumer is
still active (Rafael Wysocki)"
* tag 'pm-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / devfreq: exynos-bus: Fix NULL pointer dereference
PM: runtime: Fix supplier device management during consumer probe
PM: runtime: Redefine pm_runtime_release_supplier()
Linus Torvalds [Fri, 8 Jul 2022 19:55:25 +0000 (12:55 -0700)]
Merge tag 'cxl-fixes-for-5.19-rc6' of git://git./linux/kernel/git/cxl/cxl
Pull cxl fixes from Vishal Verma:
- Update MAINTAINERS for Ben's email
- Fix cleanup of port devices on failure to probe driver
- Fix endianness in get/set LSA mailbox command structures
- Fix memregion_free() fallback definition
- Fix missing variable payload checks in CXL cmd size validation
* tag 'cxl-fixes-for-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/mbox: Fix missing variable payload checks in cmd size validation
memregion: Fix memregion_free() fallback definition
cxl/mbox: Use __le32 in get,set_lsa mailbox structures
cxl/core: Use is_endpoint_decoder
cxl: Fix cleanup of port devices on failure to probe driver.
MAINTAINERS: Update Ben's email address
Linus Torvalds [Fri, 8 Jul 2022 19:49:00 +0000 (12:49 -0700)]
Merge tag 'iommu-fixes-v5.19-rc5' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- fix device setup failures in the Intel VT-d driver when the PASID
table is shared
- fix Intel VT-d device hot-add failure due to wrong device notifier
order
- remove the old IOMMU mailing list from the MAINTAINERS file now that
it has been retired
* tag 'iommu-fixes-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
MAINTAINERS: Remove iommu@lists.linux-foundation.org
iommu/vt-d: Fix RID2PASID setup/teardown failure
iommu/vt-d: Fix PCI bus rescan device hot add
Linus Torvalds [Fri, 8 Jul 2022 19:39:52 +0000 (12:39 -0700)]
Merge tag 'gpio-fixes-for-v5.19-rc6' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix a build error in gpio-vf610
- fix a null-pointer dereference in the GPIO character device code
* tag 'gpio-fixes-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: cdev: fix null pointer dereference in linereq_free()
gpio: vf610: fix compilation error
Rafael J. Wysocki [Fri, 8 Jul 2022 18:38:51 +0000 (20:38 +0200)]
Merge branch 'pm-core'
Merge a runtime PM framework cleanup and fix related to device links.
* pm-core:
PM: runtime: Fix supplier device management during consumer probe
PM: runtime: Redefine pm_runtime_release_supplier()
Linus Torvalds [Fri, 8 Jul 2022 18:32:23 +0000 (11:32 -0700)]
Merge tag 'block-5.19-2022-07-08' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"NVMe pull request with another id quirk addition, and a tracing fix"
* tag 'block-5.19-2022-07-08' of git://git.kernel.dk/linux-block:
nvme: use struct group for generic command dwords
nvme-pci: phison e16 has bogus namespace ids
Linus Torvalds [Fri, 8 Jul 2022 18:25:01 +0000 (11:25 -0700)]
Merge tag 'io_uring-5.19-2022-07-08' of git://git.kernel.dk/linux-block
Pull io_uring tweak from Jens Axboe:
"Just a minor tweak to an addition made in this release cycle: padding
a 32-bit value that's in a 64-bit union to avoid any potential
funkiness from that"
* tag 'io_uring-5.19-2022-07-08' of git://git.kernel.dk/linux-block:
io_uring: explicit sqe padding for ioctl commands
Linus Torvalds [Fri, 8 Jul 2022 18:03:26 +0000 (11:03 -0700)]
Merge tag 'for-5.19/fbdev-3' of git://git./linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes from Helge Deller:
- fbcon now prevents switching to screen resolutions which are smaller
than the font size, and prevents enabling a font which is bigger than
the current screen resolution. This fixes vmalloc-out-of-bounds
accesses found by KASAN.
- Guiling Deng fixed a bug where the centered fbdev logo wasn't
displayed correctly if the screen size matched the logo size.
- Hsin-Yi Wang provided a patch to include errno.h to fix build when
CONFIG_OF isn't enabled.
* tag 'for-5.19/fbdev-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbcon: Use fbcon_info_from_console() in fbcon_modechange_possible()
fbmem: Check virtual screen sizes in fb_set_var()
fbcon: Prevent that screen size is smaller than font size
fbcon: Disallow setting font bigger than screen size
video: of_display_timing.h: include errno.h
fbdev: fbmem: Fix logo center image dx issue
AngeloGioacchino Del Regno [Wed, 6 Jul 2022 10:06:27 +0000 (11:06 +0100)]
nvmem: mtk-efuse: Simplify with devm_platform_get_and_ioremap_resource()
Convert platform_get_resource(), devm_ioremap_resource() to a single
call to devm_platform_get_and_ioremap_resource(), as this is exactly
what this function does.
No functional changes.
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220706100627.6534-8-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Allen-KH Cheng [Wed, 6 Jul 2022 10:06:26 +0000 (11:06 +0100)]
dt-bindings: nvmem: mediatek: efuse: add support for mt8186
Add compatible for mt8186 SoC.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Allen-KH Cheng <allen-kh.cheng@mediatek.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220706100627.6534-7-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chunfeng Yun [Wed, 6 Jul 2022 10:06:25 +0000 (11:06 +0100)]
dt-bindings: nvmem: mediatek: efuse: add support mt8183
Add "mediatek,mt8183-efuse" to fix dtbs check warning.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220706100627.6534-6-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chunfeng Yun [Wed, 6 Jul 2022 10:06:24 +0000 (11:06 +0100)]
dt-bindings: nvmem: convert mtk-efuse.txt to YAML schema
Convert mtk-efuse.txt to YAML schema mediatek,efuse.yaml
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220706100627.6534-5-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lukas Bulwahn [Wed, 6 Jul 2022 10:06:23 +0000 (11:06 +0100)]
MAINTAINERS: rectify file pattern in MICROCHIP OTPC DRIVER
Commit
6b291610dd57 ("nvmem: microchip-otpc: add support") adds the
Microchip otpc driver and a corresponding MAINTAINERS section, but slips
in a slightly wrong file pattern.
Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken reference.
Rectify this file pattern in MICROCHIP OTPC DRIVER.
Acked-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220706100627.6534-4-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Claudiu Beznea [Wed, 6 Jul 2022 10:06:22 +0000 (11:06 +0100)]
nvmem: microchip-otpc: add support
Add support for Microchip OTP controller available on SAMA7G5. The OTPC
controls the access to a non-volatile memory. The memory behind OTPC is
organized into packets, packets are composed by a fixed length header
(4 bytes long) and a variable length payload (payload length is available
in the header). When software request the data at an offset in memory
the OTPC will return (via header + data registers) the whole packet that
has a word at that offset. For the OTP memory layout like below:
offset OTP Memory layout
. .
. ... .
. .
0x0E +-----------+ <--- packet X
| header X |
0x12 +-----------+
| payload X |
0x16 | |
| |
0x1A | |
+-----------+
. .
. ... .
. .
if user requests data at address 0x16 the data started at 0x0E will be
returned by controller. User will be able to fetch the whole packet
starting at 0x0E (or parts of the packet) via proper registers. The same
packet will be returned if software request the data at offset 0x0E or
0x12 or 0x1A.
The OTP will be populated by Microchip with at least 2 packets first one
being boot configuration packet and the 2nd one being temperature
calibration packet. The packet order will be preserved b/w different chip
revisions but the packet sizes may change.
For the above reasons and to keep the same software able to work on all
chip variants the read function of the driver is working with a packet
id instead of an offset in OTP memory.
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220706100627.6534-3-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Claudiu Beznea [Wed, 6 Jul 2022 10:06:21 +0000 (11:06 +0100)]
dt-bindings: microchip-otpc: document Microchip OTPC
Document Microchip OTP controller.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220706100627.6534-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>