platform/kernel/linux-starfive.git
2 years agoMerge branch 'sched/fast-headers' into sched/core
Ingo Molnar [Tue, 15 Mar 2022 08:02:10 +0000 (09:02 +0100)]
Merge branch 'sched/fast-headers' into sched/core

Merge the scheduler build speedup of the fast-headers tree.

Cumulative scheduler (kernel/sched/) build time speedup on a
Linux distribution's config, which enables all scheduler features,
compared to the vanilla kernel:

      _____________________________________________________________________________
     |
     |  Vanilla kernel (v5.13-rc7):
     |_____________________________________________________________________________
     |
     |  Performance counter stats for 'make -j96 kernel/sched/' (3 runs):
     |
     |   126,975,564,374      instructions              #    1.45  insn per cycle           ( +-  0.00% )
     |    87,637,847,671      cycles                    #    3.959 GHz                      ( +-  0.30% )
     |         22,136.96 msec cpu-clock                 #    7.499 CPUs utilized            ( +-  0.29% )
     |
     |            2.9520 +- 0.0169 seconds time elapsed  ( +-  0.57% )
     |_____________________________________________________________________________
     |
     |  Patched kernel:
     |_____________________________________________________________________________
     |
     | Performance counter stats for 'make -j96 kernel/sched/' (3 runs):
     |
     |    50,420,496,914      instructions              #    1.47  insn per cycle           ( +-  0.00% )
     |    34,234,322,038      cycles                    #    3.946 GHz                      ( +-  0.31% )
     |          8,675.81 msec cpu-clock                 #    3.053 CPUs utilized            ( +-  0.45% )
     |
     |            2.8420 +- 0.0181 seconds time elapsed  ( +-  0.64% )
     |_____________________________________________________________________________

    Summary:

      - CPU time used to build the scheduler dropped by -60.9%, a reduction
        from 22.1 clock-seconds to 8.7 clock-seconds.

      - Wall-clock time to build the scheduler dropped by -3.9%, a reduction
        from 2.95 seconds to 2.84 seconds.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2 years agocgroup: Fix suspicious rcu_dereference_check() usage warning
Chengming Zhou [Sat, 5 Mar 2022 03:41:03 +0000 (11:41 +0800)]
cgroup: Fix suspicious rcu_dereference_check() usage warning

task_css_set_check() will use rcu_dereference_check() to check for
rcu_read_lock_held() on the read-side, which is not true after commit
dc6e0818bc9a ("sched/cpuacct: Optimize away RCU read lock"). This
commit drop explicit rcu_read_lock(), change to RCU-sched read-side
critical section. So fix the RCU warning by adding check for
rcu_read_lock_sched_held().

Fixes: dc6e0818bc9a ("sched/cpuacct: Optimize away RCU read lock")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reported-by: syzbot+16e3f2c77e7c5a0113f9@syzkaller.appspotmail.com
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20220305034103.57123-1-zhouchengming@bytedance.com
2 years agosched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers
Frederic Weisbecker [Thu, 17 Feb 2022 11:12:40 +0000 (12:12 +0100)]
sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers

Displaying "PREEMPT" on kernel headers when CONFIG_PREEMPT_DYNAMIC=y
can be misleading for anybody involved in remote debugging because it
is then not guaranteed that there is an actual preemption behaviour. It
depends on default Kconfig or boot defined choices.

Therefore, tell about PREEMPT_DYNAMIC on static kernel headers and leave
the search for the actual preemption behaviour to browsing dmesg.

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220217111240.GA742892@lothringen
2 years agosched/topology: Remove redundant variable and fix incorrect type in build_sched_domains
K Prateek Nayak [Fri, 18 Feb 2022 16:27:43 +0000 (21:57 +0530)]
sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains

While investigating the sparse warning reported by the LKP bot [1],
observed that we have a redundant variable "top" in the function
build_sched_domains that was introduced in the recent commit
e496132ebedd ("sched/fair: Adjust the allowed NUMA imbalance when
SD_NUMA spans multiple LLCs")

The existing variable "sd" suffices which allows us to remove the
redundant variable "top" while annotating the other variable "top_p"
with the "__rcu" annotation to silence the sparse warning.

[1] https://lore.kernel.org/lkml/202202170853.9vofgC3O-lkp@intel.com/

Fixes: e496132ebedd ("sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lore.kernel.org/r/20220218162743.1134-1-kprateek.nayak@amd.com
2 years agosched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity()
Dietmar Eggemann [Wed, 2 Mar 2022 18:34:33 +0000 (19:34 +0100)]
sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity()

The `struct rq *rq` parameter isn't used. Remove it.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20220302183433.333029-7-dietmar.eggemann@arm.com
2 years agosched/deadline,rt: Remove unused functions for !CONFIG_SMP
Dietmar Eggemann [Wed, 2 Mar 2022 18:34:32 +0000 (19:34 +0100)]
sched/deadline,rt: Remove unused functions for !CONFIG_SMP

The need_pull_[rt|dl]_task() and pull_[rt|dl]_task() functions are not
used on a !CONFIG_SMP system. Remove them.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20220302183433.333029-6-dietmar.eggemann@arm.com
2 years agosched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently
Dietmar Eggemann [Wed, 2 Mar 2022 18:34:31 +0000 (19:34 +0100)]
sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently

Deploy __node_2_pdl(node), __node_2_dle(node) and rb_first_cached()
consistently throughout the sched class source file which makes the
code at least easier to read.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20220302183433.333029-5-dietmar.eggemann@arm.com
2 years agosched/deadline: Merge dl_task_can_attach() and dl_cpu_busy()
Dietmar Eggemann [Wed, 2 Mar 2022 18:34:30 +0000 (19:34 +0100)]
sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy()

Both functions are doing almost the same, that is checking if admission
control is still respected.

With exclusive cpusets, dl_task_can_attach() checks if the destination
cpuset (i.e. its root domain) has enough CPU capacity to accommodate the
task.
dl_cpu_busy() checks if there is enough CPU capacity in the cpuset in
case the CPU is hot-plugged out.

dl_task_can_attach() is used to check if a task can be admitted while
dl_cpu_busy() is used to check if a CPU can be hotplugged out.

Make dl_cpu_busy() able to deal with a task and use it instead of
dl_task_can_attach() in task_can_attach().

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20220302183433.333029-4-dietmar.eggemann@arm.com
2 years agosched/deadline: Move bandwidth mgmt and reclaim functions into sched class source...
Dietmar Eggemann [Wed, 2 Mar 2022 18:34:29 +0000 (19:34 +0100)]
sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file

Move the deadline bandwidth management (admission control) functions
__dl_add(), __dl_sub() and __dl_overflow() as well as the bandwidth
reclaim function __dl_update() from private task scheduler header file
to the deadline sched class source file.
The functions are only used internally so they don't have to be
exported.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20220302183433.333029-3-dietmar.eggemann@arm.com
2 years agosched/deadline: Remove unused def_dl_bandwidth
Dietmar Eggemann [Wed, 2 Mar 2022 18:34:28 +0000 (19:34 +0100)]
sched/deadline: Remove unused def_dl_bandwidth

Since commit 1724813d9f2c ("sched/deadline: Remove the sysctl_sched_dl
knobs") the default deadline bandwidth control structure has no purpose.
Remove it.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20220302183433.333029-2-dietmar.eggemann@arm.com
2 years agosched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
Valentin Schneider [Thu, 20 Jan 2022 16:25:20 +0000 (16:25 +0000)]
sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE

TASK_RTLOCK_WAIT currently isn't part of TASK_REPORT, thus a task blocking
on an rtlock will appear as having a task state == 0, IOW TASK_RUNNING.

The actual state is saved in p->saved_state, but reading it after reading
p->__state has a few issues:
o that could still be TASK_RUNNING in the case of e.g. rt_spin_lock
o ttwu_state_match() might have changed that to TASK_RUNNING

As pointed out by Eric, adding TASK_RTLOCK_WAIT to TASK_REPORT implies
exposing a new state to userspace tools which way not know what to do with
them. The only information that needs to be conveyed here is that a task is
waiting on an rt_mutex, which matches TASK_UNINTERRUPTIBLE - there's no
need for a new state.

Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20220120162520.570782-3-valentin.schneider@arm.com
2 years agosched/tracing: Don't re-read p->state when emitting sched_switch event
Valentin Schneider [Thu, 20 Jan 2022 16:25:19 +0000 (16:25 +0000)]
sched/tracing: Don't re-read p->state when emitting sched_switch event

As of commit

  c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")

the following sequence becomes possible:

      p->__state = TASK_INTERRUPTIBLE;
      __schedule()
deactivate_task(p);
  ttwu()
    READ !p->on_rq
    p->__state=TASK_WAKING
trace_sched_switch()
  __trace_sched_switch_state()
    task_state_index()
      return 0;

TASK_WAKING isn't in TASK_REPORT, so the task appears as TASK_RUNNING in
the trace event.

Prevent this by pushing the value read from __schedule() down the trace
event.

Reported-by: Abhijeet Dharmapurikar <adharmap@quicinc.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20220120162520.570782-2-valentin.schneider@arm.com
2 years agosched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
Valentin Schneider [Thu, 27 Jan 2022 15:40:59 +0000 (15:40 +0000)]
sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race

John reported that push_rt_task() can end up invoking
find_lowest_rq(rq->curr) when curr is not an RT task (in this case a CFS
one), which causes mayhem down convert_prio().

This can happen when current gets demoted to e.g. CFS when releasing an
rt_mutex, and the local CPU gets hit with an rto_push_work irqwork before
getting the chance to reschedule. Exactly who triggers this work isn't
entirely clear to me - switched_from_rt() only invokes rt_queue_pull_task()
if there are no RT tasks on the local RQ, which means the local CPU can't
be in the rto_mask.

My current suspected sequence is something along the lines of the below,
with the demoted task being current.

  mark_wakeup_next_waiter()
    rt_mutex_adjust_prio()
      rt_mutex_setprio() // deboost originally-CFS task
check_class_changed()
  switched_from_rt() // Only rt_queue_pull_task() if !rq->rt.rt_nr_running
  switched_to_fair() // Sets need_resched
      __balance_callbacks() // if pull_rt_task(), tell_cpu_to_push() can't select local CPU per the above
      raw_spin_rq_unlock(rq)

       // need_resched is set, so task_woken_rt() can't
       // invoke push_rt_tasks(). Best I can come up with is
       // local CPU has rt_nr_migratory >= 2 after the demotion, so stays
       // in the rto_mask, and then:

       <some other CPU running rto_push_irq_work_func() queues rto_push_work on this CPU>
 push_rt_task()
   // breakage follows here as rq->curr is CFS

Move an existing check to check rq->curr vs the next pushable task's
priority before getting anywhere near find_lowest_rq(). While at it, add an
explicit sched_class of rq->curr check prior to invoking
find_lowest_rq(rq->curr). Align the DL logic to also reschedule regardless
of next_task's migratability.

Fixes: a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
Reported-by: John Keeping <john@metanate.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: John Keeping <john@metanate.com>
Link: https://lore.kernel.org/r/20220127154059.974729-1-valentin.schneider@arm.com
2 years agosched/cpuacct: Remove redundant RCU read lock
Chengming Zhou [Sun, 20 Feb 2022 05:14:26 +0000 (13:14 +0800)]
sched/cpuacct: Remove redundant RCU read lock

The cpuacct_account_field() and it's cgroup v2 wrapper
cgroup_account_cputime_field() is only called from cputime
in task_group_account_field(), which is already in RCU read-side
critical section. So remove these redundant RCU read lock.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220220051426.5274-3-zhouchengming@bytedance.com
2 years agosched/cpuacct: Optimize away RCU read lock
Chengming Zhou [Sun, 20 Feb 2022 05:14:25 +0000 (13:14 +0800)]
sched/cpuacct: Optimize away RCU read lock

Since cpuacct_charge() is called from the scheduler update_curr(),
we must already have rq lock held, then the RCU read lock can
be optimized away.

And do the same thing in it's wrapper cgroup_account_cputime(),
but we can't use lockdep_assert_rq_held() there, which defined
in kernel/sched/sched.h.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220220051426.5274-2-zhouchengming@bytedance.com
2 years agosched/cpuacct: Fix charge percpu cpuusage
Chengming Zhou [Sun, 20 Feb 2022 05:14:24 +0000 (13:14 +0800)]
sched/cpuacct: Fix charge percpu cpuusage

The cpuacct_account_field() is always called by the current task
itself, so it's ok to use __this_cpu_add() to charge the tick time.

But cpuacct_charge() maybe called by update_curr() in load_balance()
on a random CPU, different from the CPU on which the task is running.
So __this_cpu_add() will charge that cputime to a random incorrect CPU.

Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code")
Reported-by: Minye Zhu <zhuminye@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220220051426.5274-1-zhouchengming@bytedance.com
2 years agosched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies
Ingo Molnar [Tue, 22 Feb 2022 13:51:58 +0000 (14:51 +0100)]
sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies

Remove all headers, except the ones required to make this header
build standalone.

Also include stats.h in sched.h explicitly - dependencies already
require this.

Summary of the build speedup gained through the last ~15 scheduler build &
header dependency patches:

Cumulative scheduler (kernel/sched/) build time speedup on a
Linux distribution's config, which enables all scheduler features,
compared to the vanilla kernel:

  _____________________________________________________________________________
 |
 |  Vanilla kernel (v5.13-rc7):
 |_____________________________________________________________________________
 |
 |  Performance counter stats for 'make -j96 kernel/sched/' (3 runs):
 |
 |   126,975,564,374      instructions              #    1.45  insn per cycle           ( +-  0.00% )
 |    87,637,847,671      cycles                    #    3.959 GHz                      ( +-  0.30% )
 |         22,136.96 msec cpu-clock                 #    7.499 CPUs utilized            ( +-  0.29% )
 |
 |            2.9520 +- 0.0169 seconds time elapsed  ( +-  0.57% )
 |_____________________________________________________________________________
 |
 |  Patched kernel:
 |_____________________________________________________________________________
 |
 | Performance counter stats for 'make -j96 kernel/sched/' (3 runs):
 |
 |    50,420,496,914      instructions              #    1.47  insn per cycle           ( +-  0.00% )
 |    34,234,322,038      cycles                    #    3.946 GHz                      ( +-  0.31% )
 |          8,675.81 msec cpu-clock                 #    3.053 CPUs utilized            ( +-  0.45% )
 |
 |            2.8420 +- 0.0181 seconds time elapsed  ( +-  0.64% )
 |_____________________________________________________________________________

Summary:

  - CPU time used to build the scheduler dropped by -60.9%, a reduction
    from 22.1 clock-seconds to 8.7 clock-seconds.

  - Wall-clock time to build the scheduler dropped by -3.9%, a reduction
    from 2.95 seconds to 2.84 seconds.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agosched/headers: Reorganize, clean up and optimize kernel/sched/build_utility.c depende...
Ingo Molnar [Mon, 19 Jul 2021 10:43:51 +0000 (12:43 +0200)]
sched/headers: Reorganize, clean up and optimize kernel/sched/build_utility.c dependencies

Use all generic headers from kernel/sched/sched.h that are required
for it to build.

Sort the sections alphabetically.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agosched/headers: Reorganize, clean up and optimize kernel/sched/build_policy.c dependencies
Ingo Molnar [Mon, 21 Jun 2021 08:33:56 +0000 (10:33 +0200)]
sched/headers: Reorganize, clean up and optimize kernel/sched/build_policy.c dependencies

Use all generic headers from kernel/sched/sched.h that are required
for it to build.

Sort the sections alphabetically.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agosched/headers: Reorganize, clean up and optimize kernel/sched/fair.c dependencies
Ingo Molnar [Tue, 22 Feb 2022 12:50:01 +0000 (13:50 +0100)]
sched/headers: Reorganize, clean up and optimize kernel/sched/fair.c dependencies

Use all generic headers from kernel/sched/sched.h that are required
for it to build.

Sort the sections alphabetically.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agosched/headers: Reorganize, clean up and optimize kernel/sched/core.c dependencies
Ingo Molnar [Wed, 23 Feb 2022 07:17:15 +0000 (08:17 +0100)]
sched/headers: Reorganize, clean up and optimize kernel/sched/core.c dependencies

Use all generic headers from kernel/sched/sched.h that are required
for it to build.

Sort the sections alphabetically.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agosched/headers: Standardize kernel/sched/sched.h header dependencies
Ingo Molnar [Sun, 13 Feb 2022 07:19:43 +0000 (08:19 +0100)]
sched/headers: Standardize kernel/sched/sched.h header dependencies

kernel/sched/sched.h is a weird mix of ad-hoc headers included
in the middle of the header.

Two of them rely on being included in the middle of kernel/sched/sched.h,
due to definitions they require:

 - "stat.h" needs the rq definitions.
 - "autogroup.h" needs the task_group definition.

Move the inclusion of these two files out of kernel/sched/sched.h, and
include them in all files that require them.

Move of the rest of the header dependencies to the top of the
kernel/sched/sched.h file.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agosched/headers: Introduce kernel/sched/build_policy.c and build multiple .c files...
Ingo Molnar [Tue, 22 Feb 2022 12:46:03 +0000 (13:46 +0100)]
sched/headers: Introduce kernel/sched/build_policy.c and build multiple .c files there

Similarly to kernel/sched/build_utility.c, collect all 'scheduling policy' related
source code files into kernel/sched/build_policy.c:

    kernel/sched/idle.c

    kernel/sched/rt.c

    kernel/sched/cpudeadline.c
    kernel/sched/pelt.c

    kernel/sched/cputime.c
    kernel/sched/deadline.c

With the exception of fair.c, which we continue to build as a separate file
for build efficiency and parallelism reasons.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agosched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files...
Ingo Molnar [Tue, 22 Feb 2022 12:23:24 +0000 (13:23 +0100)]
sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there

Collect all utility functionality source code files into a single kernel/sched/build_utility.c file,
via #include-ing the .c files:

    kernel/sched/clock.c
    kernel/sched/completion.c
    kernel/sched/loadavg.c
    kernel/sched/swait.c
    kernel/sched/wait_bit.c
    kernel/sched/wait.c

CONFIG_CPU_FREQ:
    kernel/sched/cpufreq.c

CONFIG_CPU_FREQ_GOV_SCHEDUTIL:
    kernel/sched/cpufreq_schedutil.c

CONFIG_CGROUP_CPUACCT:
    kernel/sched/cpuacct.c

CONFIG_SCHED_DEBUG:
    kernel/sched/debug.c

CONFIG_SCHEDSTATS:
    kernel/sched/stats.c

CONFIG_SMP:
   kernel/sched/cpupri.c
   kernel/sched/stop_task.c
   kernel/sched/topology.c

CONFIG_SCHED_CORE:
   kernel/sched/core_sched.c

CONFIG_PSI:
   kernel/sched/psi.c

CONFIG_MEMBARRIER:
   kernel/sched/membarrier.c

CONFIG_CPU_ISOLATION:
   kernel/sched/isolation.c

CONFIG_SCHED_AUTOGROUP:
   kernel/sched/autogroup.c

The goal is to amortize the 60+ KLOC header bloat from over a dozen build units into
a single build unit.

The build time of build_utility.c also roughly matches the build time of core.c and
fair.c - allowing better load-balancing of scheduler-only rebuilds.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agosched/headers: Make the <linux/sched/deadline.h> header build standalone
Ingo Molnar [Tue, 22 Feb 2022 12:28:20 +0000 (13:28 +0100)]
sched/headers: Make the <linux/sched/deadline.h> header build standalone

This header depends on various scheduler definitions.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agosched/headers: Fix comment typo in kernel/sched/cpudeadline.c
Ingo Molnar [Mon, 21 Jun 2021 06:50:48 +0000 (08:50 +0200)]
sched/headers: Fix comment typo in kernel/sched/cpudeadline.c

File name changed.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agosched/headers: Add initial new headers as identity mappings
Ingo Molnar [Tue, 22 Feb 2022 12:23:59 +0000 (13:23 +0100)]
sched/headers: Add initial new headers as identity mappings

This allows code sharing between fast-headers tree and the vanilla
scheduler tree.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agosched/headers: sched/clock: Mark all functions 'notrace', remove CC_FLAGS_FTRACE...
Ingo Molnar [Mon, 21 Jun 2021 06:41:43 +0000 (08:41 +0200)]
sched/headers: sched/clock: Mark all functions 'notrace', remove CC_FLAGS_FTRACE build asymmetry

Mark all non-init functions in kernel/sched.c as 'notrace', instead of
turning them all off via CC_FLAGS_FTRACE.

This is going to allow the treatment of this file as any other scheduler
file, and it can be #include-ed in compound compilation units as well.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agosched/headers: Add header guard to kernel/sched/stats.h and kernel/sched/autogroup.h
Ingo Molnar [Sat, 20 Nov 2021 09:39:20 +0000 (10:39 +0100)]
sched/headers: Add header guard to kernel/sched/stats.h and kernel/sched/autogroup.h

Protect against multiple inclusion.

Also include "sched.h" in "stat.h", as it relies on it.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agosched/headers: Add header guard to kernel/sched/sched.h
Ingo Molnar [Tue, 22 Feb 2022 13:50:43 +0000 (14:50 +0100)]
sched/headers: Add header guard to kernel/sched/sched.h

Use the canonical header guard naming of the full path to the header.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agosched/headers: Fix header to build standalone: <linux/sched_clock.h>
Ingo Molnar [Thu, 23 Sep 2021 18:42:44 +0000 (20:42 +0200)]
sched/headers: Fix header to build standalone: <linux/sched_clock.h>

Uses various kernel types that don't build standalone.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2 years agoMerge tag 'v5.17-rc5' into sched/core, to resolve conflicts
Ingo Molnar [Mon, 21 Feb 2022 10:53:51 +0000 (11:53 +0100)]
Merge tag 'v5.17-rc5' into sched/core, to resolve conflicts

New conflicts in sched/core due to the following upstream fixes:

  44585f7bc0cb ("psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n")
  a06247c6804f ("psi: Fix uaf issue when psi trigger is destroyed while being polled")

Conflicts:
include/linux/psi_types.h
kernel/sched/psi.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2 years agoLinux 5.17-rc5
Linus Torvalds [Sun, 20 Feb 2022 21:07:20 +0000 (13:07 -0800)]
Linux 5.17-rc5

2 years agoMerge tag 'locking_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 Feb 2022 20:50:50 +0000 (12:50 -0800)]
Merge tag 'locking_urgent_for_v5.17_rc5' of git://git./linux/kernel/git/tip/tip

Pull locking fix from Borislav Petkov:
 "Fix a NULL ptr dereference when dumping lockdep chains through
  /proc/lockdep_chains"

* tag 'locking_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep: Correct lock_classes index mapping

2 years agoMerge tag 'x86_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 Feb 2022 20:46:21 +0000 (12:46 -0800)]
Merge tag 'x86_urgent_for_v5.17_rc5' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Fix the ptrace regset xfpregs_set() callback to behave according to
   the ABI

 - Handle poisoned pages properly in the SGX reclaimer code

* tag 'x86_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ptrace: Fix xfpregs_set()'s incorrect xmm clearing
  x86/sgx: Fix missing poison handling in reclaimer

2 years agoMerge tag 'sched_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 Feb 2022 20:40:20 +0000 (12:40 -0800)]
Merge tag 'sched_urgent_for_v5.17_rc5' of git://git./linux/kernel/git/tip/tip

Pull scheduler fix from Borislav Petkov:
 "Fix task exposure order when forking tasks"

* tag 'sched_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix yet more sched_fork() races

2 years agoMerge tag 'edac_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 Feb 2022 20:04:14 +0000 (12:04 -0800)]
Merge tag 'edac_urgent_for_v5.17_rc5' of git://git./linux/kernel/git/ras/ras

Pull EDAC fix from Borislav Petkov:
 "Fix a long-standing struct alignment bug in the EDAC struct allocation
  code"

* tag 'edac_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC: Fix calculation of returned address and next offset in edac_align_ptr()

2 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sun, 20 Feb 2022 19:51:49 +0000 (11:51 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Three fixes, all in drivers.

  The ufs and qedi fixes are minor; the lpfc one is a bit bigger because
  it involves adding a heuristic to detect and deal with common but not
  standards compliant behaviour"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Fix divide by zero in ufshcd_map_queues()
  scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loop
  scsi: qedi: Fix ABBA deadlock in qedi_process_tmf_resp() and qedi_process_cmd_cleanup_resp()

2 years agoMerge tag 'dmaengine-fix-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul...
Linus Torvalds [Sun, 20 Feb 2022 19:30:18 +0000 (11:30 -0800)]
Merge tag 'dmaengine-fix-5.17' of git://git./linux/kernel/git/vkoul/dmaengine

Pull dmaengine fixes from Vinod Koul:
 "A bunch of driver fixes for:

   - ptdma error handling in init

   - lock fix in at_hdmac

   - error path and error num fix for sh dma

   - pm balance fix for stm32"

* tag 'dmaengine-fix-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: shdma: Fix runtime PM imbalance on error
  dmaengine: sh: rcar-dmac: Check for error num after dma_set_max_seg_size
  dmaengine: stm32-dmamux: Fix PM disable depth imbalance in stm32_dmamux_probe
  dmaengine: sh: rcar-dmac: Check for error num after setting mask
  dmaengine: at_xdmac: Fix missing unlock in at_xdmac_tasklet()
  dmaengine: ptdma: Fix the error handling path in pt_core_init()

2 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sun, 20 Feb 2022 19:23:48 +0000 (11:23 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Some driver updates, a MAINTAINERS fix, and additions to COMPILE_TEST
  (so we won't miss build problems again)"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: remove duplicate entry for i2c-qcom-geni
  i2c: brcmstb: fix support for DSL and CM variants
  i2c: qup: allow COMPILE_TEST
  i2c: imx: allow COMPILE_TEST
  i2c: cadence: allow COMPILE_TEST
  i2c: qcom-cci: don't put a device tree node before i2c_add_adapter()
  i2c: qcom-cci: don't delete an unregistered adapter
  i2c: bcm2835: Avoid clock stretching timeouts

2 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Sun, 20 Feb 2022 19:15:46 +0000 (11:15 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

 - a fix for Synaptics touchpads in RMI4 mode failing to suspend/resume
   properly because I2C client devices are now being suspended and
   resumed asynchronously which changed the ordering

 - a change to make sure we do not set right and middle buttons
   capabilities on touchpads that are "buttonpads" (i.e. do not have
   separate physical buttons)

 - a change to zinitix touchscreen driver adding more compatible
   strings/IDs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: psmouse - set up dependency between PS/2 and SMBus companions
  Input: zinitix - add new compatible strings
  Input: clear BTN_RIGHT/MIDDLE on buttonpads

2 years agoMerge tag 'for-v5.17-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux...
Linus Torvalds [Sun, 20 Feb 2022 19:07:46 +0000 (11:07 -0800)]
Merge tag 'for-v5.17-rc' of git://git./linux/kernel/git/sre/linux-power-supply

Pull power supply fixes from Sebastian Reichel:
 "Three regression fixes for the 5.17 cycle:

   - build warning fix for power-supply documentation

   - pointer size fix in cw2015 battery driver

   - OOM handling in bq256xx charger driver"

* tag 'for-v5.17-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
  power: supply: bq256xx: Handle OOM correctly
  power: supply: core: fix application of sizeof to pointer
  power: supply: fix table problem in sysfs-class-power

2 years agoMerge tag 'fs.mount_setattr.v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 Feb 2022 19:01:47 +0000 (11:01 -0800)]
Merge tag 'fs.mount_setattr.v5.17-rc4' of git://git./linux/kernel/git/brauner/linux

Pull mount_setattr test/doc fixes from Christian Brauner:
 "This contains a fix for one of the selftests for the mount_setattr
  syscall to create idmapped mounts, an entry for idmapped mounts for
  maintainers, and missing kernel documentation for the helper we split
  out some time ago to get and yield write access to a mount when
  changing mount properties"

* tag 'fs.mount_setattr.v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  fs: add kernel doc for mnt_{hold,unhold}_writers()
  MAINTAINERS: add entry for idmapped mounts
  tests: fix idmapped mount_setattr test

2 years agoMerge tag 'pidfd.v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner...
Linus Torvalds [Sun, 20 Feb 2022 18:55:05 +0000 (10:55 -0800)]
Merge tag 'pidfd.v5.17-rc4' of git://git./linux/kernel/git/brauner/linux

Pull pidfd fix from Christian Brauner:
 "This fixes a problem reported by lockdep when installing a pidfd via
  fd_install() with siglock and the tasklisk write lock held in
  copy_process() when calling clone()/clone3() with CLONE_PIDFD.

  Originally a pidfd was created prior to holding any of these locks but
  this required a call to ksys_close(). So quite some time ago in
  6fd2fe494b17 ("copy_process(): don't use ksys_close() on cleanups") we
  switched to a get_unused_fd_flags() + fd_install() model.

  As part of that we moved fd_install() as late as possible. This was
  done for two main reasons. First, because we needed to ensure that we
  call fd_install() past the point of no return as once that's called
  the fd is live in the task's file table. Second, because we tried to
  ensure that the fd is visible in /proc/<pid>/fd/<pidfd> right when the
  task is visible.

  This fix moves the fd_install() to an even later point which means
  that a task will be visible in proc while the pidfd isn't yet under
  /proc/<pid>/fd/<pidfd>.

  While this is a user visible change it's very unlikely that this will
  have any impact. Nobody should be relying on that and if they do we
  need to come up with something better but again, it's doubtful this is
  relevant"

* tag 'pidfd.v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  copy_process(): Move fd_install() out of sighand->siglock critical section

2 years agoMerge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 20 Feb 2022 18:44:11 +0000 (10:44 -0800)]
Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git./linux/kernel/git/ebiederm/user-namespace

Pull ucounts fixes from Eric Biederman:
 "Michal Koutný recently found some bugs in the enforcement of
  RLIMIT_NPROC in the recent ucount rlimit implementation.

  In this set of patches I have developed a very conservative approach
  changing only what is necessary to fix the bugs that I can see
  clearly. Cleanups and anything that is making the code more consistent
  can follow after we have the code working as it has historically.

  The problem is not so much inconsistencies (although those exist) but
  that it is very difficult to figure out what the code should be doing
  in the case of RLIMIT_NPROC.

  All other rlimits are only enforced where the resource is acquired
  (allocated). RLIMIT_NPROC by necessity needs to be enforced in an
  additional location, and our current implementation stumbled it's way
  into that implementation"

* 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ucounts: Handle wrapping in is_ucounts_overlimit
  ucounts: Move RLIMIT_NPROC handling after set_user
  ucounts: Base set_cred_ucounts changes on the real user
  ucounts: Enforce RLIMIT_NPROC not RLIMIT_NPROC+1
  rlimit: Fix RLIMIT_NPROC enforcement failure caused by capability calls in set_user

2 years agoMAINTAINERS: remove duplicate entry for i2c-qcom-geni
Wolfram Sang [Fri, 18 Feb 2022 10:49:04 +0000 (11:49 +0100)]
MAINTAINERS: remove duplicate entry for i2c-qcom-geni

The driver is already covered in the ARM/QUALCOMM section. Also, Akash
Asthana's email bounces meanwhile and Mukesh Savaliya has never
responded to mails regarding this driver.

Signed-off-by: Wolfram Sang <wsa@kernel.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2 years agoarm64: Support PREEMPT_DYNAMIC
Mark Rutland [Mon, 14 Feb 2022 16:52:16 +0000 (16:52 +0000)]
arm64: Support PREEMPT_DYNAMIC

This patch enables support for PREEMPT_DYNAMIC on arm64, allowing the
preemption model to be chosen at boot time.

Specifically, this patch selects HAVE_PREEMPT_DYNAMIC_KEY, so that each
preemption function is an out-of-line call with an early return
depending upon a static key. This leaves almost all the codegen up to
the compiler, and side-steps a number of pain points with static calls
(e.g. interaction with CFI schemes). This should have no worse overhead
than using non-inline static calls, as those use out-of-line trampolines
with early returns.

For example, the dynamic_cond_resched() wrapper looks as follows when
enabled. When disabled, the first `B` is replaced with a `NOP`,
resulting in an early return.

| <dynamic_cond_resched>:
|        bti     c
|        b       <dynamic_cond_resched+0x10>     // or `nop`
|        mov     w0, #0x0
|        ret
|        mrs     x0, sp_el0
|        ldr     x0, [x0, #8]
|        cbnz    x0, <dynamic_cond_resched+0x8>
|        paciasp
|        stp     x29, x30, [sp, #-16]!
|        mov     x29, sp
|        bl      <preempt_schedule_common>
|        mov     w0, #0x1
|        ldp     x29, x30, [sp], #16
|        autiasp
|        ret

... compared to the regular form of the function:

| <__cond_resched>:
|        bti     c
|        mrs     x0, sp_el0
|        ldr     x1, [x0, #8]
|        cbz     x1, <__cond_resched+0x18>
|        mov     w0, #0x0
|        ret
|        paciasp
|        stp     x29, x30, [sp, #-16]!
|        mov     x29, sp
|        bl      <preempt_schedule_common>
|        mov     w0, #0x1
|        ldp     x29, x30, [sp], #16
|        autiasp
|        ret

Since arm64 does not yet use the generic entry code, we must define our
own `sk_dynamic_irqentry_exit_cond_resched`, which will be
enabled/disabled by the common code in kernel/sched/core.c. All other
preemption functions and associated static keys are defined there.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-8-mark.rutland@arm.com
2 years agoarm64: entry: Centralize preemption decision
Mark Rutland [Mon, 14 Feb 2022 16:52:15 +0000 (16:52 +0000)]
arm64: entry: Centralize preemption decision

For historical reasons, the decision of whether or not to preempt is
spread across arm64_preempt_schedule_irq() and __el1_irq(), and it would
be clearer if this were all in one place.

Also, arm64_preempt_schedule_irq() calls lockdep_assert_irqs_disabled(),
but this is redundant, as we have a subsequent identical assertion in
__exit_to_kernel_mode(), and preempt_schedule_irq() will
BUG_ON(!irqs_disabled()) anyway.

This patch removes the redundant assertion and centralizes the
preemption decision making within arm64_preempt_schedule_irq().

Other than the slight change to assertion behaviour, there should be no
functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-7-mark.rutland@arm.com
2 years agosched/preempt: Add PREEMPT_DYNAMIC using static keys
Mark Rutland [Mon, 14 Feb 2022 16:52:14 +0000 (16:52 +0000)]
sched/preempt: Add PREEMPT_DYNAMIC using static keys

Where an architecture selects HAVE_STATIC_CALL but not
HAVE_STATIC_CALL_INLINE, each static call has an out-of-line trampoline
which will either branch to a callee or return to the caller.

On such architectures, a number of constraints can conspire to make
those trampolines more complicated and potentially less useful than we'd
like. For example:

* Hardware and software control flow integrity schemes can require the
  addition of "landing pad" instructions (e.g. `BTI` for arm64), which
  will also be present at the "real" callee.

* Limited branch ranges can require that trampolines generate or load an
  address into a register and perform an indirect branch (or at least
  have a slow path that does so). This loses some of the benefits of
  having a direct branch.

* Interaction with SW CFI schemes can be complicated and fragile, e.g.
  requiring that we can recognise idiomatic codegen and remove
  indirections understand, at least until clang proves more helpful
  mechanisms for dealing with this.

For PREEMPT_DYNAMIC, we don't need the full power of static calls, as we
really only need to enable/disable specific preemption functions. We can
achieve the same effect without a number of the pain points above by
using static keys to fold early returns into the preemption functions
themselves rather than in an out-of-line trampoline, effectively
inlining the trampoline into the start of the function.

For arm64, this results in good code generation. For example, the
dynamic_cond_resched() wrapper looks as follows when enabled. When
disabled, the first `B` is replaced with a `NOP`, resulting in an early
return.

| <dynamic_cond_resched>:
|        bti     c
|        b       <dynamic_cond_resched+0x10>     // or `nop`
|        mov     w0, #0x0
|        ret
|        mrs     x0, sp_el0
|        ldr     x0, [x0, #8]
|        cbnz    x0, <dynamic_cond_resched+0x8>
|        paciasp
|        stp     x29, x30, [sp, #-16]!
|        mov     x29, sp
|        bl      <preempt_schedule_common>
|        mov     w0, #0x1
|        ldp     x29, x30, [sp], #16
|        autiasp
|        ret

... compared to the regular form of the function:

| <__cond_resched>:
|        bti     c
|        mrs     x0, sp_el0
|        ldr     x1, [x0, #8]
|        cbz     x1, <__cond_resched+0x18>
|        mov     w0, #0x0
|        ret
|        paciasp
|        stp     x29, x30, [sp, #-16]!
|        mov     x29, sp
|        bl      <preempt_schedule_common>
|        mov     w0, #0x1
|        ldp     x29, x30, [sp], #16
|        autiasp
|        ret

Any architecture which implements static keys should be able to use this
to implement PREEMPT_DYNAMIC with similar cost to non-inlined static
calls. Since this is likely to have greater overhead than (inlined)
static calls, PREEMPT_DYNAMIC is only defaulted to enabled when
HAVE_PREEMPT_DYNAMIC_CALL is selected.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-6-mark.rutland@arm.com
2 years agosched/preempt: Decouple HAVE_PREEMPT_DYNAMIC from GENERIC_ENTRY
Mark Rutland [Mon, 14 Feb 2022 16:52:13 +0000 (16:52 +0000)]
sched/preempt: Decouple HAVE_PREEMPT_DYNAMIC from GENERIC_ENTRY

Now that the enabled/disabled states for the preemption functions are
declared alongside their definitions, the core PREEMPT_DYNAMIC logic is
no longer tied to GENERIC_ENTRY, and can safely be selected so long as
an architecture provides enabled/disabled states for
irqentry_exit_cond_resched().

Make it possible to select HAVE_PREEMPT_DYNAMIC without GENERIC_ENTRY.

For existing users of HAVE_PREEMPT_DYNAMIC there should be no functional
change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-5-mark.rutland@arm.com
2 years agosched/preempt: Simplify irqentry_exit_cond_resched() callers
Mark Rutland [Mon, 14 Feb 2022 16:52:12 +0000 (16:52 +0000)]
sched/preempt: Simplify irqentry_exit_cond_resched() callers

Currently callers of irqentry_exit_cond_resched() need to be aware of
whether the function should be indirected via a static call, leading to
ugly ifdeffery in callers.

Save them the hassle with a static inline wrapper that does the right
thing. The raw_irqentry_exit_cond_resched() will also be useful in
subsequent patches which will add conditional wrappers for preemption
functions.

Note: in arch/x86/entry/common.c, xen_pv_evtchn_do_upcall() always calls
irqentry_exit_cond_resched() directly, even when PREEMPT_DYNAMIC is in
use. I believe this is a latent bug (which this patch corrects), but I'm
not entirely certain this wasn't deliberate.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-4-mark.rutland@arm.com
2 years agosched/preempt: Refactor sched_dynamic_update()
Mark Rutland [Mon, 14 Feb 2022 16:52:11 +0000 (16:52 +0000)]
sched/preempt: Refactor sched_dynamic_update()

Currently sched_dynamic_update needs to open-code the enabled/disabled
function names for each preemption model it supports, when in practice
this is a boolean enabled/disabled state for each function.

Make this clearer and avoid repetition by defining the enabled/disabled
states at the function definition, and using helper macros to perform the
static_call_update(). Where x86 currently overrides the enabled
function, it is made to provide both the enabled and disabled states for
consistency, with defaults provided by the core code otherwise.

In subsequent patches this will allow us to support PREEMPT_DYNAMIC
without static calls.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-3-mark.rutland@arm.com
2 years agosched/preempt: Move PREEMPT_DYNAMIC logic later
Mark Rutland [Mon, 14 Feb 2022 16:52:10 +0000 (16:52 +0000)]
sched/preempt: Move PREEMPT_DYNAMIC logic later

The PREEMPT_DYNAMIC logic in kernel/sched/core.c patches static calls
for a bunch of preemption functions. While most are defined prior to
this, the definition of cond_resched() is later in the file, and so we
only have its declarations from include/linux/sched.h.

In subsequent patches we'd like to define some macros alongside the
definition of each of the preemption functions, which we can use within
sched_dynamic_update(). For this to be possible, the PREEMPT_DYNAMIC
logic needs to be placed after the various preemption functions.

As a preparatory step, this patch moves the PREEMPT_DYNAMIC logic after
the various preemption functions, with no other changes -- this is
purely a move.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-2-mark.rutland@arm.com
2 years agosched: Fix yet more sched_fork() races
Peter Zijlstra [Mon, 14 Feb 2022 09:16:57 +0000 (10:16 +0100)]
sched: Fix yet more sched_fork() races

Where commit 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an
invalid sched_task_group") fixed a fork race vs cgroup, it opened up a
race vs syscalls by not placing the task on the runqueue before it
gets exposed through the pidhash.

Commit 13765de8148f ("sched/fair: Fix fault in reweight_entity") is
trying to fix a single instance of this, instead fix the whole class
of issues, effectively reverting this commit.

Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Tested-by: Zhang Qiao <zhangqiao22@huawei.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/YgoeCbwj5mbCR0qA@hirez.programming.kicks-ass.net
2 years agoMerge tag 'nfs-for-5.17-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Linus Torvalds [Sat, 19 Feb 2022 00:24:44 +0000 (16:24 -0800)]
Merge tag 'nfs-for-5.17-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:

 - Fix unnecessary changeattr revalidations

 - Fix resolving symlinks during directory lookups

 - Don't report writeback errors in nfs_getattr()

* tag 'nfs-for-5.17-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Do not report writeback errors in nfs_getattr()
  NFS: LOOKUP_DIRECTORY is also ok with symlinks
  NFS: Remove an incorrect revalidation in nfs4_update_changeattr_locked()

2 years agoMerge tag 'acpi-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Sat, 19 Feb 2022 00:19:14 +0000 (16:19 -0800)]
Merge tag 'acpi-5.17-rc5' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These make an excess warning message go away and fix a recently
  introduced boot failure on a vintage machine.

  Specifics:

   - Change the log level of the "table not found" message in
     acpi_table_parse_entries_array() to debug to prevent it from
     showing up in the logs unnecessarily (Dan Williams)

   - Add a C-state limit quirk for 32-bit ThinkPad T40 to prevent it
     from crashing on boot after recent changes in the ACPI processor
     driver (Woody Suwalski)"

* tag 'acpi-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40
  ACPI: tables: Quiet ACPI table not found warning

2 years agoMerge tag 'riscv-for-linus-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 19 Feb 2022 00:14:13 +0000 (16:14 -0800)]
Merge tag 'riscv-for-linus-5.17-rc5' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "A set of three fixes, all aimed at fixing some fallout from the recent
  sparse hart ID support"

* tag 'riscv-for-linus-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Fix IPI/RFENCE hmask on non-monotonic hartid ordering
  RISC-V: Fix handling of empty cpu masks
  RISC-V: Fix hartid mask handling for hartid 31 and up

2 years agoInput: psmouse - set up dependency between PS/2 and SMBus companions
Dmitry Torokhov [Tue, 15 Feb 2022 21:32:26 +0000 (13:32 -0800)]
Input: psmouse - set up dependency between PS/2 and SMBus companions

When we switch from emulated PS/2 to native (RMI4 or Elan) protocols, we
create SMBus companion devices that are attached to I2C/SMBus controllers.
However, when suspending and resuming, we also need to make sure that we
take into account the PS/2 device they are associated with, so that PS/2
device is suspended after the companion and resumed before it, otherwise
companions will not work properly. Before I2C devices were marked for
asynchronous suspend/resume, this ordering happened naturally, but now we
need to enforce it by establishing device links, with PS/2 devices being
suppliers and SMBus companions being consumers.

Fixes: 172d931910e1 ("i2c: enable async suspend/resume on i2c client devices")
Reported-and-tested-by: Hugh Dickins <hughd@google.com>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Link: https://lore.kernel.org/r/89456fcd-a113-4c82-4b10-a9bcaefac68f@google.com
Link: https://lore.kernel.org/r/YgwQN8ynO88CPMju@google.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2 years agoMerge branch 'acpi-processor'
Rafael J. Wysocki [Fri, 18 Feb 2022 18:36:36 +0000 (19:36 +0100)]
Merge branch 'acpi-processor'

Merge fix for a recent boot lockup regression on 32-bit ThinkPad T40.

* acpi-processor:
  ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40

2 years agoMerge tag 'mtd/fixes-for-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 18 Feb 2022 17:33:23 +0000 (09:33 -0800)]
Merge tag 'mtd/fixes-for-5.17-rc5' of git://git./linux/kernel/git/mtd/linux

Pull MTD fixes from Miquel Raynal:
 "MTD changes:

   - Qcom:
      - Don't print error message on -EPROBE_DEFER
      - Fix kernel panic on skipped partition
      - Fix missing free for pparts in cleanup

   - phram: Prevent divide by zero bug in phram_setup()

  Raw NAND controller changes:

   - ingenic: Fix missing put_device in ingenic_ecc_get

   - qcom: Fix clock sequencing in qcom_nandc_probe()

   - omap2: Prevent invalid configuration and build error

   - gpmi: Don't leak PM reference in error path

   - brcmnand: Fix incorrect sub-page ECC status"

* tag 'mtd/fixes-for-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: rawnand: brcmnand: Fixed incorrect sub-page ECC status
  mtd: rawnand: gpmi: don't leak PM reference in error path
  mtd: phram: Prevent divide by zero bug in phram_setup()
  mtd: rawnand: omap2: Prevent invalid configuration and build error
  mtd: parsers: qcom: Fix missing free for pparts in cleanup
  mtd: parsers: qcom: Fix kernel panic on skipped partition
  mtd: parsers: qcom: Don't print error message on -EPROBE_DEFER
  mtd: rawnand: qcom: Fix clock sequencing in qcom_nandc_probe()
  mtd: rawnand: ingenic: Fix missing put_device in ingenic_ecc_get

2 years agoMerge tag 'block-5.17-2022-02-17' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 18 Feb 2022 17:27:10 +0000 (09:27 -0800)]
Merge tag 'block-5.17-2022-02-17' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Surprise removal fix (Christoph)

 - Ensure that pages are zeroed before submitted for userspace IO
   (Haimin)

 - Fix blk-wbt accounting issue with BFQ (Laibin)

 - Use bsize for discard granularity in loop (Ming)

 - Fix missing zone handling in blk_complete_request() (Pankaj)

* tag 'block-5.17-2022-02-17' of git://git.kernel.dk/linux-block:
  block/wbt: fix negative inflight counter when remove scsi device
  block: fix surprise removal for drivers calling blk_set_queue_dying
  block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern
  block: loop:use kstatfs.f_bsize of backing file to set discard granularity
  block: Add handling for zone append command in blk_complete_request

2 years agoMerge tag 'sound-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 18 Feb 2022 17:20:52 +0000 (09:20 -0800)]
Merge tag 'sound-5.17-rc5' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small patches, mostly for old and new regressions and
  device-specific fixes.

   - Regression fixes regarding ALSA core SG-buffer helpers

   - Regression fix for Realtek HD-audio mutex deadlock

   - Regression fix for USB-audio PM resume error

   - More coverage of ASoC core control API notification fixes

   - Old regression fixes for HD-audio probe mask

   - Fixes for ASoC Realtek codec work handling

   - Other device-specific quirks / fixes"

* tag 'sound-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
  ASoC: intel: skylake: Set max DMA segment size
  ASoC: SOF: hda: Set max DMA segment size
  ALSA: hda: Set max DMA segment size
  ALSA: hda/realtek: Fix deadlock by COEF mutex
  ALSA: usb-audio: Don't abort resume upon errors
  ALSA: hda: Fix missing codec probe on Shenker Dock 15
  ALSA: hda: Fix regression on forced probe mask option
  ALSA: hda/realtek: Add quirk for Legion Y9000X 2019
  ALSA: usb-audio: revert to IMPLICIT_FB_FIXED_DEV for M-Audio FastTrack Ultra
  ASoC: wm_adsp: Correct control read size when parsing compressed buffer
  ASoC: qcom: Actually clear DMA interrupt register for HDMI
  ALSA: memalloc: invalidate SG pages before sync
  ALSA: memalloc: Fix dma_need_sync() checks
  MAINTAINERS: update cros_ec_codec maintainers
  ASoC: rt5682: do not block workqueue if card is unbound
  ASoC: rt5668: do not block workqueue if card is unbound
  ASoC: rt5682s: do not block workqueue if card is unbound
  ASoC: tas2770: Insert post reset delay
  ASoC: Revert "ASoC: mediatek: Check for error clk pointer"
  ASoC: amd: acp: Set gpio_spkr_en to None for max speaker amplifer in machine driver
  ...

2 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 18 Feb 2022 17:14:19 +0000 (09:14 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
 "Fix wrong branch label in the EL2 GICv3 initialisation code"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Correct wrong label in macro __init_el2_gicv3

2 years agoMerge tag 'powerpc-5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Fri, 18 Feb 2022 17:10:14 +0000 (09:10 -0800)]
Merge tag 'powerpc-5.17-4' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix boot failure on 603 with DEBUG_PAGEALLOC and KFENCE

 - Fix 32-build with newer binutils that rejects 'ptesync' etc

Thanks to Anders Roxell, Christophe Leroy, and Maxime Bizon.

* tag 'powerpc-5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/lib/sstep: fix 'ptesync' build error
  powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE

2 years agoMerge tag '5.17-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Fri, 18 Feb 2022 17:04:27 +0000 (09:04 -0800)]
Merge tag '5.17-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Six small smb3 client fixes, three for stable:

   - fix for snapshot mount option

   - two ACL related fixes

   - use after free race fix

   - fix for confusing warning message logged with older dialects"

* tag '5.17-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix confusing unneeded warning message on smb2.1 and earlier
  cifs: modefromsids must add an ACE for authenticated users
  cifs: fix double free race when mount fails in cifs_get_root()
  cifs: do not use uninitialized data in the owner/group sid
  cifs: fix set of group SID via NTSD xattrs
  smb3: fix snapshot mount option

2 years agox86/ptrace: Fix xfpregs_set()'s incorrect xmm clearing
Andy Lutomirski [Mon, 14 Feb 2022 12:05:49 +0000 (13:05 +0100)]
x86/ptrace: Fix xfpregs_set()'s incorrect xmm clearing

xfpregs_set() handles 32-bit REGSET_XFP and 64-bit REGSET_FP. The actual
code treats these regsets as modern FX state (i.e. the beginning part of
XSTATE). The declarations of the regsets thought they were the legacy
i387 format. The code thought they were the 32-bit (no xmm8..15) variant
of XSTATE and, for good measure, made the high bits disappear by zeroing
the wrong part of the buffer. The latter broke ptrace, and everything
else confused anyone trying to understand the code. In particular, the
nonsense definitions of the regsets confused me when I wrote this code.

Clean this all up. Change the declarations to match reality (which
shouldn't change the generated code, let alone the ABI) and fix
xfpregs_set() to clear the correct bits and to only do so for 32-bit
callers.

Fixes: 6164331d15f7 ("x86/fpu: Rewrite xfpregs_set()")
Reported-by: Luís Ferreira <contact@lsferreira.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215524
Link: https://lore.kernel.org/r/YgpFnZpF01WwR8wU@zn.tnic
2 years agoi2c: brcmstb: fix support for DSL and CM variants
Rafał Miłecki [Tue, 15 Feb 2022 07:27:35 +0000 (08:27 +0100)]
i2c: brcmstb: fix support for DSL and CM variants

DSL and CM (Cable Modem) support 8 B max transfer size and have a custom
DT binding for that reason. This driver was checking for a wrong
"compatible" however which resulted in an incorrect setup.

Fixes: e2e5a2c61837 ("i2c: brcmstb: Adding support for CM and DSL SoCs")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2 years agoMerge tag 'linux-kselftest-fixes-5.17-rc5' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Thu, 17 Feb 2022 23:21:42 +0000 (15:21 -0800)]
Merge tag 'linux-kselftest-fixes-5.17-rc5' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fixes from Shuah Khan:
 "Fixes to ftrace, exec, and seccomp tests build, run-time and install
  bugs. These bugs are in the way of running the tests"

* tag 'linux-kselftest-fixes-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/ftrace: Do not trace do_softirq because of PREEMPT_RT
  selftests/seccomp: Fix seccomp failure by adding missing headers
  selftests/exec: Add non-regular to TEST_GEN_PROGS

2 years agoMerge tag 'drm-fixes-2022-02-18' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Thu, 17 Feb 2022 21:11:46 +0000 (13:11 -0800)]
Merge tag 'drm-fixes-2022-02-18' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Regular fixes for rc5, nothing really stands out, mostly some amdgpu
  and i915 fixes with mediatek, radeon and some misc fixes.

  cma-helper:
   - set VM_DONTEXPAND

  atomic:
   - error handling fix

  mediatek:
   - fix probe defer loop with external bridge

  amdgpu:
   - Stable pstate clock fixes for Dimgrey Cavefish and Beige Goby
   - S0ix SDMA fix
   - Yellow Carp GPU reset fix

  radeon:
   - Backlight fix for iMac 12,1

  i915:
   - GVT kerneldoc cleanup.
   - GVT Kconfig should depend on X86
   - Prevent out of range access in SWSCI display code
   - Fix mbus join and dbuf slice config lookup
   - Fix inverted priority selection in the TTM backend
   - Fix FBC plane end Y offset check"

* tag 'drm-fixes-2022-02-18' of git://anongit.freedesktop.org/drm/drm:
  drm/atomic: Don't pollute crtc_state->mode_blob with error pointers
  drm/radeon: Fix backlight control on iMac 12,1
  drm/amd/pm: correct the sequence of sending gpu reset msg
  drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.
  drm/amd/pm: correct UMD pstate clocks for Dimgrey Cavefish and Beige Goby
  drm/i915/fbc: Fix the plane end Y offset check
  drm/i915/opregion: check port number bounds for SWSCI display power state
  drm/i915/ttm: tweak priority hint selection
  drm/i915: Fix mbus join config lookup
  drm/i915: Fix dbuf slice config lookup
  drm/cma-helper: Set VM_DONTEXPAND for mmap
  drm/mediatek: mtk_dsi: Avoid EPROBE_DEFER loop with external bridge
  drm/i915/gvt: Make DRM_I915_GVT depend on X86
  drm/i915/gvt: clean up kernel-doc in gtt.c

2 years agoMerge tag 'drm-intel-fixes-2022-02-17' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Thu, 17 Feb 2022 19:44:44 +0000 (05:44 +1000)]
Merge tag 'drm-intel-fixes-2022-02-17' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- GVT kerneldoc cleanup. (Randy Dunlap)
- GVT Kconfig should depend on X86. (Siva Mullati)
- Prevent out of range access in SWSCI display code. (Jani Nikula)
- Fix mbus join and dbuf slice config lookup. (Ville Syrjälä)
- Fix inverted priority selection in the TTM backend. (Matthew Auld)
- Fix FBC plane end Y offset check. (Ville Syrjälä)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Yg4lA6k8+xp8u3aB@tursulin-mobl2
2 years agoMerge tag 'drm-misc-fixes-2022-02-17' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Thu, 17 Feb 2022 19:39:53 +0000 (05:39 +1000)]
Merge tag 'drm-misc-fixes-2022-02-17' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

 * drm/cma-helper: Set VM_DONTEXPAND
 * drm/atomic: Fix error handling in drm_atomic_set_mode_for_crtc()

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/Yg4mzQALMX69UmA3@linux-uq9g
2 years agoMerge tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 17 Feb 2022 19:33:59 +0000 (11:33 -0800)]
Merge tag 'net-5.17-rc5' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless and netfilter.

  Current release - regressions:

   - dsa: lantiq_gswip: fix use after free in gswip_remove()

   - smc: avoid overwriting the copies of clcsock callback functions

  Current release - new code bugs:

   - iwlwifi:
      - fix use-after-free when no FW is present
      - mei: fix the pskb_may_pull check in ipv4
      - mei: retry mapping the shared area
      - mvm: don't feed the hardware RFKILL into iwlmei

  Previous releases - regressions:

   - ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()

   - tipc: fix wrong publisher node address in link publications

   - iwlwifi: mvm: don't send SAR GEO command for 3160 devices, avoid FW
     assertion

   - bgmac: make idm and nicpm resource optional again

   - atl1c: fix tx timeout after link flap

  Previous releases - always broken:

   - vsock: remove vsock from connected table when connect is
     interrupted by a signal

   - ping: change destination interface checks to match raw sockets

   - crypto: af_alg - get rid of alg_memory_allocated to avoid confusing
     semantics (and null-deref) after SO_RESERVE_MEM was added

   - ipv6: make exclusive flowlabel checks per-netns

   - bonding: force carrier update when releasing slave

   - sched: limit TC_ACT_REPEAT loops

   - bridge: multicast: notify switchdev driver whenever MC processing
     gets disabled because of max entries reached

   - wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found

   - iwlwifi: fix locking when "HW not ready"

   - phy: mediatek: remove PHY mode check on MT7531

   - dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN

   - dsa: lan9303:
      - fix polarity of reset during probe
      - fix accelerated VLAN handling"

* tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
  bonding: force carrier update when releasing slave
  nfp: flower: netdev offload check for ip6gretap
  ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
  ipv4: fix data races in fib_alias_hw_flags_set
  net: dsa: lan9303: add VLAN IDs to master device
  net: dsa: lan9303: handle hwaccel VLAN tags
  vsock: remove vsock from connected table when connect is interrupted by a signal
  Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
  ping: fix the dif and sdif check in ping_lookup
  net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
  net: sched: limit TC_ACT_REPEAT loops
  tipc: fix wrong notification node addresses
  net: dsa: lantiq_gswip: fix use after free in gswip_remove()
  ipv6: per-netns exclusive flowlabel checks
  net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
  CDC-NCM: avoid overflow in sanity checking
  mctp: fix use after free
  net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()
  bonding: fix data-races around agg_select_timer
  dpaa2-eth: Initialize mutex used in one step timestamping path
  ...

2 years agobonding: force carrier update when releasing slave
Zhang Changzhong [Wed, 16 Feb 2022 14:18:08 +0000 (22:18 +0800)]
bonding: force carrier update when releasing slave

In __bond_release_one(), bond_set_carrier() is only called when bond
device has no slave. Therefore, if we remove the up slave from a master
with two slaves and keep the down slave, the master will remain up.

Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
statement.

Reproducer:
$ insmod bonding.ko mode=0 miimon=100 max_bonds=2
$ ifconfig bond0 up
$ ifenslave bond0 eth0 eth1
$ ifconfig eth0 down
$ ifenslave -d bond0 eth1
$ cat /proc/net/bonding/bond0

Fixes: ff59c4563a8d ("[PATCH] bonding: support carrier state for master")
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agox86/sgx: Fix missing poison handling in reclaimer
Reinette Chatre [Wed, 2 Feb 2022 19:41:12 +0000 (11:41 -0800)]
x86/sgx: Fix missing poison handling in reclaimer

The SGX reclaimer code lacks page poison handling in its main
free path. This can lead to avoidable machine checks if a
poisoned page is freed and reallocated instead of being
isolated.

A troublesome scenario is:
 1. Machine check (#MC) occurs (asynchronous, !MF_ACTION_REQUIRED)
 2. arch_memory_failure() is eventually called
 3. (SGX) page->poison set to 1
 4. Page is reclaimed
 5. Page added to normal free lists by sgx_reclaim_pages()
    ^ This is the bug (poison pages should be isolated on the
    sgx_poison_page_list instead)
 6. Page is reallocated by some innocent enclave, a second (synchronous)
    in-kernel #MC is induced, probably during EADD instruction.
    ^ This is the fallout from the bug

(6) is unfortunate and can be avoided by replacing the open coded
enclave page freeing code in the reclaimer with sgx_free_epc_page()
to obtain support for poison page handling that includes placing the
poisoned page on the correct list.

Fixes: d6d261bded8a ("x86/sgx: Add new sgx_epc_page flag bit to mark free pages")
Fixes: 992801ae9243 ("x86/sgx: Initial poison handling for dirty and free pages")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/dcc95eb2aaefb042527ac50d0a50738c7c160dac.1643830353.git.reinette.chatre@intel.com
2 years agofs/file_table: fix adding missing kmemleak_not_leak()
Luis Chamberlain [Tue, 15 Feb 2022 02:08:28 +0000 (18:08 -0800)]
fs/file_table: fix adding missing kmemleak_not_leak()

Commit b42bc9a3c511 ("Fix regression due to "fs: move binfmt_misc sysctl
to its own file") fixed a regression, however it failed to add a
kmemleak_not_leak().

Fixes: b42bc9a3c511 ("Fix regression due to "fs: move binfmt_misc sysctl to its own file")
Reported-by: Tong Zhang <ztong0001@gmail.com>
Cc: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMerge tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm...
Linus Torvalds [Thu, 17 Feb 2022 18:06:09 +0000 (10:06 -0800)]
Merge tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git./linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix corrupt inject files when only last branch option is enabled with
   ARM CoreSight ETM

 - Fix use-after-free for realloc(..., 0) in libsubcmd, found by gcc 12

 - Defer freeing string after possible strlen() on it in the BPF loader,
   found by gcc 12

 - Avoid early exit in 'perf trace' due SIGCHLD from non-workload
   processes

 - Fix arm64 perf_event_attr 'perf test's wrt --call-graph
   initialization

 - Fix libperf 32-bit build for 'perf test' wrt uint64_t printf

 - Fix perf_cpu_map__for_each_cpu macro in libperf, providing access to
   the CPU iterator

 - Sync linux/perf_event.h UAPI with the kernel sources

 - Update Jiri Olsa's email address in MAINTAINERS

* tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf bpf: Defer freeing string after possible strlen() on it
  perf test: Fix arm64 perf_event_attr tests wrt --call-graph initialization
  libsubcmd: Fix use-after-free for realloc(..., 0)
  libperf: Fix perf_cpu_map__for_each_cpu macro
  perf cs-etm: Fix corrupt inject files when only last branch option is enabled
  perf cs-etm: No-op refactor of synth opt usage
  libperf: Fix 32-bit build for tests uint64_t printf
  tools headers UAPI: Sync linux/perf_event.h with the kernel sources
  perf trace: Avoid early exit due SIGCHLD from non-workload processes
  MAINTAINERS: Update Jiri's email address

2 years agoMerge tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof...
Linus Torvalds [Thu, 17 Feb 2022 17:54:00 +0000 (09:54 -0800)]
Merge tag 'modules-5.17-rc5' of git://git./linux/kernel/git/mcgrof/linux

Pull module fix from Luis Chamberlain:
 "Fixes module decompression when CONFIG_SYSFS=n

  The only fix trickled down for v5.17-rc cycle so far is the fix for
  module decompression when CONFIG_SYSFS=n. This was reported through
  0-day"

* tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  module: fix building with sysfs disabled

2 years agonfp: flower: netdev offload check for ip6gretap
Danie du Toit [Thu, 17 Feb 2022 12:48:20 +0000 (14:48 +0200)]
nfp: flower: netdev offload check for ip6gretap

IPv6 GRE tunnels are not being offloaded, this is caused by a missing
netdev offload check. The functionality of IPv6 GRE tunnel offloading
was previously added but this check was not included. Adding the
ip6gretap check allows IPv6 GRE tunnels to be offloaded correctly.

Fixes: f7536ffb0986 ("nfp: flower: Allow ipv6gretap interface for offloading")
Signed-off-by: Danie du Toit <danie.dutoit@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220217124820.40436-1-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
Eric Dumazet [Wed, 16 Feb 2022 17:32:17 +0000 (09:32 -0800)]
ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt

Because fib6_info_hw_flags_set() is called without any synchronization,
all accesses to gi6->offload, fi->trap and fi->offload_failed
need some basic protection like READ_ONCE()/WRITE_ONCE().

BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt

read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0:
 fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline]
 fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033
 fib6_del_route net/ipv6/ip6_fib.c:1983 [inline]
 fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028
 __ip6_del_rt net/ipv6/route.c:3876 [inline]
 ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891
 __ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374
 ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline]
 __ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207
 ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220
 inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476
 __sock_release net/socket.c:650 [inline]
 sock_close+0x6c/0x150 net/socket.c:1318
 __fput+0x295/0x520 fs/file_table.c:280
 ____fput+0x11/0x20 fs/file_table.c:313
 task_work_run+0x8e/0x110 kernel/task_work.c:164
 tracehook_notify_resume include/linux/tracehook.h:189 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
 exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207
 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
 syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
 do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x44/0xae

write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
 fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
 nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
 nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
 nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
 nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
 nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 worker_thread+0x616/0xa70 kernel/workqueue.c:2454
 kthread+0x2c7/0x2e0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30

value changed: 0x22 -> 0x2a

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work

Fixes: 0c5fcf9e249e ("IPv6: Add "offload failed" indication to routes")
Fixes: bb3c4ab93e44 ("ipv6: Add "offload" and "trap" indications to routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amit Cohen <amcohen@nvidia.com>
Cc: Ido Schimmel <idosch@nvidia.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoipv4: fix data races in fib_alias_hw_flags_set
Eric Dumazet [Wed, 16 Feb 2022 17:32:16 +0000 (09:32 -0800)]
ipv4: fix data races in fib_alias_hw_flags_set

fib_alias_hw_flags_set() can be used by concurrent threads,
and is only RCU protected.

We need to annotate accesses to following fields of struct fib_alias:

    offload, trap, offload_failed

Because of READ_ONCE()WRITE_ONCE() limitations, make these
field u8.

BUG: KCSAN: data-race in fib_alias_hw_flags_set / fib_alias_hw_flags_set

read to 0xffff888134224a6a of 1 bytes by task 2013 on cpu 1:
 fib_alias_hw_flags_set+0x28a/0x470 net/ipv4/fib_trie.c:1050
 nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
 nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
 nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
 nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
 nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 process_scheduled_works kernel/workqueue.c:2370 [inline]
 worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
 kthread+0x1bf/0x1e0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30

write to 0xffff888134224a6a of 1 bytes by task 4872 on cpu 0:
 fib_alias_hw_flags_set+0x2d5/0x470 net/ipv4/fib_trie.c:1054
 nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
 nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
 nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
 nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
 nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 process_scheduled_works kernel/workqueue.c:2370 [inline]
 worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
 kthread+0x1bf/0x1e0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30

value changed: 0x00 -> 0x02

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 4872 Comm: kworker/0:0 Not tainted 5.17.0-rc3-syzkaller-00188-g1d41d2e82623-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work

Fixes: 90b93f1b31f8 ("ipv4: Add "offload" and "trap" indications to routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220216173217.3792411-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: lan9303: add VLAN IDs to master device
Mans Rullgard [Wed, 16 Feb 2022 20:48:18 +0000 (20:48 +0000)]
net: dsa: lan9303: add VLAN IDs to master device

If the master device does VLAN filtering, the IDs used by the switch
must be added for any frames to be received.  Do this in the
port_enable() function, and remove them in port_disable().

Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans@mansr.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220216204818.28746-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: lan9303: handle hwaccel VLAN tags
Mans Rullgard [Wed, 16 Feb 2022 12:46:34 +0000 (12:46 +0000)]
net: dsa: lan9303: handle hwaccel VLAN tags

Check for a hwaccel VLAN tag on rx and use it if present.  Otherwise,
use __skb_vlan_pop() like the other tag parsers do.  This fixes the case
where the VLAN tag has already been consumed by the master.

Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans@mansr.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomm: don't try to NUMA-migrate COW pages that have other uses
Linus Torvalds [Thu, 17 Feb 2022 16:57:47 +0000 (08:57 -0800)]
mm: don't try to NUMA-migrate COW pages that have other uses

Oded Gabbay reports that enabling NUMA balancing causes corruption with
his Gaudi accelerator test load:

 "All the details are in the bug, but the bottom line is that somehow,
  this patch causes corruption when the numa balancing feature is
  enabled AND we don't use process affinity AND we use GUP to pin pages
  so our accelerator can DMA to/from system memory.

  Either disabling numa balancing, using process affinity to bind to
  specific numa-node or reverting this patch causes the bug to
  disappear"

and Oded bisected the issue to commit 09854ba94c6a ("mm: do_wp_page()
simplification").

Now, the NUMA balancing shouldn't actually be changing the writability
of a page, and as such shouldn't matter for COW.  But it appears it
does.  Suspicious.

However, regardless of that, the condition for enabling NUMA faults in
change_pte_range() is nonsensical.  It uses "page_mapcount(page)" to
decide if a COW page should be NUMA-protected or not, and that makes
absolutely no sense.

The number of mappings a page has is irrelevant: not only does GUP get a
reference to a page as in Oded's case, but the other mappings migth be
paged out and the only reference to them would be in the page count.

Since we should never try to NUMA-balance a page that we can't move
anyway due to other references, just fix the code to use 'page_count()'.
Oded confirms that that fixes his issue.

Now, this does imply that something in NUMA balancing ends up changing
page protections (other than the obvious one of making the page
inaccessible to get the NUMA faulting information).  Otherwise the COW
simplification wouldn't matter - since doing the GUP on the page would
make sure it's writable.

The cause of that permission change would be good to figure out too,
since it clearly results in spurious COW events - but fixing the
nonsensical test that just happened to work before is obviously the
CorrectThing(tm) to do regardless.

Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215616
Link: https://lore.kernel.org/all/CAFCwf10eNmwq2wD71xjUhqkvv5+_pJMR1nPug2RqNDcFT4H86Q@mail.gmail.com/
Reported-and-tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agovsock: remove vsock from connected table when connect is interrupted by a signal
Seth Forshee [Thu, 17 Feb 2022 14:13:12 +0000 (08:13 -0600)]
vsock: remove vsock from connected table when connect is interrupted by a signal

vsock_connect() expects that the socket could already be in the
TCP_ESTABLISHED state when the connecting task wakes up with a signal
pending. If this happens the socket will be in the connected table, and
it is not removed when the socket state is reset. In this situation it's
common for the process to retry connect(), and if the connection is
successful the socket will be added to the connected table a second
time, corrupting the list.

Prevent this by calling vsock_remove_connected() if a signal is received
while waiting for a connection. This is harmless if the socket is not in
the connected table, and if it is in the table then removing it will
prevent list corruption from a double add.

Note for backporting: this patch requires d5afa82c977e ("vsock: correct
removal of socket from the list"), which is in all current stable trees
except 4.9.y.

Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220217141312.2297547-1-sforshee@digitalocean.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoRevert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
Jonas Gorski [Wed, 16 Feb 2022 18:46:34 +0000 (10:46 -0800)]
Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"

This reverts commit 3710e80952cf2dc48257ac9f145b117b5f74e0a5.

Since idm_base and nicpm_base are still optional resources not present
on all platforms, this breaks the driver for everything except Northstar
2 (which has both).

The same change was already reverted once with 755f5738ff98 ("net:
broadcom: fix a mistake about ioremap resource").

So let's do it again.

Fixes: 3710e80952cf ("net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
[florian: Added comments to explain the resources are optional]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220216184634.2032460-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoucounts: Handle wrapping in is_ucounts_overlimit
Eric W. Biederman [Thu, 10 Feb 2022 00:09:41 +0000 (18:09 -0600)]
ucounts: Handle wrapping in is_ucounts_overlimit

While examining is_ucounts_overlimit and reading the various messages
I realized that is_ucounts_overlimit fails to deal with counts that
may have wrapped.

Being wrapped should be a transitory state for counts and they should
never be wrapped for long, but it can happen so handle it.

Cc: stable@vger.kernel.org
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Link: https://lkml.kernel.org/r/20220216155832.680775-5-ebiederm@xmission.com
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2 years agoucounts: Move RLIMIT_NPROC handling after set_user
Eric W. Biederman [Mon, 14 Feb 2022 15:40:25 +0000 (09:40 -0600)]
ucounts: Move RLIMIT_NPROC handling after set_user

During set*id() which cred->ucounts to charge the the current process
to is not known until after set_cred_ucounts.  So move the
RLIMIT_NPROC checking into a new helper flag_nproc_exceeded and call
flag_nproc_exceeded after set_cred_ucounts.

This is very much an arbitrary subset of the places where we currently
change the RLIMIT_NPROC accounting, designed to preserve the existing
logic.

Fixing the existing logic will be the subject of another series of
changes.

Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220216155832.680775-4-ebiederm@xmission.com
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2 years agoucounts: Base set_cred_ucounts changes on the real user
Eric W. Biederman [Wed, 9 Feb 2022 22:22:20 +0000 (16:22 -0600)]
ucounts: Base set_cred_ucounts changes on the real user

Michal Koutný <mkoutny@suse.com> wrote:
> Tasks are associated to multiple users at once. Historically and as per
> setrlimit(2) RLIMIT_NPROC is enforce based on real user ID.
>
> The commit 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
> made the accounting structure "indexed" by euid and hence potentially
> account tasks differently.
>
> The effective user ID may be different e.g. for setuid programs but
> those are exec'd into already existing task (i.e. below limit), so
> different accounting is moot.
>
> Some special setresuid(2) users may notice the difference, justifying
> this fix.

I looked at cred->ucount and it is only used for rlimit operations
that were previously stored in cred->user.  Making the fact
cred->ucount can refer to a different user from cred->user a bug,
affecting all uses of cred->ulimit not just RLIMIT_NPROC.

Fix set_cred_ucounts to always use the real uid not the effective uid.

Further simplify set_cred_ucounts by noticing that set_cred_ucounts
somehow retained a draft version of the check to see if alloc_ucounts
was needed that checks the new->user and new->user_ns against the
current_real_cred().  Remove that draft version of the check.

All that matters for setting the cred->ucounts are the user_ns and uid
fields in the cred.

Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220207121800.5079-4-mkoutny@suse.com
Link: https://lkml.kernel.org/r/20220216155832.680775-3-ebiederm@xmission.com
Reported-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2 years agoucounts: Enforce RLIMIT_NPROC not RLIMIT_NPROC+1
Eric W. Biederman [Thu, 10 Feb 2022 02:03:19 +0000 (20:03 -0600)]
ucounts: Enforce RLIMIT_NPROC not RLIMIT_NPROC+1

Michal Koutný <mkoutny@suse.com> wrote:

> It was reported that v5.14 behaves differently when enforcing
> RLIMIT_NPROC limit, namely, it allows one more task than previously.
> This is consequence of the commit 21d1c5e386bc ("Reimplement
> RLIMIT_NPROC on top of ucounts") that missed the sharpness of
> equality in the forking path.

This can be fixed either by fixing the test or by moving the increment
to be before the test.  Fix it my moving copy_creds which contains
the increment before is_ucounts_overlimit.

In the case of CLONE_NEWUSER the ucounts in the task_cred changes.
The function is_ucounts_overlimit needs to use the final version of
the ucounts for the new process.  Which means moving the
is_ucounts_overlimit test after copy_creds is necessary.

Both the test in fork and the test in set_user were semantically
changed when the code moved to ucounts.  The change of the test in
fork was bad because it was before the increment.  The test in
set_user was wrong and the change to ucounts fixed it.  So this
fix only restores the old behavior in one lcation not two.

Link: https://lkml.kernel.org/r/20220204181144.24462-1-mkoutny@suse.com
Link: https://lkml.kernel.org/r/20220216155832.680775-2-ebiederm@xmission.com
Cc: stable@vger.kernel.org
Reported-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2 years agorlimit: Fix RLIMIT_NPROC enforcement failure caused by capability calls in set_user
Eric W. Biederman [Fri, 11 Feb 2022 19:57:44 +0000 (13:57 -0600)]
rlimit: Fix RLIMIT_NPROC enforcement failure caused by capability calls in set_user

Solar Designer <solar@openwall.com> wrote:
> I'm not aware of anyone actually running into this issue and reporting
> it.  The systems that I personally know use suexec along with rlimits
> still run older/distro kernels, so would not yet be affected.
>
> So my mention was based on my understanding of how suexec works, and
> code review.  Specifically, Apache httpd has the setting RLimitNPROC,
> which makes it set RLIMIT_NPROC:
>
> https://httpd.apache.org/docs/2.4/mod/core.html#rlimitnproc
>
> The above documentation for it includes:
>
> "This applies to processes forked from Apache httpd children servicing
> requests, not the Apache httpd children themselves. This includes CGI
> scripts and SSI exec commands, but not any processes forked from the
> Apache httpd parent, such as piped logs."
>
> In code, there are:
>
> ./modules/generators/mod_cgid.c:        ( (cgid_req.limits.limit_nproc_set) && ((rc = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC,
> ./modules/generators/mod_cgi.c:        ((rc = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC,
> ./modules/filters/mod_ext_filter.c:    rv = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC, conf->limit_nproc);
>
> For example, in mod_cgi.c this is in run_cgi_child().
>
> I think this means an httpd child sets RLIMIT_NPROC shortly before it
> execs suexec, which is a SUID root program.  suexec then switches to the
> target user and execs the CGI script.
>
> Before 2863643fb8b9, the setuid() in suexec would set the flag, and the
> target user's process count would be checked against RLIMIT_NPROC on
> execve().  After 2863643fb8b9, the setuid() in suexec wouldn't set the
> flag because setuid() is (naturally) called when the process is still
> running as root (thus, has those limits bypass capabilities), and
> accordingly execve() would not check the target user's process count
> against RLIMIT_NPROC.

In commit 2863643fb8b9 ("set_user: add capability check when
rlimit(RLIMIT_NPROC) exceeds") capable calls were added to set_user to
make it more consistent with fork.  Unfortunately because of call site
differences those capable calls were checking the credentials of the
user before set*id() instead of after set*id().

This breaks enforcement of RLIMIT_NPROC for applications that set the
rlimit and then call set*id() while holding a full set of
capabilities.  The capabilities are only changed in the new credential
in security_task_fix_setuid().

The code in apache suexec appears to follow this pattern.

Commit 909cc4ae86f3 ("[PATCH] Fix two bugs with process limits
(RLIMIT_NPROC)") where this check was added describes the targes of this
capability check as:

  2/ When a root-owned process (e.g. cgiwrap) sets up process limits and then
      calls setuid, the setuid should fail if the user would then be running
      more than rlim_cur[RLIMIT_NPROC] processes, but it doesn't.  This patch
      adds an appropriate test.  With this patch, and per-user process limit
      imposed in cgiwrap really works.

So the original use case of this check also appears to match the broken
pattern.

Restore the enforcement of RLIMIT_NPROC by removing the bad capable
checks added in set_user.  This unfortunately restores the
inconsistent state the code has been in for the last 11 years, but
dealing with the inconsistencies looks like a larger problem.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20210907213042.GA22626@openwall.com/
Link: https://lkml.kernel.org/r/20220212221412.GA29214@openwall.com
Link: https://lkml.kernel.org/r/20220216155832.680775-1-ebiederm@xmission.com
Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Reviewed-by: Solar Designer <solar@openwall.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2 years agoping: fix the dif and sdif check in ping_lookup
Xin Long [Wed, 16 Feb 2022 05:20:52 +0000 (00:20 -0500)]
ping: fix the dif and sdif check in ping_lookup

When 'ping' changes to use PING socket instead of RAW socket by:

   # sysctl -w net.ipv4.ping_group_range="0 100"

There is another regression caused when matching sk_bound_dev_if
and dif, RAW socket is using inet_iif() while PING socket lookup
is using skb->dev->ifindex, the cmd below fails due to this:

  # ip link add dummy0 type dummy
  # ip link set dummy0 up
  # ip addr add 192.168.111.1/24 dev dummy0
  # ping -I dummy0 192.168.111.1 -c1

The issue was also reported on:

  https://github.com/iputils/iputils/issues/104

But fixed in iputils in a wrong way by not binding to device when
destination IP is on device, and it will cause some of kselftests
to fail, as Jianlin noticed.

This patch is to use inet(6)_iif and inet(6)_sdif to get dif and
sdif for PING socket, and keep consistent with RAW socket.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoblock/wbt: fix negative inflight counter when remove scsi device
Laibin Qiu [Sat, 22 Jan 2022 11:10:45 +0000 (19:10 +0800)]
block/wbt: fix negative inflight counter when remove scsi device

Now that we disable wbt by set WBT_STATE_OFF_DEFAULT in
wbt_disable_default() when switch elevator to bfq. And when
we remove scsi device, wbt will be enabled by wbt_enable_default.
If it become false positive between wbt_wait() and wbt_track()
when submit write request.

The following is the scenario that triggered the problem.

T1                          T2                           T3
                            elevator_switch_mq
                            bfq_init_queue
                            wbt_disable_default <= Set
                            rwb->enable_state (OFF)
Submit_bio
blk_mq_make_request
rq_qos_throttle
<= rwb->enable_state (OFF)
                                                         scsi_remove_device
                                                         sd_remove
                                                         del_gendisk
                                                         blk_unregister_queue
                                                         elv_unregister_queue
                                                         wbt_enable_default
                                                         <= Set rwb->enable_state (ON)
q_qos_track
<= rwb->enable_state (ON)
^^^^^^ this request will mark WBT_TRACKED without inflight add and will
lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.

Fix this by move wbt_enable_default() from elv_unregister to
bfq_exit_queue(). Only re-enable wbt when bfq exit.

Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly")

Remove oneline stale comment, and kill one oneshot local variable.

Signed-off-by: Ming Lei <ming.lei@rehdat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/linux-block/20211214133103.551813-1-qiulaibin@huawei.com/
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: fix surprise removal for drivers calling blk_set_queue_dying
Christoph Hellwig [Thu, 17 Feb 2022 07:52:31 +0000 (08:52 +0100)]
block: fix surprise removal for drivers calling blk_set_queue_dying

Various block drivers call blk_set_queue_dying to mark a disk as dead due
to surprise removal events, but since commit 8e141f9eb803 that doesn't
work given that the GD_DEAD flag needs to be set to stop I/O.

Replace the driver calls to blk_set_queue_dying with a new (and properly
documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the
only remaining caller.

Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk")
Reported-by: Markus Blöchl <markus.bloechl@ipetronik.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern
Haimin Zhang [Wed, 16 Feb 2022 08:40:38 +0000 (16:40 +0800)]
block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern

Add __GFP_ZERO flag for alloc_page in function bio_copy_kern to initialize
the buffer of a bio.

Signed-off-by: Haimin Zhang <tcs.kernel@gmail.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220216084038.15635-1-tcs.kernel@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agonet: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
Daniele Palmas [Tue, 15 Feb 2022 11:13:35 +0000 (12:13 +0100)]
net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990

Add quirk CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE for Telit FN990
0x1071 composition in order to avoid bind error.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoperf bpf: Defer freeing string after possible strlen() on it
Arnaldo Carvalho de Melo [Wed, 16 Feb 2022 19:01:00 +0000 (16:01 -0300)]
perf bpf: Defer freeing string after possible strlen() on it

This was detected by the gcc in Fedora Rawhide's gcc:

  50    11.01 fedora:rawhide                : FAIL gcc version 12.0.1 20220205 (Red Hat 12.0.1-0) (GCC)
        inlined from 'bpf__config_obj' at util/bpf-loader.c:1242:9:
    util/bpf-loader.c:1225:34: error: pointer 'map_opt' may be used after 'free' [-Werror=use-after-free]
     1225 |                 *key_scan_pos += strlen(map_opt);
          |                                  ^~~~~~~~~~~~~~~
    util/bpf-loader.c:1223:9: note: call to 'free' here
     1223 |         free(map_name);
          |         ^~~~~~~~~~~~~~
    cc1: all warnings being treated as errors

So do the calculations on the pointer before freeing it.

Fixes: 04f9bf2bac72480c ("perf bpf-loader: Add missing '*' for key_scan_pos")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Link: https://lore.kernel.org/lkml/Yg1VtQxKrPpS3uNA@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoMerge tag 'amd-drm-fixes-5.17-2022-02-16' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Thu, 17 Feb 2022 09:06:07 +0000 (19:06 +1000)]
Merge tag 'amd-drm-fixes-5.17-2022-02-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.17-2022-02-16:

amdgpu:
- Stable pstate clock fixes for Dimgrey Cavefish and Beige Goby
- S0ix SDMA fix
- Yellow Carp GPU reset fix

radeon:
- Backlight fix for iMac 12,1

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220217035242.8084-1-alexander.deucher@amd.com
2 years agoASoC: intel: skylake: Set max DMA segment size
Takashi Iwai [Tue, 15 Feb 2022 13:27:56 +0000 (14:27 +0100)]
ASoC: intel: skylake: Set max DMA segment size

The recent code refactoring to use the standard DMA helper requires
the max DMA segment size setup for SG list management.  Without it,
the kernel may spew warnings when a large buffer is allocated.

This patch sets up dma_set_max_seg_size() for avoiding spurious
warnings.

Fixes: 2c95b92ecd92 ("ALSA: memalloc: Unify x86 SG-buffer handling (take#3)")
Acked-by: Cezary Rojewski <cezary.rojewski@intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>
BugLink: https://github.com/thesofproject/linux/issues/3430
Link: https://lore.kernel.org/r/20220215132756.31236-4-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2 years agoASoC: SOF: hda: Set max DMA segment size
Takashi Iwai [Tue, 15 Feb 2022 13:27:55 +0000 (14:27 +0100)]
ASoC: SOF: hda: Set max DMA segment size

The recent code refactoring to use the standard DMA helper requires
the max DMA segment size setup for SG list management. Without it,
the kernel may spew warnings when a large buffer is allocated.

This patch sets up dma_set_max_seg_size() for avoiding spurious
warnings.

Fixes: 2c95b92ecd92 ("ALSA: memalloc: Unify x86 SG-buffer handling (take#3)")
Acked-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>
BugLink: https://github.com/thesofproject/linux/issues/3430
Link: https://lore.kernel.org/r/20220215132756.31236-3-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2 years agoALSA: hda: Set max DMA segment size
Takashi Iwai [Tue, 15 Feb 2022 13:27:54 +0000 (14:27 +0100)]
ALSA: hda: Set max DMA segment size

The recent code refactoring to use the standard DMA helper requires
the max DMA segment size setup for SG list management. Without it,
the kernel may spew warnings when a large buffer is allocated.

This patch sets up dma_set_max_seg_size() for avoiding spurious
warnings.

Fixes: 2c95b92ecd92 ("ALSA: memalloc: Unify x86 SG-buffer handling (take#3)")
Cc: <stable@vger.kernel.org>
BugLink: https://github.com/thesofproject/linux/issues/3430
Link: https://lore.kernel.org/r/20220215132756.31236-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>