Jerry Huang [Wed, 14 Mar 2012 09:08:28 +0000 (17:08 +0800)]
powerpc/85xx: add P1020UTM-PC platform support
The p1020utm-pc has the similar feature as the p1020rdb.
Therefore, p1020utm-pc use the same platform file as the p1/p2 rdb board.
Overview of P1020UTM-PC platform:
- DDR3 1GB
- NOR flash 32MB
- I2C EEPROM 256Kb
- eTSEC1 (RGMII PHY Atheros AR8021)
- eTSEC2 (SGMII PHY Vitesse VSC8221)
- eTSEC3 (RGMII PHY Atheros AR8021)
- SDHC
- 2 USB ports
- PCIe (Lane1 to dual SATA controller)
Signed-off-by: Jerry Huang <Chang-Ming.Huang@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Jerry Huang [Wed, 14 Mar 2012 09:08:27 +0000 (17:08 +0800)]
powerpc/85xx: add P1020MBG-PC platform support
The p1020mbg-pc has the similar feature as the p1020rdb.
Therefore, p1020mbg-pc use the same platform file as the p1/p2 rdb board.
Overview of P1020MBG-PC platform:
- DDR3 2GB
- NOR flash 64MB
- I2C EEPROM 256Kb
- eTSEC1 (RGMII PHY) connected to VSC7385 L2 switch
- eTSEC2 (SGMII PHY)
- eTSEC3 (RGMII PHY)
- SDHC
- 2 USB ports
- 4 TDM ports
- PCIe (Lane1 to dual SATA controller)
Signed-off-by: Jerry Huang <Chang-Ming.Huang@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Prabhakar Kushwaha [Thu, 15 Mar 2012 05:34:23 +0000 (11:04 +0530)]
NAND Machine support for Integrated Flash Controller
Integrated Flash Controller(IFC) can be used to hook NAND Flash
chips using NAND Flash Machine available on it.
Signed-off-by: Dipen Dudhat <Dipen.Dudhat@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Liu Shuo <b35362@freescale.com>
Signed-off-by: Poonam Aggrwal <poonam.aggrwal@freescale.com>
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Jia Hongtao [Tue, 21 Feb 2012 02:11:23 +0000 (10:11 +0800)]
powerpc/85xx: Clean up partition nodes in dts for MPC8572DS
Signed-off-by: Jin Qing <b24347@freescale.com>
Signed-off-by: Jia Hongtao <B38951@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Timur Tabi [Thu, 16 Feb 2012 00:25:47 +0000 (18:25 -0600)]
powerpc/85xx: p1022ds: disable the NOR flash node if video is enabled
The Freescale P1022 has a unique pin muxing "feature" where the DIU video
controller's video signals are muxed with 24 of the local bus address signals.
When the DIU is enabled, the bulk of the local bus is disabled, preventing
access to memory-mapped devices like NOR flash and the pixis FPGA.
Therefore, if the DIU is going to be enabled, then memory-mapped devices on
the localbus, like NOR flash, need to be disabled.
This also means that the localbus is not a 'simple-bus' any more, so remove
that string from the compatible node.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Timur Tabi [Thu, 16 Feb 2012 00:25:48 +0000 (18:25 -0600)]
powerpc/85xx: create 32-bit DTS for the P1022DS
Create a 32-bit address space version of p1022ds.dts. To avoid confusion,
p1022ds.dts is renamed to p1022ds_36b.dts. We also create p1022ds.dtsi
to store some common nodes.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Xie Xiaobo [Tue, 17 Jan 2012 09:59:51 +0000 (17:59 +0800)]
powerpc/85xx: Add magic-packet properties for etsec
The properties indicates that the hardware supports waking up via magic
packet.
Signed-off-by: Xie Xiaobo <X.Xie@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Xie Xiaobo [Tue, 17 Jan 2012 09:59:50 +0000 (17:59 +0800)]
powerpc/85xx: Add some DTS nodes and attributes for mpc8536ds
Add partitions for NOR and NAND Flash.
Signed-off-by: Xie Xiaobo <X.Xie@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Liu Shuo [Thu, 8 Mar 2012 22:47:37 +0000 (14:47 -0800)]
powerpc/fsl_msi: return proper error value when ioremap failed.
Signed-off-by: Liu Shuo <soniccat.liu@gmail.com>
Acked-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Gustavo Zacarias [Tue, 28 Feb 2012 19:43:08 +0000 (16:43 -0300)]
powerpc/85xx: fix typo in p1010rdb.dtsi
Fix typo introduced by "powerpc: Add TBI PHY node to first MDIO bus"
from Andy Fleming.
It's device_type rather than device-type, which causes the mdio probe to
fail thus making all gianfar ethernet interfaces unusable.
Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Sebastian Andrzej Siewior [Thu, 15 Mar 2012 17:40:28 +0000 (18:40 +0100)]
powerpc/85xx: p2020rdb & p1010rdb - lower spi flash freq to 40Mhz
This is here most likely since the FSL bsp. Back in the FSL bsp it was
set to 50Mhz and working. However the driver divided the SoC freq. only
by 2. According to the TRM the platform clock (which the manual refers
in its formula) is the system clock divided by two. So in the end it has
to divide by 4 and this is what the fsl-spi driver in tree is doing.
Since then the flash is not wokring I guess. After chaning the freq from
50Mhz to 40Mhz like others do then I can access the flash.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Sebastian Andrzej Siewior [Thu, 15 Mar 2012 17:40:27 +0000 (18:40 +0100)]
powerpc/85xx: p2020rdb - move the NAND address.
It is not at 0xffa00000. According to current u-boot source the NAND
controller is always at 0xff800000 and it is either at CS0 or CS1
depending on NAND or NAND+NOR mode. In 36bit mode it is shifted to
0xfff800000 but it has always an eight there and never an A.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Liu Gang [Fri, 9 Mar 2012 08:10:38 +0000 (16:10 +0800)]
powerpc/srio: Fix the compile errors when building with 64bit
For the file "arch/powerpc/sysdev/fsl_rmu.c", there will be some compile
errors while using the corenet64_smp_defconfig:
.../fsl_rmu.c:315: error: cast from pointer to integer of different size
.../fsl_rmu.c:320: error: cast to pointer from integer of different size
.../fsl_rmu.c:320: error: cast to pointer from integer of different size
.../fsl_rmu.c:320: error: cast to pointer from integer of different size
.../fsl_rmu.c:330: error: cast to pointer from integer of different size
.../fsl_rmu.c:332: error: cast to pointer from integer of different size
.../fsl_rmu.c:339: error: cast to pointer from integer of different size
.../fsl_rmu.c:340: error: cast to pointer from integer of different size
.../fsl_rmu.c:341: error: cast to pointer from integer of different size
.../fsl_rmu.c:348: error: cast to pointer from integer of different size
.../fsl_rmu.c:348: error: cast to pointer from integer of different size
.../fsl_rmu.c:348: error: cast to pointer from integer of different size
.../fsl_rmu.c:659: error: cast from pointer to integer of different size
.../fsl_rmu.c:659: error: format '%8.8x' expects type 'unsigned int',
but argument 5 has type 'size_t'
.../fsl_rmu.c:985: error: cast from pointer to integer of different size
.../fsl_rmu.c:997: error: cast to pointer from integer of different size
Rewrote the corresponding code with the support of 64bit building.
Signed-off-by: Liu Gang <Gang.Liu@freescale.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Liu Gang [Tue, 6 Mar 2012 02:58:12 +0000 (10:58 +0800)]
powerpc/srio: Fix the relocation errors when building with 64bit
For the file "arch/powerpc/sysdev/fsl_rio.c", there will be some relocation
errors while using the corenet64_smp_defconfig:
WARNING: modpost: Found 6 section mismatch(es).
To see full details build your kernel with:
'make CONFIG_DEBUG_SECTION_MISMATCH=y'
GEN .version
CHK include/generated/compile.h
UPD include/generated/compile.h
CC init/version.o
LD init/built-in.o
LD .tmp_vmlinux1
arch/powerpc/sysdev/built-in.o:(__ex_table+0x0):
relocation truncated to fit: R_PPC64_ADDR16 against `.text'+3208
arch/powerpc/sysdev/built-in.o:(__ex_table+0x2):
relocation truncated to fit: R_PPC64_ADDR16 against `.fixup'
arch/powerpc/sysdev/built-in.o:(__ex_table+0x4):
relocation truncated to fit: R_PPC64_ADDR16 against `.text'+3230
arch/powerpc/sysdev/built-in.o:(__ex_table+0x6):
relocation truncated to fit: R_PPC64_ADDR16 against `.fixup'+c
arch/powerpc/sysdev/built-in.o:(__ex_table+0x8):
relocation truncated to fit: R_PPC64_ADDR16 against `.text'+3250
arch/powerpc/sysdev/built-in.o:(__ex_table+0xa):
relocation truncated to fit: R_PPC64_ADDR16 against `.fixup'+18
Rewrote the corresponding code with the support of 64bit building.
Signed-off-by: Liu Gang <Gang.Liu@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Zhicheng Fan [Mon, 13 Feb 2012 22:06:23 +0000 (22:06 +0000)]
powerpc/85xx: Add dts for p1025rdb board
P1025RDB Overview
------------------
1Gbyte DDR3 SDRAM
32 Mbyte NAND flash
16Mbyte NOR flash
16 Mbyte SPI flash
SD connector to interface with the SD memory card
Real-time clock on I2C bus
PCIe:
- x1 PCIe slot
- x1 mini-PCIe slot
10/100/1000 BaseT Ethernet ports:
- eTSEC1, RGMII: one 10/100/1000 port using AtherosTM AR8021
- eTSEC2, SGMII: one 10/100/1000 port using VitesseTM VSC8221
- eTSEC3, RGMII: one 10/100/1000 port using AtherosTM AR8021
USB 2.0 port:
- Two USB2.0 Type A receptacles
- One USB2.0 signal to Mini PCIe slot
Dual RJ45 UART ports:
- DUART interface: supports two UARTs up to 115200 bps for console display
Signed-off-by: Zhicheng Fan <b32736@freescale.com>
Acked-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Zhicheng Fan [Mon, 13 Feb 2012 22:06:22 +0000 (22:06 +0000)]
powerpc/85xx: Add p1025rdb platform support
Signed-off-by: Zhicheng Fan <b32736@freescale.com>
Acked-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Ramneek Mehresh [Wed, 18 Jan 2012 05:40:48 +0000 (11:10 +0530)]
powerpc/85xx: Add usb controller version info
Add usb controller version info for the following:
MPC8536, P1010, P1020, P1021, P1022, P1023, P2020, P2041,
P3041, P3060, P5020
Signed-off-by: Ramneek Mehresh <ramneek.mehresh@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Tang Yuantian [Thu, 9 Feb 2012 21:59:57 +0000 (21:59 +0000)]
powerpc/85xx: Add p2020rdb-pc dts support
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Signed-off-by: Poonam Aggrwal <poonam.aggrwal@freescale.com>
Signed-off-by: Tang Yuantian <b29983@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Tang Yuantian [Wed, 28 Dec 2011 03:41:47 +0000 (11:41 +0800)]
powerpc/85xx: Adds Support for P2020RDB-PC board
P2020RDB-PC Board shares the same design(PCB) as P102x RDB style platforms.
The difference between this platform and the already existing P2020RDB
is mainly with respect to DDR. The P2020RDB-PC has a DDR3 memory.
The P2020RDB-PC also has a CPLD device connected to local bus.
The main differences from the P102x RDB-PC is 64-bit DDR and SYSCLK of
100Mhz.
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Signed-off-by: Poonam Aggrwal <poonam.aggrwal@freescale.com>
Signed-off-by: Tang Yuantian <b29983@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Xu Jiucheng [Tue, 17 Jan 2012 08:01:30 +0000 (16:01 +0800)]
powerpc/85xx: Added P1021RDB-PC Platform support
Signed-off-by: Xu Jiucheng <B37781@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Xu Jiucheng [Tue, 17 Jan 2012 08:01:29 +0000 (16:01 +0800)]
powerpc/85xx: Added dts for P1021RDB-PC board
P1021RDB-PC Overview
-----------------
1Gbyte DDR3 (on board DDR)
16Mbyte NOR flash
32Mbyte eSLC NAND Flash
256 Kbit M24256 I2C EEPROM
128 Mbit SPI Flash memory
Real-time clock on I2C bus
SD/MMC connector to interface with the SD memory card
PCIex
- x1 PCIe slot or x1 PCIe to dual SATA controller
- x1 mini-PCIe slot
USB 2.0
- ULPI PHY interface: SMSC USB3300 USB PHY and Genesys Logic’s GL850A
- Two USB2.0 Type A receptacles
- One USB2.0 signal to Mini PCIe slot
eTSEC1: Connected to RGMII PHY VSC7385
eTSEC2: Connected to SGMII PHY VSC8221
eTSEC3: Connected to SGMII PHY AR8021
DUART interface: supports two UARTs up to 115200 bps for console display
Signed-off-by: Matthew McClintock <msm@freescale.com>
Signed-off-by: Xu Jiucheng <B37781@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Kumar Gala [Sun, 6 Nov 2011 17:51:07 +0000 (11:51 -0600)]
powerpc: Add initial e6500 cpu support
Add basic support for e6500 core in its single threaded mode.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Kumar Gala [Thu, 5 Jan 2012 18:37:16 +0000 (12:37 -0600)]
powerpc/fsl-booke: Fixup calc_cam_sz to support MMU v2
The registers that describe size supported by TLB are different on MMU
v2 as well as we support power of two page sizes. For now we continue
to assume that FSL variable size array supports all page sizes up to the
maximum one reported in TLB1PS.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Paul Gortmaker [Fri, 20 Jan 2012 01:23:20 +0000 (20:23 -0500)]
powerpc/85xx: fix Kconfig warning about missing 8250 dependency
The SERIAL_8250_EXTENDED option just enables access to other
less regularly used options, like SERIAL_8250_SHARE_IRQ.
Select it to get rid of this warning when selecting the child
option living underneath it.
warning: (FSL_SOC_BOOKE && SERIAL_8250_RM9K) selects
SERIAL_8250_SHARE_IRQ which has unmet direct dependencies
(HAS_IOMEM && SERIAL_8250_EXTENDED)
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 12 Mar 2012 23:15:35 +0000 (10:15 +1100)]
Merge branch 'eeh' into next
Benjamin Herrenschmidt [Tue, 6 Mar 2012 07:27:59 +0000 (18:27 +1100)]
powerpc: Rework lazy-interrupt handling
The current implementation of lazy interrupts handling has some
issues that this tries to address.
We don't do the various workarounds we need to do when re-enabling
interrupts in some cases such as when returning from an interrupt
and thus we may still lose or get delayed decrementer or doorbell
interrupts.
The current scheme also makes it much harder to handle the external
"edge" interrupts provided by some BookE processors when using the
EPR facility (External Proxy) and the Freescale Hypervisor.
Additionally, we tend to keep interrupts hard disabled in a number
of cases, such as decrementer interrupts, external interrupts, or
when a masked decrementer interrupt is pending. This is sub-optimal.
This is an attempt at fixing it all in one go by reworking the way
we do the lazy interrupt disabling from the ground up.
The base idea is to replace the "hard_enabled" field with a
"irq_happened" field in which we store a bit mask of what interrupt
occurred while soft-disabled.
When re-enabling, either via arch_local_irq_restore() or when returning
from an interrupt, we can now decide what to do by testing bits in that
field.
We then implement replaying of the missed interrupts either by
re-using the existing exception frame (in exception exit case) or via
the creation of a new one from an assembly trampoline (in the
arch_local_irq_enable case).
This removes the need to play with the decrementer to try to create
fake interrupts, among others.
In addition, this adds a few refinements:
- We no longer hard disable decrementer interrupts that occur
while soft-disabled. We now simply bump the decrementer back to max
(on BookS) or leave it stopped (on BookE) and continue with hard interrupts
enabled, which means that we'll potentially get better sample quality from
performance monitor interrupts.
- Timer, decrementer and doorbell interrupts now hard-enable
shortly after removing the source of the interrupt, which means
they no longer run entirely hard disabled. Again, this will improve
perf sample quality.
- On Book3E 64-bit, we now make the performance monitor interrupt
act as an NMI like Book3S (the necessary C code for that to work
appear to already be present in the FSL perf code, notably calling
nmi_enter instead of irq_enter). (This also fixes a bug where BookE
perfmon interrupts could clobber r14 ... oops)
- We could make "masked" decrementer interrupts act as NMIs when doing
timer-based perf sampling to improve the sample quality.
Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
v2:
- Add hard-enable to decrementer, timer and doorbells
- Fix CR clobber in masked irq handling on BookE
- Make embedded perf interrupt act as an NMI
- Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
to retrigger an interrupt without preventing hard-enable
v3:
- Fix or vs. ori bug on Book3E
- Fix enabling of interrupts for some exceptions on Book3E
v4:
- Fix resend of doorbells on return from interrupt on Book3E
v5:
- Rebased on top of my latest series, which involves some significant
rework of some aspects of the patch.
v6:
- 32-bit compile fix
- more compile fixes with various .config combos
- factor out the asm code to soft-disable interrupts
- remove the C wrapper around preempt_schedule_irq
v7:
- Fix a bug with hard irq state tracking on native power7
Gavin Shan [Mon, 27 Feb 2012 20:04:11 +0000 (20:04 +0000)]
powerpc/eeh: pseries platform config space access in EEH
With the original EEH implementation, the access to config space of
the corresponding PCI device is done by RTAS sensitive function. That
depends on pci_dn heavily. That would limit EEH extension to other
platforms like powernv because other platforms might have different
ways to access PCI config space.
The patch splits those functions used to access PCI config space
and implement them in platform related EEH component. It would be
helpful to support EEH on multiple platforms simutaneously in future.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Wed, 29 Feb 2012 15:47:45 +0000 (15:47 +0000)]
powerpc/eeh: Introduce struct eeh_stats for EEH
With the original EEH implementation, the EEH global statistics
are maintained by individual global variables. That makes the
code a little hard to maintain.
The patch introduces extra struct eeh_stats for the EEH global
statistics so that it can be maintained in collective fashion.
It's the rework on the corresponding v5 patch. According to
the comments from David Laight, the EEH global statistics have
been changed for a litte bit so that they have fixed-type of
"u64". Also, the format used to print them has been changed to
"%llu" based on David's suggestion. Also, the output format of
EEH global statistics should be kept as intacted according to
Michael's suggestion that there might be tools parsing them.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:04:09 +0000 (20:04 +0000)]
powerpc/eeh: Replace pci_dn with eeh_dev for EEH on pSeries
The pci_dn has been replaced with eeh_dev. In order to comply with
the rule, the EEH platform implementation on pSeries should also
be adjusted for a little bit so that it will depend on eeh_dev instead
of pci_dn.
The patch replaces pci_dn with eeh_dev. The corresponding information
will be retrieved from eeh_dev instead of pci_dn.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:04:08 +0000 (20:04 +0000)]
powerpc/eeh: Replace pci_dn with eeh_dev for EEH aux components
The original EEH implementation is heavily depending on struct pci_dn.
We have to put EEH related information to pci_dn. Actually, we could
split struct pci_dn so that the EEH sensitive information to form an
individual struct, then EEH looks more independent.
The patch replaces pci_dn with eeh_dev for EEH aux components like
event and driver. Also, the eeh_event struct has been adjusted for
a little bit since eeh_dev has linked the associated FDT (Flat Device
Tree) node and PCI device. It's not necessary for eeh_event struct to
trace FDT node and PCI device. We can just simply to trace eeh_dev in
eeh_event.
The patch also renames function pcid_name() to eeh_pcid_name(), which
should be missed in the previous patch where the EEH aux components
have been cleaned up.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:04:07 +0000 (20:04 +0000)]
powerpc/eeh: Replace pci_dn with eeh_dev for EEH core
The original EEH implementation is heavily depending on struct pci_dn.
We have to put EEH related information to pci_dn. Actually, we could
split struct pci_dn so that the EEH sensitive information to form an
individual struct, then EEH looks more independent.
The patch replaces pci_dn with eeh_dev for EEH core.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:04:06 +0000 (20:04 +0000)]
powerpc/eeh: Replace pci_dn with eeh_dev for EEH address cache
With original EEH implementation, struct pci_dn is used while building
PCI I/O address cache, which helps on searching the corresponding
PCI device according to the given physical I/O address. Besides, pci_dn
is associated with the corresponding PCI device while building its
I/O cache.
The patch replaces struct pci_dn with struct eeh_dev so that EEH address
cache won't depend on struct pci_dn. That will help EEH to become an
independent module in future. Besides, the binding of eeh_dev and PCI
device is done while building PCI device I/O cache.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:04:05 +0000 (20:04 +0000)]
powerpc/eeh: Replace pci_dn with eeh_dev for EEH sysfs
With original EEH implementation, all EEH related statistics have
been put into struct pci_dn. We've introduced struct eeh_dev to
replace struct pci_dn in EEH core components, including EEH sysfs
component.
The patch shows EEH statistics from struct eeh_dev instead of struct
pci_dn in EEH sysfs component. Besides, it also fixed the EEH device
retrieval from PCI device, which was introduced by the previous patch
in the series of patch.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:04:04 +0000 (20:04 +0000)]
powerpc/eeh: Introduce EEH device
Original EEH implementation depends on struct pci_dn heavily. However,
EEH shouldn't depend on that actually because EEH needn't share much
information with other PCI components. That's to say, EEH should have
worked independently.
The patch introduces struct eeh_dev so that EEH core components needn't
be working based on struct pci_dn in future. Also, struct pci_dn, struct
eeh_dev instances are created in dynamic fasion and the binding with EEH
device, OF node, PCI device is implemented as well.
The EEH devices are created after PHBs are detected and initialized, but
PCI emunation hasn't started yet. Apart from that, PHB might be created
dynamically through DLPAR component and the EEH devices should be creatd
as well. Another case might be OF node is created dynamically by DR
(Dynamic Reconfiguration), which has been defined by PAPR. For those OF
nodes created by DR, EEH devices should be also created accordingly. The
binding between EEH device and OF node is done while the EEH device is
initially created.
The binding between EEH device and PCI device should be done after PCI
emunation is done. Besides, PCI hotplug also needs the binding so that
the EEH devices could be traced from the newly coming PCI buses or PCI
devices.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:04:03 +0000 (20:04 +0000)]
powerpc/eeh: Cleanup function names in EEH aux components
The patch does some cleanup on the function names of EEH
aux components. Currently, only couple of function names from
eeh_cache have been adjusted so that:
* The function name has prefix "eeh_addr_cache".
* Move around pci_addr_cache_build() in the header file
to reflect function call sequence.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:04:02 +0000 (20:04 +0000)]
powerpc/pseries: Cleanup comments in EEH aux components
There're several EEH aux components and the patch does some cleanup
for them so that they look more clean.
* Duplicated comments have been removed from the header file.
* Comments have been reorganized so that it looks more clean.
* The leading comments of functions are adjusted for a little
bit so that the result of "make pdfdocs" would be more
unified.
* Function calls "xxx ()" has been replaced by "xxx()".
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:04:01 +0000 (20:04 +0000)]
powerpc/eeh: pseries platform EEH configure bridge
In order to enable particular PCI device, which has been included
in the parent PE. The involved PCI bridges should be enabled explicitly
if there has. On pSeries platform, there're dedicated RTAS calls
to fulfil the purpose.
The patch implements the function of configuring PCI bridges through
the dedicated RTAS calls. Besides, the function has been abstracted
by struct eeh_ops::configure_bridge so that the EEH core components
could support multiple platforms in future.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:04:00 +0000 (20:04 +0000)]
powerpc/eeh: pseries platform EEH error log retrieval
On RTAS compliant pSeries platform, one dedicated RTAS call has
been introduced to retrieve EEH temporary or permanent error log.
The patch implements the function of retriving EEH error log through
RTAS call. Besides, it has been abstracted by struct eeh_ops::get_log
so that EEH core components could support multiple platforms in future.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:03:59 +0000 (20:03 +0000)]
powerpc/eeh: pseries platform EEH reset PE
On RTAS compliant pSeries platform, there is a dedicated RTAS call
(ibm,set-slot-reset) to reset the specified PE. Furthermore, two
types of resets are supported: hot and fundamental. the type of
reset is to be used actually depends on the included PCI device's
requirements.
The patch implements resetting PE on pSeries platform through RTAS
call. Besides, it has been abstracted through struct eeh_ops::reset
so that EEH core components could support multiple platforms in future.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:03:58 +0000 (20:03 +0000)]
powerpc/eeh: pseries platform EEH wait PE state
On pSeries platform, the PE state might be temporarily unavailable.
In that case, the firmware will return the corresponding wait time.
That means the kernel has to wait for appropriate time in order to
get the PE state.
The patch does the implementation for that. Besides, the function
has been abstracted through struct eeh_ops::wait_state so that EEH core
components could support multiple platforms in future.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:03:57 +0000 (20:03 +0000)]
powerpc/eeh: pseries platform PE state retrieval
On pSeries platform, there're 2 dedicated RTAS calls introduced to
retrieve the corresponding PE's state: ibm,read-slot-reset-state and
ibm,read-slot-reset-state2.
The patch implements the retrieval of PE's state according to the
given PE address. Besides, the implementation has been abstracted by
struct eeh_ops::get_state so that EEH core components could support
multiple platforms in future.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:03:56 +0000 (20:03 +0000)]
powerpc/eeh: pseries platform EEH PE address retrieval
There're 2 types of addresses used for EEH operations. The first
one would be BDF (Bus/Device/Function) address which is retrieved
from the reg property of the corresponding FDT node. Another one
is PE address that should be enquired from firmware through RTAS
call on pSeries platform. When issuing EEH operation, the PE address
has precedence over BDF address.
The patch implements retrieving PE address according to the given
BDF address on pSeries platform. Also, the struct eeh_early_enable_info
has been removed since the information can be figured out from
dn->pdn->phb->buid directly and that simplifies the code.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:03:55 +0000 (20:03 +0000)]
powerpc/eeh: pseries platform EEH operations
There're 4 EEH operations that are covered by the dedicated RTAS
call <ibm,set-eeh-option>: enable or disable EEH, enable MMIO and
enable DMA. At early stage of system boot, the EEH would be tried
to enable on PCI device related device node. MMIO and DMA for
particular PE should be enabled when doing recovery on EEH errors
so that the PE could function properly again.
The patch implements it and abstract that through struct
eeh_ops::set_eeh. It would be help for EEH to support multiple
platforms in future.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:03:54 +0000 (20:03 +0000)]
powerpc/eeh: pseries platform EEH initialization
The platform specific EEH operations have been abstracted by
struct eeh_ops. The individual platroms, including pSeries, needs
doing necessary initialization before the platform dependent EEH
operations work properly.
The patch is addressing that and do necessary platform initialization
for pSeries platform. More specificly, it will figure out the tokens
of EEH related RTAS calls.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:03:53 +0000 (20:03 +0000)]
powerpc/eeh: Platform dependent EEH operations
EEH has been implemented on RTAS-compliant pSeries platform.
That's to say, the EEH operations will be implemented through RTAS
calls eventually. The situation limited feasible extension on EEH.
In order to support EEH on multiple platforms like pseries and powernv
simutaneously. We have to split the platform dependent EEH options
up out of current implementation.
The patch addresses supporting EEH on multiple platforms. The pseries
platform dependent EEH operations will be abstracted by struct eeh_ops.
EEH core components will be built based on the registered EEH operations.
With the mechanism, what the individual platform needs to do is implement
platform dependent EEH operations.
For now, the pseries platform is covered under the mechanism. That means
we have to think about other platforms to support EEH, like powernv.
Besides, we only have framework for the mechanism and we have to implement
it for pseries platform later.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:03:52 +0000 (20:03 +0000)]
powerpc/eeh: Cleanup function names in the EEH core
The EEH has been implemented on pSeries platform. The original
code looks a little bit nasty. The patch does cleanup on the
current EEH implementation so that it looks more clean.
* Try adding prefix "eeh" for functions.
* Some function names have been adjusted so that they looks
shorter and meaningful.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 27 Feb 2012 20:03:51 +0000 (20:03 +0000)]
powerpc/eeh: Cleanup comments in the EEH core
The EEH has been implemented on pSeries platform. The original
code looks a little bit nasty. The patch does cleanup on the
current EEH implementation so that it looks more clean.
* Duplicated comments have been removed from the corresponding
header files.
* Comments have been reorganized so that it looks more clean.
* The leading comments of functions are adjusted for a little
bit so that the result of "make pdfdocs" would be more
unified.
* Function definitions and calls have unified format as "xxx()".
That means the format "xxx ()" has been replaced by "xxx()".
* There're multiple functions implemented for resetting PE. The
position of those functions have been move around so that they
are adjacent to each other to reflect their relationship.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Fri, 2 Mar 2012 00:33:52 +0000 (11:33 +1100)]
powerpc: Replace mfmsr instructions with load from PACA kernel_msr field
On 64-bit, the mfmsr instruction can be quite slow, slower
than loading a field from the cache-hot PACA, which happens
to already contain the value we want in most cases.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Sun, 4 Mar 2012 23:55:04 +0000 (10:55 +1100)]
powerpc: Fix 64-bit BookE FP unavailable exceptions
We were using CR0.EQ after EXCEPTION_COMMON, hoping it still
contained whether we came from userspace or kernel space.
However, under some circumstances, EXCEPTION_COMMON will
call C code and clobber non-volatile registers, so we really
need to re-load the previous MSR from the stackframe and
re-test.
While there, invert the condition to make the fast path more
obvious and remove the BUG_OPCODE which was a debugging
leftover and call .ret_from_except as we should.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Fri, 2 Mar 2012 00:01:31 +0000 (11:01 +1100)]
powerpc: Fix register clobbering when accumulating stolen time
When running under a hypervisor that supports stolen time accounting,
we may call C code from the macro EXCEPTION_PROLOG_COMMON in the
exception entry path, which clobbers CR0.
However, the FPU and vector traps rely on CR0 indicating whether we
are coming from userspace or kernel to decide what to do.
So we need to restore that value after the C call
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 1 Mar 2012 23:10:09 +0000 (10:10 +1100)]
powerpc/xmon: Add display of soft & hard irq states
Also use local_paca instead of get_paca() to avoid getting into
the smp_processor_id() debugging code from the debugger
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 1 Mar 2012 07:14:45 +0000 (18:14 +1100)]
powerpc: Add support for page fault retry and fatal signals
Other architectures such as x86 and ARM have been growing
new support for features like retrying page faults after
dropping the mm semaphore to break contention, or being
able to return from a stuck page fault when a SIGKILL is
pending.
This refactors our implementation of do_page_fault() to
move the error handling out of line in a way similar to
x86 and adds support for those two features.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 1 Mar 2012 04:47:44 +0000 (15:47 +1100)]
powerpc: Disable interrupts in 64-bit kernel FP and vector faults
If we get a floating point, altivec or vsx unavaible interrupt in
kernel, we trigger a kernel error. There is no point preserving
the interrupt state, in fact, that can even make debugging harder
as the processor state might change (we may even preempt) between
taking the exception and landing in a debugger.
So just make those 3 disable interrupts unconditionally.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
v2: On BookE only disable when hitting the kernel unavailable
path, otherwise it will fail to restore softe as
fast_exception_return doesn't do it.
Benjamin Herrenschmidt [Wed, 7 Mar 2012 05:48:45 +0000 (16:48 +1100)]
powerpc: Call do_page_fault() with interrupts off
We currently turn interrupts back to their previous state before
calling do_page_fault(). This can be annoying when debugging as
a bad fault will potentially have lost some processor state before
getting into the debugger.
We also end up calling some generic code with interrupts enabled
such as notify_page_fault() with interrupts enabled, which could
be unexpected.
This changes our code to behave more like other architectures,
and make the assembly entry code call into do_page_faults() with
interrupts disabled. They are conditionally re-enabled from
within do_page_fault() in the same spot x86 does it.
While there, add the might_sleep() test in the case of a successful
trylock of the mmap semaphore, again like x86.
Also fix a bug in the existing assembly where r12 (_MSR) could get
clobbered by C calls (the DTL accounting in the exception common
macro and DISABLE_INTS) in some cases.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
v2. Add the r12 clobber fix
Benjamin Herrenschmidt [Thu, 1 Mar 2012 04:42:56 +0000 (15:42 +1100)]
powerpc: Improve behaviour of irq tracing on 64-bit exception entry
Some exceptions would unconditionally disable interrupts on entry,
which is fine, but calling lockdep every time not only adds more
overhead than strictly needed, but also means we get quite a few
"redudant" disable logged, which makes it hard to spot the really
bad ones.
So instead, split the macro used by the exception code into a
normal one and a separate one used when CONFIG_TRACE_IRQFLAGS is
enabled, and make the later skip th tracing if interrupts were
already disabled.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 1 Mar 2012 04:40:23 +0000 (15:40 +1100)]
powerpc: Improve 64-bit syscall entry/exit
We unconditionally hard enable interrupts. This is unnecessary as
syscalls are expected to always be called with interrupts enabled.
While at it, we add a WARN_ON if that is not the case and
CONFIG_TRACE_IRQFLAGS is enabled (we don't want to add overhead
to the fast path when this is not set though).
Thus let's remove the enabling (and associated irq tracing) from
the syscall entry path. Also on Book3S, replace a few mfmsr
instructions with loads of PACAMSR from the PACA, which should be
faster & schedule better.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 1 Mar 2012 01:45:27 +0000 (12:45 +1100)]
powerpc: Rework runlatch code
This moves the inlines into system.h and changes the runlatch
code to use the thread local flags (non-atomic) rather than
the TIF flags (atomic) to keep track of the latch state.
The code to turn it back on in an asynchronous interrupt is
now simplified and partially inlined.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Wed, 29 Feb 2012 23:52:01 +0000 (10:52 +1100)]
powerpc: Use the same interrupt prolog for perfmon as other interrupts
The perfmon interrupt is the sole user of a special variant of the
interrupt prolog which differs from the one used by external and timer
interrupts in that it saves the non-volatile GPRs and doesn't turn the
runlatch on.
The former is unnecessary and the later is arguably incorrect, so
let's clean that up by using the same prolog. While at it we rename
that prolog to use the _ASYNC prefix.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Tue, 28 Feb 2012 02:44:58 +0000 (13:44 +1100)]
powerpc: Remove legacy iSeries bits from assembly files
This removes the various bits of assembly in the kernel entry,
exception handling and SLB management code that were specific
to running under the legacy iSeries hypervisor which is no
longer supported.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stephen Rothwell [Wed, 7 Mar 2012 18:43:10 +0000 (18:43 +0000)]
powerpc: clean up vio.c
This cleans up vio.c after the removal of the legacy iSeries platform.
It also removes some no longer referenced include files.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stephen Rothwell [Wed, 7 Mar 2012 18:41:09 +0000 (18:41 +0000)]
driver-core: remove legacy iSeries hack
The PowerPC legacy iSeries plateform is being removed along with the
"one looney iseries driver", so this code can now be removed as well.
cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stephen Rothwell [Wed, 7 Mar 2012 18:39:31 +0000 (18:39 +0000)]
tty: powerpc: remove SERIAL_ICOM dependency on PPC_ISERIES
The PowerPC legacy iSeries platform is being removed so this is no
longer selectable.
Cc: Alan Cox <alan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-serial@vger.kernel.org
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stephen Rothwell [Wed, 7 Mar 2012 18:37:40 +0000 (18:37 +0000)]
tty: powerpc: remove hvc_iseries
The PowerPC legacy iSeries platform is being removed, so this code is no
longer needed.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stephen Rothwell [Wed, 7 Mar 2012 18:35:38 +0000 (18:35 +0000)]
powerpc: remove the legacy iSeries part of ibmvscsi
The PowerPC legacy iSeries platform is being removed and this code is
no longer selectable. There is more clean up that can be done, but this
just gets the old code out of the way.
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stephen Rothwell [Wed, 7 Mar 2012 18:33:53 +0000 (18:33 +0000)]
net: powerpc: remove the legacy iSeries ethernet driver
This driver is specific to the PowerPC legcay iSeries platform which is
being removed.
Cc: David Miller <davem@davemloft.net>
Cc: <netdev@vger.kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stephen Rothwell [Wed, 7 Mar 2012 17:02:07 +0000 (17:02 +0000)]
powerpc: Remove the main legacy iSerie platform code
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Akinobu Mita [Fri, 27 Jan 2012 04:24:48 +0000 (04:24 +0000)]
powerpc/pmac: Use string library in nvram code
- Use memchr_inv to check if the data contains all 0xFF bytes.
It is faster than looping for each byte.
- Use memcmp to compare memory areas
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Grant Likely [Mon, 30 Jan 2012 08:02:19 +0000 (08:02 +0000)]
powerpc: Make SPARSE_IRQ required
All IRQs on powerpc are managed via irq_domain anyway, there isn't really
any advantage to turning SPARSE_IRQ off, and it's the direction we want
to take the kernel design anyway. This patch makes powerpc always use
SPARSE_IRQ.
On pseries_defconfig, SPARSE_IRQ adds only about 0x300 bytes to the
.text sections, and removes about 0x20000 from the data section for the
static irq_desc table.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nishanth Aravamudan [Mon, 27 Feb 2012 08:55:15 +0000 (08:55 +0000)]
powerpc/prom: Remove limit on maximum size of properties
On a 16TB system (using AMS/CMO), I get:
WARNING: ignoring large property [/ibm,dynamic-reconfiguration-memory] ibm,dynamic-memory length 0x000000000017ffec
and significantly less memory is thus shown to the partition. As far as
I can tell, the constant used is arbitrary. Ben Herrenschmidt provided
additional background that
> The limit was originally set because of Apple machines carrying ROM
> images in the device-tree, at a time where we were much more memory
> constrained than we are now.
and that it is likely not very useful any longer.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Matt Fleming [Tue, 14 Feb 2012 01:40:59 +0000 (01:40 +0000)]
powerpc: Use set_current_blocked() and block_sigmask()
As described in
e6fa16ab ("signal: sigprocmask() should do
retarget_shared_pending()") the modification of current->blocked is
incorrect as we need to check whether the signal we're about to block
is pending in the shared queue.
Also, use the new helper function introduced in commit
5e6292c0f28f
("signal: add block_sigmask() for adding sigmask to current->blocked")
which centralises the code for updating current->blocked after
successfully delivering a signal and reduces the amount of duplicate
code across architectures. In the past some architectures got this
code wrong, so using this helper function should stop that from
happening again.
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Joe Perches [Tue, 28 Feb 2012 08:49:34 +0000 (08:49 +0000)]
powerpc: Use vsprintf extention %pf with builtin_return_address
Emit the function name not the address when possible.
builtin_return_address() gives an address. When building
a kernel with CONFIG_KALLSYMS, emit the actual function
name not the address.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Jimi Xenidis [Tue, 28 Feb 2012 13:27:07 +0000 (13:27 +0000)]
powerpc/icswx: Fix race condition with IPI setting ACOP
There is a race where a thread causes a coprocessor type to be valid
in its own ACOP _and_ in the current context, but it does not
propagate to the ACOP register of other threads in time for them to
use it. The original code tries to solve this by sending an IPI to
all threads on the system, which is heavy handed, but unfortunately
still provides a window where the icswx is issued by other threads and
the ACOP is not up to date.
This patch detects that the ACOP DSI fault was a "false positive" and
syncs the ACOP and causes the icswx to be replayed.
Signed-off-by: Jimi Xenidis <jimix@pobox.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Wed, 29 Feb 2012 21:12:16 +0000 (21:12 +0000)]
powerpc/atomic: Implement atomic*_inc_not_zero
Implement atomic_inc_not_zero and atomic64_inc_not_zero. At the
moment we use atomic*_add_unless which requires us to put 0 and
1 constants into registers. We can also avoid a subtract by
saving the original value in a second temporary.
This removes 3 instructions from fget:
-
c0000000001b63c0: 39 00 00 00 li r8,0
-
c0000000001b63c4: 39 40 00 01 li r10,1
...
-
c0000000001b63e8: 7c 0a 00 50 subf r0,r10,r0
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Wed, 29 Feb 2012 21:09:53 +0000 (21:09 +0000)]
atomic: Allow atomic_inc_not_zero to be overridden
We want to implement a ppc64 specific version of atomic_inc_not_zero
so wrap it in an ifdef to allow it to be overridden.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Ira Snyder [Thu, 26 Jan 2012 11:00:14 +0000 (11:00 +0000)]
carma-fpga: fix race between data dumping and DMA callback
When the system is under heavy load, we occasionally saw a problem where
the system would get a legitimate interrupt when they should be
disabled.
This was caused by the data_dma_cb() DMA callback unconditionally
re-enabling FPGA interrupts even when data dumping is disabled. When
data dumping was re-enabled, the irq handler would fire while a DMA was
in progress. The "BUG_ON(priv->inflight != NULL);" during the second
invocation of the DMA callback caused the system to crash.
To fix the issue, the priv->enabled boolean is moved under the
protection of the priv->lock spinlock. The DMA callback checks the
boolean to know whether to re-enable FPGA interrupts before it returns.
Now that it is fixed, the driver keeps FPGA interrupts disabled when it
expects that they are disabled, fixing the bug.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Ira Snyder [Thu, 26 Jan 2012 10:59:54 +0000 (10:59 +0000)]
carma-fpga: fix lockdep warning
Lockdep occasionally complains with the message:
INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
This is caused by calling videobuf_dma_unmap() under spin_lock_irq(). To
fix the warning, we drop the lock before unmapping and freeing the
buffer.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Masanari Iida [Mon, 23 Jan 2012 07:26:36 +0000 (07:26 +0000)]
macintosh: Fix typo in mediabay.c
Fix typo "unsuported" to "unsupported" in
drivers/machintosh/mediabay.c
Signed-off-by: Masanari Iida<standby24x7@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Danny Kukawka [Thu, 16 Feb 2012 03:56:03 +0000 (03:56 +0000)]
arch/powerpc/platforms/powernv/setup.c: included asm/xics.h twice
arch/powerpc/platforms/powernv/setup.c: included 'asm/xics.h' twice,
remove the duplicate.
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Danny Kukawka [Thu, 16 Feb 2012 03:55:54 +0000 (03:55 +0000)]
arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice
arch/powerpc/kvm/book3s_hv.c: included 'linux/sched.h' twice,
remove the duplicate.
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stephen Rothwell [Wed, 22 Feb 2012 14:10:12 +0000 (14:10 +0000)]
powerpc: remove CONFIG_PPC_ISERIES from the architecture Kconfig files
After this, we can remove the legacy iSeries code more easily.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Wed, 22 Feb 2012 13:50:13 +0000 (13:50 +0000)]
powerpc/mpic: Fix allocation of reverse-map for multi-ISU mpics
When using a multi-ISU MPIC, we can interrupts up to
isu_size * MPIC_MAX_ISU, not just isu_size, so allocate
the right size reverse map.
Without this, the code will constantly fallback to
a linear search.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Sun, 26 Feb 2012 23:50:11 +0000 (10:50 +1100)]
Merge remote-tracking branch 'origin/master' into next
Linus Torvalds [Sun, 26 Feb 2012 20:47:17 +0000 (12:47 -0800)]
Merge git://git./linux/kernel/git/davem/net
1) ICMP sockets leave err uninitialized but we try to return it for the
unsupported MSG_OOB case, reported by Dave Jones.
2) Add new Zaurus device ID entries, from Dave Jones.
3) Pointer calculation in hso driver memset is wrong, from Dan
Carpenter.
4) ks8851_probe() checks unsigned value as negative, fix also from Dan
Carpenter.
5) Fix crashes in atl1c driver due to TX queue handling, from Eric
Dumazet. I anticipate some TX side locking fixes coming in the near
future for this driver as well.
6) The inline directive fix in Bluetooth which was breaking the build
only with very new versions of GCC, from Johan Hedberg.
7) Fix crashes in the ATP CLIP code due to ARP cleanups this merge
window, reported by Meelis Roos and fixed by Eric Dumazet.
8) JME driver doesn't flush RX FIFO correctly, from Guo-Fu Tseng.
9) Some ip6_route_output() callers test the return value for NULL, but
this never happens as the convention is to return a dst entry with
dst->error set. Fixes from RonQing Li.
10) Logitech Harmony 900 should be handled by zaurus driver not
cdc_ether, update white lists and black lists accordingly. From
Scott Talbert.
11) Receiving from certain kinds of devices there won't be a MAC header,
so there is no MAC header to fixup in the IPSEC code, and if we try
to do it we'll crash. Fix from Eric Dumazet.
12) Port type array indexing off-by-one in mlx4 driver, fix from Yevgeny
Petrilin.
13) Fix regression in link-down handling in davinci_emac which causes
all RX descriptors to be freed up and therefore RX to wedge
completely, from Christian Riesch.
14) It took two attempts, but ctnetlink soft lockups seem to be
cured now, from Pablo Neira Ayuso.
15) Endianness bug fix in ENIC driver, from Santosh Nayak.
16) The long ago conversion of the PPP fragmentation code over to
abstracted SKB list handling wasn't perfect, once we get an
out of sequence SKB we don't flush the rest of them like we
should. From Ben McKeegan.
17) Fix regression of ->ip_summed initialization in sfc driver.
From Ben Hutchings.
18) Bluetooth timeout mistakenly using msecs instead of jiffies,
from Andrzej Kaczmarek.
19) Using _sync variant of work cancellation results in deadlocks,
use the non _sync variants instead. From Andre Guedes.
20) Bluetooth rfcomm code had reference counting problems leading
to crashes, fix from Octavian Purdila.
21) The conversion of netem over to classful qdisc handling added
two bugs to netem_dequeue(), fixes from Eric Dumazet.
22) Missing pci_iounmap() in ATM Solos driver. Fix from Julia Lawall.
23) b44_pci_exit() should not have __exit tag since it's invoked from
non-__exit code. From Nikola Pajkovsky.
24) The conversion of the neighbour hash tables over to RCU added a
race, fixed here by adding the necessary reread of tbl->nht, fix
from Michel Machado.
25) When we added VF (virtual function) attributes for network device
dumps, this potentially bloats up the size of the dump of one
network device such that the dump size is too large for the buffer
allocated by properly written netlink applications.
In particular, if you add 255 VFs to a network device, parts of
GLIBC stop working.
To fix this, we add an attribute that is used to turn on these
extended portions of the network device dump. Sophisticaed
applications like 'ip' that want to see this stuff will be changed
to set the attribute, whereas things like GLIBC that don't care
about VFs simply will not, and therefore won't be busted by the
mere presence of VFs on a network device.
Thanks to the tireless work of Greg Rose on this fix.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
sfc: Fix assignment of ip_summed for pre-allocated skbs
ppp: fix 'ppp_mp_reconstruct bad seq' errors
enic: Fix endianness bug.
gre: fix spelling in comments
netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
Revert "netfilter: ctnetlink: fix soft lockup when netlink adds new entries"
davinci_emac: Do not free all rx dma descriptors during init
mlx4_core: Fixing array indexes when setting port types
phy: IC+101G and PHY_HAS_INTERRUPT flag
netdev/phy/icplus: Correct broken phy_init code
ipsec: be careful of non existing mac headers
Move Logitech Harmony 900 from cdc_ether to zaurus
hso: memsetting wrong data in hso_get_count()
netfilter: ip6_route_output() never returns NULL.
ethernet/broadcom: ip6_route_output() never returns NULL.
ipv6: ip6_route_output() never returns NULL.
jme: Fix FIFO flush issue
atm: clip: remove clip_tbl
ipv4: ping: Fix recvmsg MSG_OOB error handling.
rtnetlink: Fix problem with buffer allocation
...
Linus Torvalds [Sun, 26 Feb 2012 17:44:55 +0000 (09:44 -0800)]
Fix autofs compile without CONFIG_COMPAT
The autofs compat handling fix caused a compile failure when
CONFIG_COMPAT isn't defined.
Instead of adding random #ifdef'fery in autofs, let's just make the
compat helpers earlier to use: without CONFIG_COMPAT, is_compat_task()
just hardcodes to zero.
We could probably do something similar for a number of other cases where
we have #ifdef's in code, but this is the low-hanging fruit.
Reported-and-tested-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 25 Feb 2012 20:18:16 +0000 (12:18 -0800)]
Linux 3.3-rc5
Linus Torvalds [Sat, 25 Feb 2012 20:12:08 +0000 (12:12 -0800)]
Merge tag 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/linux-staging
Couple of minor driver fixes.
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (max34440) Fix resetting temperature history
hwmon: (f75375s) Fix register write order when setting fans to full speed
hwmon: (ads1015) Fix file leak in probe function
hwmon: (max6639) Fix PPR register initialization to set both channels
hwmon: (max6639) Fix FAN_FROM_REG calculation
Linus Torvalds [Sat, 25 Feb 2012 20:11:25 +0000 (12:11 -0800)]
Merge branch 'rc-fixes' of git://git./linux/kernel/git/mmarek/kbuild
three kbuild fixes for 3.3:
- make deb-pkg symlink race fix.
- make coccicheck fix.
- Dropping the check for modutils. This is not a regression, but
allows the module-init-tools replacement kmod work with the 3.3
kernel.
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
coccicheck: change handling of C={1,2} when M= is set
builddeb: Don't create files in /tmp with predictable names
kbuild: do not check for ancient modutils tools
Ian Kent [Wed, 22 Feb 2012 12:45:44 +0000 (20:45 +0800)]
autofs: work around unhappy compat problem on x86-64
When the autofs protocol version 5 packet type was added in commit
5c0a32fc2cd0 ("autofs4: add new packet type for v5 communications"), it
obvously tried quite hard to be word-size agnostic, and uses explicitly
sized fields that are all correctly aligned.
However, with the final "char name[NAME_MAX+1]" array at the end, the
actual size of the structure ends up being not very well defined:
because the struct isn't marked 'packed', doing a "sizeof()" on it will
align the size of the struct up to the biggest alignment of the members
it has.
And despite all the members being the same, the alignment of them is
different: a "__u64" has 4-byte alignment on x86-32, but native 8-byte
alignment on x86-64. And while 'NAME_MAX+1' ends up being a nice round
number (256), the name[] array starts out a 4-byte aligned.
End result: the "packed" size of the structure is 300 bytes: 4-byte, but
not 8-byte aligned.
As a result, despite all the fields being in the same place on all
architectures, sizeof() will round up that size to 304 bytes on
architectures that have 8-byte alignment for u64.
Note that this is *not* a problem for 32-bit compat mode on POWER, since
there __u64 is 8-byte aligned even in 32-bit mode. But on x86, 32-bit
and 64-bit alignment is different for 64-bit entities, and as a result
the structure that has exactly the same layout has different sizes.
So on x86-64, but no other architecture, we will just subtract 4 from
the size of the structure when running in a compat task. That way we
will write the properly sized packet that user mode expects.
Not pretty. Sadly, this very subtle, and unnecessary, size difference
has been encoded in user space that wants to read packets of *exactly*
the right size, and will refuse to touch anything else.
Reported-and-tested-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 25 Feb 2012 04:03:14 +0000 (20:03 -0800)]
Merge tag 'rdma-for-linus' of git://git./linux/kernel/git/roland/infiniband
One InfiniBand/RDMA regression fix for 3.3:
- mlx4 SR-IOV changes added static exported functions, which doesn't
build on powerpc at least. Fix from Doug Ledford for this.
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
mlx4_core: Exported functions can't be static
David S. Miller [Sat, 25 Feb 2012 03:12:44 +0000 (22:12 -0500)]
Merge branch 'sfc-3.3' of git://git./linux/kernel/git/bwh/sfc
Ben Hutchings [Sat, 25 Feb 2012 00:03:10 +0000 (00:03 +0000)]
sfc: Fix assignment of ip_summed for pre-allocated skbs
When pre-allocating skbs for received packets, we set ip_summed =
CHECKSUM_UNNCESSARY. We used to change it back to CHECKSUM_NONE when
the received packet had an incorrect checksum or unhandled protocol.
Commit
bc8acf2c8c3e43fcc192762a9f964b3e9a17748b ('drivers/net: avoid
some skb->ip_summed initializations') mistakenly replaced the latter
assignment with a DEBUG-only assertion that ip_summed ==
CHECKSUM_NONE. This assertion is always false, but it seems no-one
has exercised this code path in a DEBUG build.
Fix this by moving our assignment of CHECKSUM_UNNECESSARY into
efx_rx_packet_gro().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Linus Torvalds [Sat, 25 Feb 2012 00:08:51 +0000 (16:08 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi-rc-fixes-2.6
SCSI fixes on
20120224:
"This is a set of assorted bug fixes for power management, mpt2sas,
ipr, the rdac device handler and quite a big chunk for qla2xxx (plus a
use after free of scsi_host in scsi_scan.c). "
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
[SCSI] scsi_dh_rdac: Fix for unbalanced reference count
[SCSI] scsi_pm: Fix bug in the SCSI power management handler
[SCSI] scsi_scan: Fix 'Poison overwritten' warning caused by using freed 'shost'
[SCSI] qla2xxx: Update version number to 8.03.07.13-k.
[SCSI] qla2xxx: Proper detection of firmware abort error code for ISP82xx.
[SCSI] qla2xxx: Remove resetting memory during device initialization for ISP82xx.
[SCSI] qla2xxx: Complete mailbox command timedout to avoid initialization failures during next reset cycle.
[SCSI] qla2xxx: Remove check for null fcport from host reset handler.
[SCSI] qla2xxx: Correct out of bounds read of ISP2200 mailbox registers.
[SCSI] qla2xxx: Remove errant clearing of MBX_INTERRUPT flag during CT-IOCB processing.
[SCSI] qla2xxx: Clear options-flags while issuing stop-firmware mbx command.
[SCSI] qla2xxx: Add an "is reset active" helper.
[SCSI] qla2xxx: Add check for null fcport references in qla2xxx_queuecommand.
[SCSI] qla2xxx: Propagate up abort failures.
[SCSI] isci: Fix NULL ptr dereference when no firmware is being loaded
[SCSI] ipr: fix eeh recovery for 64-bit adapters
[SCSI] mpt2sas: Fix mismatch in mpt2sas_base_hard_reset_handler() mutex lock-unlock
Ben McKeegan [Fri, 24 Feb 2012 06:33:56 +0000 (06:33 +0000)]
ppp: fix 'ppp_mp_reconstruct bad seq' errors
This patch fixes a (mostly cosmetic) bug introduced by the patch
'ppp: Use SKB queue abstraction interfaces in fragment processing'
found here: http://www.spinics.net/lists/netdev/msg153312.html
The above patch rewrote and moved the code responsible for cleaning
up discarded fragments but the new code does not catch every case
where this is necessary. This results in some discarded fragments
remaining in the queue, and triggering a 'bad seq' error on the
subsequent call to ppp_mp_reconstruct. Fragments are discarded
whenever other fragments of the same frame have been lost.
This can generate a lot of unwanted and misleading log messages.
This patch also adds additional detail to the debug logging to
make it clearer which fragments were lost and which other fragments
were discarded as a result of losses. (Run pppd with 'kdebug 1'
option to enable debug logging.)
Signed-off-by: Ben McKeegan <ben@netservers.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Santosh Nayak [Fri, 24 Feb 2012 06:56:39 +0000 (06:56 +0000)]
enic: Fix endianness bug.
Sparse complaints the endian bug.
Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Dietsche [Fri, 20 Jan 2012 23:10:35 +0000 (17:10 -0600)]
coccicheck: change handling of C={1,2} when M= is set
This patch reverts a portion of
d0bc1fb4 so that coccicheck will
work properly when C=1 or C=2.
Reported-and-tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Greg Dietsche <Gregory.Dietsche@cuw.edu>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michal Marek <mmarek@suse.cz>
David S. Miller [Fri, 24 Feb 2012 22:41:57 +0000 (17:41 -0500)]
Merge branch 'master' of git://1984.lsi.us.es/net
stephen hemminger [Fri, 24 Feb 2012 08:08:20 +0000 (08:08 +0000)]
gre: fix spelling in comments
The original spelling and bad word choice makes these comments hard to read.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 24 Feb 2012 20:32:51 +0000 (12:32 -0800)]
Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-media
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] hdpvr: update picture controls to support firmware versions > 0.15
[media] wl128x: fix build errors when GPIOLIB is not enabled
[media] hdpvr: fix race conditon during start of streaming
[media] omap3isp: Fix crash caused by subdevs now having a pointer to devnodes
[media] imon: don't wedge hardware after early callbacks
Oleg Nesterov [Fri, 24 Feb 2012 19:07:29 +0000 (20:07 +0100)]
epoll: ep_unregister_pollwait() can use the freed pwq->whead
signalfd_cleanup() ensures that ->signalfd_wqh is not used, but
this is not enough. eppoll_entry->whead still points to the memory
we are going to free, ep_unregister_pollwait()->remove_wait_queue()
is obviously unsafe.
Change ep_poll_callback(POLLFREE) to set eppoll_entry->whead = NULL,
change ep_unregister_pollwait() to check pwq->whead != NULL under
rcu_read_lock() before remove_wait_queue(). We add the new helper,
ep_remove_wait_queue(), for this.
This works because sighand_cachep is SLAB_DESTROY_BY_RCU and because
->signalfd_wqh is initialized in sighand_ctor(), not in copy_sighand.
ep_unregister_pollwait()->remove_wait_queue() can play with already
freed and potentially reused ->sighand, but this is fine. This memory
must have the valid ->signalfd_wqh until rcu_read_unlock().
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 24 Feb 2012 19:07:11 +0000 (20:07 +0100)]
epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()
This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.
epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.
This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.
__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.
ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.
The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.
In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.
Note:
- we do not have wake_up_all_poll() but wake_up_poll()
is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.
- signalfd_cleanup() uses POLLHUP along with POLLFREE,
we need a couple of simple changes in eventpoll.c to
make sure it can't be "lost".
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>